AI Agentic Process Audit:流程审计与重放保证架构
重要说明: 本文是学习、作品集和内部架构训练材料, 不构成法律意见、监管解释、审计意见、模型验证结论、内控有效性结论、风险接受决定或生产上线批准。本文讨论的 audit、assurance、evidence、control effectiveness 和 process owner review, 均指支持内部审计、风险、合规、业务 Owner 和管理层进行独立审阅与挑战的证据架构, 不代表任何正
AI Agentic Process Audit / Workflow Replay / Assurance Architecture 解读
面向对象: Senior AI PM / AI Architect / Internal Audit Partner / Process Owner / CBAP-level BA / AI Governance Lead / Financial Retail Operations Leader。 核心问题: 当 AI agent 不只是回答问题, 而是理解意图、制定计划、调用工具、请求人工批准、执行流程动作、处理例外并从事故中学习时, 组织如何设计一套可审阅、可重放、可质询、可持续改进的 evidence architecture? 学习目标: 建立 agentic process audit model、replayable trace、event-sourced agent workflow、plan/action/observation/approval/output schema、audit query、sampling strategy、process conformance、incident replay 和 assurance operating model 的高级心智模型。
重要说明: 本文是学习、作品集和内部架构训练材料, 不构成法律意见、监管解释、审计意见、模型验证结论、内控有效性结论、风险接受决定或生产上线批准。本文讨论的 audit、assurance、evidence、control effectiveness 和 process owner review, 均指支持内部审计、风险、合规、业务 Owner 和管理层进行独立审阅与挑战的证据架构, 不代表任何正式审计签署或监管认可。访问日期按 2026-06-30 记录。
Source Anchors
以下来源用于组织 AI 风险管理、AI 管理体系、架构描述、需求工程、可观测性、provenance 和金融机构 IT 审查语言。本文只将它们作为产品、架构、BA 和内部 assurance 的设计锚点, 不声称任何 schema、trace 或 replay workbench 自动满足审计、监管或模型风险批准。
| Source | Official link | 本文采用的思想 |
|---|---|---|
| NIST AI Risk Management Framework | https://www.nist.gov/itl/ai-risk-management-framework | 用 Govern / Map / Measure / Manage 组织 agentic workflow 风险识别、度量、控制、运行监测和改进证据。 |
| ISO/IEC 42001 AI management system | https://www.iso.org/standard/81230.html | 用 AI management system 的 scope、policy、operation、performance evaluation、management review 和 improvement 设计持续 assurance operating model。 |
| ISO/IEC/IEEE 42010 Architecture Description | https://www.iso.org/standard/74393.html | 用 stakeholder、concern、viewpoint、architecture view、correspondence 和 rationale 组织 replay architecture 的多视图描述。 |
| ISO/IEC/IEEE 29148 Requirements Engineering | https://www.iso.org/standard/72089.html | 用 stakeholder need、requirement、verification、validation、traceability 和 information item 设计 process claim 与 evidence contract。 |
| OpenTelemetry Documentation | https://opentelemetry.io/docs/ | 用 traces、metrics、logs、context propagation 和 semantic attributes 设计 runtime observability 与 replay trace。 |
| W3C PROV Overview | https://www.w3.org/TR/prov-overview/ | 用 Entity、Activity、Agent 的 provenance 思维表达一次 agentic 行为由谁、基于什么、通过什么活动生成。 |
| FFIEC IT Handbook | https://ithandbook.ffiec.gov/ | 用金融机构 IT 风险、审查、治理、外包、业务连续性和控制评估语言校准金融零售场景。 |
一句话:
Agentic process audit is the evidence architecture discipline that turns an AI agent's intent, plan, tool actions, policy decisions, human approvals, exceptions, outputs and learning signals into replayable process evidence for review, challenge and improvement.
1. Executive Summary
普通 AI 系统的审阅重点通常是:
- 模型是否表现足够好。
- RAG 是否引用了正确来源。
- 输出是否合规、安全、可解释。
- 上线后是否有监控和事故响应。
Agentic workflow 的复杂度更高, 因为系统会跨越多个业务步骤:
User intent
-> intent interpretation
-> plan generation
-> policy decision
-> tool selection
-> tool call
-> observation
-> human approval
-> exception handling
-> compensating action
-> final output
-> feedback
-> incident learning
如果组织只保存最终答案或普通 API 日志, 内审、业务 Owner、风险团队和架构团队无法回答关键问题:
| 审阅问题 | 普通日志的缺口 |
|---|---|
| 用户真正要求了什么, agent 如何解释意图 | 日志只保存 request text, 没有 intent classification 和 confidence |
| Agent 为什么选择这条计划 | 没有 plan version、candidate plan、rejected plan 和 rationale |
| 哪个工具动作影响了客户、资金、案件或监管材料 | 只看到 API 调用成功, 看不到 side effect、idempotency、approval 和 rollback |
| 策略为什么允许、拦截或升级 | 没有 policy version、decision reason 和 obligations |
| 人工审批者看到了什么证据 | 只记录 approved, 没有 visible evidence set |
| 例外是合规偏差还是合理业务例外 | 没有 exception reason、owner、expiry、compensating control |
| 事故发生后能否重放流程 | 时间线断裂, 版本集合不完整, 敏感内容无法安全访问 |
| 控制是否有效 | 没有 sampling universe、population definition、control test result 和 exception aging |
成熟的 agentic process audit architecture 不是让 AI 团队多填表, 而是在 workflow runtime 中原生生成证据:
Process claim
-> replayable trace
-> event-sourced agent workflow
-> evidence chain
-> audit query
-> sampling and testing
-> process conformance review
-> incident replay
-> control improvement
高级 AI PM、AI Architect 和 CBAP-level BA 的任务是把 agentic workflow 从 "看起来自动化" 升级成 "可证明地受控"。这里的 "可证明" 不意味着输出完全 deterministic, 也不意味着自动获得审计结论, 而是指组织可以在权限、隐私和保留边界内重建足够证据, 支持独立挑战、管理审阅和持续改进。
2. Target Audience and Role Expectations
| Role | 需要回答的问题 | 典型输出 |
|---|---|---|
| Senior AI PM | Agentic workflow 是否真正改善业务结果, 同时具备可审阅、可回滚、可学习的运行证据 | outcome evidence map、replay readiness gate、scale/stop memo |
| AI Architect | 如何设计 trace、event store、provenance graph、replay workbench、redaction 和 retention | replay reference architecture、event schema、audit query architecture |
| Internal Audit Partner | 哪些流程声称可以被验证, 哪些证据可用于独立测试, 哪些仍需人工挑战 | audit query catalog、sampling design input、evidence sufficiency review |
| Process Owner | Agent 是否按批准流程运行, 例外是否合理, 人工审批和补偿动作是否有效 | process conformance report、exception aging、control action plan |
| CBAP-level BA | 用户意图、业务规则、流程状态、例外路径、审批和验收标准是否可追踪 | process claim model、state/event dictionary、control-to-evidence matrix |
| Risk / Compliance | 自动化边界、客户影响、政策决策、风险接受和补偿控制是否清楚 | risk scenario map、policy decision evidence、exception register |
| Operations Leader | HITL 队列、复核容量、override taxonomy、incident route 和 training 是否可运行 | review SOP、capacity evidence、override dashboard |
| Platform / SRE | 证据采集、可观测性、数据最小化、访问控制、SLO 和故障恢复是否可实施 | OTel instrumentation、evidence pipeline、replay SLO |
成熟组织会把 agentic process audit 当成流程自动化产品的一部分, 不是上线前由内审临时提出的证据补丁。
3. Learning Objectives
完成本文后应能:
- 区分 audit trail、observability、provenance、evidence chain 和 workflow replay。
- 定义 process claim, 并把它拆成可验证的 workflow、decision、control 和 outcome 证据。
- 设计 event-sourced agent workflow, 覆盖 intent、plan、action、observation、approval、policy、exception、output、feedback 和 incident learning。
- 写出 plan/action/observation/approval/output event schema, 明确字段、版本、保留、脱敏和访问边界。
- 解释 causality vs chronology 的差异, 避免把时间线当作因果证明。
- 设计 workflow replay architecture, 支持 technical replay、business replay、control replay 和 incident replay。
- 建立 audit query catalog, 支持内审、流程 Owner、风险、合规、模型风险和事故复盘。
- 设计 risk-based sampling strategy, 测试控制设计有效性和运行有效性。
- 区分 process conformance、justified exception、policy override 和 control failure。
- 在 AML investigation copilot、payment dispute assistant、KYC onboarding agent、collections hardship case agent、regulatory reporting narrative drafter 和 payment operations repair queue agent 中应用。
4. Core Concepts
4.1 Process Claim
Process claim 是团队对 agentic workflow 的可验证主张。它不能写成空泛的 "agent improves operations", 而要写成可审阅、可测试、可采样的声明。
| Weak claim | Strong process claim |
|---|---|
| AML copilot helps analysts work faster | For AML alert type A and B, the copilot generates a sourced case timeline and draft narrative, but final disposition remains analyst-owned and every saved narrative has source-span evidence. |
| KYC agent automates onboarding | The KYC onboarding agent classifies documents, identifies missing evidence and drafts customer follow-up, but cannot reject an applicant without human review and appeal path. |
| Payment repair agent fixes exceptions | The payment repair queue agent suggests repair actions and executes only low-risk reversible updates after policy decision and dual-control approval. |
Process claim 的组成:
Business outcome
+ workflow scope
+ agent responsibility
+ human responsibility
+ policy boundary
+ evidence requirement
+ control and exception handling
+ outcome measurement
4.2 Replayable Trace
Replayable trace 不是简单地把原始 prompt、答案和日志全部保存。它是一个最小充分证据集合, 能在授权条件下重建关键流程事实:
- 谁发起了请求, 角色和权限是什么。
- 用户意图如何被解释, 是否存在歧义。
- Agent 生成了什么计划, 哪些步骤被执行或跳过。
- 哪些工具被调用, 输入、输出、side effect 和 idempotency 证据是什么。
- 哪些 policy decisions 允许、拦截、升级或附加 obligations。
- 哪些人工审批、覆盖、拒绝或补充信息发生。
- 哪些异常被识别, 采取了什么补偿动作。
- 最终输出如何生成、保存、发送或丢弃。
- 反馈、事故信号和改进动作如何进入 learning loop。
Replayable 不等于 bit-for-bit deterministic。对 LLM 和第三方模型而言, 完全复现输出常常不可承诺。成熟表达应是:
We can reconstruct the approved version set, inputs at the agreed evidence level, policy decisions, tool actions, approvals, outputs and observed outcomes, while documenting reproducibility limits.
4.3 Event-Sourced Agent Workflow
Event-sourced agent workflow 把 agent 行为记录为不可随意改写的业务事件序列。当前 workflow state 可以由事件投影而来, 事故复盘和过程审阅可以从事件重新构造。
| Event family | 典型事件 | 审阅价值 |
|---|---|---|
| Intent | intent.received、intent.classified、intent.clarified | 证明用户需求和 agent 解释之间的关系 |
| Plan | plan.generated、plan.reviewed、plan.revised、plan.approved | 证明 agent 为什么按某个路径行动 |
| Action | action.proposed、tool.invoked、tool.completed、tool.failed | 证明工具动作、参数、结果和副作用 |
| Observation | observation.received、context.loaded、retrieval.completed | 证明 agent 看到的事实和证据 |
| Policy | policy.evaluated、policy.blocked、policy.escalated | 证明控制点真实运行 |
| Approval | approval.requested、approval.decided、approval.expired | 证明 HITL 和 dual control |
| Exception | exception.detected、override.recorded、compensation.executed | 证明例外管理和补救 |
| Output | output.drafted、output.finalized、output.delivered、output.discarded | 证明客户、监管或业务记录中的输出来源 |
| Feedback | feedback.captured、qa.sampled、user.edited | 证明 adoption、quality 和 learning signals |
| Incident | incident.signal.detected、incident.replay.started、learning.action.created | 证明事故复盘和改进闭环 |
5. Agentic Process Audit Model
5.1 Audit Layers
Agentic process audit model 可分为八层。
| Layer | 核心问题 | Evidence objects |
|---|---|---|
| 1. Intent layer | 用户、系统或队列触发的业务意图是什么 | request record、intent classification、clarification prompt、user role |
| 2. Process layer | 该意图对应哪个批准流程、状态机和业务规则 | workflow definition、state transition、process version、BPMN / state model |
| 3. Planning layer | Agent 如何把意图变成计划, 是否需要审批 | plan candidate、plan rationale、risk tier、approval requirement |
| 4. Control layer | 哪些 policy、entitlement、SoD、risk 和 compliance controls 被执行 | policy decision、permission result、SoD check、obligations |
| 5. Action layer | Agent 读取、生成、写入或提交了什么 | tool contract、tool input hash、tool output pointer、side effect id |
| 6. Human layer | 人如何审阅、批准、覆盖、拒绝或补充事实 | visible evidence set、decision reason、reviewer role、edit diff |
| 7. Output layer | 哪些输出进入业务记录、客户沟通或监管材料 | output hash、citation map、delivery state、record id |
| 8. Learning layer | 缺陷、反馈、事故和抽样结果如何改变未来行为 | QA finding、incident replay report、eval case、control action |
5.2 Process Claim to Evidence Chain
一个 process claim 必须连到 evidence chain:
Process claim
-> requirement
-> control objective
-> workflow event
-> evidence object
-> audit query
-> sampling approach
-> control result
-> management action
示例: Payment dispute assistant
| Chain element | Example |
|---|---|
| Process claim | Assistant drafts dispute evidence packet but does not submit chargeback without maker-checker approval. |
| Requirement | All customer-impacting dispute submissions require reviewer approval based on visible evidence set. |
| Control objective | Prevent unsupported or unauthorized chargeback submissions. |
| Workflow event | approval.decided before tool.invoked for chargeback_submit. |
| Evidence object | approval record, visible evidence hash, tool input hash, policy decision id, side effect id. |
| Audit query | Show all chargeback submissions where approval is missing, expired or not tied to exact execution input. |
| Sampling approach | 100% automated exception query plus monthly sample of approved submissions. |
| Control result | Exceptions by reason, reviewer quality, stale approval count, remediation status. |
| Management action | Update tool gateway to block mismatched approval input hashes. |
6. Event Schema
6.1 Common Event Envelope
Agentic process events should share a stable envelope, even if each event type has domain-specific payload.
| Field | Purpose |
|---|---|
event_id | globally unique event identifier |
event_type | stable event name such as ai.plan.generated |
schema_version | payload contract version |
occurred_at | event time in UTC |
recorded_at | collection time in UTC |
trace_id | end-to-end workflow trace |
workflow_id | business workflow instance |
case_id_hash | privacy-preserving case reference |
use_case_id | AI use case registry id |
risk_tier | workflow risk classification |
producer | service, gateway or application emitting event |
actor_type | user, agent, service, reviewer, policy engine |
actor_id_hash | hashed or tokenized actor identity |
data_class | classification for privacy, security and retention |
retention_policy_id | retention rule applied to event |
redaction_profile | how sensitive content was minimized |
prev_event_ids | causal predecessors, not only previous time event |
evidence_refs | pointers to controlled evidence objects |
integrity_hash | tamper-evidence or content hash |
6.2 Plan / Action / Observation / Approval / Output Schema
| Schema | Required fields | Assurance use |
|---|---|---|
| Plan event | plan_id、plan_version、goal、steps、risk_assessment、requires_approval、rationale_hash、rejected_options | Shows why the agent intended to act and whether the plan crossed a risk boundary. |
| Action event | action_id、tool_name、tool_schema_version、action_type、input_hash、dry_run_result、policy_decision_id、approval_id、side_effect_id、idempotency_key | Shows tool use, write boundaries, approvals and recoverability. |
| Observation event | observation_id、source_system、source_version、fact_type、fact_hash、retrieval_refs、confidence_band、freshness | Shows what facts the agent used and whether they were current and authorized. |
| Policy event | policy_id、policy_version、decision、reason_code、obligations、input_attribute_hash、evaluation_mode | Shows allow, block, escalate, redact, approval required or restrict decisions. |
| Approval event | approval_id、request_reason、approver_role、decision、reason_code、visible_evidence_hash、approval_scope、expiry | Shows HITL, maker-checker and SoD evidence. |
| Exception event | exception_id、exception_type、detected_by、severity、justification、owner、compensating_control、expiry、closure_evidence | Shows whether exceptions are controlled or unmanaged drift. |
| Output event | output_id、output_type、output_hash、citation_map、safety_label、record_system、delivery_channel、final_status | Shows what was finalized, stored, delivered or suppressed. |
| Feedback event | feedback_id、feedback_type、accept_edit_reject、edit_diff_hash、reason_code、business_outcome_ref | Shows adoption, quality signals and business outcome evidence. |
6.3 Example Event
{
"event_id": "evt_20260630_kyc_000184",
"event_type": "ai.approval.decided",
"schema_version": "1.0",
"occurred_at": "2026-06-30T18:42:10Z",
"recorded_at": "2026-06-30T18:42:11Z",
"trace_id": "trc_kyc_onboarding_72f1",
"workflow_id": "wf_kyc_case_review_9482",
"case_id_hash": "sha256:1d4f...",
"use_case_id": "kyc_onboarding_agent",
"risk_tier": "high",
"producer": "kyc-review-workbench",
"actor_type": "reviewer",
"actor_id_hash": "hash:user:9a27",
"data_class": "restricted",
"retention_policy_id": "ret_ai_high_business_record_7y",
"redaction_profile": "pii-minimized-v3",
"prev_event_ids": ["evt_20260630_kyc_000181", "evt_20260630_kyc_000183"],
"evidence_refs": [
"evidence://visible-set/vis_kyc_5521",
"evidence://policy-decision/pol_kyc_8910",
"evidence://tool-input/tool_kyc_followup_2322"
],
"integrity_hash": "sha256:ab77...",
"data": {
"approval_id": "appr_kyc_7721",
"request_reason": "customer_followup_message",
"approver_role": "KYC_Senior_Reviewer",
"decision": "approved_with_edit",
"reason_code": "missing_ubo_evidence_clearer_language",
"visible_evidence_hash": "sha256:392e...",
"approval_scope": "draft_customer_followup_only",
"expiry": "2026-07-01T18:42:10Z"
}
}
7. Causality vs Chronology
Workflow replay must not confuse chronological order with causal explanation.
| Chronology asks | Causality asks |
|---|---|
| What happened before what? | Which event caused, enabled, constrained or justified another event? |
| What was the next timestamp? | What policy, approval, observation or plan step was a required predecessor? |
| Did the tool call occur after approval? | Was this approval scoped to this exact tool input and still valid at execution time? |
| Did the output follow retrieval? | Did the output's material claims rely on retrieved evidence or unsupported model generation? |
Agentic workflow needs both:
- Chronological timeline for human-readable incident review.
- Causal graph for control testing, evidence lineage and independent challenge.
Example:
Approval A happened before Tool Call B.
That is chronology.
Approval A authorized input hash H1.
Tool Call B executed input hash H2.
H1 != H2.
That is causal control failure.
成熟 replay architecture 应记录 prev_event_ids、causal_refs、policy_decision_id、approval_scope、input_hash 和 side_effect_id, 让审阅者能判断 "按时间发生过" 与 "被正确授权" 的差异。
8. Workflow Replay Architecture
8.1 Logical Architecture
Business workflow UI / queue / API
|
AI agent orchestrator
intent classifier | planner | policy engine | tool gateway | HITL workflow
|
OpenTelemetry traces + agentic process events
|
Evidence collection layer
schema validation | redaction | hashing | retention tagging | integrity checks
|
Event store + trace store + evidence lake
|
Provenance graph
entities | activities | agents | causal edges | version set
|
Replay workbench
timeline view | causal graph | evidence chain | policy decision view | approval view
|
Audit query and assurance reporting
process conformance | sampling | incident replay | control testing | process owner review
8.2 Core Components
| Component | Responsibility |
|---|---|
| Agent orchestrator | Emits trace context and events for intent, plan, tool call, observations, approvals and outputs. |
| Policy decision point | Produces versioned allow, block, escalate, approval-required and obligation decisions. |
| Tool gateway | Enforces authorization, idempotency, dry-run, side-effect capture, approval binding and kill switch. |
| HITL workflow service | Records visible evidence, reviewer decision, reason code, edit diff, expiry and SoD result. |
| Evidence collector | Validates event schema, applies redaction profile, computes hashes and assigns retention class. |
| Event store | Stores append-only domain events for workflow replay and conformance analysis. |
| Trace store | Stores spans, timing, cost, latency and dependency context for observability and drilldown. |
| Provenance graph | Links entities, activities and agents across prompt, context, RAG, tool, approval and output objects. |
| Replay workbench | Lets authorized reviewers reconstruct timelines, causal chains, version sets and evidence gaps. |
| Audit query layer | Provides parameterized queries for internal audit, risk, process owner and incident review. |
| Redaction and access layer | Controls who can see raw content, redacted content, hashes, pointers and aggregate metrics. |
| Retention and legal hold layer | Applies retention policy, deletion handling and evidence preservation for incidents or reviews. |
8.3 Replay Modes
| Replay mode | Purpose | Limits |
|---|---|---|
| Technical replay | Reconstruct model, prompt, RAG, policy, tool and runtime version set. | May not reproduce identical LLM output if provider behavior changed. |
| Business replay | Reconstruct workflow states, user actions, approvals, outputs and business record updates. | Requires business systems to retain record ids and state transitions. |
| Control replay | Re-evaluate whether required controls fired and were bound to correct inputs. | Does not itself prove control design is sufficient. |
| Incident replay | Reconstruct events in an incident window and connect root cause, impact and remediation. | Sensitive content may require restricted access and legal guidance. |
| Learning replay | Turn failures, edits, overrides and QA findings into eval cases and process improvements. | Must avoid using feedback data outside purpose and consent boundaries. |
8.4 Reproducibility Limits
For agentic AI, replay architecture must explicitly document reproducibility limits:
| Limit | Mitigation |
|---|---|
| LLM nondeterminism | Preserve prompt/config/model route/output hash and use replay to compare behavior, not guarantee exact output. |
| Third-party model version opacity | Capture vendor metadata, response headers, model alias resolution and contractually available version info. |
| External tool state changes | Store tool request/response hashes, side-effect ids, business record state and compensating action records. |
| Knowledge base drift | Preserve KB version, index version, retrieved chunk ids, effective dates and document lifecycle state. |
| Privacy redaction | Preserve enough hashed/pointer evidence to reconstruct under authorized conditions without broad raw content exposure. |
9. Audit Trail vs Observability
Audit trail and observability overlap but are not the same.
| Dimension | Audit trail | Observability |
|---|---|---|
| Primary question | Can we prove who did what, when, under which authority and evidence? | Can we understand system behavior, performance and failure modes? |
| Main users | Internal audit, risk, compliance, process owners, regulators, legal | SRE, platform, engineering, product, operations |
| Time horizon | Retention aligned to control, business record and regulatory needs | Operational trend and incident analysis horizon |
| Data shape | Events, approvals, version records, evidence objects, control results | Traces, metrics, logs, exemplars, dashboards |
| Quality bar | Completeness, integrity, non-repudiation, access control, chain of evidence | Coverage, latency, sampling, diagnostic usefulness |
| Failure mode | Cannot prove control occurred or output was authorized | Cannot diagnose outage, drift, cost spike or bad dependency |
Mature architecture connects both:
Observability shows that something happened and how the system behaved.
Audit trail shows whether the behavior was authorized, controlled, evidenced and reviewable.
For example, an OTel span may show tool.invoke completed in 250 ms. Audit evidence must additionally show:
- Which tool contract and schema version.
- Which user or agent authority was used.
- Which policy decision allowed it.
- Whether approval was required and completed.
- What side effect occurred.
- Whether the action matched approval scope.
- How to reverse or compensate if needed.
10. Evidence Chain
10.1 Evidence Chain Model
Agentic process evidence should be organized as a chain:
Business outcome evidence
<- process conformance evidence
<- workflow replay evidence
<- event evidence
<- source system evidence
<- version and control evidence
| Evidence type | Example | Assurance use |
|---|---|---|
| Business outcome evidence | alert aging reduction, dispute cycle time, onboarding completion, hardship treatment quality | Shows whether workflow produced intended value without unacceptable harm. |
| Process conformance evidence | state transitions, required approvals, SoD results, exception reasons | Shows whether workflow followed approved process or justified exception. |
| Workflow replay evidence | trace, event sequence, causal graph, version set | Supports reconstruction and independent challenge. |
| Event evidence | plan, action, observation, policy, approval, output, feedback events | Provides granular proof of actions and decisions. |
| Source system evidence | case management records, payment system state, KYC document store, regulatory report data | Anchors AI evidence to systems of record. |
| Version and control evidence | prompt, model, KB, policy, tool schema, release bundle | Shows which approved artifact set governed behavior. |
10.2 Business Outcome Evidence
Business outcome evidence prevents audit architecture from becoming purely technical. For financial retail examples:
| Use case | Outcome evidence | Control counterweight |
|---|---|---|
| AML investigation copilot | reduced alert aging, better narrative completeness, lower reopen rate | no unsupported SAR implication, analyst final disposition retained |
| Payment dispute assistant | faster evidence packet creation, fewer missing-document reworks | chargeback submission requires maker-checker approval |
| KYC onboarding agent | faster first-pass completion, lower document chase volume | no automated rejection, appeal route preserved |
| Collections hardship case agent | better hardship option matching, improved follow-up timeliness | vulnerable customer and fair treatment checks |
| Regulatory reporting narrative drafter | shorter variance explanation cycle, fewer reviewer corrections | no unsupported metric cause, source lineage visible |
| Payment operations repair queue agent | lower repair backlog and fewer duplicate repairs | dual control for irreversible or customer-impacting updates |
11. Exception and Override Handling
Exceptions are expected in real processes. The assurance question is whether they are explicit, justified, owned, time-bound and learned from.
11.1 Exception Types
| Type | Example | Required evidence |
|---|---|---|
| Business exception | KYC case requires non-standard document due to jurisdiction rule | policy citation, reason code, reviewer approval, customer communication record |
| Policy override | Payment dispute response needs supervisor approval despite low automated risk score | override reason, approver role, scope, expiry, evidence set |
| Tool exception | Payment repair API unavailable, manual repair queue used | incident link, manual action record, reconciliation evidence |
| Data exception | RAG source stale for one policy section | affected scope, compensating manual source check, expiry |
| Workflow exception | HITL reviewer queue exceeds SLA, case routed to backup team | capacity signal, escalation decision, customer impact analysis |
| Model exception | Agent confidence below threshold but reviewer proceeds after independent evidence check | confidence band, reviewer rationale, QA sample inclusion |
11.2 Good Override Record
| Field | Strong content |
|---|---|
| override_id | stable id tied to workflow and case |
| baseline rule | the rule, policy or process step being overridden |
| business reason | why standard route does not fit |
| risk impact | customer, financial, compliance, operational and reputational impact |
| approver role | role with authority and independence |
| visible evidence | what the approver saw at decision time |
| compensating control | extra review, sampling, reconciliation, customer notice or temporary limit |
| expiry or closure trigger | when override ends or must be reviewed |
| learning path | whether this becomes process update, eval case, policy clarification or training |
Weak override:
Supervisor approved exception.
Strong override:
Supervisor approved hardship option deviation for case class H2 because customer submitted verified disaster impact documentation not covered by standard script. AI recommendation was restricted to draft language. Final treatment selected by hardship specialist. Case added to monthly vulnerable-customer QA sample and policy team review.
12. Segregation of Duties
Agentic AI can blur roles because the same platform may propose, execute, document and monitor a process. Segregation of duties must be explicit.
| Risk | Control design |
|---|---|
| Agent proposes and executes high-impact action without independent review | Tool gateway enforces approval for write or customer-impacting actions. |
| Same user requests and approves a payment repair | SoD check blocks self-approval and routes to independent reviewer. |
| Developer changes prompt and approves production release | Release workflow separates change author, reviewer and production approver. |
| Model owner validates their own control effectiveness | Independent challenge by risk, model risk, QA or internal audit partner. |
| Operations team suppresses incident evidence | Incident evidence preservation controlled by platform/security/legal process. |
SoD event evidence should include:
requester_roleapprover_rolerelationship_check_resultsame_user_blockeddelegated_authority_sourceapproval_scopeindependent_challenge_requiredbreak_glass_reason
Independent challenge does not mean every event is manually reviewed. It means the evidence architecture allows authorized second-line, third-line, QA or process-owner reviewers to test whether controls operated as designed.
13. Sampling and Testing Approach
13.1 Population Definition
Sampling begins with a population definition. Without it, teams cherry-pick good examples.
| Population | Example |
|---|---|
| All high-risk workflow instances | All KYC onboarding agent cases where customer follow-up was drafted in June 2026. |
| All customer-impacting tool actions | All payment repair queue updates that changed customer-visible status. |
| All overrides | All collections hardship cases where AI recommendation was overridden. |
| All policy blocks | All regulatory narrative drafts blocked for unsupported claim. |
| All incidents or near misses | All AML copilot summaries flagged by QA as material evidence omission. |
13.2 Testing Types
| Test type | Purpose | Example |
|---|---|---|
| Design effectiveness test | Determine whether the control, if operated, would address the risk. | Does tool gateway approval binding prevent mismatched execution input? |
| Operating effectiveness test | Determine whether the control actually operated across samples. | Sample approved chargeback submissions and verify approval hash equals tool input hash. |
| Automated exception query | 100% scan for impossible or prohibited patterns. | Find tool write actions with no policy decision or expired approval. |
| Process conformance test | Compare actual event sequence to approved workflow model. | KYC follow-up must have intent, document observation, policy check, draft, review and output. |
| Outcome reasonableness test | Compare process result with business outcome and control counterweight. | Dispute cycle time improved without higher rework or complaint rate. |
| Replay drill | Reconstruct one case end-to-end under access controls. | Rebuild AML case timeline from trace, events, source records and approvals. |
13.3 Sampling Strategy
Risk-based sampling should combine:
- 100% automated queries for missing mandatory events.
- Attribute sampling for required control evidence.
- Judgmental samples for high-risk, high-value, high-complexity or complaint-linked cases.
- Stratified samples by channel, segment, language, geography, model route and reviewer.
- Incident-driven samples from near misses, overrides and QA failures.
- Negative samples where agent refused, escalated or blocked action.
Sampling should record:
| Field | Meaning |
|---|---|
| population_id | stable query or dataset defining universe |
| sample_method | random, stratified, risk-based, judgmental, exception query |
| sample_period | time window |
| sample_size | count and rationale |
| test_objective | design, operating, conformance, outcome or incident replay |
| pass_criteria | precise evidence expectation |
| exception_classification | justified exception, documentation defect, control failure, process drift |
| remediation_owner | process, product, architecture, operations or control owner |
14. Process Conformance
Process conformance asks whether actual workflow execution matches the approved process model.
Approved model:
intent -> plan -> policy -> tool dry-run -> approval -> tool execution -> output -> feedback
Actual trace:
intent -> plan -> tool execution -> output
Conformance result:
non-conformant because policy and approval events are missing before customer-impacting tool execution.
14.1 Conformance vs Justified Exception
| Category | Meaning | Example |
|---|---|---|
| Conformant | Actual event path follows approved process. | KYC agent drafted follow-up only after document evidence and reviewer approval. |
| Justified exception | Process deviated but with authorized reason, owner and compensating control. | Backup reviewer approved due to outage under documented continuity procedure. |
| Control failure | Required control absent, expired, mismatched or bypassed. | Payment repair tool executed without valid approval. |
| Process drift | Repeated deviations show the real process has changed without approval. | Reviewers routinely skip citation check because UI makes it difficult. |
| Model of process wrong | Approved process model omits legitimate operational path. | AML escalation path for multi-jurisdiction cases not modeled. |
Mature review does not punish every deviation. It classifies deviations and updates process, controls, training or tooling based on evidence.
14.2 Audit Queries for Conformance
| Query | Purpose |
|---|---|
| Show all tool write actions without preceding policy decision. | Detect bypassed control. |
| Show all approvals where visible evidence hash is missing. | Detect weak HITL evidence. |
| Show all output deliveries where output hash differs from approved draft hash. | Detect post-approval mutation. |
| Show all overrides by reviewer, reason and case type. | Detect concentration, training need or process ambiguity. |
| Show all workflows where exception expiry passed without closure. | Detect unmanaged residual risk. |
| Show all regulatory narrative drafts with unsupported material claims. | Detect output evidence failure. |
15. Incident Replay
Incident replay reconstructs what happened, why it happened, who or what allowed it, what impact occurred and what changed afterward.
15.1 Incident Replay Packet
| Section | Contents |
|---|---|
| Incident scope | incident id, use case, workflow, time window, affected cases, severity |
| Version set | model, prompt, RAG index, policy, tool schema, release bundle, feature flags |
| Timeline | chronological events and spans |
| Causal graph | required predecessors, policy decisions, approvals, tool actions, outputs |
| Evidence gaps | missing spans, missing event fields, inaccessible source records, redaction limits |
| Customer or business impact | affected customers, cases, funds, reports, timelines, operational backlog |
| Control analysis | which controls worked, failed, were bypassed or were absent |
| Exception analysis | whether deviations were justified, expired or unmanaged |
| Remediation | rollback, compensation, customer action, policy update, prompt update, tool restriction |
| Learning loop | eval cases, regression tests, training, process model update, control improvement |
15.2 Example: Payment Operations Repair Queue Agent
Incident signal:
Duplicate repair actions increased after a workflow release.
Replay findings:
| Finding | Evidence |
|---|---|
| Duplicate side effects occurred in 18 cases | tool side_effect_id and payment system state transitions |
| Idempotency key was generated from case id only, not repair action id | tool gateway event schema and code release notes |
| Approval existed but was scoped to first repair attempt | approval scope and expiry |
| Retry path bypassed dry-run after timeout | trace timeline and causal graph |
| Customer-visible balances were corrected through compensating entries | compensating action records and reconciliation report |
| Regression test added | eval/control test case for retry idempotency |
Assurance conclusion should be framed carefully:
The replay packet supports process owner, risk and internal audit review of the incident evidence. It does not by itself constitute audit sign-off or regulatory closure.
16. Financial Retail Examples
16.1 AML Investigation Copilot
| Audit focus | Replay evidence |
|---|---|
| Analyst final accountability | final disposition event, analyst approval, no auto-SAR submission |
| Evidence completeness | transaction refs, KYC refs, adverse media refs, source-span map |
| Narrative quality | draft narrative, edit diff, QA sample, reopen reason |
| Sensitive data boundary | redaction profile, access log, SAR-related data retention class |
| Incident learning | omitted-evidence QA finding converted into regression case |
16.2 Payment Dispute Assistant
| Audit focus | Replay evidence |
|---|---|
| Chargeback submission authority | maker-checker approval, tool input hash, policy decision |
| Evidence packet support | transaction timeline, merchant evidence, network rule source |
| Customer communication | approved letter output hash, delivery channel, complaint link |
| Exceptions | provisional credit exception, supervisor reason, expiry |
| Outcome evidence | cycle time, rework, win/loss rate, complaint trend |
16.3 KYC Onboarding Agent
| Audit focus | Replay evidence |
|---|---|
| No automated rejection | output status, reviewer decision, appeal route evidence |
| Missing evidence detection | document observation events, confidence band, source pointer |
| Customer follow-up | draft, reviewer edit, approved final message |
| High-risk jurisdiction | policy decision, escalation obligation, senior review |
| Process conformance | onboarding state transitions and exception path |
16.4 Collections Hardship Case Agent
| Audit focus | Replay evidence |
|---|---|
| Fair treatment | vulnerability flag handling, policy decision, human specialist approval |
| Option recommendation | hardship facts, source policy, plan rationale |
| Override quality | override reason codes, supervisor sample, customer outcome |
| Communications | approved message hash and record id |
| Learning loop | complaints and QA findings converted to training or policy clarification |
16.5 Regulatory Reporting Narrative Drafter
| Audit focus | Replay evidence |
|---|---|
| Source lineage | metric id, data source, report period, transformation refs |
| Unsupported claim prevention | citation map, policy block for hallucinated cause |
| Maker-checker | reviewer approval, visible evidence, edit diff |
| Attestation boundary | AI draft marked as draft, authorized signer retained |
| Retention | report pack evidence aligned to regulatory record policy |
16.6 Payment Operations Repair Queue Agent
| Audit focus | Replay evidence |
|---|---|
| Repair action authorization | tool risk tier, policy decision, dual control |
| Reversibility | side effect id, idempotency key, compensating action |
| Queue prioritization | plan rationale, SLA, customer impact |
| Exception handling | manual repair route, reconciliation evidence |
| Outcome evidence | backlog, duplicate repair rate, settlement break trend |
17. Operating Model
17.1 RACI
| Activity | AI PM | AI Architect | BA | Process Owner | Risk / Compliance | Internal Audit Partner | Platform / SRE |
|---|---|---|---|---|---|---|---|
| Process claim definition | A | C | R | A | C | C | I |
| Event schema | C | A | R | C | C | C | R |
| Replay architecture | C | A | C | C | C | C | R |
| Control-to-evidence matrix | C | C | R | A | R | C | I |
| Audit query catalog | C | R | R | C | C | A/C | C |
| Sampling strategy | C | C | C | A | R | C | I |
| Exception register | R | C | R | A | C | C | I |
| Incident replay | R | R | C | A | A/C | C | R |
| Learning loop | A | R | R | A | C | I | C |
R = Responsible, A = Accountable, C = Consulted, I = Informed.
17.2 Cadence
| Forum | Cadence | Main question | Output |
|---|---|---|---|
| Workflow evidence design review | Before pilot and major release | Are process claims, events, controls and replay needs defined? | evidence contract and release gate |
| Process conformance review | Monthly or risk-based | Are actual traces matching approved process? | conformance report and action log |
| Exception and override review | Weekly for high-risk workflows | Are exceptions justified, aging and closing? | exception register update |
| Incident replay review | Triggered by incident or near miss | What happened, why, impact and learning? | replay packet and remediation |
| Assurance management review | Quarterly | Are controls, outcomes and evidence architecture improving? | management action and roadmap |
18. Anti-Patterns
| Anti-pattern | Why it fails | Mature replacement |
|---|---|---|
| Final answer as audit evidence | It hides intent, plan, tool calls, approvals and policy decisions. | Replayable trace with event-sourced workflow evidence. |
| Logging everything raw | Creates privacy, security and retention risk without better assurance. | Minimum sufficient evidence with redaction, hash, pointer and controlled raw access. |
| Chronology treated as causality | "Approval happened before action" does not prove action was approved. | Causal links through input hash, approval scope and policy decision. |
| HITL recorded as yes/no | Review cannot determine what evidence human saw. | Visible evidence set, decision reason, edit diff and expiry. |
| Exceptions hidden in comments | Cannot distinguish justified business exception from control failure. | Structured exception record with owner, expiry and compensating control. |
| Audit query afterthought | Evidence exists but cannot answer real review questions. | Audit query catalog designed during requirements and architecture. |
| Same team self-certifies all controls | Lack of independent challenge and SoD. | Separate author, approver, reviewer and process owner roles based on risk. |
| Replay promises exact LLM reproduction | Overclaims determinism and ignores vendor/model drift. | Document reproducibility limits and preserve version set plus output hashes. |
| Sampling only successful cases | Misses near misses, blocks, overrides and failures. | Risk-based samples covering negative paths and exceptions. |
| Outcome metrics without control counterweights | Speed gains may hide customer harm or control erosion. | Pair business outcome evidence with quality, risk and conformance evidence. |
19. PM / BA / Architect Implications
19.1 For Senior AI PM
- Define process claims before building automation.
- Treat workflow replay as a product capability, not just internal logging.
- Pair value metrics with control and customer outcome evidence.
- Require release gates for evidence coverage, replay readiness and exception handling.
- Write scale recommendations that show uncertainty, residual risk owner and monitoring triggers.
19.2 For CBAP-level BA
- Model agentic workflow as state, event, decision and evidence objects.
- Translate stakeholder needs into process claims, acceptance criteria and audit queries.
- Make exception paths first-class requirements.
- Distinguish conformance, justified exception, process drift and control failure.
- Ensure every material process claim has evidence, owner and test approach.
19.3 For AI Architect
- Design trace context, event schema and provenance graph early.
- Use event-sourced workflow where replay and conformance matter.
- Bind approvals to exact action inputs and side effects.
- Record version sets for model, prompt, RAG, policy, tool schema and workflow.
- Build redaction, retention and access control into the evidence architecture.
19.4 For Internal Audit Partner
- Challenge whether process claims are testable.
- Review whether evidence is complete, reliable and independently queryable.
- Help shape sampling strategy and audit query catalog without owning management controls.
- Distinguish evidence sufficiency for review from formal audit conclusions.
20. Interview Answers
Q1: How do you make an agentic AI workflow auditable without slowing it down?
30 秒版本:
I would design auditability into the workflow runtime. Every agentic workflow should emit structured events for intent, plan, observations, policy decisions, tool calls, human approvals, exceptions, outputs and feedback. Those events connect to traces, source records and a provenance graph, so process owners and audit partners can replay cases, test controls and sample exceptions without asking teams to manually reconstruct evidence.
2 分钟版本:
I start from process claims. For example, a payment dispute assistant may draft evidence packets but cannot submit a chargeback without maker-checker approval. That claim becomes requirements, controls, events and audit queries. The architecture emits a replayable trace: intent classification, plan, RAG evidence, policy decision, tool dry-run, approval visible evidence, tool execution, output and feedback.
To avoid slowing teams, evidence is generated as work happens. The tool gateway enforces approval binding, idempotency and side-effect capture. The HITL workflow records visible evidence and reason codes. OpenTelemetry traces provide operational context, while event store and provenance graph provide process and causal evidence. Internal audit or process owners can then run queries such as "show all customer-impacting tool actions without valid approval" and sample cases by risk. The result is not automatic audit sign-off, but a stronger evidence architecture for review and assurance.
Q2: What is the difference between observability and audit trail for agentic AI?
30 秒版本:
Observability explains system behavior: latency, errors, traces, costs and dependencies. Audit trail proves authority and evidence: who requested, what plan was approved, which policy allowed it, what tool action happened, what output was delivered and whether exceptions were justified. Agentic AI needs both connected by trace ids and evidence references.
Q3: Why is causality more important than chronology in workflow replay?
30 秒版本:
Chronology tells us that one event happened before another. Causality tells us whether the later action was actually authorized, supported and caused by the earlier evidence. An approval before a tool call is not enough; the approval must apply to the same input hash, scope, policy decision and side effect.
Q4: How would you test control effectiveness for a KYC onboarding agent?
30 秒版本:
I would define the population, such as all KYC agent cases with customer follow-up drafts in the month. I would run automated exception queries for missing reviewer approval, unsupported document claims and automated rejection. Then I would sample by risk tier, document type, geography and reviewer to verify source evidence, policy decisions, human review, final output and customer outcome. Exceptions would be classified as justified exception, documentation defect, process drift or control failure.
Q5: What evidence is needed for incident replay?
30 秒版本:
An incident replay packet needs scope, affected cases, version set, event timeline, causal graph, policy decisions, approvals, tool side effects, outputs, business impact, evidence gaps, control analysis, remediation and learning actions. It should also state reproducibility limits and privacy constraints.
Q6: How do you handle reproducibility limits with LLM agents?
30 秒版本:
I do not promise exact deterministic reproduction unless the stack supports it. I preserve the version set, prompt/config hash, model route, RAG index, retrieved chunks, policy version, tool inputs, approvals, output hash and business state. Replay focuses on reconstructing evidence and control behavior, while documenting model nondeterminism, vendor version limits and redaction boundaries.
21. Portfolio Exercise
Build an "Agentic Process Audit and Workflow Replay Pack" for one financial retail use case. Recommended use cases:
| Use case | Suggested process claim |
|---|---|
| AML investigation copilot | Agent prepares sourced timeline and draft narrative, but analyst owns final disposition. |
| Payment dispute assistant | Agent drafts evidence packet, but chargeback submission requires maker-checker approval. |
| KYC onboarding agent | Agent detects missing evidence and drafts follow-up, but cannot reject applicant. |
| Collections hardship case agent | Agent recommends hardship options, but specialist approves customer treatment. |
| Regulatory reporting narrative drafter | Agent drafts variance narrative from approved metrics, but authorized signer remains accountable. |
| Payment operations repair queue agent | Agent suggests repair and executes only reversible low-risk updates under policy and dual control. |
Required Artifacts
- Process claim with workflow scope, agent boundary, human boundary, policy boundary and outcome evidence.
- Event dictionary covering intent, plan, observation, policy, action, approval, exception, output, feedback and incident.
- Plan/action/observation/approval/output schema with required fields and retention class.
- Replay architecture diagram showing orchestrator, policy engine, tool gateway, HITL, event store, trace store, evidence lake, provenance graph and replay workbench.
- Causal graph for one representative case.
- Audit query catalog with at least 10 queries.
- Control-to-evidence matrix with at least 12 controls.
- Sampling strategy with population definition, method, sample size rationale and pass criteria.
- Exception and override register with owner, expiry and compensating control.
- Incident replay packet for one failure or near miss.
- Process conformance report distinguishing conformant cases, justified exceptions, process drift and control failures.
- Executive assurance narrative for process owner review.
Scoring Rubric
| Criterion | Strong evidence |
|---|---|
| Process rigor | Process claim is specific, testable and linked to workflow scope. |
| BA rigor | Events, states, decisions, requirements and evidence are traceable. |
| Architecture rigor | Replay architecture separates traces, events, provenance, retention and access control. |
| Audit usefulness | Audit queries can be run without manual reconstruction. |
| Control thinking | SoD, approval binding, exception handling and sampling are practical. |
| Financial realism | Examples reflect AML, KYC, disputes, collections, reporting or payment operations constraints. |
| Reproducibility honesty | Replay limits are documented without overclaiming deterministic reproduction. |
| Outcome balance | Business value is paired with customer, control and process conformance evidence. |
22. Final Mental Model
Agentic workflow assurance should make five truths visible:
The final answer is not the process.
The timeline is not the cause.
The approval is not valid unless bound to exact evidence and action.
The exception is not acceptable unless owned, justified, expiring and monitored.
The replay is not audit sign-off, but it is the evidence architecture that makes serious review possible.
The senior-level move is to design AI agents as replayable process participants, not opaque automations.