AI 底层逻辑 / 经典论文

AI Agentic Process Audit：流程审计与重放保证架构

重要说明: 本文是学习、作品集和内部架构训练材料, 不构成法律意见、监管解释、审计意见、模型验证结论、内控有效性结论、风险接受决定或生产上线批准。本文讨论的 audit、assurance、evidence、control effectiveness 和 process owner review, 均指支持内部审计、风险、合规、业务 Owner 和管理层进行独立审阅与挑战的证据架构, 不代表任何正

947 行ai-foundations/papers/161-ai-agentic-process-audit-workflow-replay-assurance-architecture.md

AI Agentic Process Audit / Workflow Replay / Assurance Architecture 解读

面向对象: Senior AI PM / AI Architect / Internal Audit Partner / Process Owner / CBAP-level BA / AI Governance Lead / Financial Retail Operations Leader。核心问题: 当 AI agent 不只是回答问题, 而是理解意图、制定计划、调用工具、请求人工批准、执行流程动作、处理例外并从事故中学习时, 组织如何设计一套可审阅、可重放、可质询、可持续改进的 evidence architecture? 学习目标: 建立 agentic process audit model、replayable trace、event-sourced agent workflow、plan/action/observation/approval/output schema、audit query、sampling strategy、process conformance、incident replay 和 assurance operating model 的高级心智模型。

Source Anchors

以下来源用于组织 AI 风险管理、AI 管理体系、架构描述、需求工程、可观测性、provenance 和金融机构 IT 审查语言。本文只将它们作为产品、架构、BA 和内部 assurance 的设计锚点, 不声称任何 schema、trace 或 replay workbench 自动满足审计、监管或模型风险批准。

Source	Official link	本文采用的思想
NIST AI Risk Management Framework	https://www.nist.gov/itl/ai-risk-management-framework	用 Govern / Map / Measure / Manage 组织 agentic workflow 风险识别、度量、控制、运行监测和改进证据。
ISO/IEC 42001 AI management system	https://www.iso.org/standard/81230.html	用 AI management system 的 scope、policy、operation、performance evaluation、management review 和 improvement 设计持续 assurance operating model。
ISO/IEC/IEEE 42010 Architecture Description	https://www.iso.org/standard/74393.html	用 stakeholder、concern、viewpoint、architecture view、correspondence 和 rationale 组织 replay architecture 的多视图描述。
ISO/IEC/IEEE 29148 Requirements Engineering	https://www.iso.org/standard/72089.html	用 stakeholder need、requirement、verification、validation、traceability 和 information item 设计 process claim 与 evidence contract。
OpenTelemetry Documentation	https://opentelemetry.io/docs/	用 traces、metrics、logs、context propagation 和 semantic attributes 设计 runtime observability 与 replay trace。
W3C PROV Overview	https://www.w3.org/TR/prov-overview/	用 Entity、Activity、Agent 的 provenance 思维表达一次 agentic 行为由谁、基于什么、通过什么活动生成。
FFIEC IT Handbook	https://ithandbook.ffiec.gov/	用金融机构 IT 风险、审查、治理、外包、业务连续性和控制评估语言校准金融零售场景。

一句话:

Agentic process audit is the evidence architecture discipline that turns an AI agent's intent, plan, tool actions, policy decisions, human approvals, exceptions, outputs and learning signals into replayable process evidence for review, challenge and improvement.

1. Executive Summary

普通 AI 系统的审阅重点通常是:

模型是否表现足够好。
RAG 是否引用了正确来源。
输出是否合规、安全、可解释。
上线后是否有监控和事故响应。

Agentic workflow 的复杂度更高, 因为系统会跨越多个业务步骤:

User intent
  -> intent interpretation
  -> plan generation
  -> policy decision
  -> tool selection
  -> tool call
  -> observation
  -> human approval
  -> exception handling
  -> compensating action
  -> final output
  -> feedback
  -> incident learning

如果组织只保存最终答案或普通 API 日志, 内审、业务 Owner、风险团队和架构团队无法回答关键问题:

审阅问题	普通日志的缺口
用户真正要求了什么, agent 如何解释意图	日志只保存 request text, 没有 intent classification 和 confidence
Agent 为什么选择这条计划	没有 plan version、candidate plan、rejected plan 和 rationale
哪个工具动作影响了客户、资金、案件或监管材料	只看到 API 调用成功, 看不到 side effect、idempotency、approval 和 rollback
策略为什么允许、拦截或升级	没有 policy version、decision reason 和 obligations
人工审批者看到了什么证据	只记录 approved, 没有 visible evidence set
例外是合规偏差还是合理业务例外	没有 exception reason、owner、expiry、compensating control
事故发生后能否重放流程	时间线断裂, 版本集合不完整, 敏感内容无法安全访问
控制是否有效	没有 sampling universe、population definition、control test result 和 exception aging

成熟的 agentic process audit architecture 不是让 AI 团队多填表, 而是在 workflow runtime 中原生生成证据:

Process claim
  -> replayable trace
  -> event-sourced agent workflow
  -> evidence chain
  -> audit query
  -> sampling and testing
  -> process conformance review
  -> incident replay
  -> control improvement

高级 AI PM、AI Architect 和 CBAP-level BA 的任务是把 agentic workflow 从 "看起来自动化" 升级成 "可证明地受控"。这里的 "可证明" 不意味着输出完全 deterministic, 也不意味着自动获得审计结论, 而是指组织可以在权限、隐私和保留边界内重建足够证据, 支持独立挑战、管理审阅和持续改进。

2. Target Audience and Role Expectations

Role	需要回答的问题	典型输出
Senior AI PM	Agentic workflow 是否真正改善业务结果, 同时具备可审阅、可回滚、可学习的运行证据	outcome evidence map、replay readiness gate、scale/stop memo
AI Architect	如何设计 trace、event store、provenance graph、replay workbench、redaction 和 retention	replay reference architecture、event schema、audit query architecture
Internal Audit Partner	哪些流程声称可以被验证, 哪些证据可用于独立测试, 哪些仍需人工挑战	audit query catalog、sampling design input、evidence sufficiency review
Process Owner	Agent 是否按批准流程运行, 例外是否合理, 人工审批和补偿动作是否有效	process conformance report、exception aging、control action plan
CBAP-level BA	用户意图、业务规则、流程状态、例外路径、审批和验收标准是否可追踪	process claim model、state/event dictionary、control-to-evidence matrix
Risk / Compliance	自动化边界、客户影响、政策决策、风险接受和补偿控制是否清楚	risk scenario map、policy decision evidence、exception register
Operations Leader	HITL 队列、复核容量、override taxonomy、incident route 和 training 是否可运行	review SOP、capacity evidence、override dashboard
Platform / SRE	证据采集、可观测性、数据最小化、访问控制、SLO 和故障恢复是否可实施	OTel instrumentation、evidence pipeline、replay SLO

成熟组织会把 agentic process audit 当成流程自动化产品的一部分, 不是上线前由内审临时提出的证据补丁。

3. Learning Objectives

完成本文后应能:

区分 audit trail、observability、provenance、evidence chain 和 workflow replay。
定义 process claim, 并把它拆成可验证的 workflow、decision、control 和 outcome 证据。
设计 event-sourced agent workflow, 覆盖 intent、plan、action、observation、approval、policy、exception、output、feedback 和 incident learning。
写出 plan/action/observation/approval/output event schema, 明确字段、版本、保留、脱敏和访问边界。
解释 causality vs chronology 的差异, 避免把时间线当作因果证明。
设计 workflow replay architecture, 支持 technical replay、business replay、control replay 和 incident replay。
建立 audit query catalog, 支持内审、流程 Owner、风险、合规、模型风险和事故复盘。
设计 risk-based sampling strategy, 测试控制设计有效性和运行有效性。
区分 process conformance、justified exception、policy override 和 control failure。
在 AML investigation copilot、payment dispute assistant、KYC onboarding agent、collections hardship case agent、regulatory reporting narrative drafter 和 payment operations repair queue agent 中应用。

4. Core Concepts

4.1 Process Claim

Process claim 是团队对 agentic workflow 的可验证主张。它不能写成空泛的 "agent improves operations", 而要写成可审阅、可测试、可采样的声明。

Weak claim	Strong process claim
AML copilot helps analysts work faster	For AML alert type A and B, the copilot generates a sourced case timeline and draft narrative, but final disposition remains analyst-owned and every saved narrative has source-span evidence.
KYC agent automates onboarding	The KYC onboarding agent classifies documents, identifies missing evidence and drafts customer follow-up, but cannot reject an applicant without human review and appeal path.
Payment repair agent fixes exceptions	The payment repair queue agent suggests repair actions and executes only low-risk reversible updates after policy decision and dual-control approval.

Process claim 的组成:

Business outcome
  + workflow scope
  + agent responsibility
  + human responsibility
  + policy boundary
  + evidence requirement
  + control and exception handling
  + outcome measurement

4.2 Replayable Trace

Replayable trace 不是简单地把原始 prompt、答案和日志全部保存。它是一个最小充分证据集合, 能在授权条件下重建关键流程事实:

谁发起了请求, 角色和权限是什么。
用户意图如何被解释, 是否存在歧义。
Agent 生成了什么计划, 哪些步骤被执行或跳过。
哪些工具被调用, 输入、输出、side effect 和 idempotency 证据是什么。
哪些 policy decisions 允许、拦截、升级或附加 obligations。
哪些人工审批、覆盖、拒绝或补充信息发生。
哪些异常被识别, 采取了什么补偿动作。
最终输出如何生成、保存、发送或丢弃。
反馈、事故信号和改进动作如何进入 learning loop。

Replayable 不等于 bit-for-bit deterministic。对 LLM 和第三方模型而言, 完全复现输出常常不可承诺。成熟表达应是:

We can reconstruct the approved version set, inputs at the agreed evidence level, policy decisions, tool actions, approvals, outputs and observed outcomes, while documenting reproducibility limits.

4.3 Event-Sourced Agent Workflow

Event-sourced agent workflow 把 agent 行为记录为不可随意改写的业务事件序列。当前 workflow state 可以由事件投影而来, 事故复盘和过程审阅可以从事件重新构造。

Event family	典型事件	审阅价值
Intent	`intent.received`、`intent.classified`、`intent.clarified`	证明用户需求和 agent 解释之间的关系
Plan	`plan.generated`、`plan.reviewed`、`plan.revised`、`plan.approved`	证明 agent 为什么按某个路径行动
Action	`action.proposed`、`tool.invoked`、`tool.completed`、`tool.failed`	证明工具动作、参数、结果和副作用
Observation	`observation.received`、`context.loaded`、`retrieval.completed`	证明 agent 看到的事实和证据
Policy	`policy.evaluated`、`policy.blocked`、`policy.escalated`	证明控制点真实运行
Approval	`approval.requested`、`approval.decided`、`approval.expired`	证明 HITL 和 dual control
Exception	`exception.detected`、`override.recorded`、`compensation.executed`	证明例外管理和补救
Output	`output.drafted`、`output.finalized`、`output.delivered`、`output.discarded`	证明客户、监管或业务记录中的输出来源
Feedback	`feedback.captured`、`qa.sampled`、`user.edited`	证明 adoption、quality 和 learning signals
Incident	`incident.signal.detected`、`incident.replay.started`、`learning.action.created`	证明事故复盘和改进闭环

5. Agentic Process Audit Model

5.1 Audit Layers

Agentic process audit model 可分为八层。

Layer	核心问题	Evidence objects
1. Intent layer	用户、系统或队列触发的业务意图是什么	request record、intent classification、clarification prompt、user role
2. Process layer	该意图对应哪个批准流程、状态机和业务规则	workflow definition、state transition、process version、BPMN / state model
3. Planning layer	Agent 如何把意图变成计划, 是否需要审批	plan candidate、plan rationale、risk tier、approval requirement
4. Control layer	哪些 policy、entitlement、SoD、risk 和 compliance controls 被执行	policy decision、permission result、SoD check、obligations
5. Action layer	Agent 读取、生成、写入或提交了什么	tool contract、tool input hash、tool output pointer、side effect id
6. Human layer	人如何审阅、批准、覆盖、拒绝或补充事实	visible evidence set、decision reason、reviewer role、edit diff
7. Output layer	哪些输出进入业务记录、客户沟通或监管材料	output hash、citation map、delivery state、record id
8. Learning layer	缺陷、反馈、事故和抽样结果如何改变未来行为	QA finding、incident replay report、eval case、control action

5.2 Process Claim to Evidence Chain

一个 process claim 必须连到 evidence chain:

Process claim
  -> requirement
  -> control objective
  -> workflow event
  -> evidence object
  -> audit query
  -> sampling approach
  -> control result
  -> management action

示例: Payment dispute assistant

Chain element	Example
Process claim	Assistant drafts dispute evidence packet but does not submit chargeback without maker-checker approval.
Requirement	All customer-impacting dispute submissions require reviewer approval based on visible evidence set.
Control objective	Prevent unsupported or unauthorized chargeback submissions.
Workflow event	`approval.decided` before `tool.invoked` for `chargeback_submit`.
Evidence object	approval record, visible evidence hash, tool input hash, policy decision id, side effect id.
Audit query	Show all chargeback submissions where approval is missing, expired or not tied to exact execution input.
Sampling approach	100% automated exception query plus monthly sample of approved submissions.
Control result	Exceptions by reason, reviewer quality, stale approval count, remediation status.
Management action	Update tool gateway to block mismatched approval input hashes.

6. Event Schema

6.1 Common Event Envelope

Agentic process events should share a stable envelope, even if each event type has domain-specific payload.

Field	Purpose
`event_id`	globally unique event identifier
`event_type`	stable event name such as `ai.plan.generated`
`schema_version`	payload contract version
`occurred_at`	event time in UTC
`recorded_at`	collection time in UTC
`trace_id`	end-to-end workflow trace
`workflow_id`	business workflow instance
`case_id_hash`	privacy-preserving case reference
`use_case_id`	AI use case registry id
`risk_tier`	workflow risk classification
`producer`	service, gateway or application emitting event
`actor_type`	user, agent, service, reviewer, policy engine
`actor_id_hash`	hashed or tokenized actor identity
`data_class`	classification for privacy, security and retention
`retention_policy_id`	retention rule applied to event
`redaction_profile`	how sensitive content was minimized
`prev_event_ids`	causal predecessors, not only previous time event
`evidence_refs`	pointers to controlled evidence objects
`integrity_hash`	tamper-evidence or content hash

6.2 Plan / Action / Observation / Approval / Output Schema

Schema	Required fields	Assurance use
Plan event	`plan_id`、`plan_version`、`goal`、`steps`、`risk_assessment`、`requires_approval`、`rationale_hash`、`rejected_options`	Shows why the agent intended to act and whether the plan crossed a risk boundary.
Action event	`action_id`、`tool_name`、`tool_schema_version`、`action_type`、`input_hash`、`dry_run_result`、`policy_decision_id`、`approval_id`、`side_effect_id`、`idempotency_key`	Shows tool use, write boundaries, approvals and recoverability.
Observation event	`observation_id`、`source_system`、`source_version`、`fact_type`、`fact_hash`、`retrieval_refs`、`confidence_band`、`freshness`	Shows what facts the agent used and whether they were current and authorized.
Policy event	`policy_id`、`policy_version`、`decision`、`reason_code`、`obligations`、`input_attribute_hash`、`evaluation_mode`	Shows allow, block, escalate, redact, approval required or restrict decisions.
Approval event	`approval_id`、`request_reason`、`approver_role`、`decision`、`reason_code`、`visible_evidence_hash`、`approval_scope`、`expiry`	Shows HITL, maker-checker and SoD evidence.
Exception event	`exception_id`、`exception_type`、`detected_by`、`severity`、`justification`、`owner`、`compensating_control`、`expiry`、`closure_evidence`	Shows whether exceptions are controlled or unmanaged drift.
Output event	`output_id`、`output_type`、`output_hash`、`citation_map`、`safety_label`、`record_system`、`delivery_channel`、`final_status`	Shows what was finalized, stored, delivered or suppressed.
Feedback event	`feedback_id`、`feedback_type`、`accept_edit_reject`、`edit_diff_hash`、`reason_code`、`business_outcome_ref`	Shows adoption, quality signals and business outcome evidence.

6.3 Example Event

{
  "event_id": "evt_20260630_kyc_000184",
  "event_type": "ai.approval.decided",
  "schema_version": "1.0",
  "occurred_at": "2026-06-30T18:42:10Z",
  "recorded_at": "2026-06-30T18:42:11Z",
  "trace_id": "trc_kyc_onboarding_72f1",
  "workflow_id": "wf_kyc_case_review_9482",
  "case_id_hash": "sha256:1d4f...",
  "use_case_id": "kyc_onboarding_agent",
  "risk_tier": "high",
  "producer": "kyc-review-workbench",
  "actor_type": "reviewer",
  "actor_id_hash": "hash:user:9a27",
  "data_class": "restricted",
  "retention_policy_id": "ret_ai_high_business_record_7y",
  "redaction_profile": "pii-minimized-v3",
  "prev_event_ids": ["evt_20260630_kyc_000181", "evt_20260630_kyc_000183"],
  "evidence_refs": [
    "evidence://visible-set/vis_kyc_5521",
    "evidence://policy-decision/pol_kyc_8910",
    "evidence://tool-input/tool_kyc_followup_2322"
  ],
  "integrity_hash": "sha256:ab77...",
  "data": {
    "approval_id": "appr_kyc_7721",
    "request_reason": "customer_followup_message",
    "approver_role": "KYC_Senior_Reviewer",
    "decision": "approved_with_edit",
    "reason_code": "missing_ubo_evidence_clearer_language",
    "visible_evidence_hash": "sha256:392e...",
    "approval_scope": "draft_customer_followup_only",
    "expiry": "2026-07-01T18:42:10Z"
  }
}

7. Causality vs Chronology

Workflow replay must not confuse chronological order with causal explanation.

Chronology asks	Causality asks
What happened before what?	Which event caused, enabled, constrained or justified another event?
What was the next timestamp?	What policy, approval, observation or plan step was a required predecessor?
Did the tool call occur after approval?	Was this approval scoped to this exact tool input and still valid at execution time?
Did the output follow retrieval?	Did the output's material claims rely on retrieved evidence or unsupported model generation?

Agentic workflow needs both:

Chronological timeline for human-readable incident review.
Causal graph for control testing, evidence lineage and independent challenge.

Example:

Approval A happened before Tool Call B.
That is chronology.

Approval A authorized input hash H1.
Tool Call B executed input hash H2.
H1 != H2.
That is causal control failure.

成熟 replay architecture 应记录 prev_event_ids、causal_refs、policy_decision_id、approval_scope、input_hash 和 side_effect_id, 让审阅者能判断 "按时间发生过" 与 "被正确授权" 的差异。

8. Workflow Replay Architecture

8.1 Logical Architecture

Business workflow UI / queue / API
        |
AI agent orchestrator
  intent classifier | planner | policy engine | tool gateway | HITL workflow
        |
OpenTelemetry traces + agentic process events
        |
Evidence collection layer
  schema validation | redaction | hashing | retention tagging | integrity checks
        |
Event store + trace store + evidence lake
        |
Provenance graph
  entities | activities | agents | causal edges | version set
        |
Replay workbench
  timeline view | causal graph | evidence chain | policy decision view | approval view
        |
Audit query and assurance reporting
  process conformance | sampling | incident replay | control testing | process owner review

8.2 Core Components

Component	Responsibility
Agent orchestrator	Emits trace context and events for intent, plan, tool call, observations, approvals and outputs.
Policy decision point	Produces versioned allow, block, escalate, approval-required and obligation decisions.
Tool gateway	Enforces authorization, idempotency, dry-run, side-effect capture, approval binding and kill switch.
HITL workflow service	Records visible evidence, reviewer decision, reason code, edit diff, expiry and SoD result.
Evidence collector	Validates event schema, applies redaction profile, computes hashes and assigns retention class.
Event store	Stores append-only domain events for workflow replay and conformance analysis.
Trace store	Stores spans, timing, cost, latency and dependency context for observability and drilldown.
Provenance graph	Links entities, activities and agents across prompt, context, RAG, tool, approval and output objects.
Replay workbench	Lets authorized reviewers reconstruct timelines, causal chains, version sets and evidence gaps.
Audit query layer	Provides parameterized queries for internal audit, risk, process owner and incident review.
Redaction and access layer	Controls who can see raw content, redacted content, hashes, pointers and aggregate metrics.
Retention and legal hold layer	Applies retention policy, deletion handling and evidence preservation for incidents or reviews.

8.3 Replay Modes

Replay mode	Purpose	Limits
Technical replay	Reconstruct model, prompt, RAG, policy, tool and runtime version set.	May not reproduce identical LLM output if provider behavior changed.
Business replay	Reconstruct workflow states, user actions, approvals, outputs and business record updates.	Requires business systems to retain record ids and state transitions.
Control replay	Re-evaluate whether required controls fired and were bound to correct inputs.	Does not itself prove control design is sufficient.
Incident replay	Reconstruct events in an incident window and connect root cause, impact and remediation.	Sensitive content may require restricted access and legal guidance.
Learning replay	Turn failures, edits, overrides and QA findings into eval cases and process improvements.	Must avoid using feedback data outside purpose and consent boundaries.

8.4 Reproducibility Limits

For agentic AI, replay architecture must explicitly document reproducibility limits:

Limit	Mitigation
LLM nondeterminism	Preserve prompt/config/model route/output hash and use replay to compare behavior, not guarantee exact output.
Third-party model version opacity	Capture vendor metadata, response headers, model alias resolution and contractually available version info.
External tool state changes	Store tool request/response hashes, side-effect ids, business record state and compensating action records.
Knowledge base drift	Preserve KB version, index version, retrieved chunk ids, effective dates and document lifecycle state.
Privacy redaction	Preserve enough hashed/pointer evidence to reconstruct under authorized conditions without broad raw content exposure.

9. Audit Trail vs Observability

Audit trail and observability overlap but are not the same.

Dimension	Audit trail	Observability
Primary question	Can we prove who did what, when, under which authority and evidence?	Can we understand system behavior, performance and failure modes?
Main users	Internal audit, risk, compliance, process owners, regulators, legal	SRE, platform, engineering, product, operations
Time horizon	Retention aligned to control, business record and regulatory needs	Operational trend and incident analysis horizon
Data shape	Events, approvals, version records, evidence objects, control results	Traces, metrics, logs, exemplars, dashboards
Quality bar	Completeness, integrity, non-repudiation, access control, chain of evidence	Coverage, latency, sampling, diagnostic usefulness
Failure mode	Cannot prove control occurred or output was authorized	Cannot diagnose outage, drift, cost spike or bad dependency

Mature architecture connects both:

Observability shows that something happened and how the system behaved.
Audit trail shows whether the behavior was authorized, controlled, evidenced and reviewable.

For example, an OTel span may show tool.invoke completed in 250 ms. Audit evidence must additionally show:

Which tool contract and schema version.
Which user or agent authority was used.
Which policy decision allowed it.
Whether approval was required and completed.
What side effect occurred.
Whether the action matched approval scope.
How to reverse or compensate if needed.

10. Evidence Chain

10.1 Evidence Chain Model

Agentic process evidence should be organized as a chain:

Business outcome evidence
  <- process conformance evidence
  <- workflow replay evidence
  <- event evidence
  <- source system evidence
  <- version and control evidence

Evidence type	Example	Assurance use
Business outcome evidence	alert aging reduction, dispute cycle time, onboarding completion, hardship treatment quality	Shows whether workflow produced intended value without unacceptable harm.
Process conformance evidence	state transitions, required approvals, SoD results, exception reasons	Shows whether workflow followed approved process or justified exception.
Workflow replay evidence	trace, event sequence, causal graph, version set	Supports reconstruction and independent challenge.
Event evidence	plan, action, observation, policy, approval, output, feedback events	Provides granular proof of actions and decisions.
Source system evidence	case management records, payment system state, KYC document store, regulatory report data	Anchors AI evidence to systems of record.
Version and control evidence	prompt, model, KB, policy, tool schema, release bundle	Shows which approved artifact set governed behavior.

10.2 Business Outcome Evidence

Business outcome evidence prevents audit architecture from becoming purely technical. For financial retail examples:

Use case	Outcome evidence	Control counterweight
AML investigation copilot	reduced alert aging, better narrative completeness, lower reopen rate	no unsupported SAR implication, analyst final disposition retained
Payment dispute assistant	faster evidence packet creation, fewer missing-document reworks	chargeback submission requires maker-checker approval
KYC onboarding agent	faster first-pass completion, lower document chase volume	no automated rejection, appeal route preserved
Collections hardship case agent	better hardship option matching, improved follow-up timeliness	vulnerable customer and fair treatment checks
Regulatory reporting narrative drafter	shorter variance explanation cycle, fewer reviewer corrections	no unsupported metric cause, source lineage visible
Payment operations repair queue agent	lower repair backlog and fewer duplicate repairs	dual control for irreversible or customer-impacting updates

11. Exception and Override Handling

Exceptions are expected in real processes. The assurance question is whether they are explicit, justified, owned, time-bound and learned from.

11.1 Exception Types

Type	Example	Required evidence
Business exception	KYC case requires non-standard document due to jurisdiction rule	policy citation, reason code, reviewer approval, customer communication record
Policy override	Payment dispute response needs supervisor approval despite low automated risk score	override reason, approver role, scope, expiry, evidence set
Tool exception	Payment repair API unavailable, manual repair queue used	incident link, manual action record, reconciliation evidence
Data exception	RAG source stale for one policy section	affected scope, compensating manual source check, expiry
Workflow exception	HITL reviewer queue exceeds SLA, case routed to backup team	capacity signal, escalation decision, customer impact analysis
Model exception	Agent confidence below threshold but reviewer proceeds after independent evidence check	confidence band, reviewer rationale, QA sample inclusion

11.2 Good Override Record

Field	Strong content
override_id	stable id tied to workflow and case
baseline rule	the rule, policy or process step being overridden
business reason	why standard route does not fit
risk impact	customer, financial, compliance, operational and reputational impact
approver role	role with authority and independence
visible evidence	what the approver saw at decision time
compensating control	extra review, sampling, reconciliation, customer notice or temporary limit
expiry or closure trigger	when override ends or must be reviewed
learning path	whether this becomes process update, eval case, policy clarification or training

Weak override:

Supervisor approved exception.

Strong override:

Supervisor approved hardship option deviation for case class H2 because customer submitted verified disaster impact documentation not covered by standard script. AI recommendation was restricted to draft language. Final treatment selected by hardship specialist. Case added to monthly vulnerable-customer QA sample and policy team review.

12. Segregation of Duties

Agentic AI can blur roles because the same platform may propose, execute, document and monitor a process. Segregation of duties must be explicit.

Risk	Control design
Agent proposes and executes high-impact action without independent review	Tool gateway enforces approval for write or customer-impacting actions.
Same user requests and approves a payment repair	SoD check blocks self-approval and routes to independent reviewer.
Developer changes prompt and approves production release	Release workflow separates change author, reviewer and production approver.
Model owner validates their own control effectiveness	Independent challenge by risk, model risk, QA or internal audit partner.
Operations team suppresses incident evidence	Incident evidence preservation controlled by platform/security/legal process.

SoD event evidence should include:

requester_role
approver_role
relationship_check_result
same_user_blocked
delegated_authority_source
approval_scope
independent_challenge_required
break_glass_reason

Independent challenge does not mean every event is manually reviewed. It means the evidence architecture allows authorized second-line, third-line, QA or process-owner reviewers to test whether controls operated as designed.

13. Sampling and Testing Approach

13.1 Population Definition

Sampling begins with a population definition. Without it, teams cherry-pick good examples.

Population	Example
All high-risk workflow instances	All KYC onboarding agent cases where customer follow-up was drafted in June 2026.
All customer-impacting tool actions	All payment repair queue updates that changed customer-visible status.
All overrides	All collections hardship cases where AI recommendation was overridden.
All policy blocks	All regulatory narrative drafts blocked for unsupported claim.
All incidents or near misses	All AML copilot summaries flagged by QA as material evidence omission.

13.2 Testing Types

Test type	Purpose	Example
Design effectiveness test	Determine whether the control, if operated, would address the risk.	Does tool gateway approval binding prevent mismatched execution input?
Operating effectiveness test	Determine whether the control actually operated across samples.	Sample approved chargeback submissions and verify approval hash equals tool input hash.
Automated exception query	100% scan for impossible or prohibited patterns.	Find tool write actions with no policy decision or expired approval.
Process conformance test	Compare actual event sequence to approved workflow model.	KYC follow-up must have intent, document observation, policy check, draft, review and output.
Outcome reasonableness test	Compare process result with business outcome and control counterweight.	Dispute cycle time improved without higher rework or complaint rate.
Replay drill	Reconstruct one case end-to-end under access controls.	Rebuild AML case timeline from trace, events, source records and approvals.

13.3 Sampling Strategy

Risk-based sampling should combine:

100% automated queries for missing mandatory events.
Attribute sampling for required control evidence.
Judgmental samples for high-risk, high-value, high-complexity or complaint-linked cases.
Stratified samples by channel, segment, language, geography, model route and reviewer.
Incident-driven samples from near misses, overrides and QA failures.
Negative samples where agent refused, escalated or blocked action.

Sampling should record:

Field	Meaning
population_id	stable query or dataset defining universe
sample_method	random, stratified, risk-based, judgmental, exception query
sample_period	time window
sample_size	count and rationale
test_objective	design, operating, conformance, outcome or incident replay
pass_criteria	precise evidence expectation
exception_classification	justified exception, documentation defect, control failure, process drift
remediation_owner	process, product, architecture, operations or control owner

14. Process Conformance

Process conformance asks whether actual workflow execution matches the approved process model.

Approved model:
intent -> plan -> policy -> tool dry-run -> approval -> tool execution -> output -> feedback

Actual trace:
intent -> plan -> tool execution -> output

Conformance result:
non-conformant because policy and approval events are missing before customer-impacting tool execution.

14.1 Conformance vs Justified Exception

Category	Meaning	Example
Conformant	Actual event path follows approved process.	KYC agent drafted follow-up only after document evidence and reviewer approval.
Justified exception	Process deviated but with authorized reason, owner and compensating control.	Backup reviewer approved due to outage under documented continuity procedure.
Control failure	Required control absent, expired, mismatched or bypassed.	Payment repair tool executed without valid approval.
Process drift	Repeated deviations show the real process has changed without approval.	Reviewers routinely skip citation check because UI makes it difficult.
Model of process wrong	Approved process model omits legitimate operational path.	AML escalation path for multi-jurisdiction cases not modeled.

Mature review does not punish every deviation. It classifies deviations and updates process, controls, training or tooling based on evidence.

14.2 Audit Queries for Conformance

Query	Purpose
Show all tool write actions without preceding policy decision.	Detect bypassed control.
Show all approvals where visible evidence hash is missing.	Detect weak HITL evidence.
Show all output deliveries where output hash differs from approved draft hash.	Detect post-approval mutation.
Show all overrides by reviewer, reason and case type.	Detect concentration, training need or process ambiguity.
Show all workflows where exception expiry passed without closure.	Detect unmanaged residual risk.
Show all regulatory narrative drafts with unsupported material claims.	Detect output evidence failure.

15. Incident Replay

Incident replay reconstructs what happened, why it happened, who or what allowed it, what impact occurred and what changed afterward.

15.1 Incident Replay Packet

Section	Contents
Incident scope	incident id, use case, workflow, time window, affected cases, severity
Version set	model, prompt, RAG index, policy, tool schema, release bundle, feature flags
Timeline	chronological events and spans
Causal graph	required predecessors, policy decisions, approvals, tool actions, outputs
Evidence gaps	missing spans, missing event fields, inaccessible source records, redaction limits
Customer or business impact	affected customers, cases, funds, reports, timelines, operational backlog
Control analysis	which controls worked, failed, were bypassed or were absent
Exception analysis	whether deviations were justified, expired or unmanaged
Remediation	rollback, compensation, customer action, policy update, prompt update, tool restriction
Learning loop	eval cases, regression tests, training, process model update, control improvement

15.2 Example: Payment Operations Repair Queue Agent

Incident signal:

Duplicate repair actions increased after a workflow release.

Replay findings:

Finding	Evidence
Duplicate side effects occurred in 18 cases	tool side_effect_id and payment system state transitions
Idempotency key was generated from case id only, not repair action id	tool gateway event schema and code release notes
Approval existed but was scoped to first repair attempt	approval scope and expiry
Retry path bypassed dry-run after timeout	trace timeline and causal graph
Customer-visible balances were corrected through compensating entries	compensating action records and reconciliation report
Regression test added	eval/control test case for retry idempotency

Assurance conclusion should be framed carefully:

The replay packet supports process owner, risk and internal audit review of the incident evidence. It does not by itself constitute audit sign-off or regulatory closure.

16. Financial Retail Examples

16.1 AML Investigation Copilot

Audit focus	Replay evidence
Analyst final accountability	final disposition event, analyst approval, no auto-SAR submission
Evidence completeness	transaction refs, KYC refs, adverse media refs, source-span map
Narrative quality	draft narrative, edit diff, QA sample, reopen reason
Sensitive data boundary	redaction profile, access log, SAR-related data retention class
Incident learning	omitted-evidence QA finding converted into regression case

16.2 Payment Dispute Assistant

Audit focus	Replay evidence
Chargeback submission authority	maker-checker approval, tool input hash, policy decision
Evidence packet support	transaction timeline, merchant evidence, network rule source
Customer communication	approved letter output hash, delivery channel, complaint link
Exceptions	provisional credit exception, supervisor reason, expiry
Outcome evidence	cycle time, rework, win/loss rate, complaint trend

16.3 KYC Onboarding Agent

Audit focus	Replay evidence
No automated rejection	output status, reviewer decision, appeal route evidence
Missing evidence detection	document observation events, confidence band, source pointer
Customer follow-up	draft, reviewer edit, approved final message
High-risk jurisdiction	policy decision, escalation obligation, senior review
Process conformance	onboarding state transitions and exception path

16.4 Collections Hardship Case Agent

Audit focus	Replay evidence
Fair treatment	vulnerability flag handling, policy decision, human specialist approval
Option recommendation	hardship facts, source policy, plan rationale
Override quality	override reason codes, supervisor sample, customer outcome
Communications	approved message hash and record id
Learning loop	complaints and QA findings converted to training or policy clarification

16.5 Regulatory Reporting Narrative Drafter

Audit focus	Replay evidence
Source lineage	metric id, data source, report period, transformation refs
Unsupported claim prevention	citation map, policy block for hallucinated cause
Maker-checker	reviewer approval, visible evidence, edit diff
Attestation boundary	AI draft marked as draft, authorized signer retained
Retention	report pack evidence aligned to regulatory record policy

16.6 Payment Operations Repair Queue Agent

Audit focus	Replay evidence
Repair action authorization	tool risk tier, policy decision, dual control
Reversibility	side effect id, idempotency key, compensating action
Queue prioritization	plan rationale, SLA, customer impact
Exception handling	manual repair route, reconciliation evidence
Outcome evidence	backlog, duplicate repair rate, settlement break trend

17. Operating Model

17.1 RACI

Activity	AI PM	AI Architect	BA	Process Owner	Risk / Compliance	Internal Audit Partner	Platform / SRE
Process claim definition	A	C	R	A	C	C	I
Event schema	C	A	R	C	C	C	R
Replay architecture	C	A	C	C	C	C	R
Control-to-evidence matrix	C	C	R	A	R	C	I
Audit query catalog	C	R	R	C	C	A/C	C
Sampling strategy	C	C	C	A	R	C	I
Exception register	R	C	R	A	C	C	I
Incident replay	R	R	C	A	A/C	C	R
Learning loop	A	R	R	A	C	I	C

R = Responsible, A = Accountable, C = Consulted, I = Informed.

17.2 Cadence

Forum	Cadence	Main question	Output
Workflow evidence design review	Before pilot and major release	Are process claims, events, controls and replay needs defined?	evidence contract and release gate
Process conformance review	Monthly or risk-based	Are actual traces matching approved process?	conformance report and action log
Exception and override review	Weekly for high-risk workflows	Are exceptions justified, aging and closing?	exception register update
Incident replay review	Triggered by incident or near miss	What happened, why, impact and learning?	replay packet and remediation
Assurance management review	Quarterly	Are controls, outcomes and evidence architecture improving?	management action and roadmap

18. Anti-Patterns

Anti-pattern	Why it fails	Mature replacement
Final answer as audit evidence	It hides intent, plan, tool calls, approvals and policy decisions.	Replayable trace with event-sourced workflow evidence.
Logging everything raw	Creates privacy, security and retention risk without better assurance.	Minimum sufficient evidence with redaction, hash, pointer and controlled raw access.
Chronology treated as causality	"Approval happened before action" does not prove action was approved.	Causal links through input hash, approval scope and policy decision.
HITL recorded as yes/no	Review cannot determine what evidence human saw.	Visible evidence set, decision reason, edit diff and expiry.
Exceptions hidden in comments	Cannot distinguish justified business exception from control failure.	Structured exception record with owner, expiry and compensating control.
Audit query afterthought	Evidence exists but cannot answer real review questions.	Audit query catalog designed during requirements and architecture.
Same team self-certifies all controls	Lack of independent challenge and SoD.	Separate author, approver, reviewer and process owner roles based on risk.
Replay promises exact LLM reproduction	Overclaims determinism and ignores vendor/model drift.	Document reproducibility limits and preserve version set plus output hashes.
Sampling only successful cases	Misses near misses, blocks, overrides and failures.	Risk-based samples covering negative paths and exceptions.
Outcome metrics without control counterweights	Speed gains may hide customer harm or control erosion.	Pair business outcome evidence with quality, risk and conformance evidence.

19. PM / BA / Architect Implications

19.1 For Senior AI PM

Define process claims before building automation.
Treat workflow replay as a product capability, not just internal logging.
Pair value metrics with control and customer outcome evidence.
Require release gates for evidence coverage, replay readiness and exception handling.
Write scale recommendations that show uncertainty, residual risk owner and monitoring triggers.

19.2 For CBAP-level BA

Model agentic workflow as state, event, decision and evidence objects.
Translate stakeholder needs into process claims, acceptance criteria and audit queries.
Make exception paths first-class requirements.
Distinguish conformance, justified exception, process drift and control failure.
Ensure every material process claim has evidence, owner and test approach.

19.3 For AI Architect

Design trace context, event schema and provenance graph early.
Use event-sourced workflow where replay and conformance matter.
Bind approvals to exact action inputs and side effects.
Record version sets for model, prompt, RAG, policy, tool schema and workflow.
Build redaction, retention and access control into the evidence architecture.

19.4 For Internal Audit Partner

Challenge whether process claims are testable.
Review whether evidence is complete, reliable and independently queryable.
Help shape sampling strategy and audit query catalog without owning management controls.
Distinguish evidence sufficiency for review from formal audit conclusions.

20. Interview Answers

Q1: How do you make an agentic AI workflow auditable without slowing it down?

30 秒版本:

I would design auditability into the workflow runtime. Every agentic workflow should emit structured events for intent, plan, observations, policy decisions, tool calls, human approvals, exceptions, outputs and feedback. Those events connect to traces, source records and a provenance graph, so process owners and audit partners can replay cases, test controls and sample exceptions without asking teams to manually reconstruct evidence.

2 分钟版本:

I start from process claims. For example, a payment dispute assistant may draft evidence packets but cannot submit a chargeback without maker-checker approval. That claim becomes requirements, controls, events and audit queries. The architecture emits a replayable trace: intent classification, plan, RAG evidence, policy decision, tool dry-run, approval visible evidence, tool execution, output and feedback.

To avoid slowing teams, evidence is generated as work happens. The tool gateway enforces approval binding, idempotency and side-effect capture. The HITL workflow records visible evidence and reason codes. OpenTelemetry traces provide operational context, while event store and provenance graph provide process and causal evidence. Internal audit or process owners can then run queries such as "show all customer-impacting tool actions without valid approval" and sample cases by risk. The result is not automatic audit sign-off, but a stronger evidence architecture for review and assurance.

Q2: What is the difference between observability and audit trail for agentic AI?

30 秒版本:

Observability explains system behavior: latency, errors, traces, costs and dependencies. Audit trail proves authority and evidence: who requested, what plan was approved, which policy allowed it, what tool action happened, what output was delivered and whether exceptions were justified. Agentic AI needs both connected by trace ids and evidence references.

Q3: Why is causality more important than chronology in workflow replay?

30 秒版本:

Chronology tells us that one event happened before another. Causality tells us whether the later action was actually authorized, supported and caused by the earlier evidence. An approval before a tool call is not enough; the approval must apply to the same input hash, scope, policy decision and side effect.

Q4: How would you test control effectiveness for a KYC onboarding agent?

30 秒版本:

I would define the population, such as all KYC agent cases with customer follow-up drafts in the month. I would run automated exception queries for missing reviewer approval, unsupported document claims and automated rejection. Then I would sample by risk tier, document type, geography and reviewer to verify source evidence, policy decisions, human review, final output and customer outcome. Exceptions would be classified as justified exception, documentation defect, process drift or control failure.

Q5: What evidence is needed for incident replay?

30 秒版本:

An incident replay packet needs scope, affected cases, version set, event timeline, causal graph, policy decisions, approvals, tool side effects, outputs, business impact, evidence gaps, control analysis, remediation and learning actions. It should also state reproducibility limits and privacy constraints.

Q6: How do you handle reproducibility limits with LLM agents?

30 秒版本:

I do not promise exact deterministic reproduction unless the stack supports it. I preserve the version set, prompt/config hash, model route, RAG index, retrieved chunks, policy version, tool inputs, approvals, output hash and business state. Replay focuses on reconstructing evidence and control behavior, while documenting model nondeterminism, vendor version limits and redaction boundaries.

21. Portfolio Exercise

Build an "Agentic Process Audit and Workflow Replay Pack" for one financial retail use case. Recommended use cases:

Use case	Suggested process claim
AML investigation copilot	Agent prepares sourced timeline and draft narrative, but analyst owns final disposition.
Payment dispute assistant	Agent drafts evidence packet, but chargeback submission requires maker-checker approval.
KYC onboarding agent	Agent detects missing evidence and drafts follow-up, but cannot reject applicant.
Collections hardship case agent	Agent recommends hardship options, but specialist approves customer treatment.
Regulatory reporting narrative drafter	Agent drafts variance narrative from approved metrics, but authorized signer remains accountable.
Payment operations repair queue agent	Agent suggests repair and executes only reversible low-risk updates under policy and dual control.

Required Artifacts

Process claim with workflow scope, agent boundary, human boundary, policy boundary and outcome evidence.
Event dictionary covering intent, plan, observation, policy, action, approval, exception, output, feedback and incident.
Plan/action/observation/approval/output schema with required fields and retention class.
Replay architecture diagram showing orchestrator, policy engine, tool gateway, HITL, event store, trace store, evidence lake, provenance graph and replay workbench.
Causal graph for one representative case.
Audit query catalog with at least 10 queries.
Control-to-evidence matrix with at least 12 controls.
Sampling strategy with population definition, method, sample size rationale and pass criteria.
Exception and override register with owner, expiry and compensating control.
Incident replay packet for one failure or near miss.
Process conformance report distinguishing conformant cases, justified exceptions, process drift and control failures.
Executive assurance narrative for process owner review.

Scoring Rubric

Criterion	Strong evidence
Process rigor	Process claim is specific, testable and linked to workflow scope.
BA rigor	Events, states, decisions, requirements and evidence are traceable.
Architecture rigor	Replay architecture separates traces, events, provenance, retention and access control.
Audit usefulness	Audit queries can be run without manual reconstruction.
Control thinking	SoD, approval binding, exception handling and sampling are practical.
Financial realism	Examples reflect AML, KYC, disputes, collections, reporting or payment operations constraints.
Reproducibility honesty	Replay limits are documented without overclaiming deterministic reproduction.
Outcome balance	Business value is paired with customer, control and process conformance evidence.

22. Final Mental Model

Agentic workflow assurance should make five truths visible:

The final answer is not the process.
The timeline is not the cause.
The approval is not valid unless bound to exact evidence and action.
The exception is not acceptable unless owned, justified, expiring and monitored.
The replay is not audit sign-off, but it is the evidence architecture that makes serious review possible.

The senior-level move is to design AI agents as replayable process participants, not opaque automations.