返回 Papers
AI 底层逻辑 / 经典论文

AI Agentic Process Audit:流程审计与重放保证架构

重要说明: 本文是学习、作品集和内部架构训练材料, 不构成法律意见、监管解释、审计意见、模型验证结论、内控有效性结论、风险接受决定或生产上线批准。本文讨论的 audit、assurance、evidence、control effectiveness 和 process owner review, 均指支持内部审计、风险、合规、业务 Owner 和管理层进行独立审阅与挑战的证据架构, 不代表任何正

947ai-foundations/papers/161-ai-agentic-process-audit-workflow-replay-assurance-architecture.md

AI Agentic Process Audit / Workflow Replay / Assurance Architecture 解读

面向对象: Senior AI PM / AI Architect / Internal Audit Partner / Process Owner / CBAP-level BA / AI Governance Lead / Financial Retail Operations Leader。 核心问题: 当 AI agent 不只是回答问题, 而是理解意图、制定计划、调用工具、请求人工批准、执行流程动作、处理例外并从事故中学习时, 组织如何设计一套可审阅、可重放、可质询、可持续改进的 evidence architecture? 学习目标: 建立 agentic process audit model、replayable trace、event-sourced agent workflow、plan/action/observation/approval/output schema、audit query、sampling strategy、process conformance、incident replay 和 assurance operating model 的高级心智模型。

重要说明: 本文是学习、作品集和内部架构训练材料, 不构成法律意见、监管解释、审计意见、模型验证结论、内控有效性结论、风险接受决定或生产上线批准。本文讨论的 audit、assurance、evidence、control effectiveness 和 process owner review, 均指支持内部审计、风险、合规、业务 Owner 和管理层进行独立审阅与挑战的证据架构, 不代表任何正式审计签署或监管认可。访问日期按 2026-06-30 记录。


Source Anchors

以下来源用于组织 AI 风险管理、AI 管理体系、架构描述、需求工程、可观测性、provenance 和金融机构 IT 审查语言。本文只将它们作为产品、架构、BA 和内部 assurance 的设计锚点, 不声称任何 schema、trace 或 replay workbench 自动满足审计、监管或模型风险批准。

SourceOfficial link本文采用的思想
NIST AI Risk Management Frameworkhttps://www.nist.gov/itl/ai-risk-management-framework用 Govern / Map / Measure / Manage 组织 agentic workflow 风险识别、度量、控制、运行监测和改进证据。
ISO/IEC 42001 AI management systemhttps://www.iso.org/standard/81230.html用 AI management system 的 scope、policy、operation、performance evaluation、management review 和 improvement 设计持续 assurance operating model。
ISO/IEC/IEEE 42010 Architecture Descriptionhttps://www.iso.org/standard/74393.html用 stakeholder、concern、viewpoint、architecture view、correspondence 和 rationale 组织 replay architecture 的多视图描述。
ISO/IEC/IEEE 29148 Requirements Engineeringhttps://www.iso.org/standard/72089.html用 stakeholder need、requirement、verification、validation、traceability 和 information item 设计 process claim 与 evidence contract。
OpenTelemetry Documentationhttps://opentelemetry.io/docs/用 traces、metrics、logs、context propagation 和 semantic attributes 设计 runtime observability 与 replay trace。
W3C PROV Overviewhttps://www.w3.org/TR/prov-overview/用 Entity、Activity、Agent 的 provenance 思维表达一次 agentic 行为由谁、基于什么、通过什么活动生成。
FFIEC IT Handbookhttps://ithandbook.ffiec.gov/用金融机构 IT 风险、审查、治理、外包、业务连续性和控制评估语言校准金融零售场景。

一句话:

Agentic process audit is the evidence architecture discipline that turns an AI agent's intent, plan, tool actions, policy decisions, human approvals, exceptions, outputs and learning signals into replayable process evidence for review, challenge and improvement.

1. Executive Summary

普通 AI 系统的审阅重点通常是:

  • 模型是否表现足够好。
  • RAG 是否引用了正确来源。
  • 输出是否合规、安全、可解释。
  • 上线后是否有监控和事故响应。

Agentic workflow 的复杂度更高, 因为系统会跨越多个业务步骤:

User intent
  -> intent interpretation
  -> plan generation
  -> policy decision
  -> tool selection
  -> tool call
  -> observation
  -> human approval
  -> exception handling
  -> compensating action
  -> final output
  -> feedback
  -> incident learning

如果组织只保存最终答案或普通 API 日志, 内审、业务 Owner、风险团队和架构团队无法回答关键问题:

审阅问题普通日志的缺口
用户真正要求了什么, agent 如何解释意图日志只保存 request text, 没有 intent classification 和 confidence
Agent 为什么选择这条计划没有 plan version、candidate plan、rejected plan 和 rationale
哪个工具动作影响了客户、资金、案件或监管材料只看到 API 调用成功, 看不到 side effect、idempotency、approval 和 rollback
策略为什么允许、拦截或升级没有 policy version、decision reason 和 obligations
人工审批者看到了什么证据只记录 approved, 没有 visible evidence set
例外是合规偏差还是合理业务例外没有 exception reason、owner、expiry、compensating control
事故发生后能否重放流程时间线断裂, 版本集合不完整, 敏感内容无法安全访问
控制是否有效没有 sampling universe、population definition、control test result 和 exception aging

成熟的 agentic process audit architecture 不是让 AI 团队多填表, 而是在 workflow runtime 中原生生成证据:

Process claim
  -> replayable trace
  -> event-sourced agent workflow
  -> evidence chain
  -> audit query
  -> sampling and testing
  -> process conformance review
  -> incident replay
  -> control improvement

高级 AI PM、AI Architect 和 CBAP-level BA 的任务是把 agentic workflow 从 "看起来自动化" 升级成 "可证明地受控"。这里的 "可证明" 不意味着输出完全 deterministic, 也不意味着自动获得审计结论, 而是指组织可以在权限、隐私和保留边界内重建足够证据, 支持独立挑战、管理审阅和持续改进。


2. Target Audience and Role Expectations

Role需要回答的问题典型输出
Senior AI PMAgentic workflow 是否真正改善业务结果, 同时具备可审阅、可回滚、可学习的运行证据outcome evidence map、replay readiness gate、scale/stop memo
AI Architect如何设计 trace、event store、provenance graph、replay workbench、redaction 和 retentionreplay reference architecture、event schema、audit query architecture
Internal Audit Partner哪些流程声称可以被验证, 哪些证据可用于独立测试, 哪些仍需人工挑战audit query catalog、sampling design input、evidence sufficiency review
Process OwnerAgent 是否按批准流程运行, 例外是否合理, 人工审批和补偿动作是否有效process conformance report、exception aging、control action plan
CBAP-level BA用户意图、业务规则、流程状态、例外路径、审批和验收标准是否可追踪process claim model、state/event dictionary、control-to-evidence matrix
Risk / Compliance自动化边界、客户影响、政策决策、风险接受和补偿控制是否清楚risk scenario map、policy decision evidence、exception register
Operations LeaderHITL 队列、复核容量、override taxonomy、incident route 和 training 是否可运行review SOP、capacity evidence、override dashboard
Platform / SRE证据采集、可观测性、数据最小化、访问控制、SLO 和故障恢复是否可实施OTel instrumentation、evidence pipeline、replay SLO

成熟组织会把 agentic process audit 当成流程自动化产品的一部分, 不是上线前由内审临时提出的证据补丁。


3. Learning Objectives

完成本文后应能:

  1. 区分 audit trail、observability、provenance、evidence chain 和 workflow replay。
  2. 定义 process claim, 并把它拆成可验证的 workflow、decision、control 和 outcome 证据。
  3. 设计 event-sourced agent workflow, 覆盖 intent、plan、action、observation、approval、policy、exception、output、feedback 和 incident learning。
  4. 写出 plan/action/observation/approval/output event schema, 明确字段、版本、保留、脱敏和访问边界。
  5. 解释 causality vs chronology 的差异, 避免把时间线当作因果证明。
  6. 设计 workflow replay architecture, 支持 technical replay、business replay、control replay 和 incident replay。
  7. 建立 audit query catalog, 支持内审、流程 Owner、风险、合规、模型风险和事故复盘。
  8. 设计 risk-based sampling strategy, 测试控制设计有效性和运行有效性。
  9. 区分 process conformance、justified exception、policy override 和 control failure。
  10. 在 AML investigation copilot、payment dispute assistant、KYC onboarding agent、collections hardship case agent、regulatory reporting narrative drafter 和 payment operations repair queue agent 中应用。

4. Core Concepts

4.1 Process Claim

Process claim 是团队对 agentic workflow 的可验证主张。它不能写成空泛的 "agent improves operations", 而要写成可审阅、可测试、可采样的声明。

Weak claimStrong process claim
AML copilot helps analysts work fasterFor AML alert type A and B, the copilot generates a sourced case timeline and draft narrative, but final disposition remains analyst-owned and every saved narrative has source-span evidence.
KYC agent automates onboardingThe KYC onboarding agent classifies documents, identifies missing evidence and drafts customer follow-up, but cannot reject an applicant without human review and appeal path.
Payment repair agent fixes exceptionsThe payment repair queue agent suggests repair actions and executes only low-risk reversible updates after policy decision and dual-control approval.

Process claim 的组成:

Business outcome
  + workflow scope
  + agent responsibility
  + human responsibility
  + policy boundary
  + evidence requirement
  + control and exception handling
  + outcome measurement

4.2 Replayable Trace

Replayable trace 不是简单地把原始 prompt、答案和日志全部保存。它是一个最小充分证据集合, 能在授权条件下重建关键流程事实:

  • 谁发起了请求, 角色和权限是什么。
  • 用户意图如何被解释, 是否存在歧义。
  • Agent 生成了什么计划, 哪些步骤被执行或跳过。
  • 哪些工具被调用, 输入、输出、side effect 和 idempotency 证据是什么。
  • 哪些 policy decisions 允许、拦截、升级或附加 obligations。
  • 哪些人工审批、覆盖、拒绝或补充信息发生。
  • 哪些异常被识别, 采取了什么补偿动作。
  • 最终输出如何生成、保存、发送或丢弃。
  • 反馈、事故信号和改进动作如何进入 learning loop。

Replayable 不等于 bit-for-bit deterministic。对 LLM 和第三方模型而言, 完全复现输出常常不可承诺。成熟表达应是:

We can reconstruct the approved version set, inputs at the agreed evidence level, policy decisions, tool actions, approvals, outputs and observed outcomes, while documenting reproducibility limits.

4.3 Event-Sourced Agent Workflow

Event-sourced agent workflow 把 agent 行为记录为不可随意改写的业务事件序列。当前 workflow state 可以由事件投影而来, 事故复盘和过程审阅可以从事件重新构造。

Event family典型事件审阅价值
Intentintent.receivedintent.classifiedintent.clarified证明用户需求和 agent 解释之间的关系
Planplan.generatedplan.reviewedplan.revisedplan.approved证明 agent 为什么按某个路径行动
Actionaction.proposedtool.invokedtool.completedtool.failed证明工具动作、参数、结果和副作用
Observationobservation.receivedcontext.loadedretrieval.completed证明 agent 看到的事实和证据
Policypolicy.evaluatedpolicy.blockedpolicy.escalated证明控制点真实运行
Approvalapproval.requestedapproval.decidedapproval.expired证明 HITL 和 dual control
Exceptionexception.detectedoverride.recordedcompensation.executed证明例外管理和补救
Outputoutput.draftedoutput.finalizedoutput.deliveredoutput.discarded证明客户、监管或业务记录中的输出来源
Feedbackfeedback.capturedqa.sampleduser.edited证明 adoption、quality 和 learning signals
Incidentincident.signal.detectedincident.replay.startedlearning.action.created证明事故复盘和改进闭环

5. Agentic Process Audit Model

5.1 Audit Layers

Agentic process audit model 可分为八层。

Layer核心问题Evidence objects
1. Intent layer用户、系统或队列触发的业务意图是什么request record、intent classification、clarification prompt、user role
2. Process layer该意图对应哪个批准流程、状态机和业务规则workflow definition、state transition、process version、BPMN / state model
3. Planning layerAgent 如何把意图变成计划, 是否需要审批plan candidate、plan rationale、risk tier、approval requirement
4. Control layer哪些 policy、entitlement、SoD、risk 和 compliance controls 被执行policy decision、permission result、SoD check、obligations
5. Action layerAgent 读取、生成、写入或提交了什么tool contract、tool input hash、tool output pointer、side effect id
6. Human layer人如何审阅、批准、覆盖、拒绝或补充事实visible evidence set、decision reason、reviewer role、edit diff
7. Output layer哪些输出进入业务记录、客户沟通或监管材料output hash、citation map、delivery state、record id
8. Learning layer缺陷、反馈、事故和抽样结果如何改变未来行为QA finding、incident replay report、eval case、control action

5.2 Process Claim to Evidence Chain

一个 process claim 必须连到 evidence chain:

Process claim
  -> requirement
  -> control objective
  -> workflow event
  -> evidence object
  -> audit query
  -> sampling approach
  -> control result
  -> management action

示例: Payment dispute assistant

Chain elementExample
Process claimAssistant drafts dispute evidence packet but does not submit chargeback without maker-checker approval.
RequirementAll customer-impacting dispute submissions require reviewer approval based on visible evidence set.
Control objectivePrevent unsupported or unauthorized chargeback submissions.
Workflow eventapproval.decided before tool.invoked for chargeback_submit.
Evidence objectapproval record, visible evidence hash, tool input hash, policy decision id, side effect id.
Audit queryShow all chargeback submissions where approval is missing, expired or not tied to exact execution input.
Sampling approach100% automated exception query plus monthly sample of approved submissions.
Control resultExceptions by reason, reviewer quality, stale approval count, remediation status.
Management actionUpdate tool gateway to block mismatched approval input hashes.

6. Event Schema

6.1 Common Event Envelope

Agentic process events should share a stable envelope, even if each event type has domain-specific payload.

FieldPurpose
event_idglobally unique event identifier
event_typestable event name such as ai.plan.generated
schema_versionpayload contract version
occurred_atevent time in UTC
recorded_atcollection time in UTC
trace_idend-to-end workflow trace
workflow_idbusiness workflow instance
case_id_hashprivacy-preserving case reference
use_case_idAI use case registry id
risk_tierworkflow risk classification
producerservice, gateway or application emitting event
actor_typeuser, agent, service, reviewer, policy engine
actor_id_hashhashed or tokenized actor identity
data_classclassification for privacy, security and retention
retention_policy_idretention rule applied to event
redaction_profilehow sensitive content was minimized
prev_event_idscausal predecessors, not only previous time event
evidence_refspointers to controlled evidence objects
integrity_hashtamper-evidence or content hash

6.2 Plan / Action / Observation / Approval / Output Schema

SchemaRequired fieldsAssurance use
Plan eventplan_idplan_versiongoalstepsrisk_assessmentrequires_approvalrationale_hashrejected_optionsShows why the agent intended to act and whether the plan crossed a risk boundary.
Action eventaction_idtool_nametool_schema_versionaction_typeinput_hashdry_run_resultpolicy_decision_idapproval_idside_effect_ididempotency_keyShows tool use, write boundaries, approvals and recoverability.
Observation eventobservation_idsource_systemsource_versionfact_typefact_hashretrieval_refsconfidence_bandfreshnessShows what facts the agent used and whether they were current and authorized.
Policy eventpolicy_idpolicy_versiondecisionreason_codeobligationsinput_attribute_hashevaluation_modeShows allow, block, escalate, redact, approval required or restrict decisions.
Approval eventapproval_idrequest_reasonapprover_roledecisionreason_codevisible_evidence_hashapproval_scopeexpiryShows HITL, maker-checker and SoD evidence.
Exception eventexception_idexception_typedetected_byseverityjustificationownercompensating_controlexpiryclosure_evidenceShows whether exceptions are controlled or unmanaged drift.
Output eventoutput_idoutput_typeoutput_hashcitation_mapsafety_labelrecord_systemdelivery_channelfinal_statusShows what was finalized, stored, delivered or suppressed.
Feedback eventfeedback_idfeedback_typeaccept_edit_rejectedit_diff_hashreason_codebusiness_outcome_refShows adoption, quality signals and business outcome evidence.

6.3 Example Event

{
  "event_id": "evt_20260630_kyc_000184",
  "event_type": "ai.approval.decided",
  "schema_version": "1.0",
  "occurred_at": "2026-06-30T18:42:10Z",
  "recorded_at": "2026-06-30T18:42:11Z",
  "trace_id": "trc_kyc_onboarding_72f1",
  "workflow_id": "wf_kyc_case_review_9482",
  "case_id_hash": "sha256:1d4f...",
  "use_case_id": "kyc_onboarding_agent",
  "risk_tier": "high",
  "producer": "kyc-review-workbench",
  "actor_type": "reviewer",
  "actor_id_hash": "hash:user:9a27",
  "data_class": "restricted",
  "retention_policy_id": "ret_ai_high_business_record_7y",
  "redaction_profile": "pii-minimized-v3",
  "prev_event_ids": ["evt_20260630_kyc_000181", "evt_20260630_kyc_000183"],
  "evidence_refs": [
    "evidence://visible-set/vis_kyc_5521",
    "evidence://policy-decision/pol_kyc_8910",
    "evidence://tool-input/tool_kyc_followup_2322"
  ],
  "integrity_hash": "sha256:ab77...",
  "data": {
    "approval_id": "appr_kyc_7721",
    "request_reason": "customer_followup_message",
    "approver_role": "KYC_Senior_Reviewer",
    "decision": "approved_with_edit",
    "reason_code": "missing_ubo_evidence_clearer_language",
    "visible_evidence_hash": "sha256:392e...",
    "approval_scope": "draft_customer_followup_only",
    "expiry": "2026-07-01T18:42:10Z"
  }
}

7. Causality vs Chronology

Workflow replay must not confuse chronological order with causal explanation.

Chronology asksCausality asks
What happened before what?Which event caused, enabled, constrained or justified another event?
What was the next timestamp?What policy, approval, observation or plan step was a required predecessor?
Did the tool call occur after approval?Was this approval scoped to this exact tool input and still valid at execution time?
Did the output follow retrieval?Did the output's material claims rely on retrieved evidence or unsupported model generation?

Agentic workflow needs both:

  • Chronological timeline for human-readable incident review.
  • Causal graph for control testing, evidence lineage and independent challenge.

Example:

Approval A happened before Tool Call B.
That is chronology.

Approval A authorized input hash H1.
Tool Call B executed input hash H2.
H1 != H2.
That is causal control failure.

成熟 replay architecture 应记录 prev_event_idscausal_refspolicy_decision_idapproval_scopeinput_hashside_effect_id, 让审阅者能判断 "按时间发生过" 与 "被正确授权" 的差异。


8. Workflow Replay Architecture

8.1 Logical Architecture

Business workflow UI / queue / API
        |
AI agent orchestrator
  intent classifier | planner | policy engine | tool gateway | HITL workflow
        |
OpenTelemetry traces + agentic process events
        |
Evidence collection layer
  schema validation | redaction | hashing | retention tagging | integrity checks
        |
Event store + trace store + evidence lake
        |
Provenance graph
  entities | activities | agents | causal edges | version set
        |
Replay workbench
  timeline view | causal graph | evidence chain | policy decision view | approval view
        |
Audit query and assurance reporting
  process conformance | sampling | incident replay | control testing | process owner review

8.2 Core Components

ComponentResponsibility
Agent orchestratorEmits trace context and events for intent, plan, tool call, observations, approvals and outputs.
Policy decision pointProduces versioned allow, block, escalate, approval-required and obligation decisions.
Tool gatewayEnforces authorization, idempotency, dry-run, side-effect capture, approval binding and kill switch.
HITL workflow serviceRecords visible evidence, reviewer decision, reason code, edit diff, expiry and SoD result.
Evidence collectorValidates event schema, applies redaction profile, computes hashes and assigns retention class.
Event storeStores append-only domain events for workflow replay and conformance analysis.
Trace storeStores spans, timing, cost, latency and dependency context for observability and drilldown.
Provenance graphLinks entities, activities and agents across prompt, context, RAG, tool, approval and output objects.
Replay workbenchLets authorized reviewers reconstruct timelines, causal chains, version sets and evidence gaps.
Audit query layerProvides parameterized queries for internal audit, risk, process owner and incident review.
Redaction and access layerControls who can see raw content, redacted content, hashes, pointers and aggregate metrics.
Retention and legal hold layerApplies retention policy, deletion handling and evidence preservation for incidents or reviews.

8.3 Replay Modes

Replay modePurposeLimits
Technical replayReconstruct model, prompt, RAG, policy, tool and runtime version set.May not reproduce identical LLM output if provider behavior changed.
Business replayReconstruct workflow states, user actions, approvals, outputs and business record updates.Requires business systems to retain record ids and state transitions.
Control replayRe-evaluate whether required controls fired and were bound to correct inputs.Does not itself prove control design is sufficient.
Incident replayReconstruct events in an incident window and connect root cause, impact and remediation.Sensitive content may require restricted access and legal guidance.
Learning replayTurn failures, edits, overrides and QA findings into eval cases and process improvements.Must avoid using feedback data outside purpose and consent boundaries.

8.4 Reproducibility Limits

For agentic AI, replay architecture must explicitly document reproducibility limits:

LimitMitigation
LLM nondeterminismPreserve prompt/config/model route/output hash and use replay to compare behavior, not guarantee exact output.
Third-party model version opacityCapture vendor metadata, response headers, model alias resolution and contractually available version info.
External tool state changesStore tool request/response hashes, side-effect ids, business record state and compensating action records.
Knowledge base driftPreserve KB version, index version, retrieved chunk ids, effective dates and document lifecycle state.
Privacy redactionPreserve enough hashed/pointer evidence to reconstruct under authorized conditions without broad raw content exposure.

9. Audit Trail vs Observability

Audit trail and observability overlap but are not the same.

DimensionAudit trailObservability
Primary questionCan we prove who did what, when, under which authority and evidence?Can we understand system behavior, performance and failure modes?
Main usersInternal audit, risk, compliance, process owners, regulators, legalSRE, platform, engineering, product, operations
Time horizonRetention aligned to control, business record and regulatory needsOperational trend and incident analysis horizon
Data shapeEvents, approvals, version records, evidence objects, control resultsTraces, metrics, logs, exemplars, dashboards
Quality barCompleteness, integrity, non-repudiation, access control, chain of evidenceCoverage, latency, sampling, diagnostic usefulness
Failure modeCannot prove control occurred or output was authorizedCannot diagnose outage, drift, cost spike or bad dependency

Mature architecture connects both:

Observability shows that something happened and how the system behaved.
Audit trail shows whether the behavior was authorized, controlled, evidenced and reviewable.

For example, an OTel span may show tool.invoke completed in 250 ms. Audit evidence must additionally show:

  • Which tool contract and schema version.
  • Which user or agent authority was used.
  • Which policy decision allowed it.
  • Whether approval was required and completed.
  • What side effect occurred.
  • Whether the action matched approval scope.
  • How to reverse or compensate if needed.

10. Evidence Chain

10.1 Evidence Chain Model

Agentic process evidence should be organized as a chain:

Business outcome evidence
  <- process conformance evidence
  <- workflow replay evidence
  <- event evidence
  <- source system evidence
  <- version and control evidence
Evidence typeExampleAssurance use
Business outcome evidencealert aging reduction, dispute cycle time, onboarding completion, hardship treatment qualityShows whether workflow produced intended value without unacceptable harm.
Process conformance evidencestate transitions, required approvals, SoD results, exception reasonsShows whether workflow followed approved process or justified exception.
Workflow replay evidencetrace, event sequence, causal graph, version setSupports reconstruction and independent challenge.
Event evidenceplan, action, observation, policy, approval, output, feedback eventsProvides granular proof of actions and decisions.
Source system evidencecase management records, payment system state, KYC document store, regulatory report dataAnchors AI evidence to systems of record.
Version and control evidenceprompt, model, KB, policy, tool schema, release bundleShows which approved artifact set governed behavior.

10.2 Business Outcome Evidence

Business outcome evidence prevents audit architecture from becoming purely technical. For financial retail examples:

Use caseOutcome evidenceControl counterweight
AML investigation copilotreduced alert aging, better narrative completeness, lower reopen rateno unsupported SAR implication, analyst final disposition retained
Payment dispute assistantfaster evidence packet creation, fewer missing-document reworkschargeback submission requires maker-checker approval
KYC onboarding agentfaster first-pass completion, lower document chase volumeno automated rejection, appeal route preserved
Collections hardship case agentbetter hardship option matching, improved follow-up timelinessvulnerable customer and fair treatment checks
Regulatory reporting narrative draftershorter variance explanation cycle, fewer reviewer correctionsno unsupported metric cause, source lineage visible
Payment operations repair queue agentlower repair backlog and fewer duplicate repairsdual control for irreversible or customer-impacting updates

11. Exception and Override Handling

Exceptions are expected in real processes. The assurance question is whether they are explicit, justified, owned, time-bound and learned from.

11.1 Exception Types

TypeExampleRequired evidence
Business exceptionKYC case requires non-standard document due to jurisdiction rulepolicy citation, reason code, reviewer approval, customer communication record
Policy overridePayment dispute response needs supervisor approval despite low automated risk scoreoverride reason, approver role, scope, expiry, evidence set
Tool exceptionPayment repair API unavailable, manual repair queue usedincident link, manual action record, reconciliation evidence
Data exceptionRAG source stale for one policy sectionaffected scope, compensating manual source check, expiry
Workflow exceptionHITL reviewer queue exceeds SLA, case routed to backup teamcapacity signal, escalation decision, customer impact analysis
Model exceptionAgent confidence below threshold but reviewer proceeds after independent evidence checkconfidence band, reviewer rationale, QA sample inclusion

11.2 Good Override Record

FieldStrong content
override_idstable id tied to workflow and case
baseline rulethe rule, policy or process step being overridden
business reasonwhy standard route does not fit
risk impactcustomer, financial, compliance, operational and reputational impact
approver rolerole with authority and independence
visible evidencewhat the approver saw at decision time
compensating controlextra review, sampling, reconciliation, customer notice or temporary limit
expiry or closure triggerwhen override ends or must be reviewed
learning pathwhether this becomes process update, eval case, policy clarification or training

Weak override:

Supervisor approved exception.

Strong override:

Supervisor approved hardship option deviation for case class H2 because customer submitted verified disaster impact documentation not covered by standard script. AI recommendation was restricted to draft language. Final treatment selected by hardship specialist. Case added to monthly vulnerable-customer QA sample and policy team review.

12. Segregation of Duties

Agentic AI can blur roles because the same platform may propose, execute, document and monitor a process. Segregation of duties must be explicit.

RiskControl design
Agent proposes and executes high-impact action without independent reviewTool gateway enforces approval for write or customer-impacting actions.
Same user requests and approves a payment repairSoD check blocks self-approval and routes to independent reviewer.
Developer changes prompt and approves production releaseRelease workflow separates change author, reviewer and production approver.
Model owner validates their own control effectivenessIndependent challenge by risk, model risk, QA or internal audit partner.
Operations team suppresses incident evidenceIncident evidence preservation controlled by platform/security/legal process.

SoD event evidence should include:

  • requester_role
  • approver_role
  • relationship_check_result
  • same_user_blocked
  • delegated_authority_source
  • approval_scope
  • independent_challenge_required
  • break_glass_reason

Independent challenge does not mean every event is manually reviewed. It means the evidence architecture allows authorized second-line, third-line, QA or process-owner reviewers to test whether controls operated as designed.


13. Sampling and Testing Approach

13.1 Population Definition

Sampling begins with a population definition. Without it, teams cherry-pick good examples.

PopulationExample
All high-risk workflow instancesAll KYC onboarding agent cases where customer follow-up was drafted in June 2026.
All customer-impacting tool actionsAll payment repair queue updates that changed customer-visible status.
All overridesAll collections hardship cases where AI recommendation was overridden.
All policy blocksAll regulatory narrative drafts blocked for unsupported claim.
All incidents or near missesAll AML copilot summaries flagged by QA as material evidence omission.

13.2 Testing Types

Test typePurposeExample
Design effectiveness testDetermine whether the control, if operated, would address the risk.Does tool gateway approval binding prevent mismatched execution input?
Operating effectiveness testDetermine whether the control actually operated across samples.Sample approved chargeback submissions and verify approval hash equals tool input hash.
Automated exception query100% scan for impossible or prohibited patterns.Find tool write actions with no policy decision or expired approval.
Process conformance testCompare actual event sequence to approved workflow model.KYC follow-up must have intent, document observation, policy check, draft, review and output.
Outcome reasonableness testCompare process result with business outcome and control counterweight.Dispute cycle time improved without higher rework or complaint rate.
Replay drillReconstruct one case end-to-end under access controls.Rebuild AML case timeline from trace, events, source records and approvals.

13.3 Sampling Strategy

Risk-based sampling should combine:

  • 100% automated queries for missing mandatory events.
  • Attribute sampling for required control evidence.
  • Judgmental samples for high-risk, high-value, high-complexity or complaint-linked cases.
  • Stratified samples by channel, segment, language, geography, model route and reviewer.
  • Incident-driven samples from near misses, overrides and QA failures.
  • Negative samples where agent refused, escalated or blocked action.

Sampling should record:

FieldMeaning
population_idstable query or dataset defining universe
sample_methodrandom, stratified, risk-based, judgmental, exception query
sample_periodtime window
sample_sizecount and rationale
test_objectivedesign, operating, conformance, outcome or incident replay
pass_criteriaprecise evidence expectation
exception_classificationjustified exception, documentation defect, control failure, process drift
remediation_ownerprocess, product, architecture, operations or control owner

14. Process Conformance

Process conformance asks whether actual workflow execution matches the approved process model.

Approved model:
intent -> plan -> policy -> tool dry-run -> approval -> tool execution -> output -> feedback

Actual trace:
intent -> plan -> tool execution -> output

Conformance result:
non-conformant because policy and approval events are missing before customer-impacting tool execution.

14.1 Conformance vs Justified Exception

CategoryMeaningExample
ConformantActual event path follows approved process.KYC agent drafted follow-up only after document evidence and reviewer approval.
Justified exceptionProcess deviated but with authorized reason, owner and compensating control.Backup reviewer approved due to outage under documented continuity procedure.
Control failureRequired control absent, expired, mismatched or bypassed.Payment repair tool executed without valid approval.
Process driftRepeated deviations show the real process has changed without approval.Reviewers routinely skip citation check because UI makes it difficult.
Model of process wrongApproved process model omits legitimate operational path.AML escalation path for multi-jurisdiction cases not modeled.

Mature review does not punish every deviation. It classifies deviations and updates process, controls, training or tooling based on evidence.

14.2 Audit Queries for Conformance

QueryPurpose
Show all tool write actions without preceding policy decision.Detect bypassed control.
Show all approvals where visible evidence hash is missing.Detect weak HITL evidence.
Show all output deliveries where output hash differs from approved draft hash.Detect post-approval mutation.
Show all overrides by reviewer, reason and case type.Detect concentration, training need or process ambiguity.
Show all workflows where exception expiry passed without closure.Detect unmanaged residual risk.
Show all regulatory narrative drafts with unsupported material claims.Detect output evidence failure.

15. Incident Replay

Incident replay reconstructs what happened, why it happened, who or what allowed it, what impact occurred and what changed afterward.

15.1 Incident Replay Packet

SectionContents
Incident scopeincident id, use case, workflow, time window, affected cases, severity
Version setmodel, prompt, RAG index, policy, tool schema, release bundle, feature flags
Timelinechronological events and spans
Causal graphrequired predecessors, policy decisions, approvals, tool actions, outputs
Evidence gapsmissing spans, missing event fields, inaccessible source records, redaction limits
Customer or business impactaffected customers, cases, funds, reports, timelines, operational backlog
Control analysiswhich controls worked, failed, were bypassed or were absent
Exception analysiswhether deviations were justified, expired or unmanaged
Remediationrollback, compensation, customer action, policy update, prompt update, tool restriction
Learning loopeval cases, regression tests, training, process model update, control improvement

15.2 Example: Payment Operations Repair Queue Agent

Incident signal:

Duplicate repair actions increased after a workflow release.

Replay findings:

FindingEvidence
Duplicate side effects occurred in 18 casestool side_effect_id and payment system state transitions
Idempotency key was generated from case id only, not repair action idtool gateway event schema and code release notes
Approval existed but was scoped to first repair attemptapproval scope and expiry
Retry path bypassed dry-run after timeouttrace timeline and causal graph
Customer-visible balances were corrected through compensating entriescompensating action records and reconciliation report
Regression test addedeval/control test case for retry idempotency

Assurance conclusion should be framed carefully:

The replay packet supports process owner, risk and internal audit review of the incident evidence. It does not by itself constitute audit sign-off or regulatory closure.

16. Financial Retail Examples

16.1 AML Investigation Copilot

Audit focusReplay evidence
Analyst final accountabilityfinal disposition event, analyst approval, no auto-SAR submission
Evidence completenesstransaction refs, KYC refs, adverse media refs, source-span map
Narrative qualitydraft narrative, edit diff, QA sample, reopen reason
Sensitive data boundaryredaction profile, access log, SAR-related data retention class
Incident learningomitted-evidence QA finding converted into regression case

16.2 Payment Dispute Assistant

Audit focusReplay evidence
Chargeback submission authoritymaker-checker approval, tool input hash, policy decision
Evidence packet supporttransaction timeline, merchant evidence, network rule source
Customer communicationapproved letter output hash, delivery channel, complaint link
Exceptionsprovisional credit exception, supervisor reason, expiry
Outcome evidencecycle time, rework, win/loss rate, complaint trend

16.3 KYC Onboarding Agent

Audit focusReplay evidence
No automated rejectionoutput status, reviewer decision, appeal route evidence
Missing evidence detectiondocument observation events, confidence band, source pointer
Customer follow-updraft, reviewer edit, approved final message
High-risk jurisdictionpolicy decision, escalation obligation, senior review
Process conformanceonboarding state transitions and exception path

16.4 Collections Hardship Case Agent

Audit focusReplay evidence
Fair treatmentvulnerability flag handling, policy decision, human specialist approval
Option recommendationhardship facts, source policy, plan rationale
Override qualityoverride reason codes, supervisor sample, customer outcome
Communicationsapproved message hash and record id
Learning loopcomplaints and QA findings converted to training or policy clarification

16.5 Regulatory Reporting Narrative Drafter

Audit focusReplay evidence
Source lineagemetric id, data source, report period, transformation refs
Unsupported claim preventioncitation map, policy block for hallucinated cause
Maker-checkerreviewer approval, visible evidence, edit diff
Attestation boundaryAI draft marked as draft, authorized signer retained
Retentionreport pack evidence aligned to regulatory record policy

16.6 Payment Operations Repair Queue Agent

Audit focusReplay evidence
Repair action authorizationtool risk tier, policy decision, dual control
Reversibilityside effect id, idempotency key, compensating action
Queue prioritizationplan rationale, SLA, customer impact
Exception handlingmanual repair route, reconciliation evidence
Outcome evidencebacklog, duplicate repair rate, settlement break trend

17. Operating Model

17.1 RACI

ActivityAI PMAI ArchitectBAProcess OwnerRisk / ComplianceInternal Audit PartnerPlatform / SRE
Process claim definitionACRACCI
Event schemaCARCCCR
Replay architectureCACCCCR
Control-to-evidence matrixCCRARCI
Audit query catalogCRRCCA/CC
Sampling strategyCCCARCI
Exception registerRCRACCI
Incident replayRRCAA/CCR
Learning loopARRACIC

R = Responsible, A = Accountable, C = Consulted, I = Informed.

17.2 Cadence

ForumCadenceMain questionOutput
Workflow evidence design reviewBefore pilot and major releaseAre process claims, events, controls and replay needs defined?evidence contract and release gate
Process conformance reviewMonthly or risk-basedAre actual traces matching approved process?conformance report and action log
Exception and override reviewWeekly for high-risk workflowsAre exceptions justified, aging and closing?exception register update
Incident replay reviewTriggered by incident or near missWhat happened, why, impact and learning?replay packet and remediation
Assurance management reviewQuarterlyAre controls, outcomes and evidence architecture improving?management action and roadmap

18. Anti-Patterns

Anti-patternWhy it failsMature replacement
Final answer as audit evidenceIt hides intent, plan, tool calls, approvals and policy decisions.Replayable trace with event-sourced workflow evidence.
Logging everything rawCreates privacy, security and retention risk without better assurance.Minimum sufficient evidence with redaction, hash, pointer and controlled raw access.
Chronology treated as causality"Approval happened before action" does not prove action was approved.Causal links through input hash, approval scope and policy decision.
HITL recorded as yes/noReview cannot determine what evidence human saw.Visible evidence set, decision reason, edit diff and expiry.
Exceptions hidden in commentsCannot distinguish justified business exception from control failure.Structured exception record with owner, expiry and compensating control.
Audit query afterthoughtEvidence exists but cannot answer real review questions.Audit query catalog designed during requirements and architecture.
Same team self-certifies all controlsLack of independent challenge and SoD.Separate author, approver, reviewer and process owner roles based on risk.
Replay promises exact LLM reproductionOverclaims determinism and ignores vendor/model drift.Document reproducibility limits and preserve version set plus output hashes.
Sampling only successful casesMisses near misses, blocks, overrides and failures.Risk-based samples covering negative paths and exceptions.
Outcome metrics without control counterweightsSpeed gains may hide customer harm or control erosion.Pair business outcome evidence with quality, risk and conformance evidence.

19. PM / BA / Architect Implications

19.1 For Senior AI PM

  • Define process claims before building automation.
  • Treat workflow replay as a product capability, not just internal logging.
  • Pair value metrics with control and customer outcome evidence.
  • Require release gates for evidence coverage, replay readiness and exception handling.
  • Write scale recommendations that show uncertainty, residual risk owner and monitoring triggers.

19.2 For CBAP-level BA

  • Model agentic workflow as state, event, decision and evidence objects.
  • Translate stakeholder needs into process claims, acceptance criteria and audit queries.
  • Make exception paths first-class requirements.
  • Distinguish conformance, justified exception, process drift and control failure.
  • Ensure every material process claim has evidence, owner and test approach.

19.3 For AI Architect

  • Design trace context, event schema and provenance graph early.
  • Use event-sourced workflow where replay and conformance matter.
  • Bind approvals to exact action inputs and side effects.
  • Record version sets for model, prompt, RAG, policy, tool schema and workflow.
  • Build redaction, retention and access control into the evidence architecture.

19.4 For Internal Audit Partner

  • Challenge whether process claims are testable.
  • Review whether evidence is complete, reliable and independently queryable.
  • Help shape sampling strategy and audit query catalog without owning management controls.
  • Distinguish evidence sufficiency for review from formal audit conclusions.

20. Interview Answers

Q1: How do you make an agentic AI workflow auditable without slowing it down?

30 秒版本:

I would design auditability into the workflow runtime. Every agentic workflow should emit structured events for intent, plan, observations, policy decisions, tool calls, human approvals, exceptions, outputs and feedback. Those events connect to traces, source records and a provenance graph, so process owners and audit partners can replay cases, test controls and sample exceptions without asking teams to manually reconstruct evidence.

2 分钟版本:

I start from process claims. For example, a payment dispute assistant may draft evidence packets but cannot submit a chargeback without maker-checker approval. That claim becomes requirements, controls, events and audit queries. The architecture emits a replayable trace: intent classification, plan, RAG evidence, policy decision, tool dry-run, approval visible evidence, tool execution, output and feedback.

To avoid slowing teams, evidence is generated as work happens. The tool gateway enforces approval binding, idempotency and side-effect capture. The HITL workflow records visible evidence and reason codes. OpenTelemetry traces provide operational context, while event store and provenance graph provide process and causal evidence. Internal audit or process owners can then run queries such as "show all customer-impacting tool actions without valid approval" and sample cases by risk. The result is not automatic audit sign-off, but a stronger evidence architecture for review and assurance.

Q2: What is the difference between observability and audit trail for agentic AI?

30 秒版本:

Observability explains system behavior: latency, errors, traces, costs and dependencies. Audit trail proves authority and evidence: who requested, what plan was approved, which policy allowed it, what tool action happened, what output was delivered and whether exceptions were justified. Agentic AI needs both connected by trace ids and evidence references.

Q3: Why is causality more important than chronology in workflow replay?

30 秒版本:

Chronology tells us that one event happened before another. Causality tells us whether the later action was actually authorized, supported and caused by the earlier evidence. An approval before a tool call is not enough; the approval must apply to the same input hash, scope, policy decision and side effect.

Q4: How would you test control effectiveness for a KYC onboarding agent?

30 秒版本:

I would define the population, such as all KYC agent cases with customer follow-up drafts in the month. I would run automated exception queries for missing reviewer approval, unsupported document claims and automated rejection. Then I would sample by risk tier, document type, geography and reviewer to verify source evidence, policy decisions, human review, final output and customer outcome. Exceptions would be classified as justified exception, documentation defect, process drift or control failure.

Q5: What evidence is needed for incident replay?

30 秒版本:

An incident replay packet needs scope, affected cases, version set, event timeline, causal graph, policy decisions, approvals, tool side effects, outputs, business impact, evidence gaps, control analysis, remediation and learning actions. It should also state reproducibility limits and privacy constraints.

Q6: How do you handle reproducibility limits with LLM agents?

30 秒版本:

I do not promise exact deterministic reproduction unless the stack supports it. I preserve the version set, prompt/config hash, model route, RAG index, retrieved chunks, policy version, tool inputs, approvals, output hash and business state. Replay focuses on reconstructing evidence and control behavior, while documenting model nondeterminism, vendor version limits and redaction boundaries.


21. Portfolio Exercise

Build an "Agentic Process Audit and Workflow Replay Pack" for one financial retail use case. Recommended use cases:

Use caseSuggested process claim
AML investigation copilotAgent prepares sourced timeline and draft narrative, but analyst owns final disposition.
Payment dispute assistantAgent drafts evidence packet, but chargeback submission requires maker-checker approval.
KYC onboarding agentAgent detects missing evidence and drafts follow-up, but cannot reject applicant.
Collections hardship case agentAgent recommends hardship options, but specialist approves customer treatment.
Regulatory reporting narrative drafterAgent drafts variance narrative from approved metrics, but authorized signer remains accountable.
Payment operations repair queue agentAgent suggests repair and executes only reversible low-risk updates under policy and dual control.

Required Artifacts

  1. Process claim with workflow scope, agent boundary, human boundary, policy boundary and outcome evidence.
  2. Event dictionary covering intent, plan, observation, policy, action, approval, exception, output, feedback and incident.
  3. Plan/action/observation/approval/output schema with required fields and retention class.
  4. Replay architecture diagram showing orchestrator, policy engine, tool gateway, HITL, event store, trace store, evidence lake, provenance graph and replay workbench.
  5. Causal graph for one representative case.
  6. Audit query catalog with at least 10 queries.
  7. Control-to-evidence matrix with at least 12 controls.
  8. Sampling strategy with population definition, method, sample size rationale and pass criteria.
  9. Exception and override register with owner, expiry and compensating control.
  10. Incident replay packet for one failure or near miss.
  11. Process conformance report distinguishing conformant cases, justified exceptions, process drift and control failures.
  12. Executive assurance narrative for process owner review.

Scoring Rubric

CriterionStrong evidence
Process rigorProcess claim is specific, testable and linked to workflow scope.
BA rigorEvents, states, decisions, requirements and evidence are traceable.
Architecture rigorReplay architecture separates traces, events, provenance, retention and access control.
Audit usefulnessAudit queries can be run without manual reconstruction.
Control thinkingSoD, approval binding, exception handling and sampling are practical.
Financial realismExamples reflect AML, KYC, disputes, collections, reporting or payment operations constraints.
Reproducibility honestyReplay limits are documented without overclaiming deterministic reproduction.
Outcome balanceBusiness value is paired with customer, control and process conformance evidence.

22. Final Mental Model

Agentic workflow assurance should make five truths visible:

The final answer is not the process.
The timeline is not the cause.
The approval is not valid unless bound to exact evidence and action.
The exception is not acceptable unless owned, justified, expiring and monitored.
The replay is not audit sign-off, but it is the evidence architecture that makes serious review possible.

The senior-level move is to design AI agents as replayable process participants, not opaque automations.