AI Agentic Process Audit / Workflow Replay / Assurance Playbook
版本: v1.0
AI Agentic Process Audit / Workflow Replay / Assurance Architecture Playbook
版本: v1.0 日期: 2026-06-30 适用对象: Senior AI PM、AI Architect、Internal Audit Partner、Process Owner、CBAP-level BA、AI Governance、Risk / Compliance、Financial Retail Operations、Platform / SRE。
本文是一份执行型手册, 目标是帮助团队把 agentic workflow 的用户意图、计划、工具调用、HITL 审批、策略决策、例外、补偿动作、输出、反馈和事故学习, 设计成可查询、可重放、可采样、可审阅的 evidence architecture。本文不构成法律意见、监管解释、审计意见、模型验证结论、内控有效性结论或生产批准。它提供的是支持内审、流程 Owner、风险、合规和管理层审阅的证据设计方法。
1. When To Use This Playbook
当 AI agent 具备下列任一能力时, 应使用本手册:
| Trigger | Why replay evidence matters |
|---|---|
| Agent calls tools that read or write business systems | Need to prove authority, input, result, side effect and reversibility. |
| Agent drafts customer, regulatory or case-record content | Need to prove source support, review, final output and delivery state. |
| Agent routes cases, recommends treatment or prioritizes queues | Need to prove policy boundary, fairness, exception and outcome evidence. |
| Agent requires HITL approval | Need to prove what the human saw, decided, changed and authorized. |
| Agent handles exceptions or compensating actions | Need to distinguish justified exception from control failure. |
| Agent behavior will be reviewed by risk, compliance, internal audit or process owners | Need audit queries, population definitions, sampling and replay packs. |
Recommended starting use cases:
| Financial retail workflow | Agentic scope |
|---|---|
| AML investigation copilot | Build sourced timeline and draft narrative for analyst review. |
| Payment dispute assistant | Draft evidence packet and customer communication for maker-checker approval. |
| KYC onboarding agent | Classify documents, detect missing evidence and draft follow-up. |
| Collections hardship case agent | Recommend hardship options and draft customer notes for specialist approval. |
| Regulatory reporting narrative drafter | Draft variance explanations from approved data lineage. |
| Payment operations repair queue agent | Triage repair queue and execute controlled low-risk updates. |
2. Source Anchors
| Anchor | Official link | Execution translation |
|---|---|---|
| NIST AI RMF | https://www.nist.gov/itl/ai-risk-management-framework | Map workflow risk, measure behavior and manage exceptions through evidence. |
| ISO/IEC 42001 | https://www.iso.org/standard/81230.html | Operate replay evidence as part of an AI management system with review and improvement. |
| ISO/IEC/IEEE 42010 | https://www.iso.org/standard/74393.html | Describe replay architecture through stakeholder concerns and architecture views. |
| ISO/IEC/IEEE 29148 | https://www.iso.org/standard/72089.html | Convert stakeholder needs into requirements, verification, validation and traceability. |
| OpenTelemetry | https://opentelemetry.io/docs/ | Instrument traces, spans, metrics, logs and context propagation. |
| W3C PROV | https://www.w3.org/TR/prov-overview/ | Model evidence as Entity, Activity and Agent relationships. |
| FFIEC IT Handbook | https://ithandbook.ffiec.gov/ | Calibrate governance, IT risk, outsourcing, continuity and control review language for financial institutions. |
3. Delivery Principles
| Principle | Practical rule |
|---|---|
| Evidence by design | Define evidence objects during requirements and architecture, not after an incident. |
| Minimum sufficient evidence | Store enough to reconstruct behavior and controls, while minimizing raw sensitive content. |
| Causal replay over timeline screenshots | Link approval, policy, tool input and side effect through IDs and hashes. |
| Process owner accountability | Replay supports review; it does not transfer business responsibility to the AI team. |
| Risk-tiered depth | Low-risk internal drafts and high-risk customer-impacting actions need different event and retention depth. |
| Exceptions are first-class | Every override has reason, owner, expiry, compensating control and learning path. |
| Independent challenge ready | Evidence must be queryable by authorized reviewers outside the delivery team. |
| Outcome plus control | Speed, cost and adoption metrics must be paired with conformance, quality and customer impact evidence. |
4. Execution Roadmap
Step 1: Define The Process Claim
Use this structure:
For [workflow scope], the agent may [allowed responsibility], must not [prohibited responsibility], requires [human or policy control] before [high-impact action], and success is measured by [business outcome] plus [control counterweight].
Completed example:
For payment dispute cases with reason codes 10.4 and 13.1, the assistant may build a sourced evidence packet and draft customer communication. It must not submit a chargeback without maker-checker approval tied to the exact tool input hash. Success is measured by reduced evidence preparation time, lower rework and no increase in unsupported submission exceptions.
Deliverable:
| Field | Payment dispute assistant example |
|---|---|
| workflow scope | card dispute reason codes 10.4 and 13.1 |
| allowed agent responsibility | evidence packet draft, reason-code checklist, customer letter draft |
| prohibited responsibility | autonomous chargeback submission |
| required human control | maker-checker approval before submission tool call |
| policy boundary | customer-impacting write actions require policy decision and approval |
| outcome evidence | preparation time, rework rate, chargeback acceptance, complaint trend |
| control counterweight | unsupported submission count, approval mismatch count, QA findings |
Step 2: Build The Workflow State And Event Map
| State | Entry event | Exit event | Required evidence |
|---|---|---|---|
| Intent captured | ai.intent.received | ai.intent.classified | user role, channel, case id hash, intent label |
| Plan prepared | ai.plan.generated | ai.plan.accepted | plan id, plan version, risk assessment, rejected options |
| Evidence gathered | ai.observation.received | ai.retrieval.completed | source refs, freshness, entitlement result |
| Policy evaluated | ai.policy.evaluated | ai.policy.decided | policy id, version, decision, obligations |
| Action proposed | ai.action.proposed | ai.tool.dry_run_completed | tool schema, input hash, dry-run result |
| Approval completed | ai.approval.requested | ai.approval.decided | visible evidence hash, reviewer role, reason code |
| Action executed | ai.tool.invoked | ai.tool.completed | side effect id, idempotency key, result pointer |
| Output finalized | ai.output.drafted | ai.output.finalized | output hash, citation map, safety label |
| Feedback learned | ai.feedback.captured | ai.learning.action_created | edit reason, QA finding, eval case id |
Deliverable: one event dictionary per workflow, owned by BA and architect.
Step 3: Define The Event Contract
Every event family should use a common envelope:
| Field | Required for high-risk workflows |
|---|---|
event_id | yes |
event_type | yes |
schema_version | yes |
occurred_at and recorded_at | yes |
trace_id and workflow_id | yes |
case_id_hash | yes |
use_case_id | yes |
risk_tier | yes |
actor_type and actor_id_hash | yes |
producer | yes |
data_class and retention_policy_id | yes |
redaction_profile | yes |
prev_event_ids and causal references | yes |
evidence_refs | yes |
integrity_hash | yes |
High-risk tool action payload:
{
"event_type": "ai.tool.invoked",
"tool_name": "chargeback_submit",
"tool_schema_version": "2026-06-15",
"action_type": "customer_impacting_write",
"input_hash": "sha256:84bc...",
"dry_run_result_id": "dryrun_7781",
"policy_decision_id": "poldec_4320",
"policy_decision": "approval_required",
"approval_id": "appr_1109",
"approval_scope": "chargeback_submit_input_hash_sha256_84bc",
"side_effect_id": "cb_submit_20260630_9912",
"idempotency_key": "case_9912_reason_104_attempt_01",
"compensating_action_ref": "manual_reversal_procedure_v4"
}
Step 4: Build The Evidence Architecture
Minimum architecture:
Agent UI / workflow queue
|
Agent orchestrator
|
Policy engine + tool gateway + HITL approval workflow
|
OpenTelemetry traces + process events
|
Evidence collector
schema validation | redaction | hashing | retention tagging
|
Trace store + event store + evidence lake
|
Provenance graph
|
Replay workbench + audit query catalog + process conformance dashboard
Architecture decisions to record:
| ADR | Decision |
|---|---|
| ADR-001 | Which workflows require full trace versus sampled trace. |
| ADR-002 | Which raw prompt, output or context content is stored, hashed, pointed to or redacted. |
| ADR-003 | How approvals are bound to exact tool inputs and output drafts. |
| ADR-004 | How model, prompt, RAG, policy, tool and workflow versions are captured. |
| ADR-005 | How evidence access is logged and restricted. |
| ADR-006 | How incident evidence preservation and legal hold are triggered. |
| ADR-007 | How reproducibility limits are documented for third-party models. |
Step 5: Create Audit Queries Before Release
Audit query catalog:
| Query id | Question | Evidence joins |
|---|---|---|
| AQ-001 | Which customer-impacting tool actions lack valid approval? | tool events, approval events, policy decisions |
| AQ-002 | Which approvals are not bound to the exact execution input hash? | approval visible evidence, tool input hash |
| AQ-003 | Which outputs were delivered after approval but changed from approved draft? | approval hash, output hash, delivery event |
| AQ-004 | Which workflows skipped required policy evaluation? | workflow state events, policy events |
| AQ-005 | Which exceptions expired without closure evidence? | exception register, closure events |
| AQ-006 | Which material claims lack citation or source support? | output events, citation map, retrieval refs |
| AQ-007 | Which reviewers approve unusually high override volume? | approval events, override events, reviewer aggregate |
| AQ-008 | Which incidents have incomplete replay packet fields? | incident events, evidence packet index |
| AQ-009 | Which outputs used retired or stale knowledge sources? | output citation map, KB lifecycle, index version |
| AQ-010 | Which cases show value gain but degraded control counterweight? | outcome metrics, QA results, conformance findings |
Step 6: Set Sampling And Testing
| Test | Execution rule |
|---|---|
| Mandatory event coverage | 100% query for missing required events in high-risk workflows. |
| Approval binding | Monthly sample plus automated mismatch query for approval hash vs action input hash. |
| Exception quality | Review all high-severity exceptions and a risk-based sample of lower severity exceptions. |
| Process conformance | Compare actual traces to approved state model by workflow type. |
| Outcome counterweight | Pair value metrics with complaint, QA, rework and override trends. |
| Incident replay drill | Run at least one replay drill before release and after major architecture changes. |
Sample test record:
| Field | Example |
|---|---|
| population | all payment dispute assistant chargeback submissions from 2026-06-01 to 2026-06-30 |
| sample method | 100% exception query plus 30-case stratified sample by reason code and reviewer |
| test objective | verify maker-checker approval and exact input binding |
| pass criteria | every submission has policy decision, valid approval, matching input hash, side effect id and output record |
| exception classes | justified exception, documentation defect, control failure, process drift |
| owner | Dispute Operations Process Owner |
Step 7: Prepare Incident Replay
Incident replay packet fields:
| Section | Required content |
|---|---|
| Scope | use case, workflow, incident id, time window, affected case count |
| Trigger | alert, complaint, QA finding, metric threshold or manual report |
| Version set | model, prompt, KB, policy, tool schema, workflow, feature flags |
| Timeline | chronological events and spans |
| Causal graph | required predecessor links and control dependencies |
| Evidence gaps | missing fields, missing spans, inaccessible source systems |
| Impact | customer, regulatory, financial, operational and reputational impact |
| Control analysis | worked, failed, bypassed, absent or weak controls |
| Remediation | rollback, restriction, compensating action, customer repair, control fix |
| Learning | eval case, prompt update, policy clarification, tool gateway change, training |
Step 8: Run Process Owner Review
Process owner review should answer:
| Review question | Required evidence |
|---|---|
| Did the workflow follow approved process? | process conformance dashboard, samples, exception list |
| Were exceptions justified and closed? | exception records, owner, expiry, closure evidence |
| Did controls operate as designed? | control test results, automated exception queries |
| Did business outcome improve? | baseline vs post-release metrics |
| Did control counterweights remain acceptable? | QA, complaint, rework, override, incident and conformance trends |
| What should change? | action owner, due date, evidence required for closure |
5. Audit Trail vs Observability Design
Use this design split during architecture review:
| Question | Evidence design |
|---|---|
| What happened technically? | OpenTelemetry trace, spans, metrics and logs. |
| What happened as a business process? | Event-sourced workflow and state transitions. |
| Why was it allowed? | Policy decision event and obligations. |
| Who approved it? | HITL approval event, role, visible evidence, reason code. |
| What did the tool change? | Tool event, side effect id, system-of-record state. |
| What output reached a person or record? | Output hash, record id, delivery event. |
| Can it be independently challenged? | Audit query, provenance graph, source refs and controlled access. |
Operational rule:
No high-risk agentic workflow should pass release readiness if it cannot answer at least one audit query for every high-impact action.
6. Evidence Chain Template With Completed Example
Use this completed example as the writing standard.
| Evidence chain field | Payment operations repair queue agent |
|---|---|
| Process claim | Agent may suggest and execute reversible low-risk repair updates after policy allow decision; irreversible or customer-impacting updates require dual control. |
| Requirement | Repair action must have tool risk tier, policy decision, approval if required, idempotency key and side effect id. |
| Control objective | Prevent unauthorized, duplicate or unrecoverable payment repair actions. |
| Event evidence | ai.policy.decided, ai.tool.dry_run_completed, ai.approval.decided, ai.tool.completed. |
| Source evidence | payment exception record, repair queue state, ledger or settlement reference. |
| Audit query | Show repair actions where side effect id exists but approval or idempotency evidence is missing. |
| Sampling | 100% duplicate action query plus sample of high-value repairs and manual route cases. |
| Outcome | repair backlog aging, duplicate repair rate, settlement break trend. |
| Control counterweight | unauthorized repair count, reconciliation mismatch, exception aging. |
| Owner | Payment Operations Process Owner. |
7. Exception And Override Runbook
7.1 Classification
| Category | Definition | Example |
|---|---|---|
| Justified exception | Approved deviation with reason, owner, expiry and compensating control. | Backup reviewer approved case under continuity procedure. |
| Documentation defect | Control likely operated, but evidence field is incomplete. | Reviewer reason code missing while visible evidence and approval exist. |
| Control failure | Required control absent, expired, mismatched or bypassed. | Tool write action executed with no valid approval. |
| Process drift | Repeated deviations reveal actual process differs from approved model. | Reviewers consistently bypass source-citation step. |
| Process model gap | Approved model omitted a legitimate path. | AML multi-jurisdiction escalation not represented in workflow. |
7.2 Override Record
| Field | Completed AML example |
|---|---|
| override id | ovr_aml_20260630_017 |
| baseline rule | high-risk alert narratives require two source categories before draft save |
| reason | adverse media source temporarily unavailable; transaction and KYC evidence sufficient for internal draft |
| approver | AML supervisor independent from original analyst |
| scope | internal draft only; no SAR filing or customer action |
| compensating control | second analyst QA within 24 hours and adverse media refresh before final disposition |
| expiry | closes when source refresh completes or case reaches final disposition |
| learning path | source availability incident added to AML copilot reliability review |
7.3 Escalation
| Trigger | Escalation path |
|---|---|
| customer-impacting action without approval | stop workflow, preserve evidence, notify process owner and risk |
| expired exception used in production | restrict affected path, review all related cases |
| repeated documentation defects | product and operations fix UI or training |
| process drift above threshold | process owner reviews model, SOP and control design |
| sensitive evidence overexposure | security/privacy incident route and access review |
8. Segregation Of Duties Controls
| Control | Implementation |
|---|---|
| Requester cannot approve own high-risk action | HITL workflow checks requester and approver identity hash and role. |
| Prompt author cannot approve release alone | Release workflow requires separate reviewer and production approver. |
| Tool gateway owner cannot override policy alone | Policy exception requires process owner or risk owner approval based on risk tier. |
| Reviewer must see exact evidence set | Approval event records visible evidence hash and approval scope. |
| Break-glass is monitored | Break-glass event requires reason, expiry, post-action review and sample inclusion. |
| Evidence access is itself auditable | Replay workbench logs query purpose, requester, fields viewed and export approvals. |
Independent challenge checklist:
- Can an authorized reviewer query the population without delivery-team manual filtering?
- Can a reviewer trace from output to prompt, policy, approval, tool and source system?
- Can a reviewer identify missing evidence and classify exceptions?
- Can a reviewer see version set and release bundle for the incident window?
- Can sensitive evidence be reviewed under controlled access without broad exposure?
9. Process Conformance Dashboard
Dashboard sections:
| Section | Metrics |
|---|---|
| Workflow volume | cases by workflow type, risk tier, channel, agent version |
| Required event coverage | intent, plan, policy, approval, tool, output, feedback coverage |
| Approval quality | visible evidence present, reason codes, expiry, SoD checks |
| Tool action control | tool calls by risk tier, approval requirement, side effect, idempotency |
| Exception management | open exceptions, aging, expiry breaches, closure evidence |
| Process conformance | conformant, justified exception, control failure, process drift |
| Outcome evidence | cycle time, backlog, rework, quality, adoption |
| Control counterweights | complaints, QA findings, unsupported claims, duplicate actions, incidents |
| Replay readiness | traces with complete version set, event completeness, evidence gaps |
Conformance thresholds should be risk-tiered. Example:
| Workflow tier | Required coverage |
|---|---|
| high-risk customer-impacting | full required event coverage and 100% automated exception queries |
| medium-risk employee-assist | full trace for sampled cases plus mandatory output and approval events |
| low-risk internal draft | trace and output evidence with sampling-based review |
10. Financial Retail Control Patterns
AML Investigation Copilot
| Control | Evidence |
|---|---|
| Analyst owns disposition | final disposition event by analyst, no auto-close tool permission |
| Source-backed timeline | transaction, KYC, adverse media and policy refs |
| SAR boundary | policy block for SAR filing or final regulatory conclusion |
| QA sample | high-risk typology and edited narratives sampled |
| Incident learning | omitted evidence converted into regression scenario |
Payment Dispute Assistant
| Control | Evidence |
|---|---|
| Maker-checker for submission | approval id tied to chargeback tool input hash |
| Network rule source | citation to rule source and version |
| Customer letter review | approved final output hash and delivery record |
| Provisional credit exception | reason, owner, expiry and customer impact |
| Outcome counterweight | rework, complaint, dispute loss and QA findings |
KYC Onboarding Agent
| Control | Evidence |
|---|---|
| No automated rejection | rejection tool absent or blocked; reviewer decision required |
| Document source span | extracted fields tied to document refs |
| High-risk escalation | policy obligation and senior reviewer approval |
| Customer communication | draft, edit diff, approved final message |
| Appeal and recourse | process output includes route and record |
Collections Hardship Case Agent
| Control | Evidence |
|---|---|
| Vulnerable customer handling | vulnerability indicator, policy obligation, specialist review |
| Treatment recommendation | facts, policy source, plan rationale |
| Override governance | override reason, supervisor review, QA sample |
| Fair outcome monitoring | treatment distribution, complaint and repeat contact |
| Communication boundary | approved message and customer record |
Regulatory Reporting Narrative Drafter
| Control | Evidence |
|---|---|
| Data lineage | metric id, source-of-record, transformation and report period |
| No unsupported claim | material claim citation map and policy block |
| Maker-checker | preparer, reviewer, visible evidence and edit diff |
| Signer boundary | authorized signer remains outside AI workflow automation |
| Retention | report evidence pack and record retention class |
Payment Operations Repair Queue Agent
| Control | Evidence |
|---|---|
| Tool risk tier | read, reversible write, irreversible write classification |
| Dual control | approval for high-value or customer-impacting repair |
| Idempotency | idempotency key, side effect id, retry event |
| Reconciliation | repair action tied to settlement and ledger state |
| Compensating action | reversal or manual correction record |
11. Operating Model
11.1 RACI
| Activity | AI PM | Architect | BA | Process Owner | Risk / Compliance | Internal Audit Partner | Platform |
|---|---|---|---|---|---|---|---|
| Process claim | A | C | R | A | C | C | I |
| Event dictionary | C | A | R | C | C | C | R |
| Evidence architecture | C | A | C | C | C | C | R |
| Audit query catalog | C | R | R | C | C | A/C | C |
| Sampling plan | C | C | C | A | R | C | I |
| Exception register | R | C | R | A | C | C | I |
| Incident replay | R | R | C | A | A/C | C | R |
| Management action | A | R | R | A | C | I | C |
11.2 Cadence
| Cadence | Participants | Output |
|---|---|---|
| Pre-pilot evidence design | PM, BA, architect, process owner, risk, platform | process claim, event contract, audit queries |
| Release readiness review | PM, architect, operations, risk, process owner | replay readiness decision and conditions |
| Weekly exception review | process owner, operations, PM, risk | exception aging and closure actions |
| Monthly conformance review | process owner, BA, PM, risk, audit partner as appropriate | conformance report and control actions |
| Incident replay review | incident manager, process owner, platform, risk, legal as needed | replay packet, remediation and learning |
| Quarterly management review | leadership, process owner, AI governance | trend, residual risk, investment and improvement decisions |
12. 30 / 60 / 90 Day Implementation Plan
| Period | Deliverables |
|---|---|
| Days 1-30 | select one high-risk workflow, define process claim, event dictionary, audit queries, event envelope, retention classes and replay readiness criteria |
| Days 31-60 | instrument orchestrator, policy engine, tool gateway and HITL workflow; build event store, trace linkage, redaction profile and conformance dashboard v1 |
| Days 61-90 | run pilot replay drills, execute sampling plan, complete incident replay exercise, close evidence gaps, publish operating cadence and management review pack |
Milestone exit criteria:
| Milestone | Exit criteria |
|---|---|
| Design ready | process claim, events, controls, queries and retention reviewed by process owner and architecture |
| Pilot ready | required events emitted in test, replay workbench can reconstruct sample case |
| Release ready | high-risk audit queries return no blocking evidence gaps; exceptions have owner and expiry |
| Scale ready | conformance, outcome and control counterweight trends support broader use |
13. Anti-Patterns And Corrections
| Anti-pattern | Correction |
|---|---|
| Save only final answer | Save event chain from intent to output and feedback. |
| Treat trace as audit proof | Link trace to policy, approval, source and side-effect evidence. |
| Store raw content everywhere | Use redaction, hashes, pointers and restricted raw evidence zones. |
| Approvals not scoped | Bind approval to visible evidence hash, input hash, output hash or action scope. |
| No negative-path evidence | Capture refusals, blocks, escalations, failed tools and abandoned plans. |
| Sampling only successful workflows | Include overrides, incidents, complaints, policy blocks and high-risk slices. |
| Process owner absent | Make process owner accountable for conformance, exceptions and outcome counterweights. |
| Internal audit treated as control owner | Use internal audit partner for challenge and review input, not management control operation. |
| Replay promises exact reproduction | Preserve version set and evidence, while documenting nondeterministic limits. |
| Exceptions never expire | Every exception has owner, expiry, compensating control and closure evidence. |
14. Interview Answers
Question 1: How would you design auditability for an AI agent that executes workflow actions?
30-second answer:
I would start with process claims and then instrument the agent as an event-sourced workflow. Every high-impact action needs evidence for intent, plan, policy decision, tool input, approval, side effect, output and feedback. I would connect OpenTelemetry traces to a domain event store and provenance graph, then define audit queries and sampling before release.
2-minute answer:
For an agentic workflow, auditability cannot be bolted on by saving chat transcripts. I define the business process claim first, such as a dispute assistant may draft evidence packets but cannot submit a chargeback without maker-checker approval. That claim becomes requirements and controls.
Architecturally, the orchestrator emits events for intent, plan, observations, policy decisions, tool dry-runs, approvals, tool executions, outputs, exceptions and feedback. The tool gateway enforces policy, idempotency, approval binding and side-effect logging. The HITL workflow records what the reviewer saw, what they decided, and the reason. OpenTelemetry gives operational traces, while the event store and provenance graph provide business replay and causal evidence.
Before release, I define audit queries such as all customer-impacting actions without valid approval, all outputs delivered after approval but changed from the approved draft, and all expired exceptions. Then I set a risk-based sampling plan. This does not create audit sign-off by itself, but it gives process owners, risk and internal audit partners a strong evidence base for review and challenge.
Question 2: How do you handle workflow replay when LLM output is not deterministic?
30-second answer:
I avoid promising exact reproduction. I preserve the version set, prompt/config hash, model route, retrieved chunks, policy decision, tool inputs, approval records, output hash and business system state. Replay reconstructs evidence and control behavior, while documenting model nondeterminism and vendor version limits.
Question 3: What is a good audit query for agentic workflows?
30-second answer:
A good audit query tests a process claim or control, not just a log field. For example: show all payment repair tool executions where the action was customer-impacting and the approval was missing, expired, scoped to a different input hash or performed by the requester.
Question 4: How would you distinguish justified exception from control failure?
30-second answer:
A justified exception has an authorized reason, owner, expiry, compensating control and closure evidence. A control failure means a required control was absent, bypassed, expired or mismatched. The replay evidence should support that classification through policy, approval, tool and exception events.
Question 5: What should a process owner see monthly?
30-second answer:
The process owner should see workflow volume, required event coverage, process conformance, open and expired exceptions, approval quality, tool action controls, outcome metrics, control counterweights, incidents, evidence gaps and management actions with owners and due dates.
15. Portfolio Exercise
Build a portfolio-ready replay assurance pack for the KYC onboarding agent.
Completed scenario:
The KYC onboarding agent classifies submitted documents, identifies missing beneficial ownership evidence and drafts customer follow-up messages. It cannot reject an applicant or mark onboarding complete without human review. High-risk jurisdiction cases require senior reviewer approval. Success is measured by reduced document rework and faster first-pass completion, with no increase in complaint, appeal or QA exception rates.
Required artifacts:
| Artifact | Content |
|---|---|
| Process claim | scope, agent boundary, human boundary, policy boundary, outcome evidence |
| Event dictionary | intent, plan, observation, policy, action, approval, exception, output, feedback, incident |
| Event schema | common envelope and five payload schemas |
| Replay architecture | orchestrator, policy, tool gateway, HITL, trace store, event store, evidence lake, provenance graph |
| Causal graph | one onboarding case from intent to customer follow-up |
| Audit query catalog | at least 10 queries tied to process claims |
| Control matrix | at least 12 controls with evidence, owner, frequency and pass criteria |
| Sampling plan | population, method, size rationale, pass criteria and exception classes |
| Exception register | high-risk jurisdiction, stale document source, reviewer override, customer communication issue |
| Incident replay packet | example: incorrect missing-document message sent to customer |
| Dashboard mock | conformance, event coverage, approval quality, exceptions, outcome and counterweights |
| Executive narrative | process owner review memo with decision, evidence, uncertainty and action plan |
Evaluation rubric:
| Criterion | Strong answer |
|---|---|
| Process specificity | Agent and human boundaries are unambiguous. |
| Evidence completeness | Every high-impact step has event, trace and source evidence. |
| Control depth | Approval, SoD, exception, tool and policy controls are testable. |
| Replay quality | Timeline and causal graph can reconstruct case behavior. |
| Privacy discipline | Raw content is minimized and access controlled. |
| Sampling maturity | Includes negative paths, overrides, high-risk slices and incidents. |
| Business realism | KYC outcomes and customer recourse are included. |
16. Self-Check Checklist
| Check | Pass standard |
|---|---|
| Target audience clear | PM, architect, BA, process owner, risk and internal audit partner roles are explicit. |
| Process claim defined | The workflow scope, agent boundary, human boundary and control counterweight are specific. |
| Event schema complete | Intent, plan, action, observation, policy, approval, exception, output, feedback and incident events are covered. |
| Replay architecture complete | Trace store, event store, evidence lake, provenance graph, replay workbench and access controls are included. |
| Audit trail vs observability separated | Operational traces and control evidence have distinct but linked responsibilities. |
| Evidence chain present | Business outcome, process conformance, workflow events, source records and versions are connected. |
| Exception handling operational | Overrides have reason, owner, expiry, compensating control and learning path. |
| SoD addressed | Request, approval, release, evidence access and independent challenge are separated by risk. |
| Sampling executable | Population, method, pass criteria, exception classes and owner are specified. |
| Process conformance usable | Dashboard classifies conformant cases, justified exceptions, control failures and process drift. |
| Incident replay ready | Packet covers scope, version set, timeline, causal graph, impact, control analysis and learning. |
| Financial retail examples included | AML, disputes, KYC, collections, regulatory reporting and payment repair are represented. |
| Assurance language accurate | The playbook supports review and challenge without claiming audit approval. |
17. Closing Synthesis
Agentic workflow replay is mature when a process owner can say:
We know what the agent was allowed to do, what it actually did, what evidence it used, who approved high-impact actions, which exceptions were justified, which controls failed, how outcomes changed and what we learned from incidents.
For AI PMs, architects and CBAP-level BAs, this is the difference between workflow automation and workflow assurance.