AI Continuous Control Monitoring / Assurance Playbook
本文不是基础 BA 教程, 也不是通用审计材料。它面向已经理解 AI control library、audit evidence binder、model risk、business process、requirements traceability 和 financial retail controls 的高级学习者。重点不是“有哪些控制”, 而是“控制是否持续运行、是否有效、是否弱化、谁负责
AI Continuous Control Monitoring / Assurance Architecture Playbook
适用对象: CBAP+ Business Analyst、AI Product Manager、Product Architect、Enterprise Architect、Control Owner、EvalOps Lead、Model Risk、Operational Risk、Compliance、Internal Audit、金融零售业务负责人。 核心问题: 金融零售 AI 系统上线后, 如何把 controls、evals、telemetry、incidents、exceptions、audit evidence 和 control owners 转成持续运行的 assurance system, 证明控制不是只在上线当天有效。 目标: 建立一套可落地的 AI continuous control monitoring architecture, 覆盖 capability model、operating model、control test taxonomy、event schema、sampling、KRI dashboard、RACI、lifecycle gates、financial retail examples、templates、30-day lab 和 interview answers。
本文不是基础 BA 教程, 也不是通用审计材料。它面向已经理解 AI control library、audit evidence binder、model risk、business process、requirements traceability 和 financial retail controls 的高级学习者。重点不是“有哪些控制”, 而是“控制是否持续运行、是否有效、是否弱化、谁负责、什么时候升级、管理层如何知道风险正在变化”。
重要说明: 本文是学习、架构设计和作品集材料, 不构成法律、监管、审计、模型验证或合规意见。真实金融机构项目必须由 business owner、technology、security、privacy、legal、compliance、model risk、operational risk、third-party risk 和 internal audit 按机构政策确认。
1. Executive Framing
AI governance 常见三层成熟度:
| Maturity | 表现 | 风险 |
|---|---|---|
| Static control list | 有政策、控制库、上线 checklist | 不知道控制是否运行 |
| Evidence binder | 有上线证据、审批记录、测试报告 | 证据偏静态, 生产退化不可见 |
| Continuous assurance | 控制被事件化、测试化、指标化、责任化、复核化 | 需要架构、流程和组织共同运行 |
AI continuous control monitoring, 简称 CCM:
AI CCM is the continuous process of testing whether AI controls operate as designed, detecting exceptions and KRIs, assigning owners, driving management action, and preserving evidence of control effectiveness over time.
它回答 10 个管理层问题:
- 哪些 AI controls 被视为 material controls?
- 每个 control 的 owner 是谁?
- 控制如何被自动测试、抽样测试或人工复核?
- 控制失败如何定义 severity?
- 哪些 exceptions 已打开、超期、重复发生?
- 哪些 KRIs 表明客户伤害、合规缺口或运营风险正在上升?
- 哪些 incidents 说明 control design 不足?
- 哪些 management actions 已完成并验证有效?
- 哪些 evidence 可以证明控制运行有效?
- 哪些 GenAI / agentic AI 风险不能被旧模型风险模板充分覆盖?
相邻资产边界:
| Asset | 主要产物 | 本手册关注 |
|---|---|---|
| AI Control Library | control objective、activity、risk-control mapping | recurring tests and operating effectiveness |
| AI Audit Evidence Binder | evidence index、traceability | evidence freshness and exception closure |
| AI Release Governance | gate、canary、rollback | post-release control monitoring |
| AI Observability | traces、metrics、logs | telemetry mapped to controls and KRIs |
| Model Risk Management | inventory、validation、monitoring | expanded prompt/RAG/tool/agent/workflow assurance |
成熟表达:
We monitor control effectiveness by linking AI runtime events to control objectives,
control owners, test methods, KRIs, exceptions, incidents, action plans and evidence.
2. Source Anchors
| Anchor | Official link | 本手册使用方式 |
|---|---|---|
| NIST AI RMF | https://www.nist.gov/itl/ai-risk-management-framework | 用 AI risk management 的结构化语言组织 govern、map、measure、manage, 强调持续治理 |
| NIST AIRC AI RMF Functions | https://airc.nist.gov/airmf-resources/airmf/ | 用 Govern / Map / Measure / Manage 映射 owner、context、metric/test 和 treatment |
| ISO/IEC 42001 | https://www.iso.org/standard/42001 | 把 CCM 放进 AI management system: operational control、performance evaluation、management review、improvement |
| Federal Reserve SR 26-2 | https://www.federalreserve.gov/supervisionreg/srletters/SR2602.htm | 截至 2026-06-30 的美国模型风险治理锚点: SR 26-2 于 2026-04-17 superseded SR 11-7 and SR 21-8 |
SR 26-2 nuance:
- SR 26-2 在 2026-04-17 已 superseded SR 11-7 and SR 21-8。
- 传统模型风险中的 risk-based approach、inventory、validation、ongoing monitoring、effective challenge 和 control ownership 仍有价值。
- GenAI、RAG、agentic AI、tool-using AI 和 multi-agent workflow 应使用更宽的 AI governance and control assurance framework。
- NIST AI RMF、ISO/IEC 42001、operational risk、privacy、security、third-party risk、customer harm、incident management 和 business controls 要放在同一 assurance architecture 中。
- 面试和作品集中不要把 SR 11-7 当成现行主引用; 可以作为历史概念, 但现行锚点应写 SR 26-2。
NIST AI RMF functions 映射:
| Function | CCM interpretation | Example |
|---|---|---|
| Govern | control owners、risk appetite、RACI、management review | AI control registry |
| Map | use case、business process、customer outcome、data flow | lending explanation mapped to adverse action |
| Measure | control tests、evals、sampling、KRIs、dashboards | citation support test |
| Manage | exceptions、incidents、actions、rollback、risk acceptance | disable write tool after control failure |
3. Capability Model
AI CCM 不是单个工具, 而是一组能力:
Policy and Control Layer
-> Control Registry
-> Control Test Catalog
-> Telemetry and Evidence Contracts
-> Automated Control Checks
-> Sampling and Independent Review
-> Exception and Action Management
-> KRI Dashboards
-> Management Review
-> Assurance Evidence Archive
| Capability | Purpose | Minimum viable implementation | Mature implementation |
|---|---|---|---|
| Control registry | 记录 controls、owners、cadence、risk mapping | GRC table | Versioned registry linked to risks, requirements and telemetry |
| Control test catalog | 把 objective 转成 test method | Manual checklist | Automated and sampled tests with pass/fail logic |
| Evidence contract | 定义 runtime 必须捕获字段 | Trace ID and model version | Control ID, release ID, policy decision, tool version, human action |
| Event collection | 收集 AI runtime and workflow events | Logs | Event stream with schema validation and lineage |
| Automated checks | 自动发现控制失败 | Scheduled SQL checks | Policy-as-code, eval pipelines, drift checks and alerting |
| Sampling | 抽检高风险输出 | Weekly manual review | Risk-stratified independent sampling |
| Exception workflow | 记录控制失败和处置 | Ticket queue | Severity, owner, due date, root cause, retest, residual risk |
| KRI dashboard | 管理层看控制健康 | Basic dashboard | Appetite thresholds, trend, segment, aging and action status |
| Management review | 定期评价控制有效性 | Monthly meeting | Evidence-based review with decisions and escalations |
| Assurance archive | 保存控制测试证据 | Shared folder | Access-controlled evidence store with retention and retrieval |
Architecture:
AI Runtime
-> event pipeline with schema validation and PII redaction
-> control test engine with rules, evals, drift checks and evidence checks
-> assurance operations with exceptions, incidents and actions
-> dashboards, monthly memos, audit packs and regulator response
Design principles: treat controls as operational products; test operating effectiveness, not only model performance; link every KRI to risk appetite; create exceptions automatically where possible; use risk-based sampling; preserve traceability from customer outcome to AI artifact versions; close incidents only after control improvements are retested; review GenAI and agentic AI under broader AI assurance, not old MRM alone.
4. Operating Model and RACI
| Role | Responsibility |
|---|---|
| Business Control Owner | Owns business risk, approves objective, accepts residual risk |
| AI Product Manager | Owns product behavior, customer impact and risk-value tradeoffs |
| CBAP+ BA | Maps process, requirements, decisions, exceptions, evidence and owners |
| Product Architect | Designs control plane, telemetry, policy gateway, rollback and evidence architecture |
| EvalOps Lead | Designs evals, regression tests, sampling and performance monitoring |
| Model Risk / AI Risk | Provides independent challenge and monitoring expectations |
| Compliance / Legal | Confirms regulatory interpretation, disclosures and complaint concerns |
| Security / Privacy | Reviews data handling, access, redaction, retention and third-party exposure |
| Operations Lead | Runs queues, human review, exception triage, remediation and training |
| Internal Audit | Reviews control design, operating effectiveness and evidence quality |
| Forum | Frequency | Inputs | Decisions |
|---|---|---|---|
| Daily control triage | Daily for Tier 2/3 | critical exceptions, KRI breaches, incidents | pause, escalate, assign action |
| Weekly assurance review | Weekly | test results, sampling failures, exception aging | remediate, retest, adjust threshold |
| Monthly effectiveness review | Monthly | trend dashboards, repeat failures, action closure | control rating, management actions |
| Quarterly AI management review | Quarterly | portfolio KRIs, audit findings, risk appetite | investment, policy update, risk acceptance |
| Incident-triggered review | Event-driven | incident report, affected traces, root cause | rollback, customer remediation, redesign |
RACI:
| Activity | Business | AI PM | BA | Architect | EvalOps | Risk/Compliance | Security/Privacy | Operations | Audit |
|---|---|---|---|---|---|---|---|---|---|
| Define control objective | A | R | R | C | C | C | C | C | C |
| Map process and decision impact | C | R | A/R | C | C | C | C | R | C |
| Define evidence contract | C | C | R | A/R | R | C | A/R | C | C |
| Build automated test | C | C | C | R | A/R | C | C | C | C |
| Define sampling plan | C | C | R | C | A/R | A/R | C | R | C |
| Review failed test | A | R | R | C | R | C | C | A/R | C |
| Approve residual risk | A | C | C | C | C | A/R | C | C | C |
| Close management action | A | R | C | C | R | C | C | R | C |
| Assess effectiveness | C | C | C | C | C | A/R | C | C | A/R |
Legend: A = accountable, R = responsible, C = consulted.
Control states: Designed -> Implemented -> Testing -> Effective -> Monitoring -> Exception -> Remediation -> Retest -> Effective. Exceptions can also move to Accepted Risk or Escalated; effective controls can be retired when the risk, process or system no longer applies.
5. Control Test Taxonomy
| Test family | Purpose | AI example |
|---|---|---|
| Completeness | Confirm every required control event exists | Every Tier 3 tool write has approval token |
| Accuracy | Confirm control decision is correct | Policy classifier blocks prohibited advice |
| Authorization | Confirm actor/system has authority | Only approved reviewer releases adverse action explanation |
| Timeliness | Confirm control runs within required time | AML escalation review completed within SLA |
| Threshold | Confirm metric stays within appetite | Unsupported answer rate below approved limit |
| Drift | Detect distribution or behavior shift | Retrieval source mix changed after corpus update |
| Reconciliation | Compare two systems or records | AI action log reconciles to CRM task creation |
| Sampling | Human or independent review of selected population | Weekly review of high-risk transcripts |
| Reperformance | Independent retest of control | Second-line reruns eval on locked dataset |
| Evidence quality | Confirm evidence is complete and usable | Trace has release_id, control_id and versions |
| Exception aging | Confirm open issues are managed | High severity exceptions not open beyond 14 days |
| Action effectiveness | Confirm remediation solved issue | Post-fix retest below threshold |
By control object:
| Control object | Example test | Cadence |
|---|---|---|
| Prompt | Prompt version belongs to approved registry | Every deploy and daily sample |
| Model route | Route uses approved model and region | Real time |
| RAG corpus | Citation source appears in approved manifest | Hourly and release-triggered |
| Retrieval | Critical policy document recall remains above threshold | Daily |
| Reranker | Compliance documents not demoted below rank 3 | Daily |
| Tool call | Write action has policy token and human approval | Real time |
| Policy gateway | Prohibited action is blocked | Daily |
| Human review | Required reviewer completed review before output | Real time |
| Telemetry | Required evidence fields present and redacted | Hourly |
| Incident | Incident created for critical KRI breach | Real time |
| Vendor | Vendor notice reviewed before material route change | Event-driven |
| Training | Staff completed procedure update | Monthly or release-triggered |
6. Event and Schema Examples
Control test event:
{
"event_type": "ai.control_test_result",
"event_time": "2026-06-30T14:15:03Z",
"control_id": "AI-CS-CRM-003",
"control_name": "Human confirmation required before CRM write",
"use_case_id": "customer_service_agent_assist",
"risk_tier": "Tier3",
"test_id": "TST-CS-CRM-003-REALTIME",
"test_method": "automated_completeness_check",
"population_count": 1,
"sample_count": 1,
"pass_count": 1,
"fail_count": 0,
"result": "pass",
"release_id": "rel-2026-06-30-004",
"trace_id": "trc-9d71f4",
"evidence_uri": "evidence://ai-control-tests/2026/06/30/trc-9d71f4",
"owner_team": "Customer Service Operations"
}
AI runtime event enriched for control testing:
{
"event_type": "ai.tool_call",
"event_time": "2026-06-30T14:14:59Z",
"trace_id": "trc-9d71f4",
"customer_segment": "retail_checking",
"channel": "contact_center",
"model_version": "approved-route-cs-2026-06",
"prompt_version": "cs_agent_prompt_v18",
"rag_index_version": "retail_policy_index_2026_06_28",
"tool_name": "crm.createFollowUpTask",
"tool_contract_version": "crm-followup-v3",
"tool_action_type": "write",
"policy_decision": "allow_with_human_confirmation",
"approval_token_id": "appr-6b38",
"approver_role": "licensed_contact_center_rep",
"pii_redaction_status": "redacted",
"control_ids": ["AI-CS-CRM-003", "AI-PRIV-LOG-002", "AI-OPS-HITL-005"]
}
Exception event:
{
"event_type": "ai.control_exception_opened",
"event_time": "2026-06-30T15:00:00Z",
"exception_id": "EX-AI-2026-0630-017",
"control_id": "AI-RAG-CITE-004",
"severity": "high",
"reason": "Unsupported citation rate breached approved threshold for complaint policy questions",
"detected_by": "daily_retrieval_support_test",
"affected_use_case": "customer_service_agent_assist",
"affected_release_id": "rel-2026-06-30-004",
"owner_team": "Knowledge Operations",
"due_date": "2026-07-05",
"required_action": "Roll back complaint SOP index snapshot and retest critical complaint sample"
}
KRI specification:
kri_id: KRI-AI-CS-007
name: Unsupported citation rate for customer-visible answers
risk_theme: Customer harm and misleading information
control_objective: Customer-visible answers must be grounded in approved knowledge sources
population: all customer_service_agent_assist responses with citations
metric_formula: unsupported_citation_count / cited_response_count
green_threshold: <= 0.25%
amber_threshold: > 0.25% and <= 0.75%
red_threshold: > 0.75%
segmentation: [channel, product, customer_segment, policy_domain]
owner: Knowledge Operations
review_forum: Weekly AI Assurance Review
management_action_red: pause index ramp, open high severity exception, sample affected answers, retest corpus manifest
evidence_source: ai_response_events and citation_validation_results
retention: 7 years for regulated customer-impacting cases
SQL-like automated test:
select
date_trunc('day', event_time) as test_day,
count(*) as write_events,
sum(case when approval_token_id is null then 1 else 0 end) as missing_approval,
sum(case when approver_role not in ('licensed_contact_center_rep', 'supervisor') then 1 else 0 end) as invalid_approver
from ai_tool_call_events
where tool_action_type = 'write'
and risk_tier = 'Tier3'
group by 1
having missing_approval > 0
or invalid_approver > 0;
7. Dashboards and KRIs
Executive dashboard:
| Widget | Question answered | Red signal |
|---|---|---|
| Control health by use case | Which AI systems have failing material controls? | Any Tier 3 critical control failed |
| Exception aging | Which exceptions are overdue? | High severity open over 14 days |
| Repeat failures | Which controls fail repeatedly? | Same control fails 3 times in 60 days |
| Customer harm signals | Are complaints, reversals or adverse outcomes rising? | Above baseline plus appetite |
| Evidence completeness | Can we prove control operation? | Required evidence coverage below 99% for Tier 3 |
| Action closure | Are actions completed and retested? | Closed without retest evidence |
Product and operations KRIs:
| KRI | Why it matters | Segments |
|---|---|---|
| Human override rate | High override means AI may be low quality or mistrusted | product, queue, agent role |
| Manual review backlog | HITL control can fail when queue is overloaded | queue, risk tier, region |
| Tool write reversal rate | Indicates harmful system-of-record actions | tool, field, customer segment |
| Complaint escalation miss rate | Direct customer harm and regulatory risk | complaint type, product, channel |
| Fallback route rate | Fallback may bypass controls or degrade quality | model route, vendor, channel |
| Unsupported answer rate | Measures grounding control effectiveness | policy domain, language, channel |
Risk and compliance KRIs:
| KRI | Interpretation | Action when red |
|---|---|---|
| Policy violation count | Output or action breached guardrail | Freeze affected capability |
| Adverse action reason mismatch | Lending explanation not aligned to decision reason | Stop customer-visible path |
| AML typology miss rate | Copilot misses known scenario | Escalate to AML governance |
| PII telemetry failure | Sensitive data logged outside approved fields | Trigger privacy incident assessment |
| Vendor route drift | Requests routed to unapproved model, region or endpoint | Disable route and review vendor controls |
| Evidence missing rate | Audit trail incomplete for material control | Hold release or issue exception |
Metric design rules: use both lagging indicators and leading KRIs; display trend and baseline; segment by use case, channel, product, customer segment, geography and risk tier; separate control failure from business performance decline; tie each red threshold to a named management action.
8. Lifecycle Gates
| Gate | Required decisions | Exit criteria |
|---|---|---|
| Use case intake | Is AI use case in CCM scope? Which controls are material? Who owns them? | Control registry entries, owners, evidence contract, KRI appetite |
| Pilot | Are automated tests running? Does traffic carry metadata? Are exceptions created? | No unresolved critical design gap, rollback tested, evidence retrievable |
| Release | Did evals and control tests pass? Are red KRIs absent or accepted? | Release memo includes test results, dashboard live, evidence archived |
| Scale | Did effectiveness remain stable during ramp? Are segments behaving differently? | 14/30-day review complete, open exceptions have action plans |
| Periodic review | Should controls be enhanced, retired or reclassified? Are KRIs still meaningful? | Monthly memo archived, registry updated, repeat failures analyzed |
| Incident review | Did incident expose control design gap? Are customers affected? | containment, root cause, control redesign, retest and customer remediation |
9. Financial Retail Examples
9.1 AML investigation copilot
Use case:
AI helps analysts summarize alerts, retrieve typology guidance, draft suspicious activity narratives and suggest next investigation steps. Human analysts remain accountable for final decision.
| Control | Continuous test | KRI |
|---|---|---|
| Approved typology grounding | Draft citations reference approved AML corpus | unsupported typology citation rate |
| No autonomous SAR filing | AI cannot submit SAR or close case | autonomous restricted action count |
| Analyst review completeness | Every AI narrative has analyst review event | missing review rate |
| False negative challenge sample | Closed no-SAR cases sampled against typology rubric | challenge failure rate |
| Alert prioritization drift | Monitor distribution by typology and customer risk | typology drift index |
| Evidence retention | Trace links alert, retrieval, prompt, output, analyst edit | evidence completeness rate |
Actions: pause narrative auto-draft for high-failure typology, add missed scenario to eval and training, require supervisor review for affected segment until retest passes.
9.2 Lending decision explanation
Use case:
AI generates applicant-facing or banker-facing explanation text based on approved decision reason codes and policy language.
| Control | Continuous test | KRI |
|---|---|---|
| Reason-code consistency | Output reason matches system decision reason | mismatch rate |
| Prohibited wording block | Output cannot imply guaranteed approval after reapplication | prohibited phrase count |
| Fair lending segment review | Sample explanations by segment and product | segment disparity signal |
| Source policy freshness | Explanation uses current policy version | stale policy citation rate |
| Human escalation | Ambiguous or complaint-sensitive cases route to reviewer | missed escalation rate |
| Adverse action evidence | Trace links decision, reason, prompt, policy and output | audit trace completeness |
Actions: disable customer-visible explanation prompt after mismatch breach, require legal/compliance review, increase sample size for product and segment with complaint spike.
9.3 Payments fraud and dispute support
Use case:
AI supports fraud operations by ranking suspicious payment events, drafting dispute summaries and recommending next-best investigation action.
| Control | Continuous test | KRI |
|---|---|---|
| No unapproved payment block | AI recommendation cannot directly block payment | AI-only block count |
| High-value transaction review | Transactions above threshold require human review | missing high-value review rate |
| Tool permission boundary | AI can create case note but cannot reverse payment | restricted tool call attempts |
| Latency fallback control | Fallback route preserves fraud policy gateway | fallback without policy decision count |
| Customer notification accuracy | Generated dispute message matches case status | message-status mismatch rate |
| Reversal monitoring | Track reversed or corrected AI-assisted actions | reversal rate |
Actions: turn off recommendation display for payment type with high reversal rate, update policy gateway, reconcile AI action log to case management system daily.
9.4 Customer service agent assistant
Use case:
AI supports contact center employees with policy answers, call summaries, complaint detection and CRM follow-up task creation.
| Control | Continuous test | KRI |
|---|---|---|
| Approved knowledge source | Customer-facing answer cites approved source | unsupported answer rate |
| Complaint escalation | Complaint indicators route to complaint workflow | missed complaint escalation rate |
| CRM write approval | Human approval token required for CRM task | write without approval count |
| Summary accuracy sample | Summaries reviewed against transcript | material omission rate |
| PII redaction | Telemetry redacts sensitive fields | PII leakage count |
| Staff overreliance | Monitor low-edit acceptance in high-risk scenarios | blind acceptance rate |
Actions: limit assistant to read-only mode during complaint SOP index issue, retrain agents when blind acceptance increases, add exact complaint language scenarios to eval and sampling.
10. Exception, Action and Evidence Design
Severity:
| Severity | Definition | Required response |
|---|---|---|
| Critical | Customer harm, unauthorized action, regulatory breach or material evidence failure | Immediate containment, executive notification, incident assessment |
| High | Material control failed but impact appears bounded | Owner action within SLA, weekly review until closure |
| Medium | Control weakness or threshold breach with limited impact | Action plan and trend monitoring |
| Low | Documentation, evidence freshness or minor process gap | Batch remediation and monthly review |
Lifecycle:
Detect
-> classify severity
-> assign owner
-> contain if needed
-> analyze root cause
-> define action
-> implement fix
-> retest control
-> close with evidence
-> feed learning into eval, control or training
Closure requires:
- Root cause documented.
- Affected population identified.
- Customer or operational impact assessed.
- Corrective action implemented.
- Control retest passes.
- Evidence archived.
- Repeat-failure prevention defined.
- Residual risk accepted by the right owner if not fully remediated.
Evidence categories:
| Category | Examples |
|---|---|
| Design evidence | control objective, risk mapping, process map, policy mapping |
| Implementation evidence | configuration, policy-as-code, tool permission, prompt registry |
| Operating evidence | trace events, control test results, sampling records, dashboard snapshots |
| Exception evidence | exception record, root cause, action plan, approval, retest result |
| Management evidence | review minutes, risk acceptance, investment decision, control rating |
| Audit evidence | independent review, reperformance result, evidence quality score |
Evidence contract fields:
trace_id
event_time
use_case_id
risk_tier
control_id
control_test_id
release_id
model_version
prompt_version
rag_index_version
tool_contract_version
policy_ruleset_version
human_review_status
policy_decision
customer_impact_category
evidence_uri
retention_class
redaction_status
control_owner
Good CCM does not mean storing every raw prompt forever. Store structured evidence where possible, control raw prompt retention by risk and privacy policy, hash identifiers for analytics, use stricter access for lending/AML/fraud/complaint records, and make sampling artifacts retrievable without exposing unnecessary PII.
11. Templates with Filled Examples
11.1 Control test card
control_id: AI-LEND-EXP-002
control_name: Lending explanation must match approved decision reason codes
business_process: unsecured_personal_loan_adverse_action
risk_statement: Customer receives inaccurate or misleading explanation for credit decision
control_owner: Head of Credit Operations Controls
control_activity: Explanation generator uses only approved reason codes and approved policy text
test_method: automated_reason_code_reconciliation plus weekly stratified human sample
population: all AI-generated lending explanations
cadence: real_time automated check and weekly human review
pass_rule: output_reason_codes are subset of decision_engine_reason_codes and policy citation is current
failure_severity: high for mismatch, critical for customer-visible prohibited explanation
kri_link: KRI-AI-LEND-003 reason-code mismatch rate
evidence_source: ai_explanation_events, decision_engine_events, citation_validation_results
management_action: stop customer-visible explanation path when mismatch rate exceeds red threshold
11.2 Exception record
exception_id: EX-AI-LEND-2026-0630-004
opened_on: 2026-06-30
severity: high
control_id: AI-LEND-EXP-002
detected_by: automated_reason_code_reconciliation
finding: 17 explanations contained one reason not present in the decision engine record
affected_population: unsecured personal loan applications in online channel
immediate_containment: customer-visible AI explanation disabled for affected product
owner: Head of Credit Operations Controls
root_cause: prompt v22 allowed summarization of bureau factors outside approved reason-code list
corrective_action: revert to prompt v21, add reason-code schema validation, expand eval set with 80 cases
retest_result: zero mismatches in 1,200 replayed explanations
closure_evidence: evidence://exceptions/EX-AI-LEND-2026-0630-004/closure-pack
closed_on: 2026-07-03
11.3 Monthly control effectiveness memo
# Monthly AI Control Effectiveness Review: Customer Service Agent Assistant
Review period: 2026-06-01 to 2026-06-30
System: customer_service_agent_assist
Risk tier: Tier3 for CRM write-enabled workflow
Overall rating: Effective with monitored exceptions
Key results:
- CRM write approval control passed for 100% of 18,442 write events.
- Unsupported citation rate was 0.31%, within amber threshold.
- Complaint escalation miss rate improved from 0.09% to 0.04%.
- Evidence completeness was 99.72%, with missing fields concentrated in fallback route events.
Open exception:
- EX-AI-CS-2026-0628-011: fallback route missing rag_index_version in 47 traces, due 2026-07-05.
Management actions:
- Knowledge Operations will retest complaint SOP retrieval after index patch.
- Architecture will add schema enforcement for fallback route telemetry.
- Operations will increase complaint-sensitive call sampling in July.
Conclusion:
Controls operated within appetite for the review period, with one telemetry completeness exception under active remediation.
11.4 Quarterly owner attestation
# Quarterly AI Control Owner Attestation
Control owner: Head of Customer Service Operations Controls
Quarter: 2026 Q2
Controls covered: AI-CS-CRM-003, AI-CS-RAG-004, AI-CS-COMP-006, AI-PRIV-LOG-002
Attestation:
I reviewed control test results, exceptions, management actions, incident linkages and evidence completeness. Based on the evidence reviewed, the controls operated within approved risk appetite for 2026 Q2 except for the documented exception below.
Exception requiring continued monitoring:
- EX-AI-CS-2026-0628-011: fallback route missing rag_index_version, due 2026-07-05.
Required improvements:
- Enforce telemetry schema for fallback routes.
- Increase complaint-sensitive sample size from 100 to 200 per week for July.
Residual risk:
Accepted through 2026-07-15 for telemetry completeness gap in fallback route, limited to read-only answer flow and excluding CRM write events.
12. 30-Day Lab
Goal:
Build a portfolio-ready AI Continuous Control Monitoring pack for one financial retail AI use case.
Recommended scenario:
Customer service agent assistant with RAG answers, call summaries, complaint detection and human-approved CRM follow-up task creation.
| Day | Work | Output |
|---|---|---|
| 1 | Choose use case and process boundary | use case scope memo |
| 2 | Identify customer outcomes and operational decisions | decision and outcome map |
| 3 | List 12 material AI risks | risk register excerpt |
| 4 | Define 10 control objectives | control objective table |
| 5 | Assign control owners and RACI | owner matrix |
| 6 | Map controls to NIST AI RMF functions | Govern / Map / Measure / Manage mapping |
| 7 | Write one-page executive framing | CCM executive summary |
| 8 | Convert controls to test methods | control test catalog |
| 9 | Define runtime, test and exception schemas | schema examples |
| 10 | Define evidence fields and retention classes | evidence contract |
| 11 | Design automated checks for tool write, citation and telemetry completeness | test rule spec |
| 12 | Design stratified sampling plan | sampling methodology |
| 13 | Define pass/fail thresholds and severity | severity matrix |
| 14 | Run tabletop test using 20 synthetic events | test result walkthrough |
| 15 | Define 15 KRIs with owners and actions | KRI catalog |
| 16 | Design executive dashboard | dashboard wireframe in markdown |
| 17 | Design operations dashboard | operational metrics spec |
| 18 | Design risk/compliance dashboard | risk view spec |
| 19 | Write exception lifecycle | exception workflow |
| 20 | Write management action rules | action matrix |
| 21 | Draft monthly effectiveness memo | memo sample |
| 22 | Map lifecycle gates from intake to periodic review | gate checklist |
| 23 | Add AML, lending, payments and customer service variations | scenario appendix |
| 24 | Write quarterly owner attestation | attestation sample |
| 25 | Create evidence archive index | evidence map |
| 26 | Add SR 26-2 nuance and GenAI/agentic governance position | regulatory nuance note |
| 27 | Review distinction from control library and evidence binder | positioning note |
| 28 | Write interview 30-second and 2-minute answers | interview pack |
| 29 | Assemble final portfolio pack | PDF-ready markdown package |
| 30 | Conduct self-review against rubric | final quality checklist |
Rubric:
| Dimension | Strong answer |
|---|---|
| Control thinking | Converts risks into control objectives, not generic metrics |
| Testability | Every material control has population, cadence, pass rule and evidence |
| Financial realism | Handles AML, lending, payments, complaints, privacy and operations |
| Architecture | Shows telemetry, schema, test engine, exception workflow and evidence store |
| Management action | Red thresholds trigger named actions and owners |
| Regulatory nuance | Uses SR 26-2 as current anchor and broader AI governance for GenAI/agents |
| Interview readiness | Explains distinction from control library and audit binder clearly |
13. Interview Answers
13.1 What is AI continuous control monitoring?
30-second answer:
AI continuous control monitoring is the operating system for proving AI controls remain effective after launch. It links control objectives to runtime events, automated tests, sampling, KRIs, exceptions, owners, management actions and evidence. It is different from a static control library because it tests whether controls actually operate over time.
2-minute answer:
In financial retail, AI risk does not stop at release. A lending explanation prompt can drift, a RAG index can lose a critical policy, a customer service agent can gain write access, or a human review queue can become overloaded. Continuous control monitoring turns these risks into recurring control tests. For each material control, I define the owner, population, cadence, evidence fields, pass/fail rule, threshold, exception severity and management action. Then I instrument runtime events so traces carry model, prompt, index, tool, policy, human review and release metadata. Automated checks catch completeness, authorization, threshold and evidence gaps; risk-based sampling catches nuanced customer harm. Dashboards show KRI trends, exception aging, repeat failures and action closure. The goal is operating effectiveness over time, not a one-time audit screenshot.
13.2 How is this different from an audit evidence binder?
An audit evidence binder organizes proof for review. Continuous control monitoring generates and tests that proof continuously. The binder answers “what evidence do we have”; CCM answers “did the control operate, did it pass, what failed, who owns the action and is effectiveness improving or degrading.”
13.3 How would you design control tests for an agentic AI system?
I would test the full behavior chain: prompt, model route, retrieval, policy decision, tool permission, human confirmation, system-of-record write, telemetry and exception handling. For every write-enabled tool, I would require authorization tests, idempotency or compensating control evidence, approval-token completeness, reconciliation to the target system and reversal-rate KRIs.
13.4 What KRIs matter for customer-facing financial AI?
I would track unsupported answer rate, complaint spike, adverse action reason mismatch, tool write reversal, missing human review, PII telemetry failure, fallback route without policy decision, override spike, exception aging, evidence completeness and repeat control failures. The point is not to collect many metrics, but to connect each KRI to risk appetite and action.
13.5 How should SR 26-2 affect GenAI governance?
SR 26-2 superseded SR 11-7 and SR 21-8 on 2026-04-17, so I would not cite SR 11-7 as the current anchor. I would keep the useful principles of risk-based governance, validation, ongoing monitoring and effective challenge, but govern GenAI and agentic AI through a broader AI assurance architecture that also covers NIST AI RMF, ISO 42001, operational controls, privacy, security, third-party risk, customer harm and incident response.
13.6 What would you show in a portfolio?
I would show a CCM pack for one financial retail AI use case: control registry, test catalog, event schema, KRI dashboard, exception workflow, RACI, lifecycle gates, sample monthly effectiveness memo and one incident-to-control-improvement example. That demonstrates PM, BA and architecture depth because it connects customer outcomes, controls, telemetry and management action.
14. Anti-Patterns and Self-Check
| Anti-pattern | Why it fails | Better practice |
|---|---|---|
| Monitoring only latency and cost | Misses customer harm and control failure | Add control KRIs and evidence completeness |
| Using control library as proof | Design does not prove operation | Run recurring tests and preserve results |
| Dashboard without owners | Red indicators do not create action | Bind each KRI to owner, threshold and action |
| All sampling is random | Rare high-risk failures disappear | Use risk-stratified sampling |
| Closing incident without control update | Same failure returns | Feed root cause into eval, control and training |
| Treating HITL as magic | Humans can be overloaded or overtrust AI | Monitor queue, edits, overrides and blind acceptance |
| Ignoring telemetry schema | Evidence cannot support audit or replay | Define event fields before release |
| Forcing agents into old SR 11-7 framing | Tool use, autonomy and workflow risk are under-modeled | Use SR 26-2 plus broader AI control assurance |
Before claiming maturity, confirm:
- Every material AI risk has a control objective.
- Every material control has an owner and test method.
- Each test has population, cadence, pass rule and severity.
- Runtime events carry the fields needed for testing.
- Evidence is redacted, retained and retrievable.
- KRIs have thresholds and named management actions.
- Exceptions have due dates, owners, root cause and retest evidence.
- Sampling is risk-stratified for high-impact use cases.
- Incidents feed back into controls, evals and training.
- Management review uses trends, not only point-in-time status.
- SR 26-2 is cited as the current MRM anchor, with GenAI and agentic AI governed through broader AI assurance.
15. One-Page Summary
AI Continuous Control Monitoring is where AI governance becomes operational. The control library says what should be controlled; the evidence binder organizes proof; the release gate decides whether a version can go live. CCM asks every day whether the control ran, passed, failed, triggered an exception, received management action, was retested, and can be proven. For a CBAP-level financial retail PM/BA/architect, the differentiator is translating AI behavior into operational controls, tests, KRIs, exceptions, evidence and management decisions that hold up under product pressure, incident pressure and audit pressure.