AI Continuous Control Monitoring:持续控制监控
重要 nuance:
AI Continuous Control Monitoring / Assurance Architecture 解读
面向对象: AI Product Architect / AI PM / CBAP+ BA / Controls Owner / EvalOps / Model Risk / Internal Audit。 核心问题: AI 控制不能只在上线前证明一次。上线后, controls、evals、telemetry、incidents、exceptions、owners 和 audit evidence 必须被转成持续运行的 control tests。 学习目标: 建立 AI continuous control monitoring、control test taxonomy、KRI dashboard、exception workflow、sampling strategy 和 management action 机制。
Source Anchors
| Source | Link | 用途 |
|---|---|---|
| NIST AI RMF | https://www.nist.gov/itl/ai-risk-management-framework | 参考持续 AI risk governance, 把风险识别、衡量、处置和复核变成运行机制 |
| NIST AIRC AI RMF Functions | https://airc.nist.gov/airmf-resources/airmf/ | 用 Govern / Map / Measure / Manage 组织 owner、test、metric、exception 和 action |
| ISO/IEC 42001 | https://www.iso.org/standard/42001 | 用 AI management system 连接运营控制、绩效评价、管理评审和持续改进 |
| Federal Reserve SR 26-2 | https://www.federalreserve.gov/supervisionreg/srletters/SR2602.htm | 截至 2026-06-30 的模型风险治理锚点: SR 26-2 于 2026-04-17 superseded SR 11-7 and SR 21-8 |
重要 nuance:
SR 26-2 replaced the old SR 11-7 / SR 21-8 framing on 2026-04-17. For GenAI and agentic AI, 不要机械套用旧 SR 11-7 话术; 应把传统模型风险原则纳入更宽的 AI governance、control assurance、operational risk、privacy、security、third-party risk 和 business control framework。
1. Thesis
一句话:
Continuous control monitoring asks whether AI controls are still operating effectively over time, not whether a control library once existed or an audit binder once had evidence.
三类资产的区别:
| Asset | 回答的问题 | 典型弱点 |
|---|---|---|
| AI Control Library | 应该有哪些控制目标和控制活动? | 控制设计存在, 但不知道是否持续运行 |
| Audit Evidence Binder | 哪些证据支持上线、审计或监管问询? | 证据可能是静态快照, 与实时风险脱节 |
| Continuous Control Monitoring | 今天哪些控制通过、失败、弱化、例外、超期、需要管理层行动? | 需要事件化、指标化、责任化和复核节奏 |
AI CCM 的核心不是“多做 dashboard”, 而是把 control objective 转成可执行的 recurring test。
2. Why It Matters
金融零售 AI 系统上线后, 风险会在运行中漂移:
- RAG 知识库更新后, 关键政策文档召回率下降。
- Agent tool 权限扩大后, 原本 read-only 的客服助手开始写 CRM。
- AML copilot 的 false negative 在新 typology 上升。
- 信贷解释模型在某个 segment 的 reason-code consistency 退化。
- HITL 队列过载, control 形式上存在但运行失效。
- Incident 关闭了, 但 corrective action 没进入 eval set。
需要回答:
Which controls are expected to operate?
What evidence proves they operated?
What test confirms they were effective?
Which exceptions are open?
Who owns remediation?
How does management know control health is improving or degrading?
3. Core Concepts
| Concept | 定义 | AI 场景例子 |
|---|---|---|
| Control objective | 要降低的风险或要满足的治理目标 | 客户可见回答必须基于批准知识源 |
| Control activity | 实际控制动作 | RAG citation checker blocks unsupported answer |
| Control test | 验证控制是否运行且有效的方法 | 每小时抽样 200 条回答检查 citation support |
| Operating effectiveness | 控制在一段时间内是否按设计运行 | 30 天内 99% trace 带完整版本标签 |
| Design effectiveness | 控制设计本身是否覆盖风险 | 仅记录日志不能防止越权 tool write |
| Exception | 控制未通过或未按期执行 | HITL review backlog 超过 SLA |
| KRI | 预警风险正在上升的指标 | override spike、complaint spike、tool reversal rate |
| Management action | 对例外的正式处置 | 暂停 ramp、收紧 threshold、补充 eval |
| Evidence freshness | 证据是否足够新 | trace coverage dashboard 每日刷新 |
4. Architecture Diagram
Control Library / AI Policy / Risk Appetite
-> control registry
-> control test catalog
-> telemetry and evidence contracts
-> event stream: inference, retrieval, tool, human review, incident, exception
-> automated checks: policy, eval, drift, trace completeness, threshold breach
-> sampling engine: stratified review by use case, segment, channel, risk tier
-> exception manager: severity, owner, due date, residual risk, action
-> KRI dashboards: control health, trend, aging, repeat failure, business impact
-> management review: accept, remediate, suspend, enhance, retire
-> assurance evidence pack: test results, exceptions, actions, sign-offs
关键原则: every control has an owner, test method, cadence, exception path, escalation route and dashboard metric mapped back to a control objective.
5. Control Test Taxonomy
| Test type | 检查什么 | AI 示例 |
|---|---|---|
| Completeness | 应该发生的控制是否发生 | 100% Tier 3 tool writes have human confirmation |
| Accuracy | 控制结果是否正确 | policy classifier blocks prohibited advice |
| Timeliness | 控制是否按时运行 | critical incident review within 48 hours |
| Threshold | 指标是否超出 appetite | hallucination complaint rate below limit |
| Drift | 输入、输出或行为分布是否偏移 | retrieval source mix differs from baseline |
| Sampling | 抽检样本是否符合标准 | 50 AML narratives reviewed weekly |
| Reperformance | 第二道或内审重跑测试 | independent eval runner reproduces gate |
| Exception aging | 例外是否超期 | high-severity exceptions open over 14 days |
| Evidence quality | 证据是否完整可用 | trace contains release_id, control_id, versions |
6. Financial Retail Case
场景:
一家零售银行把客服 RAG assistant 升级为 agentic service assistant。它能回答政策、生成下一步建议, 并在人工确认后创建 CRM follow-up task。
| Control | Continuous test | KRI |
|---|---|---|
| Approved knowledge only | 每日检查引用是否来自 approved corpus manifest | unsupported citation rate |
| Human confirmation before write | 每次 CRM write 必须有关联 human approval token | write without approval count |
| Complaint-sensitive routing | 投诉、费用、账户关闭问题必须升级人工队列 | missed escalation rate |
| PII redaction in telemetry | 抽样检查 traces 不含未脱敏 PII | redaction failure rate |
| Tool reversal monitoring | 跟踪被撤销或修正的 follow-up tasks | reversal rate by queue |
| Control owner action | 高严重度例外必须 14 天内关闭或升级 | exception aging |
PM/BA 的价值在于把“客服体验提升”翻译成 controls、tests、KRIs、owners 和 action rules。
7. PM / BA / Architect Checklist
- Identify customer decisions, employee workflows and system-of-record writes the AI can influence.
- Map each material risk to a control objective and control owner.
- Convert each control objective into one or more control tests.
- Define cadence: real-time, hourly, daily, weekly, monthly or release-triggered.
- Define evidence contract: which event fields prove the control operated.
- Define test population: all events, risk-based sample, stratified sample or exception sample.
- Define pass/fail logic with thresholds and severity levels.
- Define exception workflow: owner, due date, residual risk acceptance and escalation.
- Define management actions: pause, rollback, tighten threshold, increase HITL, update eval, retrain staff.
- Validate that GenAI/agentic controls are governed in the broader AI assurance system, not forced into old MRM language alone.
8. Code-Lite Experiment
Control test as data:
control_id: AI-CS-CRM-003
control_name: Human confirmation required before CRM write
owner: Customer Service Operations
cadence: real_time
population: all crm.write tool calls
pass_rule: approval_token_present = true and approver_role in allowed_roles
severity_if_failed: high
evidence_fields: [trace_id, release_id, tool_contract_version, approval_token_id, approver_role, customer_segment]
SQL-like check:
select
date_trunc('day', action_timestamp) as day,
count(*) as write_count,
sum(case when approval_token_id is null then 1 else 0 end) as failed_control_count
from ai_tool_events
where tool_name = 'crm.createFollowUpTask'
group by 1
having failed_control_count > 0;
Automation:
failed_control_count > 0
-> create high-severity exception
-> notify control owner
-> freeze write-enabled ramp
-> sample affected customer records
-> close only after root cause and corrective action evidence
9. Interview Questions
Q1: Continuous control monitoring 和 audit evidence binder 有什么区别?
Evidence binder 证明某个版本、某个时间点有哪些治理证据。Continuous control monitoring 证明控制在生产中是否持续运行、是否有效、是否出现例外、例外是否被管理层处置。前者偏证据组织, 后者偏运营控制和 assurance cadence。
Q2: 你会如何设计 AI control dashboard?
我会按 control objective 设计 dashboard, 而不是按技术组件堆指标。核心视图包括 control pass rate、critical failure、exception aging、KRI trend、sampling result、human override、incident linkage、evidence completeness 和 management action closure。
Q3: SR 26-2 对 GenAI / agentic AI 怎么用?
SR 26-2 在 2026-04-17 superseded SR 11-7 和 SR 21-8, 其 risk-based governance、independent challenge、ongoing monitoring 和 effective controls 仍然有价值。但 GenAI 和 agentic AI 不应被硬塞进旧 SR 11-7 模板; 应结合 NIST AI RMF、ISO 42001、operational risk、privacy、security、third-party risk 和 business control assurance 建立更宽的治理体系。
10. Pitfalls
| Pitfall | 后果 | 纠偏 |
|---|---|---|
| 把 CCM 做成指标大屏 | 指标多但没人负责 | 每个指标绑定 objective、owner、action |
| 只测模型指标 | 漏掉 prompt、RAG、tool、workflow、human review | 按 AI behavior chain 设计 tests |
| 只看 pass rate | 忽略高影响低频失败 | 单列 critical scenario failure |
| 抽样没有风险分层 | 长尾高风险客户被平均值掩盖 | 按 segment 和 risk tier stratified sampling |
| 例外没有闭环 | 同类问题反复发生 | action plan、aging、root cause、effectiveness retest |
| 控制 owner 不清 | dashboard 红灯但无人处理 | RACI 和 escalation route 进入 registry |
| 用旧 SR 11-7 话术包装所有 AI | GenAI/agentic 风险被误分类 | 使用 broader AI governance and assurance architecture |