返回 Papers
AI 底层逻辑 / 经典论文

AI Continuous Control Monitoring:持续控制监控

重要 nuance:

220ai-foundations/papers/107-ai-continuous-control-monitoring-assurance-architecture.md

AI Continuous Control Monitoring / Assurance Architecture 解读

面向对象: AI Product Architect / AI PM / CBAP+ BA / Controls Owner / EvalOps / Model Risk / Internal Audit。 核心问题: AI 控制不能只在上线前证明一次。上线后, controls、evals、telemetry、incidents、exceptions、owners 和 audit evidence 必须被转成持续运行的 control tests。 学习目标: 建立 AI continuous control monitoring、control test taxonomy、KRI dashboard、exception workflow、sampling strategy 和 management action 机制。


Source Anchors

SourceLink用途
NIST AI RMFhttps://www.nist.gov/itl/ai-risk-management-framework参考持续 AI risk governance, 把风险识别、衡量、处置和复核变成运行机制
NIST AIRC AI RMF Functionshttps://airc.nist.gov/airmf-resources/airmf/用 Govern / Map / Measure / Manage 组织 owner、test、metric、exception 和 action
ISO/IEC 42001https://www.iso.org/standard/42001用 AI management system 连接运营控制、绩效评价、管理评审和持续改进
Federal Reserve SR 26-2https://www.federalreserve.gov/supervisionreg/srletters/SR2602.htm截至 2026-06-30 的模型风险治理锚点: SR 26-2 于 2026-04-17 superseded SR 11-7 and SR 21-8

重要 nuance:

SR 26-2 replaced the old SR 11-7 / SR 21-8 framing on 2026-04-17. For GenAI and agentic AI, 不要机械套用旧 SR 11-7 话术; 应把传统模型风险原则纳入更宽的 AI governance、control assurance、operational risk、privacy、security、third-party risk 和 business control framework。


1. Thesis

一句话:

Continuous control monitoring asks whether AI controls are still operating effectively over time, not whether a control library once existed or an audit binder once had evidence.

三类资产的区别:

Asset回答的问题典型弱点
AI Control Library应该有哪些控制目标和控制活动?控制设计存在, 但不知道是否持续运行
Audit Evidence Binder哪些证据支持上线、审计或监管问询?证据可能是静态快照, 与实时风险脱节
Continuous Control Monitoring今天哪些控制通过、失败、弱化、例外、超期、需要管理层行动?需要事件化、指标化、责任化和复核节奏

AI CCM 的核心不是“多做 dashboard”, 而是把 control objective 转成可执行的 recurring test。


2. Why It Matters

金融零售 AI 系统上线后, 风险会在运行中漂移:

  • RAG 知识库更新后, 关键政策文档召回率下降。
  • Agent tool 权限扩大后, 原本 read-only 的客服助手开始写 CRM。
  • AML copilot 的 false negative 在新 typology 上升。
  • 信贷解释模型在某个 segment 的 reason-code consistency 退化。
  • HITL 队列过载, control 形式上存在但运行失效。
  • Incident 关闭了, 但 corrective action 没进入 eval set。

需要回答:

Which controls are expected to operate?
What evidence proves they operated?
What test confirms they were effective?
Which exceptions are open?
Who owns remediation?
How does management know control health is improving or degrading?

3. Core Concepts

Concept定义AI 场景例子
Control objective要降低的风险或要满足的治理目标客户可见回答必须基于批准知识源
Control activity实际控制动作RAG citation checker blocks unsupported answer
Control test验证控制是否运行且有效的方法每小时抽样 200 条回答检查 citation support
Operating effectiveness控制在一段时间内是否按设计运行30 天内 99% trace 带完整版本标签
Design effectiveness控制设计本身是否覆盖风险仅记录日志不能防止越权 tool write
Exception控制未通过或未按期执行HITL review backlog 超过 SLA
KRI预警风险正在上升的指标override spike、complaint spike、tool reversal rate
Management action对例外的正式处置暂停 ramp、收紧 threshold、补充 eval
Evidence freshness证据是否足够新trace coverage dashboard 每日刷新

4. Architecture Diagram

Control Library / AI Policy / Risk Appetite
  -> control registry
  -> control test catalog
  -> telemetry and evidence contracts
  -> event stream: inference, retrieval, tool, human review, incident, exception
  -> automated checks: policy, eval, drift, trace completeness, threshold breach
  -> sampling engine: stratified review by use case, segment, channel, risk tier
  -> exception manager: severity, owner, due date, residual risk, action
  -> KRI dashboards: control health, trend, aging, repeat failure, business impact
  -> management review: accept, remediate, suspend, enhance, retire
  -> assurance evidence pack: test results, exceptions, actions, sign-offs

关键原则: every control has an owner, test method, cadence, exception path, escalation route and dashboard metric mapped back to a control objective.


5. Control Test Taxonomy

Test type检查什么AI 示例
Completeness应该发生的控制是否发生100% Tier 3 tool writes have human confirmation
Accuracy控制结果是否正确policy classifier blocks prohibited advice
Timeliness控制是否按时运行critical incident review within 48 hours
Threshold指标是否超出 appetitehallucination complaint rate below limit
Drift输入、输出或行为分布是否偏移retrieval source mix differs from baseline
Sampling抽检样本是否符合标准50 AML narratives reviewed weekly
Reperformance第二道或内审重跑测试independent eval runner reproduces gate
Exception aging例外是否超期high-severity exceptions open over 14 days
Evidence quality证据是否完整可用trace contains release_id, control_id, versions

6. Financial Retail Case

场景:

一家零售银行把客服 RAG assistant 升级为 agentic service assistant。它能回答政策、生成下一步建议, 并在人工确认后创建 CRM follow-up task。

ControlContinuous testKRI
Approved knowledge only每日检查引用是否来自 approved corpus manifestunsupported citation rate
Human confirmation before write每次 CRM write 必须有关联 human approval tokenwrite without approval count
Complaint-sensitive routing投诉、费用、账户关闭问题必须升级人工队列missed escalation rate
PII redaction in telemetry抽样检查 traces 不含未脱敏 PIIredaction failure rate
Tool reversal monitoring跟踪被撤销或修正的 follow-up tasksreversal rate by queue
Control owner action高严重度例外必须 14 天内关闭或升级exception aging

PM/BA 的价值在于把“客服体验提升”翻译成 controls、tests、KRIs、owners 和 action rules。


7. PM / BA / Architect Checklist

  • Identify customer decisions, employee workflows and system-of-record writes the AI can influence.
  • Map each material risk to a control objective and control owner.
  • Convert each control objective into one or more control tests.
  • Define cadence: real-time, hourly, daily, weekly, monthly or release-triggered.
  • Define evidence contract: which event fields prove the control operated.
  • Define test population: all events, risk-based sample, stratified sample or exception sample.
  • Define pass/fail logic with thresholds and severity levels.
  • Define exception workflow: owner, due date, residual risk acceptance and escalation.
  • Define management actions: pause, rollback, tighten threshold, increase HITL, update eval, retrain staff.
  • Validate that GenAI/agentic controls are governed in the broader AI assurance system, not forced into old MRM language alone.

8. Code-Lite Experiment

Control test as data:

control_id: AI-CS-CRM-003
control_name: Human confirmation required before CRM write
owner: Customer Service Operations
cadence: real_time
population: all crm.write tool calls
pass_rule: approval_token_present = true and approver_role in allowed_roles
severity_if_failed: high
evidence_fields: [trace_id, release_id, tool_contract_version, approval_token_id, approver_role, customer_segment]

SQL-like check:

select
  date_trunc('day', action_timestamp) as day,
  count(*) as write_count,
  sum(case when approval_token_id is null then 1 else 0 end) as failed_control_count
from ai_tool_events
where tool_name = 'crm.createFollowUpTask'
group by 1
having failed_control_count > 0;

Automation:

failed_control_count > 0
  -> create high-severity exception
  -> notify control owner
  -> freeze write-enabled ramp
  -> sample affected customer records
  -> close only after root cause and corrective action evidence

9. Interview Questions

Q1: Continuous control monitoring 和 audit evidence binder 有什么区别?

Evidence binder 证明某个版本、某个时间点有哪些治理证据。Continuous control monitoring 证明控制在生产中是否持续运行、是否有效、是否出现例外、例外是否被管理层处置。前者偏证据组织, 后者偏运营控制和 assurance cadence。

Q2: 你会如何设计 AI control dashboard?

我会按 control objective 设计 dashboard, 而不是按技术组件堆指标。核心视图包括 control pass rate、critical failure、exception aging、KRI trend、sampling result、human override、incident linkage、evidence completeness 和 management action closure。

Q3: SR 26-2 对 GenAI / agentic AI 怎么用?

SR 26-2 在 2026-04-17 superseded SR 11-7 和 SR 21-8, 其 risk-based governance、independent challenge、ongoing monitoring 和 effective controls 仍然有价值。但 GenAI 和 agentic AI 不应被硬塞进旧 SR 11-7 模板; 应结合 NIST AI RMF、ISO 42001、operational risk、privacy、security、third-party risk 和 business control assurance 建立更宽的治理体系。


10. Pitfalls

Pitfall后果纠偏
把 CCM 做成指标大屏指标多但没人负责每个指标绑定 objective、owner、action
只测模型指标漏掉 prompt、RAG、tool、workflow、human review按 AI behavior chain 设计 tests
只看 pass rate忽略高影响低频失败单列 critical scenario failure
抽样没有风险分层长尾高风险客户被平均值掩盖按 segment 和 risk tier stratified sampling
例外没有闭环同类问题反复发生action plan、aging、root cause、effectiveness retest
控制 owner 不清dashboard 红灯但无人处理RACI 和 escalation route 进入 registry
用旧 SR 11-7 话术包装所有 AIGenAI/agentic 风险被误分类使用 broader AI governance and assurance architecture