AI Management Information:董事会报告架构
一句话:
AI Management Information / Board Reporting Architecture 解读
面向对象: AI Product Lead / Senior BA / Enterprise Architect / Risk Product Lead / Model Risk Partner。 核心问题: 董事会和审计委员会不需要更多 AI 项目状态, 需要能支持监督、风险偏好、投资、停止、整改和问责的 Management Information。 学习目标: 把 AI telemetry、value metrics、control effectiveness、incidents、customer harm、model/vendor concentration、adoption 和 risk appetite 转成有 lineage、threshold、owner、cadence 和 decision-usefulness 的 MI architecture。
Source Anchors
| Source | Link | 用途 |
|---|---|---|
| NIST AI RMF | https://www.nist.gov/itl/ai-risk-management-framework | 组织 AI risk lifecycle、impact、control 和治理报告。 |
| NIST AIRC AI RMF functions | https://airc.nist.gov/airmf-resources/airmf/ | 用 Govern / Map / Measure / Manage 设计 MI taxonomy、metric owner 和 action loop。 |
| ISO/IEC 42001 | https://www.iso.org/standard/42001 | 用 AI management system、绩效评价、管理评审和持续改进连接 MI 到 AIMS。 |
| Federal Reserve SR 26-2 | https://www.federalreserve.gov/supervisionreg/srletters/SR2602.htm | 用 2026 revised model-risk guidance 的 risk-based、materiality、inventory、monitoring 和 governance 思维校准金融机构 MI。 |
| OCC Model Risk Management Handbook legacy link | https://www.occ.gov/publications-and-resources/publications/comptrollers-handbook/files/model-risk-management/index-model-risk-management.html | 作为 legacy context; 当前 OCC 页面已重定向, 当前模型风险锚点应转向 SR 26-2 / OCC 2026 guidance。 |
| 一句话: |
AI Board Reporting Architecture 是把 AI portfolio 的事实、风险、控制、价值和行动记录做成可追溯的管理信息产品, 不是把项目周报改成董事会格式。
1. Thesis
AI governance pack 回答 "董事会应该监督什么"。AI management information architecture 回答 "董事会依赖的事实如何生成、校验、追溯和触发行动"。 关键问题:
What facts feed board oversight?
How is each metric defined?
Where did the number come from?
Which threshold makes it reportable?
Who owns remediation?
What decision can be made from it?
没有 MI architecture, board pack 会退化为:
| Weak material | Why it fails |
|---|---|
| Innovation showcase | 有 demo, 但看不见 material risk、customer harm 和 control failure。 |
| Project status report | 有进度, 但不支持 risk appetite、scale/stop 或 funding decision。 |
| Manual evidence collage | 每季度人工拼 PPT, 指标定义漂移, 审计无法追溯。 |
| 成熟 MI 的链路: |
metric contract
-> telemetry / evidence source
-> lineage and quality check
-> risk appetite threshold
-> report view
-> management action
-> board decision / challenge
2. Why It Matters
金融零售 AI 的管理信息难点不在 "缺少数据", 而在 "数据不能形成可决策事实"。
| 断点 | 表现 | 后果 |
|---|---|---|
| Telemetry disconnected from value | 有 token、latency、usage, 没有 customer outcome 和 process baseline | AI 成功被 activity 指标替代 |
| Control effectiveness not measured | 控制存在于流程图, 但没有 pass rate、sample result、exception aging | 董事会无法判断控制是否有效 |
| Incident taxonomy inconsistent | 安全、模型、客服、投诉各自记录事件 | customer harm 和 root cause 无法汇总 |
| Vendor concentration invisible | 每个 use case 通过审批, portfolio 共享同一 model/vendor/evidence stack | 单点变化引发组合风险 |
| Adoption overstated | 登录或调用被当成采用 | 不能判断用户是否产生合格价值事件 |
| Manual board reporting | 报告靠访谈和截图 | 内审、审计委员会和监管问询时无法重建事实 |
| 董事会有监督责任, 但工程挑战是: 报告数字必须从生产事实、控制测试、业务结果和行动记录中可追溯地生成。 |
3. Core Concepts
3.1 Management Information
Management Information 是为管理和监督决策准备的信息产品。它不是 raw data, 也不是 KPI 清单。
| Quality | Meaning |
|---|---|
| Decision-useful | 指向 approve, hold, scale, stop, fund, remediate, accept risk。 |
| Traceable | 每个数字能追溯到 source system、query、definition、owner 和 period。 |
| Comparable | 跨业务线、风险等级和报告周期定义一致。 |
| Timely | cadence 与风险变化速度匹配, 高风险 signal 不等季度。 |
| Balanced | value、risk、control、harm、adoption、cost、resilience 同时可见。 |
| Action-linked | 每个 amber/red signal 都连接 action owner、due date 和 closure evidence。 |
3.2 Metric Contract
| Field | Example |
|---|---|
| Metric name | Unsupported claim rate |
| Decision purpose | 是否允许 customer service RAG 扩大到新产品线 |
| Numerator | AI-assisted responses sampled as unsupported by approved source |
| Denominator | sampled AI-assisted responses for regulated topics |
| Source systems | model gateway trace, RAG citation log, QA review system |
| Grain | response_id, use_case_id, period |
| Owner | Customer operations QA owner + AI product owner |
| Threshold | Green <= 2%, Amber > 2% and <= 3%, Red > 3% |
| Escalation | Red triggers product-line freeze and risk committee action |
3.3 MI Lineage
business event -> AI trace -> control/eval result -> data quality rule
-> metric calculation -> dashboard tile -> board statement -> management action
If a board pack says "AI customer harm incidents decreased 20%", the institution should show incident taxonomy version, source systems, de-duplication logic, severity rule, customer impact classification, query version, owner sign-off and action status.
3.4 2026 SR 26-2 Nuance
SR 26-2, issued in 2026 by the Federal Reserve with OCC and FDIC alignment, supersedes SR 11-7 and SR 21-8. For AI MI:
- It is risk-based and materiality-driven, not a uniform validation checklist.
- It is most relevant to banking organizations over $30B in total assets, while smaller firms may still use it as sound-practice reference.
- It narrows model scope around complex quantitative methods producing quantitative estimates.
- It explicitly leaves generative AI and agentic AI outside formal scope because they are novel and rapidly evolving.
- The carve-out is not a governance free pass; broader AI risk governance still needs telemetry, controls, incidents, evidence and MI. Implication:
Board MI should separate traditional model-risk MI, non-generative AI MI, and GenAI / agentic AI MI, while still giving directors one consolidated view of AI value, risk, controls, incidents and concentration.
4. Architecture Diagram
AI systems and workflows
-> model gateway / agent gateway telemetry
-> RAG and knowledge source logs
-> tool action and workflow event logs
-> control tests, evals, red-team, QA samples
-> incidents, complaints, customer harm, appeals
-> cost, adoption, value, finance baseline
-> vendor, model, dependency and inventory registry
-> MI data product layer
- metric contracts
- lineage graph
- quality rules
- threshold and risk appetite rules
-> management dashboards
-> board / audit committee pack
-> decision, action, attestation and evidence closure
| Principle | Design implication |
|---|---|
| Report from systems of record | Board number should not be first created in a slide deck. |
| Separate metric logic from presentation | Dashboard, board pack and audit extracts reuse the same metric contracts. |
| Lineage before polish | A beautiful red/amber/green chart without lineage is weak MI. |
| Thresholds are controls | Thresholds need owner, rationale, review cadence and exception process. |
| Action log is part of MI | Reporting a red metric without action ownership is incomplete control operation. |
5. Financial Retail Case
Scenario: A retail bank runs six AI systems.
| System | AI role | Main board concern |
|---|---|---|
| Customer service RAG | draft grounded answer for agent | wrong policy commitment, complaint, stale source |
| Credit memo assistant | summarize documents for underwriter | fair lending, explanation, unsupported recommendation |
| AML copilot | draft case summary and evidence narrative | missed suspicious activity, weak SAR evidence |
| Fraud triage assistant | prioritize cases for analyst | customer friction, false positives, fraud loss |
| Branch knowledge assistant | answer staff policy questions | inconsistent advice, outdated policy |
| AI platform gateway | shared control plane | shadow AI, auditability, vendor concentration |
| Board question | MI metric | Source / lineage |
| --- | --- | --- |
| Are customers being harmed? | AI-attributable complaint rate, appeal overturn rate, remediation count | complaint system + case tags + AI exposure trace |
| Are controls working? | citation completeness, unsupported claim rate, HITL bypass count, control test pass rate | RAG logs + QA samples + workflow approval events |
| Are we getting value? | qualified value events, AHT reduction, backlog age, cost per resolved case | workflow events + finance baseline + AI cost ledger |
| Are we concentrated? | risk-weighted exposure by model/vendor/knowledge source | AI inventory + dependency graph + gateway routing |
| Are we adopting safely? | eligible workflow adoption, override reason mix, review burden | user telemetry + workflow eligibility + review queue |
| Board slice: |
Decision requested: approve limited scale of customer service RAG to two additional product lines.
Evidence: qualified value events 74%; unsupported claim rate 1.6%; source freshness SLA 99.4%; AI-attributable complaints flat to baseline; vendor concentration amber.
Management action: scale low-risk intents only, run cross-use-case regression before credit-card dispute policies, reduce model concentration before direct customer response.
6. PM / BA / Architect Checklist
| Role | Checklist |
|---|---|
| PM | Define decision supported by MI; tie every metric to scale/hold/stop/fund/remediate; reject vanity usage as board evidence. |
| BA | Write metric contracts; define event grain, inclusion/exclusion, source systems, threshold logic, exception flow and action fields. |
| Architect | Design telemetry, lineage, data product, access control, retention, dashboard integration and evidence export. |
| Risk partner | Define risk appetite, severity, escalation, residual risk owner and review cadence. |
| Model risk partner | Separate SR 26-2 in-scope traditional models from GenAI/agentic systems, while aligning inventories and reporting. |
| Internal audit partner | Validate report lineage, evidence integrity, source-of-record control and action closure. |
| Minimum artifact pack: AI MI metric catalog; metric contracts; source-to-report lineage diagram; risk appetite threshold matrix; board dashboard sample; management action log; report validation checklist; quarterly attestation statement. |
7. Code-Lite Experiment
Goal: build a tiny MI lineage prototype for a customer service RAG board metric.
Input tables:
ai_trace(response_id, use_case_id, model_id, timestamp, source_doc_ids, ai_exposed)
qa_review(response_id, supported_by_source, regulated_topic, reviewer_id, review_date)
complaints(case_id, response_id, severity, ai_attributable, remediation_required)
metric_contract(metric_id, numerator_rule, denominator_rule, threshold_green, threshold_red, owner)
Metric:
unsupported_claim_rate =
count(response_id where regulated_topic = true and supported_by_source = false)
/ count(response_id where regulated_topic = true and qa_review exists)
Lineage output:
metric_id, reporting_period, source_tables, query_version,
denominator_count, numerator_count, threshold_status, action_required
Experiment steps: create 50 synthetic response records; create 20 QA review records; add 3 AI-attributable complaint examples; calculate unsupported_claim_rate and harm count; generate a one-page board tile; change the denominator rule and show the lineage version change. Learning standard:
You can explain exactly why the board number changed: business reality, metric definition, data quality, source-system delay or threshold update.
8. Interview Questions
Q1: How is AI board MI different from AI governance material?
30 秒版本:
Governance material defines oversight, roles and decision rights. MI architecture defines the facts feeding that oversight: metric contracts, telemetry sources, lineage, thresholds, cadence and action logs. Without MI architecture, board governance becomes narrative instead of evidence-based supervision. 2 分钟版本: I separate the board governance pack from the MI architecture. The governance pack says which committees oversee material AI and what decisions they make. MI architecture says how each answer is produced: source systems, metric definition, data quality rule, lineage, risk appetite threshold, owner, reporting cadence and management action. For customer service RAG, the board should not only see unsupported claim rate; it should know the denominator, QA sampling method, source logs, threshold, trend, action owner and whether the metric supports scale or hold.
Q2: What makes a board AI metric decision-useful?
30 秒版本:
A board metric is decision-useful when it connects to a management action: scale, hold, stop, fund, remediate or accept residual risk. It must have a clear definition, owner, threshold, lineage and escalation path.
Q3: How would you handle SR 26-2 in AI board reporting?
30 秒版本:
I would not claim SR 26-2 directly governs every GenAI or agentic system. I would report three layers: traditional model-risk systems in scope, non-generative AI where model-risk principles apply, and GenAI/agentic systems governed through broader enterprise AI controls. The board still needs a consolidated AI risk and value view.
9. Pitfalls
| Pitfall | Why it is dangerous | Better practice |
|---|---|---|
| Board pack as slide assembly | Numbers cannot be traced or audited | Build MI data products and metric contracts |
| Usage as value | High usage can mean rework or poor UX | Use qualified value events and outcome evidence |
| Controls without effectiveness metrics | "Control exists" is not evidence | Define pass rate, sample result, exception aging |
| Thresholds without risk appetite | Red/amber/green becomes arbitrary | Link thresholds to approved appetite and stop rules |
| Incident counts without harm taxonomy | Customer impact is hidden | Classify severity, harm, remediation and AI attribution |
| Mixing GenAI with SR 26-2 models carelessly | Scope confusion and weak assurance | Separate scope while aligning inventories and reporting |
| No action log | Reporting does not drive control improvement | Every amber/red signal has owner, due date and closure evidence |
| Final memory card: |
AI Board MI = telemetry + metric contract + lineage + threshold + action + decision.
The board does not need more AI activity reporting.
It needs evidence that value, risk, control effectiveness, customer harm, concentration and adoption are within appetite or being acted on.