AI 底层逻辑 / 经典论文

AI Management Information：董事会报告架构

一句话:

207 行ai-foundations/papers/109-ai-management-information-board-reporting-architecture.md

AI Management Information / Board Reporting Architecture 解读

面向对象: AI Product Lead / Senior BA / Enterprise Architect / Risk Product Lead / Model Risk Partner。核心问题: 董事会和审计委员会不需要更多 AI 项目状态, 需要能支持监督、风险偏好、投资、停止、整改和问责的 Management Information。学习目标: 把 AI telemetry、value metrics、control effectiveness、incidents、customer harm、model/vendor concentration、adoption 和 risk appetite 转成有 lineage、threshold、owner、cadence 和 decision-usefulness 的 MI architecture。

Source Anchors

Source	Link	用途
NIST AI RMF	https://www.nist.gov/itl/ai-risk-management-framework	组织 AI risk lifecycle、impact、control 和治理报告。
NIST AIRC AI RMF functions	https://airc.nist.gov/airmf-resources/airmf/	用 Govern / Map / Measure / Manage 设计 MI taxonomy、metric owner 和 action loop。
ISO/IEC 42001	https://www.iso.org/standard/42001	用 AI management system、绩效评价、管理评审和持续改进连接 MI 到 AIMS。
Federal Reserve SR 26-2	https://www.federalreserve.gov/supervisionreg/srletters/SR2602.htm	用 2026 revised model-risk guidance 的 risk-based、materiality、inventory、monitoring 和 governance 思维校准金融机构 MI。
OCC Model Risk Management Handbook legacy link	https://www.occ.gov/publications-and-resources/publications/comptrollers-handbook/files/model-risk-management/index-model-risk-management.html	作为 legacy context; 当前 OCC 页面已重定向, 当前模型风险锚点应转向 SR 26-2 / OCC 2026 guidance。
一句话:

AI Board Reporting Architecture 是把 AI portfolio 的事实、风险、控制、价值和行动记录做成可追溯的管理信息产品, 不是把项目周报改成董事会格式。

1. Thesis

AI governance pack 回答 "董事会应该监督什么"。AI management information architecture 回答 "董事会依赖的事实如何生成、校验、追溯和触发行动"。关键问题:

What facts feed board oversight?
How is each metric defined?
Where did the number come from?
Which threshold makes it reportable?
Who owns remediation?
What decision can be made from it?

没有 MI architecture, board pack 会退化为:

Weak material	Why it fails
Innovation showcase	有 demo, 但看不见 material risk、customer harm 和 control failure。
Project status report	有进度, 但不支持 risk appetite、scale/stop 或 funding decision。
Manual evidence collage	每季度人工拼 PPT, 指标定义漂移, 审计无法追溯。
成熟 MI 的链路:

metric contract
  -> telemetry / evidence source
  -> lineage and quality check
  -> risk appetite threshold
  -> report view
  -> management action
  -> board decision / challenge

2. Why It Matters

金融零售 AI 的管理信息难点不在 "缺少数据", 而在 "数据不能形成可决策事实"。

断点	表现	后果
Telemetry disconnected from value	有 token、latency、usage, 没有 customer outcome 和 process baseline	AI 成功被 activity 指标替代
Control effectiveness not measured	控制存在于流程图, 但没有 pass rate、sample result、exception aging	董事会无法判断控制是否有效
Incident taxonomy inconsistent	安全、模型、客服、投诉各自记录事件	customer harm 和 root cause 无法汇总
Vendor concentration invisible	每个 use case 通过审批, portfolio 共享同一 model/vendor/evidence stack	单点变化引发组合风险
Adoption overstated	登录或调用被当成采用	不能判断用户是否产生合格价值事件
Manual board reporting	报告靠访谈和截图	内审、审计委员会和监管问询时无法重建事实
董事会有监督责任, 但工程挑战是: 报告数字必须从生产事实、控制测试、业务结果和行动记录中可追溯地生成。

3. Core Concepts

3.1 Management Information

Management Information 是为管理和监督决策准备的信息产品。它不是 raw data, 也不是 KPI 清单。

Quality	Meaning
Decision-useful	指向 approve, hold, scale, stop, fund, remediate, accept risk。
Traceable	每个数字能追溯到 source system、query、definition、owner 和 period。
Comparable	跨业务线、风险等级和报告周期定义一致。
Timely	cadence 与风险变化速度匹配, 高风险 signal 不等季度。
Balanced	value、risk、control、harm、adoption、cost、resilience 同时可见。
Action-linked	每个 amber/red signal 都连接 action owner、due date 和 closure evidence。

3.2 Metric Contract

Field	Example
Metric name	Unsupported claim rate
Decision purpose	是否允许 customer service RAG 扩大到新产品线
Numerator	AI-assisted responses sampled as unsupported by approved source
Denominator	sampled AI-assisted responses for regulated topics
Source systems	model gateway trace, RAG citation log, QA review system
Grain	response_id, use_case_id, period
Owner	Customer operations QA owner + AI product owner
Threshold	Green <= 2%, Amber > 2% and <= 3%, Red > 3%
Escalation	Red triggers product-line freeze and risk committee action

3.3 MI Lineage

business event -> AI trace -> control/eval result -> data quality rule
  -> metric calculation -> dashboard tile -> board statement -> management action

If a board pack says "AI customer harm incidents decreased 20%", the institution should show incident taxonomy version, source systems, de-duplication logic, severity rule, customer impact classification, query version, owner sign-off and action status.

3.4 2026 SR 26-2 Nuance

SR 26-2, issued in 2026 by the Federal Reserve with OCC and FDIC alignment, supersedes SR 11-7 and SR 21-8. For AI MI:

It is risk-based and materiality-driven, not a uniform validation checklist.
It is most relevant to banking organizations over $30B in total assets, while smaller firms may still use it as sound-practice reference.
It narrows model scope around complex quantitative methods producing quantitative estimates.
It explicitly leaves generative AI and agentic AI outside formal scope because they are novel and rapidly evolving.
The carve-out is not a governance free pass; broader AI risk governance still needs telemetry, controls, incidents, evidence and MI. Implication:

Board MI should separate traditional model-risk MI, non-generative AI MI, and GenAI / agentic AI MI, while still giving directors one consolidated view of AI value, risk, controls, incidents and concentration.

4. Architecture Diagram

AI systems and workflows
  -> model gateway / agent gateway telemetry
  -> RAG and knowledge source logs
  -> tool action and workflow event logs
  -> control tests, evals, red-team, QA samples
  -> incidents, complaints, customer harm, appeals
  -> cost, adoption, value, finance baseline
  -> vendor, model, dependency and inventory registry
  -> MI data product layer
       - metric contracts
       - lineage graph
       - quality rules
       - threshold and risk appetite rules
  -> management dashboards
  -> board / audit committee pack
  -> decision, action, attestation and evidence closure

Principle	Design implication
Report from systems of record	Board number should not be first created in a slide deck.
Separate metric logic from presentation	Dashboard, board pack and audit extracts reuse the same metric contracts.
Lineage before polish	A beautiful red/amber/green chart without lineage is weak MI.
Thresholds are controls	Thresholds need owner, rationale, review cadence and exception process.
Action log is part of MI	Reporting a red metric without action ownership is incomplete control operation.

5. Financial Retail Case

Scenario: A retail bank runs six AI systems.

System	AI role	Main board concern
Customer service RAG	draft grounded answer for agent	wrong policy commitment, complaint, stale source
Credit memo assistant	summarize documents for underwriter	fair lending, explanation, unsupported recommendation
AML copilot	draft case summary and evidence narrative	missed suspicious activity, weak SAR evidence
Fraud triage assistant	prioritize cases for analyst	customer friction, false positives, fraud loss
Branch knowledge assistant	answer staff policy questions	inconsistent advice, outdated policy
AI platform gateway	shared control plane	shadow AI, auditability, vendor concentration
Board question	MI metric	Source / lineage
---	---	---
Are customers being harmed?	AI-attributable complaint rate, appeal overturn rate, remediation count	complaint system + case tags + AI exposure trace
Are controls working?	citation completeness, unsupported claim rate, HITL bypass count, control test pass rate	RAG logs + QA samples + workflow approval events
Are we getting value?	qualified value events, AHT reduction, backlog age, cost per resolved case	workflow events + finance baseline + AI cost ledger
Are we concentrated?	risk-weighted exposure by model/vendor/knowledge source	AI inventory + dependency graph + gateway routing
Are we adopting safely?	eligible workflow adoption, override reason mix, review burden	user telemetry + workflow eligibility + review queue
Board slice:

Decision requested: approve limited scale of customer service RAG to two additional product lines.
Evidence: qualified value events 74%; unsupported claim rate 1.6%; source freshness SLA 99.4%; AI-attributable complaints flat to baseline; vendor concentration amber.
Management action: scale low-risk intents only, run cross-use-case regression before credit-card dispute policies, reduce model concentration before direct customer response.

6. PM / BA / Architect Checklist

Role	Checklist
PM	Define decision supported by MI; tie every metric to scale/hold/stop/fund/remediate; reject vanity usage as board evidence.
BA	Write metric contracts; define event grain, inclusion/exclusion, source systems, threshold logic, exception flow and action fields.
Architect	Design telemetry, lineage, data product, access control, retention, dashboard integration and evidence export.
Risk partner	Define risk appetite, severity, escalation, residual risk owner and review cadence.
Model risk partner	Separate SR 26-2 in-scope traditional models from GenAI/agentic systems, while aligning inventories and reporting.
Internal audit partner	Validate report lineage, evidence integrity, source-of-record control and action closure.
Minimum artifact pack: AI MI metric catalog; metric contracts; source-to-report lineage diagram; risk appetite threshold matrix; board dashboard sample; management action log; report validation checklist; quarterly attestation statement.

7. Code-Lite Experiment

Goal: build a tiny MI lineage prototype for a customer service RAG board metric.

Input tables:
  ai_trace(response_id, use_case_id, model_id, timestamp, source_doc_ids, ai_exposed)
  qa_review(response_id, supported_by_source, regulated_topic, reviewer_id, review_date)
  complaints(case_id, response_id, severity, ai_attributable, remediation_required)
  metric_contract(metric_id, numerator_rule, denominator_rule, threshold_green, threshold_red, owner)
Metric:
  unsupported_claim_rate =
    count(response_id where regulated_topic = true and supported_by_source = false)
    / count(response_id where regulated_topic = true and qa_review exists)
Lineage output:
  metric_id, reporting_period, source_tables, query_version,
  denominator_count, numerator_count, threshold_status, action_required

Experiment steps: create 50 synthetic response records; create 20 QA review records; add 3 AI-attributable complaint examples; calculate unsupported_claim_rate and harm count; generate a one-page board tile; change the denominator rule and show the lineage version change. Learning standard:

You can explain exactly why the board number changed: business reality, metric definition, data quality, source-system delay or threshold update.

8. Interview Questions

Q1: How is AI board MI different from AI governance material?

30 秒版本:

Governance material defines oversight, roles and decision rights. MI architecture defines the facts feeding that oversight: metric contracts, telemetry sources, lineage, thresholds, cadence and action logs. Without MI architecture, board governance becomes narrative instead of evidence-based supervision. 2 分钟版本: I separate the board governance pack from the MI architecture. The governance pack says which committees oversee material AI and what decisions they make. MI architecture says how each answer is produced: source systems, metric definition, data quality rule, lineage, risk appetite threshold, owner, reporting cadence and management action. For customer service RAG, the board should not only see unsupported claim rate; it should know the denominator, QA sampling method, source logs, threshold, trend, action owner and whether the metric supports scale or hold.

Q2: What makes a board AI metric decision-useful?

30 秒版本:

A board metric is decision-useful when it connects to a management action: scale, hold, stop, fund, remediate or accept residual risk. It must have a clear definition, owner, threshold, lineage and escalation path.

Q3: How would you handle SR 26-2 in AI board reporting?

30 秒版本:

I would not claim SR 26-2 directly governs every GenAI or agentic system. I would report three layers: traditional model-risk systems in scope, non-generative AI where model-risk principles apply, and GenAI/agentic systems governed through broader enterprise AI controls. The board still needs a consolidated AI risk and value view.

9. Pitfalls

Pitfall	Why it is dangerous	Better practice
Board pack as slide assembly	Numbers cannot be traced or audited	Build MI data products and metric contracts
Usage as value	High usage can mean rework or poor UX	Use qualified value events and outcome evidence
Controls without effectiveness metrics	"Control exists" is not evidence	Define pass rate, sample result, exception aging
Thresholds without risk appetite	Red/amber/green becomes arbitrary	Link thresholds to approved appetite and stop rules
Incident counts without harm taxonomy	Customer impact is hidden	Classify severity, harm, remediation and AI attribution
Mixing GenAI with SR 26-2 models carelessly	Scope confusion and weak assurance	Separate scope while aligning inventories and reporting
No action log	Reporting does not drive control improvement	Every amber/red signal has owner, due date and closure evidence
Final memory card:

AI Board MI = telemetry + metric contract + lineage + threshold + action + decision.
The board does not need more AI activity reporting.
It needs evidence that value, risk, control effectiveness, customer harm, concentration and adoption are within appetite or being acted on.