返回 Papers
AI 底层逻辑 / 经典论文

AI Management Information:董事会报告架构

一句话:

207ai-foundations/papers/109-ai-management-information-board-reporting-architecture.md

AI Management Information / Board Reporting Architecture 解读

面向对象: AI Product Lead / Senior BA / Enterprise Architect / Risk Product Lead / Model Risk Partner。 核心问题: 董事会和审计委员会不需要更多 AI 项目状态, 需要能支持监督、风险偏好、投资、停止、整改和问责的 Management Information。 学习目标: 把 AI telemetry、value metrics、control effectiveness、incidents、customer harm、model/vendor concentration、adoption 和 risk appetite 转成有 lineage、threshold、owner、cadence 和 decision-usefulness 的 MI architecture。


Source Anchors

SourceLink用途
NIST AI RMFhttps://www.nist.gov/itl/ai-risk-management-framework组织 AI risk lifecycle、impact、control 和治理报告。
NIST AIRC AI RMF functionshttps://airc.nist.gov/airmf-resources/airmf/用 Govern / Map / Measure / Manage 设计 MI taxonomy、metric owner 和 action loop。
ISO/IEC 42001https://www.iso.org/standard/42001用 AI management system、绩效评价、管理评审和持续改进连接 MI 到 AIMS。
Federal Reserve SR 26-2https://www.federalreserve.gov/supervisionreg/srletters/SR2602.htm用 2026 revised model-risk guidance 的 risk-based、materiality、inventory、monitoring 和 governance 思维校准金融机构 MI。
OCC Model Risk Management Handbook legacy linkhttps://www.occ.gov/publications-and-resources/publications/comptrollers-handbook/files/model-risk-management/index-model-risk-management.html作为 legacy context; 当前 OCC 页面已重定向, 当前模型风险锚点应转向 SR 26-2 / OCC 2026 guidance。
一句话:

AI Board Reporting Architecture 是把 AI portfolio 的事实、风险、控制、价值和行动记录做成可追溯的管理信息产品, 不是把项目周报改成董事会格式。


1. Thesis

AI governance pack 回答 "董事会应该监督什么"。AI management information architecture 回答 "董事会依赖的事实如何生成、校验、追溯和触发行动"。 关键问题:

What facts feed board oversight?
How is each metric defined?
Where did the number come from?
Which threshold makes it reportable?
Who owns remediation?
What decision can be made from it?

没有 MI architecture, board pack 会退化为:

Weak materialWhy it fails
Innovation showcase有 demo, 但看不见 material risk、customer harm 和 control failure。
Project status report有进度, 但不支持 risk appetite、scale/stop 或 funding decision。
Manual evidence collage每季度人工拼 PPT, 指标定义漂移, 审计无法追溯。
成熟 MI 的链路:
metric contract
  -> telemetry / evidence source
  -> lineage and quality check
  -> risk appetite threshold
  -> report view
  -> management action
  -> board decision / challenge

2. Why It Matters

金融零售 AI 的管理信息难点不在 "缺少数据", 而在 "数据不能形成可决策事实"。

断点表现后果
Telemetry disconnected from value有 token、latency、usage, 没有 customer outcome 和 process baselineAI 成功被 activity 指标替代
Control effectiveness not measured控制存在于流程图, 但没有 pass rate、sample result、exception aging董事会无法判断控制是否有效
Incident taxonomy inconsistent安全、模型、客服、投诉各自记录事件customer harm 和 root cause 无法汇总
Vendor concentration invisible每个 use case 通过审批, portfolio 共享同一 model/vendor/evidence stack单点变化引发组合风险
Adoption overstated登录或调用被当成采用不能判断用户是否产生合格价值事件
Manual board reporting报告靠访谈和截图内审、审计委员会和监管问询时无法重建事实
董事会有监督责任, 但工程挑战是: 报告数字必须从生产事实、控制测试、业务结果和行动记录中可追溯地生成。

3. Core Concepts

3.1 Management Information

Management Information 是为管理和监督决策准备的信息产品。它不是 raw data, 也不是 KPI 清单。

QualityMeaning
Decision-useful指向 approve, hold, scale, stop, fund, remediate, accept risk。
Traceable每个数字能追溯到 source system、query、definition、owner 和 period。
Comparable跨业务线、风险等级和报告周期定义一致。
Timelycadence 与风险变化速度匹配, 高风险 signal 不等季度。
Balancedvalue、risk、control、harm、adoption、cost、resilience 同时可见。
Action-linked每个 amber/red signal 都连接 action owner、due date 和 closure evidence。

3.2 Metric Contract

FieldExample
Metric nameUnsupported claim rate
Decision purpose是否允许 customer service RAG 扩大到新产品线
NumeratorAI-assisted responses sampled as unsupported by approved source
Denominatorsampled AI-assisted responses for regulated topics
Source systemsmodel gateway trace, RAG citation log, QA review system
Grainresponse_id, use_case_id, period
OwnerCustomer operations QA owner + AI product owner
ThresholdGreen <= 2%, Amber > 2% and <= 3%, Red > 3%
EscalationRed triggers product-line freeze and risk committee action

3.3 MI Lineage

business event -> AI trace -> control/eval result -> data quality rule
  -> metric calculation -> dashboard tile -> board statement -> management action

If a board pack says "AI customer harm incidents decreased 20%", the institution should show incident taxonomy version, source systems, de-duplication logic, severity rule, customer impact classification, query version, owner sign-off and action status.

3.4 2026 SR 26-2 Nuance

SR 26-2, issued in 2026 by the Federal Reserve with OCC and FDIC alignment, supersedes SR 11-7 and SR 21-8. For AI MI:

  • It is risk-based and materiality-driven, not a uniform validation checklist.
  • It is most relevant to banking organizations over $30B in total assets, while smaller firms may still use it as sound-practice reference.
  • It narrows model scope around complex quantitative methods producing quantitative estimates.
  • It explicitly leaves generative AI and agentic AI outside formal scope because they are novel and rapidly evolving.
  • The carve-out is not a governance free pass; broader AI risk governance still needs telemetry, controls, incidents, evidence and MI. Implication:

Board MI should separate traditional model-risk MI, non-generative AI MI, and GenAI / agentic AI MI, while still giving directors one consolidated view of AI value, risk, controls, incidents and concentration.


4. Architecture Diagram

AI systems and workflows
  -> model gateway / agent gateway telemetry
  -> RAG and knowledge source logs
  -> tool action and workflow event logs
  -> control tests, evals, red-team, QA samples
  -> incidents, complaints, customer harm, appeals
  -> cost, adoption, value, finance baseline
  -> vendor, model, dependency and inventory registry
  -> MI data product layer
       - metric contracts
       - lineage graph
       - quality rules
       - threshold and risk appetite rules
  -> management dashboards
  -> board / audit committee pack
  -> decision, action, attestation and evidence closure
PrincipleDesign implication
Report from systems of recordBoard number should not be first created in a slide deck.
Separate metric logic from presentationDashboard, board pack and audit extracts reuse the same metric contracts.
Lineage before polishA beautiful red/amber/green chart without lineage is weak MI.
Thresholds are controlsThresholds need owner, rationale, review cadence and exception process.
Action log is part of MIReporting a red metric without action ownership is incomplete control operation.

5. Financial Retail Case

Scenario: A retail bank runs six AI systems.

SystemAI roleMain board concern
Customer service RAGdraft grounded answer for agentwrong policy commitment, complaint, stale source
Credit memo assistantsummarize documents for underwriterfair lending, explanation, unsupported recommendation
AML copilotdraft case summary and evidence narrativemissed suspicious activity, weak SAR evidence
Fraud triage assistantprioritize cases for analystcustomer friction, false positives, fraud loss
Branch knowledge assistantanswer staff policy questionsinconsistent advice, outdated policy
AI platform gatewayshared control planeshadow AI, auditability, vendor concentration
Board questionMI metricSource / lineage
---------
Are customers being harmed?AI-attributable complaint rate, appeal overturn rate, remediation countcomplaint system + case tags + AI exposure trace
Are controls working?citation completeness, unsupported claim rate, HITL bypass count, control test pass rateRAG logs + QA samples + workflow approval events
Are we getting value?qualified value events, AHT reduction, backlog age, cost per resolved caseworkflow events + finance baseline + AI cost ledger
Are we concentrated?risk-weighted exposure by model/vendor/knowledge sourceAI inventory + dependency graph + gateway routing
Are we adopting safely?eligible workflow adoption, override reason mix, review burdenuser telemetry + workflow eligibility + review queue
Board slice:
Decision requested: approve limited scale of customer service RAG to two additional product lines.
Evidence: qualified value events 74%; unsupported claim rate 1.6%; source freshness SLA 99.4%; AI-attributable complaints flat to baseline; vendor concentration amber.
Management action: scale low-risk intents only, run cross-use-case regression before credit-card dispute policies, reduce model concentration before direct customer response.

6. PM / BA / Architect Checklist

RoleChecklist
PMDefine decision supported by MI; tie every metric to scale/hold/stop/fund/remediate; reject vanity usage as board evidence.
BAWrite metric contracts; define event grain, inclusion/exclusion, source systems, threshold logic, exception flow and action fields.
ArchitectDesign telemetry, lineage, data product, access control, retention, dashboard integration and evidence export.
Risk partnerDefine risk appetite, severity, escalation, residual risk owner and review cadence.
Model risk partnerSeparate SR 26-2 in-scope traditional models from GenAI/agentic systems, while aligning inventories and reporting.
Internal audit partnerValidate report lineage, evidence integrity, source-of-record control and action closure.
Minimum artifact pack: AI MI metric catalog; metric contracts; source-to-report lineage diagram; risk appetite threshold matrix; board dashboard sample; management action log; report validation checklist; quarterly attestation statement.

7. Code-Lite Experiment

Goal: build a tiny MI lineage prototype for a customer service RAG board metric.

Input tables:
  ai_trace(response_id, use_case_id, model_id, timestamp, source_doc_ids, ai_exposed)
  qa_review(response_id, supported_by_source, regulated_topic, reviewer_id, review_date)
  complaints(case_id, response_id, severity, ai_attributable, remediation_required)
  metric_contract(metric_id, numerator_rule, denominator_rule, threshold_green, threshold_red, owner)
Metric:
  unsupported_claim_rate =
    count(response_id where regulated_topic = true and supported_by_source = false)
    / count(response_id where regulated_topic = true and qa_review exists)
Lineage output:
  metric_id, reporting_period, source_tables, query_version,
  denominator_count, numerator_count, threshold_status, action_required

Experiment steps: create 50 synthetic response records; create 20 QA review records; add 3 AI-attributable complaint examples; calculate unsupported_claim_rate and harm count; generate a one-page board tile; change the denominator rule and show the lineage version change. Learning standard:

You can explain exactly why the board number changed: business reality, metric definition, data quality, source-system delay or threshold update.


8. Interview Questions

Q1: How is AI board MI different from AI governance material?

30 秒版本:

Governance material defines oversight, roles and decision rights. MI architecture defines the facts feeding that oversight: metric contracts, telemetry sources, lineage, thresholds, cadence and action logs. Without MI architecture, board governance becomes narrative instead of evidence-based supervision. 2 分钟版本: I separate the board governance pack from the MI architecture. The governance pack says which committees oversee material AI and what decisions they make. MI architecture says how each answer is produced: source systems, metric definition, data quality rule, lineage, risk appetite threshold, owner, reporting cadence and management action. For customer service RAG, the board should not only see unsupported claim rate; it should know the denominator, QA sampling method, source logs, threshold, trend, action owner and whether the metric supports scale or hold.

Q2: What makes a board AI metric decision-useful?

30 秒版本:

A board metric is decision-useful when it connects to a management action: scale, hold, stop, fund, remediate or accept residual risk. It must have a clear definition, owner, threshold, lineage and escalation path.

Q3: How would you handle SR 26-2 in AI board reporting?

30 秒版本:

I would not claim SR 26-2 directly governs every GenAI or agentic system. I would report three layers: traditional model-risk systems in scope, non-generative AI where model-risk principles apply, and GenAI/agentic systems governed through broader enterprise AI controls. The board still needs a consolidated AI risk and value view.


9. Pitfalls

PitfallWhy it is dangerousBetter practice
Board pack as slide assemblyNumbers cannot be traced or auditedBuild MI data products and metric contracts
Usage as valueHigh usage can mean rework or poor UXUse qualified value events and outcome evidence
Controls without effectiveness metrics"Control exists" is not evidenceDefine pass rate, sample result, exception aging
Thresholds without risk appetiteRed/amber/green becomes arbitraryLink thresholds to approved appetite and stop rules
Incident counts without harm taxonomyCustomer impact is hiddenClassify severity, harm, remediation and AI attribution
Mixing GenAI with SR 26-2 models carelesslyScope confusion and weak assuranceSeparate scope while aligning inventories and reporting
No action logReporting does not drive control improvementEvery amber/red signal has owner, due date and closure evidence
Final memory card:
AI Board MI = telemetry + metric contract + lineage + threshold + action + decision.
The board does not need more AI activity reporting.
It needs evidence that value, risk, control effectiveness, customer harm, concentration and adoption are within appetite or being acted on.