AI Management Information / Board Reporting Playbook
弱 AI board update 通常汇报 use case 数、模型调用量、满意度、预计节省小时和 "没有重大事故"。董事会和审计委员会真正需要的是:
AI Management Information / Board Reporting Architecture Playbook
适用对象: CBAP-level Financial Retail PM / Senior BA / AI Product Architect / Enterprise Architect / Risk Product Lead / Model Risk Partner / Board Reporting Lead。 目标: 把 AI telemetry、业务价值、风险偏好、控制有效性、事件、客户伤害、采用、成本、供应商集中度和管理行动转成可追溯、可验证、可决策的 Management Information。 核心观点: Board pack 是展示层; MI architecture 是 metric contracts、lineage、thresholds、data quality、cadence、RACI、validation 和 action log 的操作系统。 边界说明: 本文是学习、作品集和架构设计材料, 不构成法律、监管、审计、模型验证、财务确认或董事会治理意见。正式项目必须由 business owner、risk、model risk、legal、compliance、privacy、security、finance、data、architecture、operations 和 internal audit 共同确认适用要求。
1. Executive Framing
弱 AI board update 通常汇报 use case 数、模型调用量、满意度、预计节省小时和 "没有重大事故"。董事会和审计委员会真正需要的是:
Which AI systems are material?
What value is proven, not estimated?
Which risks are outside appetite?
Which controls are working or failing?
What customer harm occurred?
Where are model/vendor/data dependencies concentrated?
What management actions are overdue?
What decision is required this quarter?
AI MI architecture 的目标:
AI fact capture
-> metric contract
-> lineage and quality control
-> risk appetite threshold
-> management view
-> board / audit committee decision
-> action log and evidence closure
Governance defines who should challenge and decide. MI architecture defines what facts they can rely on, where those facts came from, and what action follows.
| Principle | Practical meaning |
|---|---|
| Decision-first | 每个指标必须支持 approve、scale、hold、stop、fund、remediate、accept risk、escalate 或 retire。 |
| Lineage-first | 每个 board number must trace back to system-of-record facts, not slide calculations. |
| Risk-appetite-aware | red/amber/green thresholds must map to appetite, tolerance, stop rule or escalation trigger. |
| Balanced scorecard | value, adoption, risk, control, incident, harm, cost, resilience and concentration are reported together. |
| Action-linked | amber/red without owner, due date and closure evidence is incomplete MI. |
| Cadence-fit | severe incidents and appetite breaches do not wait for quarterly board packs. |
| Audit-ready | audit can reconstruct selected numbers, actions and attestations without relying on oral explanation. |
| Executive narrative example: |
This quarter, management reports 22 registered AI systems, of which 7 are material.
Portfolio residual risk is Medium with one Red appetite breach in vendor concentration.
No severe AI incident occurred; two medium customer-service harm events were contained and remediated.
Qualified AI value events increased 18%, but only three systems have finance-recognized benefits.
Decision requested: approve targeted platform funding for evidence automation and require remediation before expanding customer-facing GenAI scope.
2. Source Anchors
访问日期按 2026-06-30 记录。
| Anchor | Official link | 本 playbook 中的用法 |
|---|---|---|
| NIST AI RMF | https://www.nist.gov/itl/ai-risk-management-framework | 用可信 AI 和 AI risk lifecycle 语言定义 AI MI 的风险、影响、治理、测量和管理闭环。 |
| NIST AIRC AI RMF functions | https://airc.nist.gov/airmf-resources/airmf/ | 用 Govern / Map / Measure / Manage 设计 board MI taxonomy、metric owners 和 action loop。 |
| ISO/IEC 42001 | https://www.iso.org/standard/42001 | 用 AI management system、performance evaluation、management review、continual improvement 连接 MI 到企业 AIMS。 |
| Federal Reserve SR 26-2 | https://www.federalreserve.gov/supervisionreg/srletters/SR2602.htm | 用 2026 revised model-risk guidance 的 risk-based、materiality、inventory、monitoring、governance 和 documentation 语言校准金融机构报告。 |
| OCC Model Risk Management Handbook legacy link | https://www.occ.gov/publications-and-resources/publications/comptrollers-handbook/files/model-risk-management/index-model-risk-management.html | 作为 legacy context: 该路径现在重定向, 不应当作为当前唯一依据; 当前模型风险锚点应转向 SR 26-2 / OCC 2026 guidance。 |
| SR 26-2 nuance: | ||
| Topic | 2026 implication for MI | |
| --- | --- | |
| Supersession | SR 26-2 supersedes SR 11-7 and SR 21-8, so board reports should not cite those as current primary anchors without context. | |
| Risk-based approach | MI should show materiality, exposure, use, risk tier and proportional oversight, not uniform annual-review theater. | |
| Institution relevance | It is most relevant to banking organizations over $30B in total assets, while smaller firms may still adopt it for complex model-risk exposure. | |
| Model definition | Traditional model-risk MI should distinguish complex quantitative models from deterministic rules and software without statistical/economic/financial theory. | |
| GenAI / agentic AI | The guidance explicitly excludes generative and agentic AI from formal scope because they are novel and rapidly evolving. | |
| Not a free pass | Excluded systems still need enterprise governance, controls, telemetry, incidents, evidence and management reporting. | |
| Do not force every GenAI board metric into SR 26-2 terminology. Do create a parallel AI MI discipline that mirrors rigor: inventory, materiality, controls, monitoring, incidents, concentration, action tracking and attestation. | ||
| Source lens | MI artifact | Board-useful output |
| --- | --- | --- |
| NIST Govern | owners, policy, accountability, oversight cadence | who owns AI risk and what forum acts |
| NIST Map | use case inventory, impact map, dependency map | what systems and affected stakeholders are in scope |
| NIST Measure | evals, controls, telemetry, incidents, metrics | how management knows risk and value facts |
| NIST Manage | actions, remediation, exceptions, stop rules | what management is doing about amber/red signals |
| ISO 42001 | management review, performance evaluation, continual improvement | recurring AIMS report and improvement loop |
| SR 26-2 | model inventory, materiality, monitoring, validation, governance | model-risk-aligned evidence for quantitative models and reference discipline for broader AI |
3. MI Architecture
AI products / workflows
-> event and telemetry capture
- model / agent gateway
- RAG retrieval and citation logs
- tool action logs
- workflow and human approval events
- eval, QA, red-team and control test results
- incidents, complaints, appeals and remediation
- cost, adoption and value events
- vendor, model and dependency registry
-> MI data product layer
- canonical AI event model
- metric contracts
- data quality checks
- lineage graph
- threshold and appetite rules
- entitlement, retention and privacy controls
-> MI marts / dashboards
- executive AI portfolio
- board and audit committee pack
- risk appetite dashboard
- control effectiveness dashboard
- value realization dashboard
- incident and customer harm dashboard
- concentration and resilience dashboard
- management action log
Canonical AI MI event model:
| Entity | Minimum fields |
|---|---|
| AI system | system_id, name, owner, risk_tier, materiality, stage, approved_boundary, business_capability |
| AI exposure event | event_id, system_id, workflow_id, user_or_case_id, timestamp, AI_role, channel, eligible_flag |
| Model call | call_id, event_id, provider, model_id, model_version, route, latency, cost, fallback_flag |
| Retrieval event | retrieval_id, event_id, source_doc_id, source_version, effective_date, citation_status, ACL_decision |
| Tool action | action_id, event_id, tool_name, permission_scope, side_effect, approval_required, approval_status |
| Human review | review_id, event_id, reviewer_role, decision, override_reason, review_time, escalation_flag |
| Eval / QA result | result_id, system_id, metric_name, sample_id, pass_fail, severity, reviewer, evidence_link |
| Incident / harm | incident_id, system_id, severity, customer_impact, AI_attribution, containment_time, remediation_status |
| Value event | value_event_id, event_id, quality_pass, risk_pass, adoption_signal, business_outcome, finance_status |
| Dependency | dependency_id, type, provider, criticality, substitutability, concentration_group, owner |
| Management action | action_id, metric_id, issue, owner, due_date, status, evidence_link, closure_approver |
| Data product responsibilities: | |
| Layer | Responsibility |
| --- | --- |
| Capture | instrument gateways, workflow systems, QA tools, incident systems and finance baselines |
| Contract | define canonical schemas, event grain, metric contracts and threshold metadata |
| Quality | validate completeness, freshness, uniqueness, referential integrity and threshold logic |
| Lineage | connect metric to source events, query version, transformation and report tile |
| Semantic | expose approved metric names, definitions, dimensions and aggregations |
| Reporting | generate management dashboards, board packs, audit extracts and action reports |
| Assurance | sample test, reconcile, attest and archive MI evidence |
| Metric contract fields: | |
| Field | Required content |
| --- | --- |
| Metric ID and name | stable identifier and name used across dashboards and board packs |
| Decision purpose | scale, hold, stop, fund, remediate, accept risk, escalate |
| Definition | plain language definition |
| Numerator / denominator | exact inclusion rules and population |
| Grain and dimensions | response, case, customer, system, period; business line, risk tier, model, vendor |
| Source systems | authoritative systems and fallback sources |
| Refresh cadence | real time, daily, weekly, monthly, quarterly |
| Quality rules | completeness, freshness, duplication, reconciliation, sample validation |
| Thresholds | green, amber, red, breach and stop rule |
| Owner and evidence | business owner, risk owner, data owner, technology owner; query, sample, sign-off |
| Versioning | definition version, effective date, change approver |
4. Metric Taxonomy
| Category | Core question | Example metrics |
|---|---|---|
| Portfolio scope | What AI systems are in scope? | registered AI systems, material systems, production systems, shadow AI detections |
| Value realization | Are we getting proven value? | qualified value events, finance-recognized benefit, cost per outcome, backlog reduction |
| Adoption | Is AI used in the right workflow? | eligible workflow adoption, repeat adoption, accepted output rate, override reason mix |
| Quality | Is AI behavior good enough? | groundedness, unsupported claim rate, citation correctness, task success, eval regression pass |
| Customer harm | Are customers harmed or remediated? | AI-attributable complaints, appeal overturns, remediation count, regulated-topic error |
| Control effectiveness | Are controls operating? | control pass rate, HITL bypass, source freshness, logging completeness, exception aging |
| Incidents | What went wrong? | severity count, containment time, repeat root cause, incident-to-action closure |
| Risk appetite | Are we within approved appetite? | red/amber/green by risk category, appetite breaches, accepted residual risk |
| Concentration | Are dependencies over-concentrated? | risk-weighted exposure by model/vendor/RAG/evidence stack, fallback coverage |
| Cost and resilience | Can we operate and recover? | cost per value event, p95 latency, SLO breach, kill switch time, recovery drill pass |
| Governance health | Is management acting? | overdue actions, expired exceptions, attestation coverage, audit finding aging |
| Value metrics should replace vanity metrics: | ||
| Weak metric | Strong MI metric | |
| --- | --- | |
| model calls | qualified AI-assisted workflow completions | |
| generated summaries | accepted summaries with no critical QA defect | |
| users activated | eligible users reaching repeat workflow adoption | |
| estimated hours saved | finance-recognized capacity release or SLA improvement | |
| features shipped | use cases passing release, adoption and benefits gate |
AI exposure
-> accepted AI output
-> quality and risk pass
-> workflow outcome improvement
-> cost within boundary
-> finance / business owner recognition
Risk, control, harm and concentration metrics:
| Area | Metric | Decision use |
|---|---|---|
| Hallucination | unsupported claim rate | scale/hold for RAG and assistant systems |
| Wrong decision support | unsupported recommendation rate | restrict scope or add review |
| Fairness / conduct | segment error or appeal overturn disparity | risk committee challenge |
| Privacy | restricted data exposure count | immediate containment |
| Security | prompt injection successful exploit rate | release block or control remediation |
| Policy drift | superseded document citation rate | knowledge governance remediation |
| Control operation | HITL bypass, source freshness, trace completeness | auditability and scale readiness |
| Customer harm | AI-attributable complaints, remediation count, appeal overturn | board harm and remediation view |
| Concentration | top model/vendor exposure, shared evidence dependency, fallback coverage | portfolio risk appetite and diversification |
| Adoption metrics must distinguish active usage from workflow adoption: | ||
| Metric | Better definition | |
| --- | --- | |
| active users | eligible users who used AI in the target workflow at least N times | |
| repeat adoption | users who use AI in target workflow across multiple periods | |
| accepted output rate | outputs accepted or lightly edited after quality review | |
| override reason mix | why humans reject or change AI output | |
| review burden | added review time and queue load caused by AI | |
| abandonment | eligible cases where AI was available but intentionally bypassed |
5. Data Lineage
Business event
-> AI exposure event
-> model / RAG / tool trace
-> human or system decision
-> QA / eval / incident / outcome record
-> metric calculation
-> dashboard tile
-> board statement
-> management action
-> closure evidence
| Question | Expected answer |
|---|---|
| What is the event grain? | response, case, customer, transaction, model call, value event |
| What source systems feed it? | named systems, tables, APIs, owners |
| What is the calculation version? | query version, semantic-layer metric version |
| What population is excluded? | exclusions by risk tier, channel, geography, test traffic, missing QA |
| How is data quality checked? | completeness, freshness, uniqueness, reconciliation, sample review |
| Who signed off? | business, risk, data and technology owners |
| What changed since last period? | metric definition, threshold, source, product scope, model version |
| Can audit replay a sample? | yes, with trace, evidence and action links |
| Example lineage: |
Board tile: Unsupported claim rate for Customer Service RAG = 1.6%, Green.
metric_id = MI-AI-QUAL-003
definition_version = v1.2
source systems = model_gateway_trace, rag_retrieval_log, qa_review_tool
denominator = regulated-topic AI-assisted responses sampled by QA
numerator = sampled responses judged unsupported by approved source
quality checks = 99.2% trace completeness, 100% QA reviewer assigned, 0 duplicate response_id
threshold = Green <= 2%, Amber >2% <=3%, Red >3%
owner = Head of Customer Operations QA
| Data quality rule | Example threshold |
|---|---|
| Trace completeness | >= 99% material AI events contain system_id, model_id, timestamp, user/case id, risk tier |
| Source freshness | >= 99% regulated policy documents indexed within SLA |
| Unique event ID | 100% uniqueness for event_id in reporting period |
| Metric denominator reconciliation | dashboard denominator reconciles to workflow system within 1% |
| Incident severity completeness | 100% incidents have severity, attribution, containment and owner |
| Action log completeness | 100% amber/red metrics have action owner or documented risk acceptance |
| Sample evidence availability | >= 95% sampled board metrics have replayable evidence |
| Lineage failure | Symptom |
| --- | --- |
| slide-only calculation | number exists only in spreadsheet |
| denominator drift | scope changes but trend line continues |
| late incident capture | quarter-end report misses events |
| inconsistent system IDs | same AI use case has multiple names |
| no action lineage | red metric closed informally |
6. Risk Appetite Dashboard
| Appetite category | Board question | Status example |
|---|---|---|
| Customer harm | Are AI systems causing unacceptable customer harm? | Amber: two medium contained events |
| Regulated decision boundary | Is AI making or implying unapproved final decisions? | Green: 0 boundary bypasses |
| Control effectiveness | Are key controls operating within threshold? | Amber: source freshness breach |
| Incident severity | Any severe or repeated incidents? | Green: 0 severe, 2 medium |
| Vendor/model concentration | Is dependency concentration acceptable? | Red: one model supports 72% material GenAI exposure |
| Auditability | Can management reconstruct AI-assisted decisions? | Amber: 94% sample reconstructability |
| Value realization | Are benefits proven enough to scale? | Amber: 3 of 7 material systems have recognized benefit |
| Cost/resilience | Is cost and performance inside boundary? | Green: cost per case under cap, p95 latency stable |
| Metric | Green | Amber |
| --- | ---: | ---: |
| Severe AI incidents | 0 | not applicable |
| Unsupported claim rate | <= 2% | > 2% and <= 3% |
| HITL bypass count | 0 | 1 contained low-risk bypass |
| Audit trace completeness | >= 99% | 95% to 98.9% |
| Vendor concentration | within approved exposure | exceeds soft limit with fallback |
| Finance-recognized benefit | >= 80% target | 50% to 79% target |
| Risk appetite statement: |
The institution has no appetite for unapproved fully automated final decisions in credit, AML, fraud or wealth workflows; severe customer harm without immediate containment; restricted data exposure to unapproved AI tools; expansion of material AI where audit trace completeness is below 95%; or expired high-risk exceptions without risk committee review.
The institution has limited appetite for contained medium incidents, controlled pilots under stop rules, and temporary concentration risk where fallback and exit roadmap are funded.
Dashboard view:
| Category | Metric | Q1 | Q2 | Status | Action |
|---|---|---|---|---|---|
| Customer harm | AI-attributable medium incidents | 1 | 2 | Amber | Close source freshness remediation |
| Decision boundary | final decision bypass | 0 | 0 | Green | Continue monitoring |
| Control | source freshness SLA | 99.1% | 97.8% | Amber | Daily regulated-policy refresh |
| Auditability | trace completeness | 98.2% | 99.1% | Green | Maintain threshold |
| Concentration | top model material exposure | 61% | 72% | Red | Approve diversification roadmap |
| Value | systems with finance-recognized benefit | 2/6 | 3/7 | Amber | Benefits contract for remaining 4 |
7. Board Pack Templates
Quarterly AI MI cover page example:
# Quarterly AI Management Information Report
Reporting period: Q2 2026
Prepared for: Board Risk Committee / Audit Committee
Decision requested: approve funding for evidence automation and model diversification.
Executive conclusion: Portfolio residual AI risk is Medium, with one Red concentration breach. Material AI systems: 7. Severe incidents: 0. Medium AI-attributable customer harm events: 2, contained and under remediation. Finance-recognized benefits: 3 systems. Critical open actions: 1 overdue high action with approved extension.
Decisions required: approve platform funding; require remediation before expanding customer-facing GenAI to regulated fee and dispute categories; endorse updated concentration limit.
Portfolio summary:
| System | Stage | Risk tier | AI role | Value | Risk | Control | Decision |
|---|---|---|---|---|---|---|---|
| Customer Service RAG | Production | High | draft answer | Green | Amber | Amber | Scale low-risk intents only |
| Credit Memo Assistant | Pilot | High | draft memo | Amber | High | Amber | Hold before scale |
| AML Copilot | Limited release | High | draft case summary | Green | Medium | Green | Expand to second low-risk queue |
| Fraud Triage | Pilot | High | prioritize | Amber | High | Amber | Continue pilot with appeal monitoring |
| Branch Knowledge | Production | Medium | answer staff | Green | Medium | Green | Continue |
| AI Gateway | Production | Enabling control | route/log/control | Green | Medium | Green | Fund evidence export |
| Board decision memo example: |
Decision requested: approve limited expansion of Customer Service RAG to two additional low-risk product lines.
Evidence: qualified value events 74%; AHT reduction 18%; unsupported claim rate 1.6%; AI-attributable complaints flat to baseline; source freshness SLA 97.8% amber; top model exposure 72% red.
Conditions: no expansion to regulated fee, dispute or credit explanation categories until source freshness and cross-use-case regression pass; model diversification roadmap funded; weekly MI for 8 weeks.
Stop triggers: unsupported claim rate above 3%; any severe AI-attributable customer harm; HITL bypass in regulated topics; trace completeness below 95%.
Audit committee evidence requests:
| Evidence request | Required MI artifact |
|---|---|
| Show material AI inventory | AI system register with materiality, owner, risk tier and stage |
| Reconstruct sampled board metric | metric contract, lineage, source data, query version, dashboard tile |
| Prove control operation | control test result, sample evidence, owner sign-off, exception log |
| Prove incidents were reported completely | incident taxonomy, case list, severity assignment, late adjustment rule |
| Prove customer harm remediation | complaint link, remediation amount/action, customer communication, closure approval |
| Prove management action closure | action log, evidence link, closure approver, residual risk update |
| Prove vendor concentration | dependency graph, risk-weighted exposure, fallback evidence |
| Board question bank: | |
| Board question | Strong MI answer pattern |
| --- | --- |
| Are we within AI risk appetite? | "Six of eight appetite categories are green, two amber, one red. The red is model concentration, with a funded remediation decision requested today." |
| How do we know the numbers are reliable? | "Each board metric has a contract, owner, source lineage, quality checks and audit sample pack. Definition changes are versioned." |
| What harm did AI cause customers? | "Two medium AI-attributable harm events occurred, both due to stale policy citations, affecting 31 customers, with no financial loss and remediation completed." |
| Are controls working? | "Key controls are tested through QA, logs and workflow samples. Citation correctness and HITL approval are green; source freshness is amber." |
| Are we getting value? | "Three systems have finance-recognized benefits. Others remain in pilot or lack sufficient adoption evidence, so no scale decision is requested for them." |
| Where are we over-concentrated? | "One model supports 72% of material GenAI exposure and the evidence path for three systems. This is above hard appetite without additional fallback." |
8. Management Action Log
Management reporting is incomplete if it only shows status.
signal -> issue classification -> management action -> owner and due date
-> evidence of completion -> validation -> residual risk update -> closure or escalation
| Field | Description |
|---|---|
| action_id | stable ID linked to metric, incident, audit finding or risk acceptance |
| source_signal | red/amber metric, incident, audit finding, exception expiry, board challenge |
| issue and severity | clear issue statement with critical, high, medium or low severity |
| owner and due_date | accountable executive or delegate and risk-aligned date |
| status | open, blocked, pending validation, closed, escalated |
| evidence_required | concrete closure evidence |
| validation_owner | risk, audit, data, technology or business validator |
| residual_risk | updated after action |
| escalation_path | forum and date if overdue |
| Sample action log: | |
| ID | Source |
| --- | --- |
| ACT-001 | Source freshness amber |
| ACT-002 | Concentration red |
| ACT-003 | Audit finding |
| ACT-004 | Customer harm incident |
| Closure standard: evidence exists in system of record; metric/control retest passes; owner attests completion; validator confirms adequacy; residual risk is updated; next-cycle MI reflects closure. |
9. Report Validation
| Objective | Validation question |
|---|---|
| completeness | Are all material systems and reportable incidents included? |
| accuracy | Do calculations match metric contracts? |
| lineage | Can each board number trace to source data and query version? |
| consistency | Are definitions stable across periods and business lines? |
| threshold correctness | Are red/amber/green statuses computed from approved thresholds? |
| action linkage | Does every breach have action, owner or risk acceptance? |
| evidence integrity | Can audit replay sampled metrics and actions? |
| Pre-issue validation: | |
| Check | Pass condition |
| --- | --- |
| Material AI register reconciliation | all production, pilot and retired-in-period material systems included |
| Metric contract coverage | 100% board KPIs/KRIs have approved contracts |
| Source freshness | source systems refreshed within reporting cut-off |
| Data quality result | no critical quality rule failure unresolved |
| Trend comparability | definition changes annotated or trend restated |
| Threshold approval | all thresholds approved and effective for period |
| Incident reconciliation | incident, complaint and harm sources reconciled |
| Action log reconciliation | all amber/red metrics map to action or risk acceptance |
| Evidence sample | sample pack generated for high-risk metrics |
| Management sign-off | business, risk, tech, data owners sign relevant sections |
| Sample-based validation: | |
| Sample | Validation |
| --- | --- |
| 5 material AI systems | inventory fields, owner, stage, risk tier, approved boundary |
| 10 AI exposure events | trace completeness, model ID, source docs, workflow link |
| 5 QA failures | severity, reviewer, source evidence, metric inclusion |
| 5 incidents | severity, AI attribution, customer harm, action log link |
| 5 value events | quality pass, risk pass, business outcome, cost inclusion |
| 3 board metrics | metric contract, query version, denominator, threshold |
| 3 closed actions | closure evidence, validation, residual risk update |
| Restatement rule: |
Restate a board metric when source-system error changes a board-level status, late incident capture changes severity or appetite status, metric definition was applied incorrectly, excluded population materially changes the decision conclusion, or data quality failure invalidates the period result. Record restatement reason, approver, impacted reports and management conclusion.
10. Cadence
| Cadence | Audience | Content |
|---|---|---|
| Real time / same day | incident commander, CRO/CIO, business owner | severe incident, boundary breach, privacy/security event, kill switch |
| Weekly | AI operations, platform, product, risk ops | telemetry, incidents, quality, source freshness, SLO, actions |
| Monthly | AI governance committee, management steering | portfolio dashboard, risk appetite, value, control, concentration, actions |
| Quarterly | board, audit committee, risk committee | decision-focused AI MI pack, attestation, residual risk, investment |
| Semiannual | internal audit, model risk, enterprise architecture | evidence quality, dependency stress tests, report validation review |
| Annual | board strategy and risk appetite review | AI strategy, appetite calibration, operating model maturity, investment roadmap |
| Trigger-based reporting: | ||
| Trigger | Report | |
| --- | --- | |
| severe customer harm | immediate executive and board committee notification | |
| unapproved final decision by AI | immediate risk/legal escalation and incident report | |
| restricted-data exposure | security/privacy incident report | |
| material vendor/model change | model/vendor impact MI and regression status | |
| concentration threshold breach | risk committee exception or remediation decision | |
| audit evidence integrity failure | audit committee notification if material | |
| repeated medium incidents | management and board trend update | |
| Quarterly cycle: | ||
| Day | Activity | |
| ---: | --- | |
| -15 | freeze metric contracts and reporting scope | |
| -12 | reconcile material AI inventory and dependency register | |
| -10 | refresh MI data marts and run quality checks | |
| -8 | incident, complaint and customer harm reconciliation | |
| -7 | generate risk appetite dashboard and action log | |
| -6 | run sample validation and lineage tests | |
| -5 | owner review: business, risk, tech, data, finance | |
| -4 | management challenge session | |
| -3 | draft board pack and audit evidence appendix | |
| -2 | executive sign-off and attestation | |
| -1 | distribute pre-read | |
| 0 | committee discussion and decisions | |
| +3 | record board challenges, decisions and new actions | |
| +10 | action owners confirm remediation plans |
11. RACI
| Activity | Accountable | Responsible | Consulted | Informed |
|---|---|---|---|---|
| AI MI strategy | CRO / CIO jointly | AI Governance Lead | Board reporting, Legal, Audit | Board committee |
| Metric taxonomy | AI Governance Lead | PM / BA / Risk Analytics | Data, Finance, Model Risk | Business owners |
| Metric contracts | Product / Risk owner | Senior BA + Data Product Owner | Architecture, Audit, Compliance | AI committee |
| Telemetry architecture | CTO / Data Platform Owner | Product Architect + Platform Engineering | Security, Privacy, Risk | Product teams |
| Risk appetite thresholds | CRO | Risk, Compliance, Model Risk | Business, Legal, Audit | Board risk committee |
| Value metrics | Business Executive | PM + Finance Partner | Operations, Data | Management steering |
| Control effectiveness metrics | Risk owner | Control owner + QA | Internal Audit, Compliance | AI governance |
| Incident and harm taxonomy | Operational Risk Executive | Incident Management + Customer Ops | Legal, Privacy, Compliance | Board committee as needed |
| Concentration reporting | Enterprise Architect | AI Platform + Third-Party Risk | Procurement, Model Risk | Risk committee |
| Report validation | Data Governance Lead | MI Reporting Team | Internal Audit, Risk, Product | Audit committee |
| Action log | AI Governance Lead | Action owners | Risk and Audit validators | Management and board |
| Attestation | Business / Risk / Tech / Data executives | Control and metric owners | Internal Audit observer | Board / Audit Committee |
| Line | MI responsibility | |||
| --- | --- | |||
| First line | operate AI systems, define workflow outcomes, capture telemetry, own value and control operation | |||
| Second line | define risk appetite, challenge metrics, approve thresholds, review exceptions and residual risk | |||
| Third line | test MI integrity, lineage, control evidence, report validation and management action closure | |||
| Role | What excellent looks like | |||
| --- | --- | |||
| PM | can explain why each board metric changes investment or scale decision | |||
| BA | can define the metric contract, event grain, business rule, threshold and exception flow | |||
| Architect | can show telemetry path, lineage, data product, control integration and evidence export | |||
| Data owner | can certify source quality, retention, privacy and semantic consistency | |||
| Risk owner | can map metric status to appetite, residual risk and escalation | |||
| Audit partner | can sample and reconstruct the report without relying on oral explanation |
12. Templates
AI MI metric catalog:
| Metric ID | Name | Category | Owner | Source | Threshold | Cadence |
|---|---|---|---|---|---|---|
| MI-AI-SCOPE-001 | Material AI system count | Portfolio | AI Governance Lead | AI register | report-only | monthly |
| MI-AI-VALUE-001 | Qualified value events | Value | Product Owner | workflow + gateway | target by system | weekly/monthly |
| MI-AI-HARM-001 | AI-attributable customer harm incident rate | Harm | Customer Ops + Risk | incident + trace | appetite matrix | weekly/quarterly |
| MI-AI-CTRL-001 | HITL bypass count | Control | Operations Control Owner | workflow logs | zero for high-risk | weekly |
| MI-AI-AUD-001 | Trace completeness | Auditability | Platform Owner | gateway logs | green >= 99% | weekly/monthly |
| MI-AI-CONC-001 | Top model exposure | Concentration | Enterprise Architect | dependency graph | appetite limit | monthly |
| MI-AI-ACT-001 | Overdue high actions | Governance | AI Governance Lead | action log | zero | weekly/monthly |
| Concrete metric contract example: | ||||||
| Field | Example | |||||
| --- | --- | |||||
| Metric ID | MI-AI-HARM-001 | |||||
| Name | AI-attributable customer harm incident rate | |||||
| Decision purpose | determine if customer-facing AI expansion remains within appetite | |||||
| Definition | customer-impacting incidents where AI exposure materially contributed to harm | |||||
| Numerator | confirmed AI-attributable customer harm incidents in reporting period | |||||
| Denominator | eligible AI-assisted customer interactions in reporting period | |||||
| Grain | incident_id and exposure_event_id | |||||
| Source systems | incident platform, complaint platform, model gateway, workflow logs | |||||
| Thresholds | green = no severe and rate within baseline; amber = contained medium incident; red = uncontained severe or repeat trend | |||||
| Owner | Customer operations executive and AI risk owner | |||||
| Evidence | incident record, trace sample, postmortem, remediation record | |||||
| Version | v1.0 effective 2026-07-01, approved by AI Governance Committee | |||||
| Concrete attestation example: |
For Q2 2026, management attests that the AI Management Information report covers all seven material AI systems in the approved register, applies approved metric contracts and thresholds, discloses the source freshness amber issue, reconciles reportable AI incidents and customer harm events, and links all amber/red appetite signals to management actions or approved risk acceptance. Signers are business owner, risk owner, technology owner, data owner, security/privacy owner and finance owner for benefit claims.
Concrete incident board update example:
Incident ID: AI-INC-2026-014.
Severity: Medium.
Systems affected: Customer Service RAG and Branch Knowledge Assistant.
Customer impact: 31 customer drafts contained stale fee-policy citation; no financial loss; remediation completed.
Root cause: regulated fee document missed daily source refresh.
Controls that worked: agent review prevented direct customer auto-send.
Controls that failed: superseded-document block did not cover one fee schedule.
Management actions: daily regulated-policy ingestion, block retired source IDs, QA retest by 2026-07-10.
Decision requested: no customer-facing direct response until source freshness returns to green for two cycles.
13. 30-Day Lab
Goal: produce a portfolio-ready AI MI and board reporting architecture artifact for a financial retail institution.
| Day | Theme | Output |
|---|---|---|
| 1 | Select case portfolio | 5 AI systems: customer service, credit, AML, fraud, platform |
| 2 | Define board decisions | scale, hold, stop, fund, remediate, accept risk decision list |
| 3 | Build AI system register | system_id, owner, risk tier, stage, AI role, materiality |
| 4 | Define source systems | telemetry, workflow, QA, incident, complaint, finance, vendor |
| 5 | Design canonical event model | exposure, model call, retrieval, tool, review, incident, value |
| 6 | Draft metric taxonomy | value, adoption, quality, harm, control, incident, concentration |
| 7 | Review week 1 | one-page executive MI architecture narrative |
| 8 | Write 5 metric contracts | unsupported claim, harm incident, trace completeness, value event, concentration |
| 9 | Define thresholds | green/amber/red and stop rules |
| 10 | Define risk appetite statements | no appetite, limited appetite, appetite zones |
| 11 | Draw lineage map | source-to-metric-to-board-to-action |
| 12 | Define data quality rules | completeness, freshness, reconciliation, sample validation |
| 13 | Build action log schema | action fields, severity, escalation, closure |
| 14 | Review week 2 | metric contract and lineage review |
| 15 | Create portfolio dashboard | systems, risk, value, controls, actions |
| 16 | Create risk appetite dashboard | appetite category status and thresholds |
| 17 | Create control effectiveness dashboard | control pass rate, exceptions, evidence |
| 18 | Create harm and incident dashboard | severity, AI attribution, remediation |
| 19 | Create concentration dashboard | model/vendor/RAG/evidence exposure |
| 20 | Draft board cover page | decision requested, executive conclusion, key risks |
| 21 | Review week 3 | board pack coherence test |
| 22 | Build audit evidence checklist | report validation and sample pack |
| 23 | Create quarterly cadence | timeline, cut-off, sign-off, restatement |
| 24 | Build RACI | first/second/third line and PM/BA/architect split |
| 25 | Run synthetic metric experiment | calculate unsupported claim rate with lineage |
| 26 | Write incident board update | stale policy or vendor model-change case |
| 27 | Write management attestation | qualified Q2 2026 version |
| 28 | Write interview answers | 6 advanced Q&A |
| 29 | Self-review | remove vanity metrics, add lineage and action links |
| 30 | Assemble portfolio artifact | final MI architecture pack and 5-minute storyline |
| Completion standard: all board-level metrics have contracts; every metric has source lineage and threshold; every amber/red signal has action or risk acceptance; board pack asks for a decision; audit can reconstruct at least three sampled metrics. |
14. Financial Retail Case
| AI system | Business process | Risk |
|---|---|---|
| Customer Service RAG | servicing policy questions | wrong commitment, complaint, privacy |
| Credit Memo Assistant | underwriter documentation | fair lending, explanation, unsupported summary |
| AML Copilot | alert investigation | missed suspicious activity, weak evidence |
| Fraud Triage | card fraud queue | false positive, customer friction, loss |
| Branch Knowledge Assistant | staff operational support | inconsistent policy, stale source |
| AI Gateway | shared platform control | shadow AI, trace, cost, vendor concentration |
| Board concern | Metric | Source |
| --- | --- | --- |
| Value | qualified value events, finance-recognized benefit | workflow, cost ledger, finance baseline |
| Adoption | eligible workflow repeat adoption, output acceptance | user telemetry, workflow eligibility |
| Quality | unsupported claim, citation correctness, summary defect | QA, eval, RAG logs |
| Harm | AI-attributable complaint, appeal overturn, remediation | complaints, appeals, incident system |
| Control | HITL bypass, source freshness, trace completeness | workflow logs, source registry, gateway |
| Concentration | top model/vendor exposure, fallback coverage | dependency graph |
| Action | overdue high actions, exception aging | GRC/action log |
Board narrative:
The AI portfolio is producing early value but is not ready for broad customer-facing autonomy.
Customer service and AML show stable value in human-supervised workflows.
Credit memo remains held due to explanation and fairness evidence gaps.
The largest portfolio risk is concentration: the same model and evidence export path support multiple material systems.
Management requests funding for evidence automation, source freshness controls and model fallback.
| System | Decision | Reason |
|---|---|---|
| Customer Service RAG | limited scale | value green, quality green, source freshness amber |
| Credit Memo Assistant | hold | fairness/explanation metrics not stable |
| AML Copilot | limited expansion | value green, risk controls green, review capacity amber |
| Fraud Triage | continue pilot | false positive and appeal metrics need longer baseline |
| Branch Knowledge | continue | medium risk, source freshness improving |
| AI Gateway | fund | improves auditability and concentration controls |
15. Interview Answers
Q1: How do you design AI board reporting that is actually useful?
30 秒版本:
I start with board decisions, not charts. Each metric must support scale, hold, stop, fund, remediate or accept risk. Then I define metric contracts, source lineage, thresholds, owners, cadence and action logs. A board pack should be the presentation layer over a governed MI data product. 2 分钟版本: I first identify the decisions the board or audit committee needs to make: risk appetite, expansion, funding, remediation, residual risk acceptance or incident oversight. For each decision, I define metrics across value, adoption, quality, customer harm, control effectiveness, incidents, concentration, cost and auditability. Each metric gets a contract: numerator, denominator, grain, source systems, threshold, owner, quality rules and evidence. Then I design lineage from AI telemetry and workflow events to dashboard tiles and board statements. Finally, every amber or red metric links to a management action log.
Q2: What is the difference between AI governance and AI MI architecture?
30 秒版本:
Governance defines who oversees AI and what decisions they own. MI architecture defines the facts, definitions, lineage, thresholds and actions used in that oversight. Governance without MI becomes opinion; MI without governance becomes unused reporting.
Q3: How do you report AI value without falling into vanity metrics?
30 秒版本:
I avoid using model calls, feature count or generated content as primary value evidence. I use qualified value events: the workflow was eligible, AI was used, output was accepted, quality and risk passed, unit economics were within boundary, and the business or finance owner recognized the outcome.
Q4: How should SR 26-2 affect AI board reporting in 2026?
30 秒版本:
SR 26-2 supersedes SR 11-7 and SR 21-8 and moves toward risk-based, materiality-driven model risk management. It explicitly excludes generative and agentic AI from formal scope, so I would not misclassify every GenAI system as SR 26-2 governed. But I would use its discipline: inventory, materiality, monitoring, governance, documentation and action tracking, alongside broader AI risk frameworks.
Q5: How do you make AI control effectiveness visible to an audit committee?
30 秒版本:
I turn each control into an operating metric and evidence artifact. For example, "human-in-the-loop" becomes HITL bypass count, reviewer coverage, override reason, review queue age and sample evidence. Audit should be able to reconstruct whether the control operated, not just see that it exists in a policy.
Q6: What would you do if a board metric turns red?
30 秒版本:
A red metric triggers the pre-agreed action path: classify the issue, identify affected systems, assign owner, decide contain/hold/stop/escalate, record due date and evidence, then validate closure. If residual risk remains outside appetite, management needs risk acceptance or scope restriction.
Q7: How do you handle vendor concentration in board MI?
30 秒版本:
I report concentration as risk-weighted exposure by model, vendor, RAG platform, evidence stack and human review dependency. The board should see whether one dependency supports multiple material systems, whether fallback is tested, and whether exposure exceeds appetite.
Q8: How do you prove the board report is reliable?
30 秒版本:
I use report validation: reconcile the AI register, verify metric contracts, run data quality checks, test lineage, sample source records, validate threshold logic, reconcile incidents and actions, and obtain owner sign-offs. Audit should be able to replay sampled numbers end to end.
16. Common Pitfalls
| Pitfall | Why it fails | Better practice |
|---|---|---|
| Reporting AI projects instead of decisions | board sees activity, not choices | lead with decisions requested |
| Using calls and users as value | usage may be rework or forced adoption | qualified value events and recognized outcomes |
| No denominator clarity | rates and trends are misleading | metric contracts with inclusion/exclusion |
| No MI lineage | audit cannot trust the report | source-to-report lineage and sample packs |
| Controls reported as policies | no evidence of operation | control effectiveness metrics and samples |
| Incidents without harm taxonomy | customer impact hidden | severity, attribution, remediation and reversibility |
| Thresholds copied from another firm | appetite not institution-specific | risk appetite calibration and review |
| GenAI scope confusion under SR 26-2 | overclaim or under-governance | separate formal scope but align governance discipline |
| No restatement rule | errors silently corrected | formal restatement and version history |
| Actions closed by assertion | remediation not proven | evidence, validation and residual risk update |
17. Final Checklist
- AI systems have stable IDs, owners, stages, risk tiers and materiality.
- Board metrics have metric contracts.
- Metric contracts include numerator, denominator, grain, source, threshold, owner and version.
- Telemetry covers model, RAG, tool, human review, workflow, incident, harm, value and dependency events.
- MI data product has quality checks, lineage and semantic layer.
- Dashboards reuse approved metrics instead of slide-local calculations.
- Thresholds map to appetite, stop rules or escalation.
- Every amber/red signal links to action or approved risk acceptance.
- Report validation samples metrics, incidents, actions and trace reconstructability.
- Board pack asks for decisions, not passive awareness. CBAP+ mastery standard: you can define board-useful AI metrics without vanity measures, write metric contracts at business-rule precision, draw source-to-board MI lineage, map telemetry to value/risk/control/harm, connect thresholds to risk appetite and actions, explain SR 26-2 nuance without overclaiming GenAI scope, design an audit-ready evidence pack, and use MI to recommend scale, hold, stop, fund or remediate. Final memory card:
AI MI architecture is the control plane behind board reporting.
Board report = decision.
Metric contract = definition.
Lineage = trust.
Threshold = risk appetite.
Action log = management accountability.
Validation = audit readiness.