返回 Papers
AI 底层逻辑 / 经典论文

AI Risk Quantification:场景损失与控制 ROI 架构

重要边界:

572ai-foundations/papers/170-ai-risk-quantification-scenario-loss-control-roi-architecture.md

AI 风险量化架构:Scenario Loss / Control ROI / Residual Risk

Date: 2026-06-30 Status: evergreen Audience: experienced CBAP / financial retail PM / product architect / solution architect / AI governance lead Output: 一套把 AI risk scenario 转成 expected loss range、residual risk、control ROI、investment priority 和 management action 的决策架构。

1. Why Risk Quantification Matters for AI Product / Architecture

重要边界:

  • 本文不是法律、合规、审计、精算、资本计量或监管意见。
  • 本文不重复 risk appetite policy、board reporting、customer harm、incident liability 和 continuous control monitoring 的完整框架。
  • 本文只解决一个高级问题: 如何把 AI 风险场景转成可比较、可投资、可挑战、可复核的经济决策语言。

AI 项目常见的风险讨论停在三个层次:

High / Medium / Low
严重 / 一般 / 可接受
加一个 guardrail / 加一个 HITL / 做一些监控

这对上线评审有帮助, 但不足以支持产品和架构投资决策。CRO、CFO、CTO、产品负责人真正需要回答:

  • 这个 AI 场景的损失分布是什么, 不是单点风险等级是什么。
  • 频率和严重度假设来自哪里, 是历史事件、专家估计、eval 样本、红队结果还是 vendor SLA。
  • 某个控制能降低多少频率、严重度、检测延迟或恢复成本。
  • 控制投入是否比残余损失下降更划算。
  • 同样 100 万预算, 应该投 RAG grounding、agent permission gateway、human review capacity、vendor redundancy 还是 eval coverage。
  • 哪些 residual risk 需要管理层接受, 哪些必须降低, 哪些应通过产品 scope 避免。

一句话:

AI Risk Quantification 是把“AI 可能出错”翻译成“哪些场景会以多大频率造成多大损失, 哪些控制用多少钱能降低多少损失, 现在是否值得上线、扩展、暂停或改架构”。

对 AI PM / BA / Architect 的价值:

角色传统表达量化后的表达
PM这个功能有风险, 需要人工复核每月 120k 次客服建议中, regulated misinformation P95 expected loss 约 180k-420k; 引用校验 + 升级路由可降低 45%-60%, 但会增加 12 秒 AHT。
BA要加异常流程和审批Scenario register 显示 AML missed escalation 的 tail loss 高于误报成本; requirement 应优先保障 suspicious pattern recall 和 analyst override evidence。
Architect需要 guardrail 和日志Control ROI 显示 tool gateway 对 agent write actions 的损失降低高于 prompt-only guardrail; 架构上先建 deterministic PEP 再做模型调优。
Governance lead需要 risk sign-offresidual risk 超出 approved threshold 的 scenario 必须进入 management action pack: reduce、transfer、accept with expiry 或 avoid by scope change。

2. Concept Diagram

flowchart LR
  A[AI Use Case Scope] --> B[Scenario Register]
  B --> C[Frequency Assumption]
  B --> D[Severity Assumption]
  C --> E[Gross Loss Range]
  D --> E
  E --> F[Control Portfolio]
  F --> G[Control Effectiveness Estimate]
  G --> H[Residual Loss Range]
  H --> I[Risk Appetite Threshold]
  H --> J[Control ROI]
  I --> K[Management Action]
  J --> L[Investment Prioritization]
  K --> M[ADR / Release Decision]
  L --> M
  M --> N[Evidence Packet]

ASCII 版:

scenario -> likelihood x severity -> gross loss range
gross loss range -> control effectiveness -> residual loss range
residual loss range -> appetite threshold -> management action
control cost vs loss reduction -> ROI -> architecture/product priority

关键转译:

AI failure mode
  -> business scenario
  -> loss event
  -> frequency distribution
  -> severity distribution
  -> gross expected loss / tail loss
  -> control effect
  -> residual expected loss / tail loss
  -> investment decision

3. Core Architecture Model

AI 风险量化不是一个 spreadsheet, 而是一套 decision architecture。它至少包含八个对象。

Object作用关键字段Owner
Use case boundary限定量化范围system、channel、user group、decision/action boundary、traffic volume、model/vendor、release stagePM + Architect
Scenario register列出可量化风险场景scenario id、failure mode、business event、loss event、affected population、triggerBA + Risk
Assumption set记录频率和严重度假设baseline frequency、stress frequency、severity components、confidence、source、review dateRisk + Finance + Ops
Gross loss model估算控制前损失expected loss、P50/P90/P95、tail scenario、confidence intervalRisk Quant / Finance
Control portfolio列出可选控制组合control id、type、target mechanism、cost、latency、coverage、ownerArchitect + Control Owner
Effectiveness model估算控制如何降低损失frequency reduction、severity reduction、detection reduction、recovery reduction、dependencyEvalOps + Risk
Residual risk model估算控制后风险residual expected loss、residual P95、threshold status、risk owner、expiryRisk Owner
Decision record把结论转成行动invest / defer / reduce scope / accept / transfer / avoid、rationale、evidenceGovernance Forum

3.1 Loss Event Taxonomy

本文刻意不展开 customer harm 和 incident liability 的全套治理, 只把损失对象用于量化。

Loss event type示例量化口径
Direct financial losspayment fraud false negative 导致欺诈损失confirmed fraud amount、recovery rate、chargeback / reimbursement
Operational costAML missed escalation 后补查、QA、case reopeninganalyst hours、QA hours、manager review、backlog cost
Compliance remediation costKYC wrongful rejection 或 regulated advice hallucination 后整改re-review population、notification、external counsel、audit support
Revenue / conversion lossKYC 误拒导致开户流失lost margin、expected lifetime value、reactivation rate
Resilience costvendor outage 导致 AI channel 降级manual fallback capacity、SLA penalty、lost service productivity
Control friction cost控制带来人工复核、延迟、误报incremental review cost、abandonment、AHT increase

3.2 Four Loss Views

同一个 scenario 至少看四个视图:

View公式适合谁看
Expected lossfrequency x average severityPM / Finance / prioritization
Stress lossstress frequency x stress severityRisk / resilience / incident planning
Tail lossP95 or P99 severity rangeCRO / CTO / executive release decision
Net valuebusiness benefit - residual expected loss - control costProduct portfolio / funding gate

3.3 Quantification Grain

AI 风险不是只按系统量化, 更要按 decision grain 量化:

Grain用途
per response客服 RAG、regulated advice、contact center misinformation
per caseAML investigation、KYC review、fraud triage
per transactionpayment fraud false negative、payment scam intervention
per customer journeyonboarding、complaint、wealth suitability support
per vendor dependency hourmodel outage、embedding service outage、vector DB outage
per releasemodel upgrade、prompt policy change、tool permission change

4. Scenario Loss and Control ROI Method

4.1 Step-by-Step Method

1. Define business decision boundary.
2. Write scenario as loss event, not model error.
3. Estimate exposed population and event frequency.
4. Estimate severity distribution by loss component.
5. Calculate gross expected loss and tail loss.
6. Map candidate controls to loss mechanisms.
7. Estimate control effectiveness and control cost.
8. Calculate residual loss range and ROI.
9. Compare against threshold and product value.
10. Record decision, assumptions and evidence.

4.2 Scenario Statement Pattern

弱表达:

模型可能 hallucinate。

强表达:

In the regulated wealth education copilot, the model may produce an unsupported product-specific recommendation that a frontline advisor reuses in a customer conversation. Loss occurs when the customer acts on advice outside approved suitability workflow, requiring remediation review, complaint handling and potential client correction.

字段化:

Field内容
Scenarioregulated advice hallucination reused by advisor
Triggercustomer asks product-specific question; answer not grounded in approved source
Exposure35k advisor-assisted responses / month
Frequency assumption0.08%-0.18% unsupported high-risk answers before control
Severity componentsreview cost, complaint handling, remediation, potential lost relationship margin
Candidate controlsRAG source gating, recommendation phrase blocker, licensed advisor workflow, QA sample, eval gate
Decision usewhether to allow customer-specific product discussion in pilot

4.3 Frequency Estimate

Frequency should be estimated as a range and tied to observable evidence.

Evidence source用法注意点
Historical incidents类似客服误导、AML miss、fraud loss、KYC rejection历史系统不一定等同 AI system
Eval / red-team sample从 prompt suite、golden set、adversarial set 推断 defect rate样本设计偏差要记录
Production shadow modeAI 输出不执行, 与人工结果比对需要足够覆盖高风险 segment
QA / manual review抽样检查 output quality 和 control bypassreviewer consistency 要校准
Vendor SLA / outage historyvendor outage scenario 的频率假设要考虑 shared dependency concentration
Expert elicitation新场景无历史数据时使用必须记录 confidence 和复核日期

频率表达示例:

ConfidenceMonthly frequency assumption说明
Low0.02%-0.20% of exposed events只有红队和专家估计, 未经过 shadow mode
Medium0.04%-0.11% of exposed events有 4 周 shadow mode 和 QA 抽样
High0.05%-0.08% of exposed events有 3 个月 production telemetry、review calibration 和 stable policy corpus

4.4 Severity Estimate

Severity 不应只填一个 dollar amount。高级做法是拆成组件。

ComponentFormula example
Direct lossnumber of affected events x average direct loss x recovery adjustment
Review costreopened cases x hours per case x fully loaded hourly cost
Remediation costaffected customers x notification / correction / rework cost
Opportunity costwrongly rejected customers x conversion loss x expected margin
Control frictionextra reviews x review cost + additional latency impact
Resilience costoutage hours x manual fallback cost x business criticality multiplier

Severity range:

Percentile用法
P50正常预算和 backlog impact
P90risk committee 和 release gate
P95executive release decision / stress narrative
P99resilience planning or capital-style sensitivity, not daily prioritization

4.5 Control Effectiveness

控制有效性要说明作用机制。不要只写“降低风险 70%”。

Mechanism示例量化方式
Frequency reductionRAG citation checker blocks unsupported regulated answershigh-risk unsupported output rate from 0.15% to 0.06%
Severity reductionhuman approval prevents final customer communicationaverage severity drops because error stays internal
Detection improvementQA sample finds drift within 48h instead of 30 daysloss duration reduced
Recovery improvementtrace evidence accelerates case reconstructioninvestigation hours reduced
Exposure reductionproduct scope excludes product-specific adviceexposed high-risk queries reduced
Resilience improvementvendor failover preserves critical workflowoutage impact hours reduced

4.6 Control ROI Formula

基础口径:

gross_expected_loss = exposure x frequency x average_severity
residual_expected_loss = exposure x residual_frequency x residual_average_severity
loss_reduction = gross_expected_loss - residual_expected_loss
control_roi = (loss_reduction - annual_control_cost) / annual_control_cost

更适合架构投资的口径:

risk_adjusted_net_value =
  business_benefit
  - residual_expected_loss
  - control_cost
  - control_friction_cost

Tail-aware narrative:

The control does not only reduce expected loss.
It compresses the P95 tail from a management-action event to a contained operating event.

4.7 Prioritization Matrix

ControlAnnual costExpected loss reductionTail loss reductionFrictionPriority
RAG citation enforcement180k420k-720kmediumlowhigh
Agent tool gateway350k600k-1.4mhighmediumhigh
Double human review for all cases1.2m700k-1.1mhighhighselective
Prompt-only safety instruction40k50k-120klowlowsupporting
Vendor redundancy500k250k-600khigh for outagelowcritical-service only

5. Financial Retail Scenarios

5.1 Model Hallucination in Regulated Advice

DimensionQuantification design
ScenarioWealth or lending assistant generates unsupported product-specific advice that an employee reuses.
Exposureadvisor responses or policy-assistant answers involving regulated topics.
Gross frequencyunsupported regulated recommendation rate from eval + shadow mode.
Severitycase review, correction outreach, complaint handling, advisor retraining, potential revenue loss.
Controlsapproved corpus manifest, citation support gate, recommendation phrase classifier, licensed advisor handoff, blocked generation for product-specific advice.
Residual narrativeresidual risk is acceptable only if AI cannot directly send customer communication and all product-specific claims show source support.
TradeoffStricter refusal reduces loss frequency but may lower advisor adoption; route high-risk prompts to curated workflow instead of generic chat.

Example range:

ItemConservativeBaseStress
Monthly exposure20,00035,00050,000
Gross defect frequency0.06%0.12%0.25%
Avg severity1,2002,5008,000
Gross monthly expected loss14,400105,0001,000,000
Residual after controls5,000-18,00032,000-58,000220,000-420,000

5.2 AML Missed Escalation

DimensionQuantification design
ScenarioAML copilot summary or prioritization misses a suspicious pattern, causing delayed escalation.
Exposurealerts where AI summary, typology suggestion or priority score influences analyst attention.
Gross frequencymissed red-flag rate by typology from shadow comparison and QA sample.
Severityreopened investigation, suspicious activity escalation delay, QA remediation, analyst backlog, regulatory exam support.
Controlstypology-specific eval set, recall floor for high-risk patterns, mandatory red-flag checklist, analyst override reason, no auto-close, scenario sampling.
Residual narrativeexpected loss may be moderate, but tail scenario is high when missed escalation clusters by typology or policy drift.
TradeoffOptimizing precision reduces false positives but may increase tail risk; architecture should preserve recall for high-risk typologies and manage review load elsewhere.

Decision rule:

If high-risk typology recall falls below release floor or confidence interval is wide,
do not expand automation.
Use AI for evidence assembly, not case prioritization, until recall evidence matures.

5.3 Payment Fraud False Negatives

DimensionQuantification design
ScenarioAI fraud triage classifies risky transactions as low priority, delaying intervention.
Exposuretransactions routed through AI triage or risk explanation workflow.
Gross frequencyfalse negative rate by segment, channel, merchant category and new fraud pattern.
Severityfraud amount, recovery rate, dispute operations, customer contact, downstream losses.
Controlsthreshold by risk segment, champion/challenger model, rules fallback, velocity checks, risk-based human queue, post-event learning loop.
Residual narrativeresidual risk must be compared against false-positive friction and customer abandonment, not optimized in isolation.
TradeoffLower false negatives can increase false positives; control ROI should include saved fraud losses and additional review / customer friction cost.

Loss formula:

net_loss_reduction =
  avoided_fraud_loss
  - incremental_false_positive_review_cost
  - customer_friction_cost
  - engineering_and_runtime_cost

5.4 KYC Wrongful Rejection

DimensionQuantification design
ScenarioAI document / identity / risk assistant contributes to rejecting legitimate applicants.
Exposureapplications where AI recommendation influences decision or manual queue priority.
Gross frequencywrongful rejection rate from appeal uphold, manual audit and shadow adjudication.
Severitylost funded accounts, rework, support contacts, appeal handling, reputational friction.
Controlslow-confidence routing, segment fairness test, reason-code consistency, second review for thin-file / document-quality cases, appeal evidence capture.
Residual narrativeexpected loss includes lost conversion, while tail risk comes from concentrated rejection in a protected or vulnerable segment.
TradeoffFaster onboarding may increase false rejects if quality gates are too aggressive; product architecture should separate fraud risk escalation from final rejection.

5.5 Contact Center Misinformation

DimensionQuantification design
ScenarioCustomer service copilot provides incorrect fee, dispute, account closure, promotion or policy information.
Exposurecustomer-visible or agent-assisted answers on regulated or contractual topics.
Gross frequencyunsupported claim rate, stale-source answer rate, agent acceptance rate.
Severityrecontact cost, complaint handling, fee corrections, goodwill credit, QA remediation.
Controlsapproved-source RAG, citation required for policy claims, answer templates for regulated topics, escalation intents, agent confirmation UX.
Residual narrativeresidual expected loss can be low, but high volume makes small defect rates material.
TradeoffFull blocking may hurt first-contact resolution; better architecture uses intent risk tiering and source-specific answer modes.

5.6 Vendor Outage

DimensionQuantification design
ScenarioFoundation model, embedding, vector database or AI gateway vendor outage degrades a critical service.
Exposureworkflows dependent on vendor at runtime.
Gross frequencyvendor SLA, historical outage, shared dependency analysis, internal failover test.
Severitymanual fallback cost, backlog, SLA breach, customer delay, lost productivity.
Controlsgraceful degradation, cached responses for stable policy, model fallback, queue prioritization, manual runbook, vendor concentration limit.
Residual narrativeexpected loss may not justify full redundancy for low-criticality use cases, but P95 outage cost may justify redundancy for AML, fraud or contact center peak periods.
TradeoffMulti-vendor architecture adds eval, routing, security and consistency cost; use risk-adjusted criticality rather than blanket redundancy.

6. Metrics / Control / Evidence Model

6.1 Metric Contract

MetricDefinitionEvidence sourceDecision use
Gross scenario exposurenumber of AI-influenced events in scopeworkflow telemetry, gateway logs, case systemdenominator for loss estimate
Defect frequencyscenario-specific failure rate before controleval, red-team, shadow mode, QAgross loss and control target
Control coverageshare of exposed events where control operatedpolicy engine logs, tool gateway logs, workflow approvalcontrol design and operating check
Control effectivenessobserved reduction in frequency / severity / detection delayA/B, shadow comparison, before-after, expert estimateresidual loss and ROI
Residual expected lossexpected loss after controlsmodel spreadsheet + evidence packetrelease / scale / fund decision
Tail lossP95/P99 scenario severitystress test, incident simulation, expert estimateexecutive decision and resilience planning
Control friction costcost introduced by the controlAHT, review volume, latency, abandonment, infra costnet value calculation
Loss avoidedgross loss minus residual lossquantification modelROI and prioritization

6.2 Control Types by Loss Mechanism

Control typeReduces frequencyReduces severityReduces detection delayReduces recovery cost
RAG source gatingyespartialpartialyes
Citation verificationyespartialyesyes
Agent permission gatewayyesyesyesyes
Human approvalpartialyespartialpartial
Eval gateyespartialyespartial
Trace and evidence captureno directpartialyesyes
Vendor failovernoyesyesyes
Product scope exclusionyesyesyesyes

6.3 Evidence Packet

EvidenceRequired fields
Scenario registerscenario id、owner、exposure、trigger、loss event、affected workflow
Frequency evidencesample design、period、population、defect count、confidence、reviewer calibration
Severity evidenceloss components、unit cost source、finance owner、range rationale
Control evidencedesign spec、control owner、event field、test result、coverage
Effectiveness evidencepre/post comparison、A/B or shadow result、assumptions、limitations
Residual risk statementresidual range、threshold status、owner、decision、expiry
ROI sheetcontrol cost、friction cost、loss reduction、net value、sensitivity
ADRselected architecture、alternatives、tradeoffs、review date

7. Anti-Patterns and Failure Modes

Anti-patternWhy it failsBetter design
High / Medium / Low only不能比较投资优先级, 也不能解释为什么花这笔钱用 expected loss range + tail scenario + confidence
False precision写出 123,456.78 美元让人误以为精确用 range、percentile、assumption quality 和 sensitivity
Model metric as risk metricaccuracy / F1 不等于业务损失把 model error 映射到 loss event 和 workflow exposure
Control theater增加 guardrail 但不估计有效性对每个 control 写明降低频率、严重度、检测或恢复的机制
Ignoring friction cost控制看似高 ROI, 实际拖慢流程、增加放弃率把 review load、latency、false positive 和 adoption impact 计入 net value
Average-only thinkingexpected loss 低, tail loss 高同时看 P50、expected、P95 和 stress narrative
Over-general scenario“AI 给错答案”太宽按业务事件、渠道、客户群、流程阶段拆场景
No confidence rating所有假设看起来同等可靠标注 evidence maturity: expert / eval / shadow / production
Control dependency blindness两个控制依赖同一日志或供应商建 control dependency map 和 common-mode failure scenario
ROI without action算完数字但不上 roadmap / ADR将结果进入 funding gate、release gate 和 management action

8. Architecture Mapping to RAG / Agent / Copilot / Eval / Governance

Architecture areaQuantification focusTypical controlsDecision tradeoff
RAGunsupported claim frequency, stale source severity, source coverageapproved corpus, citation verification, freshness SLO, answer abstentionbetter grounding vs higher latency / refusal
Agentunauthorized action severity, tool misuse frequency, reversibilitypermission gateway, scoped tokens, approval workflow, idempotent tools, action ledgerautonomy value vs tail loss compression
Copilothuman overreliance, draft reuse, review costUI confidence cues, mandatory review for high-risk intent, edit diff, role-based guidanceproductivity vs review burden
Evalscenario frequency estimate, control effectiveness evidencescenario eval set, red-team suite, segment tests, regression gateeval coverage vs release speed
Governanceresidual threshold, acceptance owner, action trackingscenario register, residual risk memo, ADR, evidence packet, management action packdecision clarity vs governance overhead

8.1 RAG Example

Risk scenario: contact center copilot gives wrong fee waiver policy.
Control candidate: citation-required answer mode for fee / dispute / account closure intents.
Quant effect: unsupported claim rate drops from 1.8% to 0.4%; AHT increases by 9 seconds.
Decision: apply control to regulated intents only, not all FAQ intents.

8.2 Agent Example

Risk scenario: agent creates or closes a customer case incorrectly.
Control candidate: tool gateway requiring human approval token for write actions.
Quant effect: severity distribution changes because errors become draft-only.
Decision: allow read + draft autonomy, require approval for writes until residual P95 falls below threshold.

8.3 Eval Example

Risk scenario: AML copilot misses new mule-account typology.
Control candidate: typology-specific regression suite and monthly red-team refresh.
Quant effect: confidence in frequency estimate improves; release decision moves from "unknown" to "controlled pilot".
Decision: fund eval coverage before adding more model capacity.

9. ADR Draft

FieldContent
ADR titleAdopt scenario-loss and control-ROI architecture for high-impact AI releases
Statusproposed
ContextExisting AI release gates classify use cases by risk tier, but investment decisions still rely on qualitative risk statements. Teams cannot consistently compare RAG grounding, agent permission, human review, eval coverage and vendor resilience investments.
DecisionFor high-impact financial retail AI use cases, require a scenario-loss model before scale approval. Each material scenario must estimate exposure, frequency, severity, gross loss, control effectiveness, residual loss, control cost, friction cost and management action.
ScopeRegulated advice copilot, AML copilot, fraud triage, KYC onboarding assistant, contact center RAG, vendor-dependent AI services.
Architecture impactAdd scenario register, assumption set, control ROI sheet, residual risk statement and evidence packet to release artifacts. Integrate telemetry fields needed for exposure, control coverage and defect frequency.
Product impactRoadmap prioritization must compare risk-adjusted net value, not only user value or engineering effort. Product scope can be reduced when tail loss exceeds threshold.
Alternatives consideredQualitative risk tier only; control checklist only; full quantitative operational-risk capital model; pure model-metric gate.
ConsequencesBetter investment discipline and executive decision quality; additional effort in assumption gathering; requires finance / risk / operations input; false precision must be controlled through ranges and confidence labels.
Review triggerNew high-impact use case, material model/vendor change, incident, major control failure, scale expansion, annual methodology review.

10. Interview Answer

30 秒版本

我不会只用 High / Medium / Low 管 AI 风险。我的方法是把 AI failure mode 转成业务 loss scenario, 估算 exposure、frequency、severity, 得到 gross expected loss 和 P95 tail loss; 再评估每个控制能降低频率、严重度、检测延迟还是恢复成本, 计算 residual risk 和 control ROI。这样产品、架构、风险和财务可以共同决定: 上线、缩小范围、加控制、接受残余风险, 还是停止。

2 分钟版本

金融零售 AI 的难点不是知道“有风险”, 而是知道风险是否值得承担以及控制钱该花在哪里。我会先定义 use case boundary, 比如客服 RAG、AML copilot、KYC AI review、支付欺诈 triage 或 vendor outage。然后把模型错误写成业务场景: regulated advice hallucination、AML missed escalation、fraud false negative、KYC wrongful rejection、contact center misinformation。每个场景估算 exposure、频率范围和严重度组件, 形成 gross expected loss 和 tail loss。

接着我会把控制映射到损失机制。RAG citation gate 主要降低 unsupported answer frequency; agent tool gateway 降低 unauthorized action severity; human review 降低最终客户影响; eval suite 提高频率估计和变更门禁; vendor failover 降低 outage severity。最后比较 control cost、friction cost 和 loss reduction, 输出 residual risk、ROI、阈值状态和 management action。

这套方法的价值是把风险治理变成产品和架构决策: 哪些场景只能 assist-only, 哪些可以 automation, 哪些投资比模型升级更高 ROI, 哪些 residual risk 需要管理层限期接受或整改。

CTO 版本

CTO 关心的是工程投资是否压缩了真正的 tail risk, 而不是 dashboard 上多了几个绿色指标。我会要求高影响 AI 系统具备 scenario-loss architecture: 每个 release 都能说明 AI run 的业务 exposure、scenario defect rate、control coverage、residual P95 和 common-mode dependency。架构上优先做 deterministic control plane: model gateway、RAG source registry、tool permission gateway、trace evidence、eval regression 和 vendor degradation path。

投资排序不是“哪个控制看起来更合规”, 而是“哪个控制用最小 friction 压缩最多 expected loss 和 tail loss”。比如 regulated advice 场景, prompt instruction 的 ROI 很低, citation enforcement 和 product-scope routing 更有效; agent write-action 场景, tool gateway 比事后 review 更能降低严重度; vendor outage 场景, 不是所有 use case 都要 multi-vendor, 但 critical service 必须有 graceful degradation。我的 CTO-level takeaway 是: risk quantification should drive architecture runway.


11. 7-Day Practice Plan

DayPracticeOutput
1选择一个金融零售 AI use case: AML copilot、KYC onboarding、contact center RAG、fraud triage 或 wealth advice copilotuse case boundary + exposure definition
2写 6 个 scenario, 每个 scenario 都从 model error 转成 business loss eventscenario register
3为 2 个高优先 scenario 估算 frequency range, 标注 evidence maturityfrequency assumption sheet
4拆 severity components: direct loss、ops cost、remediation、conversion、resilience、control frictionseverity range table
5设计 5 个 controls, 说明每个 control 降低频率、严重度、检测还是恢复control effectiveness estimate
6计算 gross loss、residual loss、control ROI 和 sensitivityROI model + prioritization matrix
7写 1 页 management action pack + 1 条 ADR + 30 秒面试答案portfolio-ready decision artifact

评分标准:

LevelEvidence
Basic有 scenario 和 High / Medium / Low
Strong有 exposure、frequency、severity、gross / residual loss
Advanced有 control mechanism、ROI、tail narrative、threshold 和 management action
Portfolio-ready有 ADR、evidence packet、assumption quality、architecture tradeoff 和 interview story

12. Key Takeaways

  1. AI risk quantification 的核心不是精确预测, 而是让决策假设显性化。
  2. 量化单位必须是 business loss scenario, 不是抽象 model failure。
  3. 控制 ROI 要同时看 loss reduction、control cost、friction cost 和 tail compression。
  4. RAG、Agent、Copilot、Eval 和 Governance 的架构投资可以用同一套 scenario-loss language 比较。
  5. 最成熟的表达不是“这个 AI 风险可控”, 而是“在这些 assumptions 下, residual expected loss 和 P95 tail loss 低于阈值; 如果指标突破, 管理层动作是 reduce scope / add control / accept with expiry / stop”。

Source Anchors

SourceLink在本文中的用法
NIST AI Risk Management Frameworkhttps://www.nist.gov/itl/ai-risk-management-framework用 Govern / Map / Measure / Manage 组织 AI risk scenario 的识别、度量、处置和治理证据。NIST 页面显示 AI RMF 1.0 正在修订, 正式项目需按访问日期复核。
NIST AI RMF Generative AI Profilehttps://www.nist.gov/publications/artificial-intelligence-risk-management-framework-generative-artificial-intelligence用于 GenAI 特有 scenario: hallucination、grounding failure、misuse、supply-chain dependency、content risk 和 evaluation gap。
ISO/IEC 23894:2023https://www.iso.org/standard/77304.html参考 AI risk management 如何进入组织活动、AI lifecycle 和风险处置过程。
ISO/IEC 42001:2023https://www.iso.org/standard/81230.html用 AI management system 的视角连接 policy、objective、risk treatment、operation、performance evaluation 和 continual improvement。
NIST SP 800-30 Rev. 1https://csrc.nist.gov/publications/detail/sp/800-30/rev-1/final参考 risk assessment、likelihood、impact、residual risk、control mitigation 和 senior leader decision support。
BIS Principles for Operational Resiliencehttps://www.bis.org/bcbs/publ/d516.htm用 operational resilience 思维处理 vendor outage、technology failure、wide-scale disruption 和 business service impact。