AI 底层逻辑 / 经典论文

AI Risk Quantification：场景损失与控制 ROI 架构

重要边界:

572 行ai-foundations/papers/170-ai-risk-quantification-scenario-loss-control-roi-architecture.md

AI 风险量化架构：Scenario Loss / Control ROI / Residual Risk

Date: 2026-06-30 Status: evergreen Audience: experienced CBAP / financial retail PM / product architect / solution architect / AI governance lead Output: 一套把 AI risk scenario 转成 expected loss range、residual risk、control ROI、investment priority 和 management action 的决策架构。

1. Why Risk Quantification Matters for AI Product / Architecture

重要边界:

本文不是法律、合规、审计、精算、资本计量或监管意见。
本文不重复 risk appetite policy、board reporting、customer harm、incident liability 和 continuous control monitoring 的完整框架。
本文只解决一个高级问题: 如何把 AI 风险场景转成可比较、可投资、可挑战、可复核的经济决策语言。

AI 项目常见的风险讨论停在三个层次:

High / Medium / Low
严重 / 一般 / 可接受
加一个 guardrail / 加一个 HITL / 做一些监控

这对上线评审有帮助, 但不足以支持产品和架构投资决策。CRO、CFO、CTO、产品负责人真正需要回答:

这个 AI 场景的损失分布是什么, 不是单点风险等级是什么。
频率和严重度假设来自哪里, 是历史事件、专家估计、eval 样本、红队结果还是 vendor SLA。
某个控制能降低多少频率、严重度、检测延迟或恢复成本。
控制投入是否比残余损失下降更划算。
同样 100 万预算, 应该投 RAG grounding、agent permission gateway、human review capacity、vendor redundancy 还是 eval coverage。
哪些 residual risk 需要管理层接受, 哪些必须降低, 哪些应通过产品 scope 避免。

一句话:

AI Risk Quantification 是把“AI 可能出错”翻译成“哪些场景会以多大频率造成多大损失, 哪些控制用多少钱能降低多少损失, 现在是否值得上线、扩展、暂停或改架构”。

对 AI PM / BA / Architect 的价值:

角色	传统表达	量化后的表达
PM	这个功能有风险, 需要人工复核	每月 120k 次客服建议中, regulated misinformation P95 expected loss 约 180k-420k; 引用校验 + 升级路由可降低 45%-60%, 但会增加 12 秒 AHT。
BA	要加异常流程和审批	Scenario register 显示 AML missed escalation 的 tail loss 高于误报成本; requirement 应优先保障 suspicious pattern recall 和 analyst override evidence。
Architect	需要 guardrail 和日志	Control ROI 显示 tool gateway 对 agent write actions 的损失降低高于 prompt-only guardrail; 架构上先建 deterministic PEP 再做模型调优。
Governance lead	需要 risk sign-off	residual risk 超出 approved threshold 的 scenario 必须进入 management action pack: reduce、transfer、accept with expiry 或 avoid by scope change。

2. Concept Diagram

flowchart LR
  A[AI Use Case Scope] --> B[Scenario Register]
  B --> C[Frequency Assumption]
  B --> D[Severity Assumption]
  C --> E[Gross Loss Range]
  D --> E
  E --> F[Control Portfolio]
  F --> G[Control Effectiveness Estimate]
  G --> H[Residual Loss Range]
  H --> I[Risk Appetite Threshold]
  H --> J[Control ROI]
  I --> K[Management Action]
  J --> L[Investment Prioritization]
  K --> M[ADR / Release Decision]
  L --> M
  M --> N[Evidence Packet]

ASCII 版:

scenario -> likelihood x severity -> gross loss range
gross loss range -> control effectiveness -> residual loss range
residual loss range -> appetite threshold -> management action
control cost vs loss reduction -> ROI -> architecture/product priority

关键转译:

AI failure mode
  -> business scenario
  -> loss event
  -> frequency distribution
  -> severity distribution
  -> gross expected loss / tail loss
  -> control effect
  -> residual expected loss / tail loss
  -> investment decision

3. Core Architecture Model

AI 风险量化不是一个 spreadsheet, 而是一套 decision architecture。它至少包含八个对象。

Object	作用	关键字段	Owner
Use case boundary	限定量化范围	system、channel、user group、decision/action boundary、traffic volume、model/vendor、release stage	PM + Architect
Scenario register	列出可量化风险场景	scenario id、failure mode、business event、loss event、affected population、trigger	BA + Risk
Assumption set	记录频率和严重度假设	baseline frequency、stress frequency、severity components、confidence、source、review date	Risk + Finance + Ops
Gross loss model	估算控制前损失	expected loss、P50/P90/P95、tail scenario、confidence interval	Risk Quant / Finance
Control portfolio	列出可选控制组合	control id、type、target mechanism、cost、latency、coverage、owner	Architect + Control Owner
Effectiveness model	估算控制如何降低损失	frequency reduction、severity reduction、detection reduction、recovery reduction、dependency	EvalOps + Risk
Residual risk model	估算控制后风险	residual expected loss、residual P95、threshold status、risk owner、expiry	Risk Owner
Decision record	把结论转成行动	invest / defer / reduce scope / accept / transfer / avoid、rationale、evidence	Governance Forum

3.1 Loss Event Taxonomy

本文刻意不展开 customer harm 和 incident liability 的全套治理, 只把损失对象用于量化。

Loss event type	示例	量化口径
Direct financial loss	payment fraud false negative 导致欺诈损失	confirmed fraud amount、recovery rate、chargeback / reimbursement
Operational cost	AML missed escalation 后补查、QA、case reopening	analyst hours、QA hours、manager review、backlog cost
Compliance remediation cost	KYC wrongful rejection 或 regulated advice hallucination 后整改	re-review population、notification、external counsel、audit support
Revenue / conversion loss	KYC 误拒导致开户流失	lost margin、expected lifetime value、reactivation rate
Resilience cost	vendor outage 导致 AI channel 降级	manual fallback capacity、SLA penalty、lost service productivity
Control friction cost	控制带来人工复核、延迟、误报	incremental review cost、abandonment、AHT increase

3.2 Four Loss Views

同一个 scenario 至少看四个视图:

View	公式	适合谁看
Expected loss	frequency x average severity	PM / Finance / prioritization
Stress loss	stress frequency x stress severity	Risk / resilience / incident planning
Tail loss	P95 or P99 severity range	CRO / CTO / executive release decision
Net value	business benefit - residual expected loss - control cost	Product portfolio / funding gate

3.3 Quantification Grain

AI 风险不是只按系统量化, 更要按 decision grain 量化:

Grain	用途
per response	客服 RAG、regulated advice、contact center misinformation
per case	AML investigation、KYC review、fraud triage
per transaction	payment fraud false negative、payment scam intervention
per customer journey	onboarding、complaint、wealth suitability support
per vendor dependency hour	model outage、embedding service outage、vector DB outage
per release	model upgrade、prompt policy change、tool permission change

4. Scenario Loss and Control ROI Method

4.1 Step-by-Step Method

1. Define business decision boundary.
2. Write scenario as loss event, not model error.
3. Estimate exposed population and event frequency.
4. Estimate severity distribution by loss component.
5. Calculate gross expected loss and tail loss.
6. Map candidate controls to loss mechanisms.
7. Estimate control effectiveness and control cost.
8. Calculate residual loss range and ROI.
9. Compare against threshold and product value.
10. Record decision, assumptions and evidence.

4.2 Scenario Statement Pattern

弱表达:

模型可能 hallucinate。

强表达:

In the regulated wealth education copilot, the model may produce an unsupported product-specific recommendation that a frontline advisor reuses in a customer conversation. Loss occurs when the customer acts on advice outside approved suitability workflow, requiring remediation review, complaint handling and potential client correction.

字段化:

Field	内容
Scenario	regulated advice hallucination reused by advisor
Trigger	customer asks product-specific question; answer not grounded in approved source
Exposure	35k advisor-assisted responses / month
Frequency assumption	0.08%-0.18% unsupported high-risk answers before control
Severity components	review cost, complaint handling, remediation, potential lost relationship margin
Candidate controls	RAG source gating, recommendation phrase blocker, licensed advisor workflow, QA sample, eval gate
Decision use	whether to allow customer-specific product discussion in pilot

4.3 Frequency Estimate

Frequency should be estimated as a range and tied to observable evidence.

Evidence source	用法	注意点
Historical incidents	类似客服误导、AML miss、fraud loss、KYC rejection	历史系统不一定等同 AI system
Eval / red-team sample	从 prompt suite、golden set、adversarial set 推断 defect rate	样本设计偏差要记录
Production shadow mode	AI 输出不执行, 与人工结果比对	需要足够覆盖高风险 segment
QA / manual review	抽样检查 output quality 和 control bypass	reviewer consistency 要校准
Vendor SLA / outage history	vendor outage scenario 的频率假设	要考虑 shared dependency concentration
Expert elicitation	新场景无历史数据时使用	必须记录 confidence 和复核日期

频率表达示例:

Confidence	Monthly frequency assumption	说明
Low	0.02%-0.20% of exposed events	只有红队和专家估计, 未经过 shadow mode
Medium	0.04%-0.11% of exposed events	有 4 周 shadow mode 和 QA 抽样
High	0.05%-0.08% of exposed events	有 3 个月 production telemetry、review calibration 和 stable policy corpus

4.4 Severity Estimate

Severity 不应只填一个 dollar amount。高级做法是拆成组件。

Component	Formula example
Direct loss	number of affected events x average direct loss x recovery adjustment
Review cost	reopened cases x hours per case x fully loaded hourly cost
Remediation cost	affected customers x notification / correction / rework cost
Opportunity cost	wrongly rejected customers x conversion loss x expected margin
Control friction	extra reviews x review cost + additional latency impact
Resilience cost	outage hours x manual fallback cost x business criticality multiplier

Severity range:

Percentile	用法
P50	正常预算和 backlog impact
P90	risk committee 和 release gate
P95	executive release decision / stress narrative
P99	resilience planning or capital-style sensitivity, not daily prioritization

4.5 Control Effectiveness

控制有效性要说明作用机制。不要只写“降低风险 70%”。

Mechanism	示例	量化方式
Frequency reduction	RAG citation checker blocks unsupported regulated answers	high-risk unsupported output rate from 0.15% to 0.06%
Severity reduction	human approval prevents final customer communication	average severity drops because error stays internal
Detection improvement	QA sample finds drift within 48h instead of 30 days	loss duration reduced
Recovery improvement	trace evidence accelerates case reconstruction	investigation hours reduced
Exposure reduction	product scope excludes product-specific advice	exposed high-risk queries reduced
Resilience improvement	vendor failover preserves critical workflow	outage impact hours reduced

4.6 Control ROI Formula

基础口径:

gross_expected_loss = exposure x frequency x average_severity
residual_expected_loss = exposure x residual_frequency x residual_average_severity
loss_reduction = gross_expected_loss - residual_expected_loss
control_roi = (loss_reduction - annual_control_cost) / annual_control_cost

更适合架构投资的口径:

risk_adjusted_net_value =
  business_benefit
  - residual_expected_loss
  - control_cost
  - control_friction_cost

Tail-aware narrative:

The control does not only reduce expected loss.
It compresses the P95 tail from a management-action event to a contained operating event.

4.7 Prioritization Matrix

Control	Annual cost	Expected loss reduction	Tail loss reduction	Friction	Priority
RAG citation enforcement	180k	420k-720k	medium	low	high
Agent tool gateway	350k	600k-1.4m	high	medium	high
Double human review for all cases	1.2m	700k-1.1m	high	high	selective
Prompt-only safety instruction	40k	50k-120k	low	low	supporting
Vendor redundancy	500k	250k-600k	high for outage	low	critical-service only

5. Financial Retail Scenarios

5.1 Model Hallucination in Regulated Advice

Dimension	Quantification design
Scenario	Wealth or lending assistant generates unsupported product-specific advice that an employee reuses.
Exposure	advisor responses or policy-assistant answers involving regulated topics.
Gross frequency	unsupported regulated recommendation rate from eval + shadow mode.
Severity	case review, correction outreach, complaint handling, advisor retraining, potential revenue loss.
Controls	approved corpus manifest, citation support gate, recommendation phrase classifier, licensed advisor handoff, blocked generation for product-specific advice.
Residual narrative	residual risk is acceptable only if AI cannot directly send customer communication and all product-specific claims show source support.
Tradeoff	Stricter refusal reduces loss frequency but may lower advisor adoption; route high-risk prompts to curated workflow instead of generic chat.

Example range:

Item	Conservative	Base	Stress
Monthly exposure	20,000	35,000	50,000
Gross defect frequency	0.06%	0.12%	0.25%
Avg severity	1,200	2,500	8,000
Gross monthly expected loss	14,400	105,000	1,000,000
Residual after controls	5,000-18,000	32,000-58,000	220,000-420,000

5.2 AML Missed Escalation

Dimension	Quantification design
Scenario	AML copilot summary or prioritization misses a suspicious pattern, causing delayed escalation.
Exposure	alerts where AI summary, typology suggestion or priority score influences analyst attention.
Gross frequency	missed red-flag rate by typology from shadow comparison and QA sample.
Severity	reopened investigation, suspicious activity escalation delay, QA remediation, analyst backlog, regulatory exam support.
Controls	typology-specific eval set, recall floor for high-risk patterns, mandatory red-flag checklist, analyst override reason, no auto-close, scenario sampling.
Residual narrative	expected loss may be moderate, but tail scenario is high when missed escalation clusters by typology or policy drift.
Tradeoff	Optimizing precision reduces false positives but may increase tail risk; architecture should preserve recall for high-risk typologies and manage review load elsewhere.

Decision rule:

If high-risk typology recall falls below release floor or confidence interval is wide,
do not expand automation.
Use AI for evidence assembly, not case prioritization, until recall evidence matures.

5.3 Payment Fraud False Negatives

Dimension	Quantification design
Scenario	AI fraud triage classifies risky transactions as low priority, delaying intervention.
Exposure	transactions routed through AI triage or risk explanation workflow.
Gross frequency	false negative rate by segment, channel, merchant category and new fraud pattern.
Severity	fraud amount, recovery rate, dispute operations, customer contact, downstream losses.
Controls	threshold by risk segment, champion/challenger model, rules fallback, velocity checks, risk-based human queue, post-event learning loop.
Residual narrative	residual risk must be compared against false-positive friction and customer abandonment, not optimized in isolation.
Tradeoff	Lower false negatives can increase false positives; control ROI should include saved fraud losses and additional review / customer friction cost.

Loss formula:

net_loss_reduction =
  avoided_fraud_loss
  - incremental_false_positive_review_cost
  - customer_friction_cost
  - engineering_and_runtime_cost

5.4 KYC Wrongful Rejection

Dimension	Quantification design
Scenario	AI document / identity / risk assistant contributes to rejecting legitimate applicants.
Exposure	applications where AI recommendation influences decision or manual queue priority.
Gross frequency	wrongful rejection rate from appeal uphold, manual audit and shadow adjudication.
Severity	lost funded accounts, rework, support contacts, appeal handling, reputational friction.
Controls	low-confidence routing, segment fairness test, reason-code consistency, second review for thin-file / document-quality cases, appeal evidence capture.
Residual narrative	expected loss includes lost conversion, while tail risk comes from concentrated rejection in a protected or vulnerable segment.
Tradeoff	Faster onboarding may increase false rejects if quality gates are too aggressive; product architecture should separate fraud risk escalation from final rejection.

5.5 Contact Center Misinformation

Dimension	Quantification design
Scenario	Customer service copilot provides incorrect fee, dispute, account closure, promotion or policy information.
Exposure	customer-visible or agent-assisted answers on regulated or contractual topics.
Gross frequency	unsupported claim rate, stale-source answer rate, agent acceptance rate.
Severity	recontact cost, complaint handling, fee corrections, goodwill credit, QA remediation.
Controls	approved-source RAG, citation required for policy claims, answer templates for regulated topics, escalation intents, agent confirmation UX.
Residual narrative	residual expected loss can be low, but high volume makes small defect rates material.
Tradeoff	Full blocking may hurt first-contact resolution; better architecture uses intent risk tiering and source-specific answer modes.

5.6 Vendor Outage

Dimension	Quantification design
Scenario	Foundation model, embedding, vector database or AI gateway vendor outage degrades a critical service.
Exposure	workflows dependent on vendor at runtime.
Gross frequency	vendor SLA, historical outage, shared dependency analysis, internal failover test.
Severity	manual fallback cost, backlog, SLA breach, customer delay, lost productivity.
Controls	graceful degradation, cached responses for stable policy, model fallback, queue prioritization, manual runbook, vendor concentration limit.
Residual narrative	expected loss may not justify full redundancy for low-criticality use cases, but P95 outage cost may justify redundancy for AML, fraud or contact center peak periods.
Tradeoff	Multi-vendor architecture adds eval, routing, security and consistency cost; use risk-adjusted criticality rather than blanket redundancy.

6. Metrics / Control / Evidence Model

6.1 Metric Contract

Metric	Definition	Evidence source	Decision use
Gross scenario exposure	number of AI-influenced events in scope	workflow telemetry, gateway logs, case system	denominator for loss estimate
Defect frequency	scenario-specific failure rate before control	eval, red-team, shadow mode, QA	gross loss and control target
Control coverage	share of exposed events where control operated	policy engine logs, tool gateway logs, workflow approval	control design and operating check
Control effectiveness	observed reduction in frequency / severity / detection delay	A/B, shadow comparison, before-after, expert estimate	residual loss and ROI
Residual expected loss	expected loss after controls	model spreadsheet + evidence packet	release / scale / fund decision
Tail loss	P95/P99 scenario severity	stress test, incident simulation, expert estimate	executive decision and resilience planning
Control friction cost	cost introduced by the control	AHT, review volume, latency, abandonment, infra cost	net value calculation
Loss avoided	gross loss minus residual loss	quantification model	ROI and prioritization

6.2 Control Types by Loss Mechanism

Control type	Reduces frequency	Reduces severity	Reduces detection delay	Reduces recovery cost
RAG source gating	yes	partial	partial	yes
Citation verification	yes	partial	yes	yes
Agent permission gateway	yes	yes	yes	yes
Human approval	partial	yes	partial	partial
Eval gate	yes	partial	yes	partial
Trace and evidence capture	no direct	partial	yes	yes
Vendor failover	no	yes	yes	yes
Product scope exclusion	yes	yes	yes	yes

6.3 Evidence Packet

Evidence	Required fields
Scenario register	scenario id、owner、exposure、trigger、loss event、affected workflow
Frequency evidence	sample design、period、population、defect count、confidence、reviewer calibration
Severity evidence	loss components、unit cost source、finance owner、range rationale
Control evidence	design spec、control owner、event field、test result、coverage
Effectiveness evidence	pre/post comparison、A/B or shadow result、assumptions、limitations
Residual risk statement	residual range、threshold status、owner、decision、expiry
ROI sheet	control cost、friction cost、loss reduction、net value、sensitivity
ADR	selected architecture、alternatives、tradeoffs、review date

7. Anti-Patterns and Failure Modes

Anti-pattern	Why it fails	Better design
High / Medium / Low only	不能比较投资优先级, 也不能解释为什么花这笔钱	用 expected loss range + tail scenario + confidence
False precision	写出 123,456.78 美元让人误以为精确	用 range、percentile、assumption quality 和 sensitivity
Model metric as risk metric	accuracy / F1 不等于业务损失	把 model error 映射到 loss event 和 workflow exposure
Control theater	增加 guardrail 但不估计有效性	对每个 control 写明降低频率、严重度、检测或恢复的机制
Ignoring friction cost	控制看似高 ROI, 实际拖慢流程、增加放弃率	把 review load、latency、false positive 和 adoption impact 计入 net value
Average-only thinking	expected loss 低, tail loss 高	同时看 P50、expected、P95 和 stress narrative
Over-general scenario	“AI 给错答案”太宽	按业务事件、渠道、客户群、流程阶段拆场景
No confidence rating	所有假设看起来同等可靠	标注 evidence maturity: expert / eval / shadow / production
Control dependency blindness	两个控制依赖同一日志或供应商	建 control dependency map 和 common-mode failure scenario
ROI without action	算完数字但不上 roadmap / ADR	将结果进入 funding gate、release gate 和 management action

8. Architecture Mapping to RAG / Agent / Copilot / Eval / Governance

Architecture area	Quantification focus	Typical controls	Decision tradeoff
RAG	unsupported claim frequency, stale source severity, source coverage	approved corpus, citation verification, freshness SLO, answer abstention	better grounding vs higher latency / refusal
Agent	unauthorized action severity, tool misuse frequency, reversibility	permission gateway, scoped tokens, approval workflow, idempotent tools, action ledger	autonomy value vs tail loss compression
Copilot	human overreliance, draft reuse, review cost	UI confidence cues, mandatory review for high-risk intent, edit diff, role-based guidance	productivity vs review burden
Eval	scenario frequency estimate, control effectiveness evidence	scenario eval set, red-team suite, segment tests, regression gate	eval coverage vs release speed
Governance	residual threshold, acceptance owner, action tracking	scenario register, residual risk memo, ADR, evidence packet, management action pack	decision clarity vs governance overhead

8.1 RAG Example

Risk scenario: contact center copilot gives wrong fee waiver policy.
Control candidate: citation-required answer mode for fee / dispute / account closure intents.
Quant effect: unsupported claim rate drops from 1.8% to 0.4%; AHT increases by 9 seconds.
Decision: apply control to regulated intents only, not all FAQ intents.

8.2 Agent Example

Risk scenario: agent creates or closes a customer case incorrectly.
Control candidate: tool gateway requiring human approval token for write actions.
Quant effect: severity distribution changes because errors become draft-only.
Decision: allow read + draft autonomy, require approval for writes until residual P95 falls below threshold.

8.3 Eval Example

Risk scenario: AML copilot misses new mule-account typology.
Control candidate: typology-specific regression suite and monthly red-team refresh.
Quant effect: confidence in frequency estimate improves; release decision moves from "unknown" to "controlled pilot".
Decision: fund eval coverage before adding more model capacity.

9. ADR Draft

Field	Content
ADR title	Adopt scenario-loss and control-ROI architecture for high-impact AI releases
Status	proposed
Context	Existing AI release gates classify use cases by risk tier, but investment decisions still rely on qualitative risk statements. Teams cannot consistently compare RAG grounding, agent permission, human review, eval coverage and vendor resilience investments.
Decision	For high-impact financial retail AI use cases, require a scenario-loss model before scale approval. Each material scenario must estimate exposure, frequency, severity, gross loss, control effectiveness, residual loss, control cost, friction cost and management action.
Scope	Regulated advice copilot, AML copilot, fraud triage, KYC onboarding assistant, contact center RAG, vendor-dependent AI services.
Architecture impact	Add scenario register, assumption set, control ROI sheet, residual risk statement and evidence packet to release artifacts. Integrate telemetry fields needed for exposure, control coverage and defect frequency.
Product impact	Roadmap prioritization must compare risk-adjusted net value, not only user value or engineering effort. Product scope can be reduced when tail loss exceeds threshold.
Alternatives considered	Qualitative risk tier only; control checklist only; full quantitative operational-risk capital model; pure model-metric gate.
Consequences	Better investment discipline and executive decision quality; additional effort in assumption gathering; requires finance / risk / operations input; false precision must be controlled through ranges and confidence labels.
Review trigger	New high-impact use case, material model/vendor change, incident, major control failure, scale expansion, annual methodology review.

10. Interview Answer

30 秒版本

我不会只用 High / Medium / Low 管 AI 风险。我的方法是把 AI failure mode 转成业务 loss scenario, 估算 exposure、frequency、severity, 得到 gross expected loss 和 P95 tail loss; 再评估每个控制能降低频率、严重度、检测延迟还是恢复成本, 计算 residual risk 和 control ROI。这样产品、架构、风险和财务可以共同决定: 上线、缩小范围、加控制、接受残余风险, 还是停止。

2 分钟版本

金融零售 AI 的难点不是知道“有风险”, 而是知道风险是否值得承担以及控制钱该花在哪里。我会先定义 use case boundary, 比如客服 RAG、AML copilot、KYC AI review、支付欺诈 triage 或 vendor outage。然后把模型错误写成业务场景: regulated advice hallucination、AML missed escalation、fraud false negative、KYC wrongful rejection、contact center misinformation。每个场景估算 exposure、频率范围和严重度组件, 形成 gross expected loss 和 tail loss。

接着我会把控制映射到损失机制。RAG citation gate 主要降低 unsupported answer frequency; agent tool gateway 降低 unauthorized action severity; human review 降低最终客户影响; eval suite 提高频率估计和变更门禁; vendor failover 降低 outage severity。最后比较 control cost、friction cost 和 loss reduction, 输出 residual risk、ROI、阈值状态和 management action。

这套方法的价值是把风险治理变成产品和架构决策: 哪些场景只能 assist-only, 哪些可以 automation, 哪些投资比模型升级更高 ROI, 哪些 residual risk 需要管理层限期接受或整改。

CTO 版本

CTO 关心的是工程投资是否压缩了真正的 tail risk, 而不是 dashboard 上多了几个绿色指标。我会要求高影响 AI 系统具备 scenario-loss architecture: 每个 release 都能说明 AI run 的业务 exposure、scenario defect rate、control coverage、residual P95 和 common-mode dependency。架构上优先做 deterministic control plane: model gateway、RAG source registry、tool permission gateway、trace evidence、eval regression 和 vendor degradation path。

投资排序不是“哪个控制看起来更合规”, 而是“哪个控制用最小 friction 压缩最多 expected loss 和 tail loss”。比如 regulated advice 场景, prompt instruction 的 ROI 很低, citation enforcement 和 product-scope routing 更有效; agent write-action 场景, tool gateway 比事后 review 更能降低严重度; vendor outage 场景, 不是所有 use case 都要 multi-vendor, 但 critical service 必须有 graceful degradation。我的 CTO-level takeaway 是: risk quantification should drive architecture runway.

11. 7-Day Practice Plan

Day	Practice	Output
1	选择一个金融零售 AI use case: AML copilot、KYC onboarding、contact center RAG、fraud triage 或 wealth advice copilot	use case boundary + exposure definition
2	写 6 个 scenario, 每个 scenario 都从 model error 转成 business loss event	scenario register
3	为 2 个高优先 scenario 估算 frequency range, 标注 evidence maturity	frequency assumption sheet
4	拆 severity components: direct loss、ops cost、remediation、conversion、resilience、control friction	severity range table
5	设计 5 个 controls, 说明每个 control 降低频率、严重度、检测还是恢复	control effectiveness estimate
6	计算 gross loss、residual loss、control ROI 和 sensitivity	ROI model + prioritization matrix
7	写 1 页 management action pack + 1 条 ADR + 30 秒面试答案	portfolio-ready decision artifact

评分标准:

Level	Evidence
Basic	有 scenario 和 High / Medium / Low
Strong	有 exposure、frequency、severity、gross / residual loss
Advanced	有 control mechanism、ROI、tail narrative、threshold 和 management action
Portfolio-ready	有 ADR、evidence packet、assumption quality、architecture tradeoff 和 interview story

12. Key Takeaways

AI risk quantification 的核心不是精确预测, 而是让决策假设显性化。
量化单位必须是 business loss scenario, 不是抽象 model failure。
控制 ROI 要同时看 loss reduction、control cost、friction cost 和 tail compression。
RAG、Agent、Copilot、Eval 和 Governance 的架构投资可以用同一套 scenario-loss language 比较。
最成熟的表达不是“这个 AI 风险可控”, 而是“在这些 assumptions 下, residual expected loss 和 P95 tail loss 低于阈值; 如果指标突破, 管理层动作是 reduce scope / add control / accept with expiry / stop”。

Source Anchors

Source	Link	在本文中的用法
NIST AI Risk Management Framework	https://www.nist.gov/itl/ai-risk-management-framework	用 Govern / Map / Measure / Manage 组织 AI risk scenario 的识别、度量、处置和治理证据。NIST 页面显示 AI RMF 1.0 正在修订, 正式项目需按访问日期复核。
NIST AI RMF Generative AI Profile	https://www.nist.gov/publications/artificial-intelligence-risk-management-framework-generative-artificial-intelligence	用于 GenAI 特有 scenario: hallucination、grounding failure、misuse、supply-chain dependency、content risk 和 evaluation gap。
ISO/IEC 23894:2023	https://www.iso.org/standard/77304.html	参考 AI risk management 如何进入组织活动、AI lifecycle 和风险处置过程。
ISO/IEC 42001:2023	https://www.iso.org/standard/81230.html	用 AI management system 的视角连接 policy、objective、risk treatment、operation、performance evaluation 和 continual improvement。
NIST SP 800-30 Rev. 1	https://csrc.nist.gov/publications/detail/sp/800-30/rev-1/final	参考 risk assessment、likelihood、impact、residual risk、control mitigation 和 senior leader decision support。
BIS Principles for Operational Resilience	https://www.bis.org/bcbs/publ/d516.htm	用 operational resilience 思维处理 vendor outage、technology failure、wide-scale disruption 和 business service impact。