AI 扩展计划 / Playbooks

AI Risk Quantification / Scenario Loss / Control ROI Playbook

本 playbook 用于回答一个高级问题:

527 行AI_RISK_QUANTIFICATION_SCENARIO_LOSS_CONTROL_ROI_PLAYBOOK.md

AI Risk Quantification / Scenario Loss / Control ROI Playbook

定位: 面向 experienced CBAP / financial retail PM / product architect / solution architect / AI governance lead 的执行手册。目标是把 AI risk scenario 转成 expected loss range、control effectiveness、residual risk、control ROI、management action 和 release decision。

重要说明: 本文不是法律、合规、审计、精算、资本计量或监管意见。正式项目必须由 Legal、Compliance、Risk、Model Risk、Finance、Operations、Security、Privacy、Technology、Internal Audit 和业务管理层确认。

1. Purpose and When to Use

1.1 Purpose

本 playbook 用于回答一个高级问题:

Given an AI use case and a set of risk scenarios,
which controls are economically and architecturally justified,
what residual risk remains,
and what management action should be taken?

它把 AI 风险讨论从 qualitative rating 升级为 decision architecture:

scenario -> exposure -> frequency -> severity -> gross loss
-> control effectiveness -> residual loss
-> ROI / threshold / management action

1.2 When to Use

Situation	Use this playbook to decide
New high-impact AI use case	是否进入 pilot、shadow mode、assist-only 或 full release
Scale decision	是否扩大渠道、客户群、业务线或自动化程度
Architecture funding	是否投资 RAG grounding、agent gateway、eval suite、human review capacity、vendor redundancy
Risk exception	是否接受 residual risk、接受多久、需要哪些补偿控制
Incident learning	事故后哪些控制投资真正降低 expected loss 或 tail loss
Vendor concentration	是否为某些 critical AI services 建 failover 或 graceful degradation
Product roadmap tradeoff	高价值功能是否因 tail risk 太高需要降级、分阶段或改变 scope

1.3 When Not to Use

Situation	Better artifact
定义组织总体 AI risk appetite	risk appetite / policy product management
向董事会做组合级报告	board MI / management information architecture
处理客户补救和申诉	customer harm / redress architecture
做控制日常运行测试	continuous control monitoring
做法律责任判断	legal / compliance / external counsel process

2. Source Anchors

Source	Link	本 playbook 使用方式
NIST AI Risk Management Framework	https://www.nist.gov/itl/ai-risk-management-framework	用 Govern / Map / Measure / Manage 组织 AI risk scenario、measurement、risk treatment 和 evidence。
NIST AI RMF Generative AI Profile	https://www.nist.gov/publications/artificial-intelligence-risk-management-framework-generative-artificial-intelligence	用于 GenAI hallucination、grounding failure、misuse、supply-chain dependency、evaluation gap 和 content risk 的 scenario 设计。
ISO/IEC 23894:2023	https://www.iso.org/standard/77304.html	参考 AI risk management 如何进入 AI lifecycle、risk treatment 和组织治理流程。
ISO/IEC 42001:2023	https://www.iso.org/standard/81230.html	用 AI management system 语言连接 policy、objective、operation、performance evaluation 和 continual improvement。
NIST SP 800-30 Rev. 1	https://csrc.nist.gov/publications/detail/sp/800-30/rev-1/final	用 likelihood、impact、residual risk 和 risk response 思路组织量化假设。
BIS Principles for Operational Resilience	https://www.bis.org/bcbs/publ/d516.htm	用于 vendor outage、critical service impact、manual fallback 和 resilience control ROI。

3. Operating Model

3.1 Roles

Role	Responsibility
Business owner	定义业务目标、价值、可接受的服务影响和管理动作
Product owner	定义 use case boundary、roadmap options、scope reduction options 和 release ask
Senior BA / CBAP	把 workflow、decision point、exception、evidence 和 scenario register 结构化
Solution / product architect	设计 RAG、agent、copilot、eval、gateway、telemetry 和 failover 控制
Risk partner	挑战 scenario、frequency、severity、residual threshold 和 acceptance
Finance partner	校准 unit cost、loss component、ROI 和 benefit realization
Operations owner	提供 volume、review capacity、SLA、manual fallback 和 friction cost
EvalOps / model quality	提供 defect frequency、control effectiveness、confidence interval 和 regression evidence
Security / privacy / third-party	提供 data boundary、vendor dependency、access、resilience 和 concentration view
Governance forum	做 invest / scale / hold / reduce / accept / avoid 决策

3.2 Workflow

1. Scope use case and decision boundary.
2. Build scenario register.
3. Estimate exposure, frequency and severity.
4. Estimate gross expected loss and P95 tail loss.
5. Map candidate controls to loss mechanisms.
6. Estimate control effectiveness, cost and friction.
7. Calculate residual risk and control ROI.
8. Compare against thresholds and business value.
9. Prepare management action pack.
10. Attach evidence packet to ADR / release decision.

3.3 Decision Rights

Decision	Recommended owner
Business value baseline	Business owner + Finance
Scenario materiality	Risk + Business owner
Frequency assumption	EvalOps + Risk + Operations
Severity assumption	Finance + Risk + Operations
Control design	Architect + Control owner
Control cost	Technology + Operations + Finance
Residual risk acceptance	Named risk owner / governance forum
Release action	Product owner + governance forum
Architecture investment	CTO / platform owner / portfolio forum

4. Templates

4.1 Scenario Register Template

Field	Guidance	Example
Scenario ID	Stable id	AI-RQ-AML-001
Use case	Specific system / workflow	AML Copilot investigation summary
Business decision boundary	What AI influences	analyst prioritization and red-flag checklist
Failure mode	Model / workflow failure	misses mule-account red flag
Loss event	Business consequence	delayed escalation and reopened investigation
Exposure population	Events in scope	18,000 alerts / month in pilot queue
Affected segment	Segment / channel / geography	high-risk retail payments alerts
Gross risk hypothesis	Pre-control story	summary omission reduces analyst attention
Candidate controls	Controls to evaluate	typology eval, mandatory checklist, no auto-close
Evidence maturity	expert / eval / shadow / production	shadow + QA sample
Owner	Accountable person	AML operations owner

Compact version:

Scenario ID	Use case	Loss event	Exposure	Frequency evidence	Severity driver	Candidate controls	Owner
AI-RQ-ADV-001	Regulated advice copilot	unsupported product-specific recommendation reused by advisor	35k answers / month	red-team + shadow	review, complaint, correction	approved corpus, citation gate, handoff	Wealth PM
AI-RQ-AML-001	AML copilot	missed suspicious pattern escalation	18k alerts / month	QA + shadow	reopened case, escalation delay	typology eval, checklist, no auto-close	AML Ops
AI-RQ-FRD-001	Payment fraud triage	false negative delays intervention	2.4m transactions / month	model validation + production	fraud amount, recovery	threshold, rules fallback, human queue	Fraud Strategy
AI-RQ-KYC-001	KYC onboarding	legitimate applicant wrongfully rejected	80k applications / month	appeal uphold + audit	lost margin, rework	low-confidence route, second review	Onboarding PM
AI-RQ-CS-001	Contact center RAG	incorrect policy answer accepted by agent	420k answers / month	QA sample	recontact, correction, goodwill	source gate, regulated templates	Contact Center
AI-RQ-VND-001	AI vendor dependency	vendor outage degrades critical workflow	600 runtime hours / month	SLA + resilience test	manual fallback, backlog	failover, cache, degradation	Platform Owner

4.2 Severity / Frequency Estimate Template

Frequency Estimate

Field	Guidance	Example
Exposure denominator	Count of AI-influenced events	420,000 customer-service assisted answers / month
Gross defect frequency range	Before additional controls	0.25%-0.60% unsupported regulated-topic answers
Basis	Data source	4-week QA sample of 8,000 answers + red-team
Confidence	low / medium / high	medium
Segment adjustment	Higher-risk segment multiplier	fees/disputes x 1.8
Stress frequency	Adverse but plausible	1.2% during policy update month
Review trigger	When to refresh	corpus update, prompt change, complaint spike

Severity Estimate

Component	Unit	Low	Base	Stress	Evidence source
Direct financial correction	per confirmed event	25	75	250	fee correction history
Recontact / service cost	per event	8	18	35	contact center cost model
QA / remediation review	per event	30	60	140	QA staffing model
Goodwill / relationship cost	per event	0	50	300	historical complaint settlement
Management / audit support	batch	5,000	15,000	60,000	prior remediation project

Gross Loss Calculation

Scenario	Exposure	Frequency	Avg severity	Expected monthly loss	P95 monthly loss
Contact center misinformation	420,000	0.35%	110	161,700	420,000-680,000
KYC wrongful rejection	80,000	0.08%	950	60,800	180,000-320,000
Fraud false negative	2,400,000	0.012%	1,400	403,200	1.1m-2.4m

4.3 Control Effectiveness Estimate Template

Control ID	Control	Target mechanism	Gross assumption	Residual assumption	Evidence	Confidence	Friction
CTRL-RAG-001	citation-required answer mode	frequency reduction	unsupported rate 0.35%	0.11%-0.18%	QA before/after	medium	+9 sec AHT
CTRL-AGT-002	human approval token for writes	severity reduction	write error reaches system of record	error remains draft until approval	workflow test	high	review queue load
CTRL-EVL-003	typology regression eval	frequency reduction + detection	AML miss detected after QA cycle	miss detected at release gate	eval result	medium	release delay
CTRL-VND-004	model fallback route	outage severity reduction	80% workflow disruption	25%-40% disruption	resilience drill	low-medium	extra platform cost

Control effectiveness narrative:

The control reduces risk by changing the loss mechanism, not by improving the model in general.
Citation-required answer mode blocks or escalates unsupported regulated claims, so frequency falls.
Human approval token changes severity because draft errors do not directly update systems of record.
Vendor fallback reduces outage severity and duration but does not reduce hallucination frequency.

4.4 Residual Risk Template

Field	Guidance	Example
Scenario ID	Link to scenario register	AI-RQ-CS-001
Gross expected loss	Pre-control range	160k-290k / month
Gross P95 loss	Tail estimate	420k-680k / month
Selected controls	Control ids	CTRL-RAG-001, CTRL-CS-005, CTRL-EVL-002
Residual expected loss	Post-control range	45k-92k / month
Residual P95 loss	Post-control tail	130k-240k / month
Residual threshold status	green / amber / red	amber
Residual risk owner	Named owner	Contact Center Risk Owner
Acceptance condition	Conditions and expiry	accept for 90-day pilot, no scale to disputes until P95 below 180k
Management action	reduce / invest / accept / avoid / transfer	invest in regulated answer templates and weekly eval refresh
Review trigger	Event	policy corpus update, complaint rate increase, unsupported claim rate > 0.2%

Residual risk language:

Residual risk is not "low".
Residual expected loss is estimated at 45k-92k per month, with P95 at 130k-240k.
This is inside pilot threshold but above scale threshold.
Management action is controlled pilot plus targeted control investment before expansion.

4.5 Control ROI Template

Field	Formula / guidance
Gross expected annual loss	monthly gross expected loss x 12
Residual expected annual loss	monthly residual expected loss x 12
Expected loss reduction	gross expected annual loss - residual expected annual loss
Annual control cost	engineering amortization + runtime + operations + license
Annual friction cost	review time + latency / abandonment + false positive cost
Net risk reduction	expected loss reduction - annual friction cost
Control ROI	(net risk reduction - annual control cost) / annual control cost
Tail impact	P95 / P99 loss compression narrative
Strategic note	reuse across use cases, resilience benefit, regulatory confidence, architecture runway

Example:

Control	Gross annual expected loss	Residual annual expected loss	Loss reduction	Control cost	Friction cost	ROI	Tail impact
Contact center citation gate	2.4m	780k	1.62m	260k	180k	4.5x	P95 monthly loss from 680k to 240k
Agent write gateway	1.8m	420k	1.38m	420k	220k	1.8x	converts system-of-record errors to draft errors
Vendor dual-run fallback	900k	360k	540k	520k	60k	-0.08x	weak expected ROI, strong critical-service tail reduction

Interpretation:

Positive ROI is not automatically enough; control may still create unacceptable customer friction.
Negative expected ROI can still be justified for critical resilience if P95 / P99 loss is intolerable.
Reusable platform controls should allocate cost across use cases rather than penalizing the first adopter.

4.6 Management Action Pack Template

Section	Content
Decision requested	approve pilot / scale / hold / reduce scope / fund control / accept residual risk
Use case boundary	workflow, users, channels, automation level, excluded scope
Top scenarios	3-6 material scenarios with gross and residual loss ranges
Control portfolio	selected controls, rejected controls, rationale
Risk-adjusted value	business benefit minus residual loss and control / friction cost
Threshold status	which scenarios are green / amber / red and why
Management action	invest, reduce, accept with expiry, avoid, transfer, monitor
Conditions	volume cap, channel cap, excluded intents, manual review, review date
Evidence packet	links or references to assumptions, eval, QA, architecture and finance evidence
ADR summary	architecture decision and tradeoffs

One-page format:

Decision	Recommendation
Release ask	Controlled pilot for contact center RAG on servicing intents, excluding disputes and account closure
Value	8%-12% AHT reduction; 1.1m annual productivity value after adoption adjustment
Material risk	misinformation on regulated policy topics
Gross loss	2.0m-3.5m annual expected loss before controls
Controls	approved corpus, citation gate, regulated templates, escalation intent classifier
Residual loss	540k-1.1m annual expected loss; P95 monthly 130k-240k
ROI	citation gate + templates estimated 3.2x-4.8x after friction
Action	approve pilot; do not scale to dispute / account closure until residual P95 below threshold
Evidence	QA sample, red-team, architecture diagram, trace coverage, finance unit-cost model

4.7 Evidence Packet Template

Evidence artifact	Required content	Owner
Use case boundary	workflow, channel, user, customer segment, automation level, exclusions	Product owner
Architecture diagram	RAG / model / gateway / tool / workflow / evidence boundary	Architect
Scenario register	material scenarios, loss events, exposure, owners	BA + Risk
Frequency estimate	sample, period, defect count, confidence, segment adjustment	EvalOps
Severity estimate	unit costs, loss components, finance sign-off	Finance
Control design	control objective, mechanism, event fields, owner	Architect + Control owner
Effectiveness evidence	before / after, A/B, shadow, QA or expert basis	EvalOps + Risk
Residual risk memo	residual ranges, threshold status, action, expiry	Risk owner
ROI model	gross loss, residual loss, cost, friction, sensitivity	Finance + Product
ADR	decision, alternatives, consequences, review trigger	Architect

5. PM / BA / Architecture Questions

5.1 PM Questions

Question	Why it matters
What business decision does this AI influence?	Loss must attach to a decision or workflow, not a model in isolation.
What is the exposed population per month?	Frequency without denominator is unusable.
Which scenarios change the release decision?	Focus on material scenarios, not every theoretical failure.
Which control creates unacceptable user friction?	High control effectiveness can still destroy product value.
What scope reduction would avoid the tail scenario?	Sometimes product boundary beats technical control.
What value remains after residual loss and control cost?	AI ROI must be risk-adjusted.
Which scenario needs explicit management acceptance?	PM should not silently carry residual risk.

5.2 BA / CBAP Questions

Question	Why it matters
Where exactly does AI enter the process?	Determines exposure and control point.
Is AI searching, drafting, recommending, deciding or executing?	Automation level changes severity.
What is the exception path when AI confidence is low?	Low-confidence handling often determines residual risk.
Which data fields prove a control operated?	Evidence must be designed into workflow events.
Which manual review outcomes can calibrate frequency assumptions?	Human workflow is a source of quant evidence.
What segments need separate estimates?	Average risk can hide KYC, fraud or AML concentration.
Which assumptions will be challenged by risk or audit?	Explicit assumptions are easier to defend and update.

5.3 Architecture Questions

Question	Why it matters
Which controls are deterministic vs model-dependent?	Deterministic controls often compress tail risk better.
Does the control reduce frequency, severity, detection delay or recovery cost?	Controls should map to loss mechanisms.
What telemetry is required to calculate exposure and coverage?	Quantification fails without event contracts.
Can the system degrade safely when model / vector DB / vendor is down?	Vendor outage is a resilience scenario, not only an IT incident.
Are controls reusable across use cases?	Platform ROI may be high even when single-use ROI is weak.
Does the architecture support per-scenario thresholds?	Different intents and tools need different risk treatment.
What common-mode failure could defeat multiple controls?	Shared dependency can make residual risk understated.

6. Release Checklist

6.1 Minimum Before Pilot

Check	Pass condition
Use case boundary	Documented scope, exclusions, automation level and owner
Scenario register	At least top 5 material scenarios with owners
Exposure denominator	Monthly volume estimate by workflow / channel / segment
Frequency estimate	Range and evidence maturity for top scenarios
Severity estimate	Loss components and finance / ops basis
Control mapping	Candidate controls mapped to loss mechanisms
Residual risk	Expected and tail residual range for top scenarios
Management action	Pilot decision, conditions, threshold and review trigger
Evidence packet	Traceable artifacts stored with owner and date

6.2 Minimum Before Scale

Check	Pass condition
Production evidence	Pilot telemetry validates or updates assumptions
Control coverage	Controls operate on required share of exposed events
Effectiveness evidence	Before / after or shadow evidence supports reduction estimate
Friction cost	AHT, review load, false positive and abandonment effects included
Residual threshold	Residual expected loss and P95 tail within scale threshold or explicitly accepted
Architecture ADR	Selected controls and rejected alternatives documented
Scenario sensitivity	Stress case run for volume, defect rate, severity and outage
Management sign-off	Named owner accepts residual risk with expiry and review conditions

6.3 Stop / Hold Triggers

Trigger	Default action
Frequency estimate confidence remains low for material scenario	hold scale; extend shadow mode or sampling
Residual P95 exceeds threshold	reduce scope or add control before release
Control friction eliminates business value	redesign control or change product boundary
Control effectiveness is unproven	treat as assumption, not as residual risk reduction
Vendor dependency lacks graceful degradation for critical service	block scale for critical workflow
Common-mode failure defeats multiple controls	redesign architecture; do not count controls independently

7. Executive Narrative

7.1 Standard Narrative

We quantified the material AI risk scenarios for this release rather than relying on qualitative risk ratings.
The highest-risk scenarios are [A], [B] and [C].
For each scenario, we estimated exposure, frequency and severity using [shadow mode / QA / eval / production / expert] evidence.

Before controls, expected annual loss is estimated at [range], with P95 tail at [range].
The recommended controls reduce risk by [mechanism]: [frequency / severity / detection / recovery].
After controls, residual expected loss is [range], with P95 at [range].

The control portfolio costs [amount] annually and adds [friction].
Net risk reduction is [amount], producing [ROI] and compressing the main tail scenario from [before] to [after].

Recommendation: [approve pilot / scale / hold / reduce scope / fund control].
Conditions: [volume cap, excluded scope, threshold, review date, owner].

7.2 CFO Version

The business case is risk-adjusted.
We are not presenting productivity value alone.
We subtract residual expected loss, control operating cost and workflow friction.
The highest ROI controls are [X] and [Y].
The vendor redundancy control has weak expected ROI but is justified only for critical workflows because it compresses P95 outage loss.

7.3 CRO Version

The residual risk statement is scenario-specific.
We are not asking for blanket acceptance of AI risk.
For each material scenario, the pack shows gross loss, selected controls, residual loss, threshold status, owner, expiry and review trigger.
Scenarios above threshold are not approved for scale unless scope is reduced or controls are funded.

7.4 CTO Version

The architecture investment is prioritized by loss mechanism.
Prompt-only control has low tail impact.
Deterministic controls such as source gating, tool gateway, scoped authorization, trace evidence and graceful degradation compress the loss distribution more reliably.
Reusable platform controls should be funded as architecture runway, not charged only to one product.

8. Interview Drills

Drill 1: 30-Second Answer

Question:

How do you quantify AI risk for a financial retail AI product?

Answer:

I start with business loss scenarios, not model metrics. For each scenario I define exposure, frequency and severity, then estimate gross expected loss and a P95 tail range. I map controls to the loss mechanism they reduce: frequency, severity, detection delay or recovery cost. After estimating control cost and friction, I calculate residual risk and control ROI. The output is a management action: approve, scale, reduce scope, invest in controls, accept residual risk with expiry or stop.

Drill 2: Regulated Advice Hallucination

Question:

A wealth copilot sometimes produces unsupported product-specific advice. What do you do?

Strong answer:

I would not start with a generic hallucination metric. I would quantify the scenario where unsupported product-specific advice is reused in a customer conversation. The denominator is regulated-topic advisor responses. Frequency comes from red-team, shadow mode and QA. Severity includes review, correction, complaint handling and potential relationship impact. Controls include approved-source RAG, citation enforcement, recommendation phrase blocking and licensed-advisor handoff. The release decision depends on residual P95 and adoption friction: education content may proceed, product-specific advice may stay out of scope until residual risk is inside threshold.

Drill 3: AML Missed Escalation

Question:

How would you compare productivity gain from an AML copilot with missed escalation risk?

Strong answer:

I separate productivity value from tail risk. Productivity may reduce investigation time, but missed escalation has asymmetric downside. I would estimate typology-specific false negative or omission frequency using shadow comparison and QA, then calculate severity as reopened investigations, escalation delay, backlog and exam support. Controls should preserve recall for high-risk typologies: mandatory red-flag checklist, no auto-close, typology eval floor and analyst override evidence. If tail risk remains above threshold, the copilot can summarize evidence but should not prioritize or de-prioritize alerts.

Drill 4: Fraud False Negatives vs False Positives

Question:

Fraud wants a lower threshold, product worries about false positives. How do you decide?

Strong answer:

I would use net risk reduction. Lowering the threshold reduces fraud false-negative loss, but increases review cost, customer friction and possible abandonment. The model should estimate avoided fraud loss minus incremental false-positive cost and service friction. I would segment by transaction risk, recovery rate and customer value rather than applying one threshold globally. Controls with strong ROI may include risk-tiered queues, velocity rules fallback and post-event learning, while low-value segments may keep higher automation.

Drill 5: Vendor Outage

Question:

Should every AI system have a second model vendor?

Strong answer:

No. I would quantify outage scenario by business criticality. For low-risk internal drafting, manual fallback may be cheaper than multi-vendor complexity. For critical AML, fraud or contact-center peak workflows, vendor outage severity can justify graceful degradation, cached policy answers or fallback routing. The ROI may be weak on expected loss but justified on P95 tail compression. The architecture decision should be risk-tiered, not blanket multi-vendor.

Drill 6: Control ROI Challenge

Question:

How do you prove an AI control is worth the cost?

Strong answer:

I prove mechanism and economics. Mechanism means the control clearly reduces frequency, severity, detection delay or recovery cost. Economics means gross loss minus residual loss exceeds control and friction cost, or the control compresses a tail scenario management will not accept. I would show assumptions, evidence maturity, sensitivity and residual threshold status. If control ROI is low but the control is required for critical tail risk or policy boundary, I would say that explicitly rather than forcing a fake ROI.

9. Portfolio Artifact Checklist

Use this checklist before publishing the artifact as a portfolio note:

Check	Standard
Advanced audience	Assumes PM / BA / architect experience; avoids basic risk definitions
Scenario examples	Includes regulated advice, AML, fraud, KYC, contact center and vendor outage
Quant method	Has exposure, frequency, severity, gross loss, residual loss and tail loss
Control ROI	Includes cost, friction, effectiveness and ROI formula
Architecture relevance	Maps to RAG, agent, copilot, eval, gateway, telemetry and resilience
Decision output	Produces management action, ADR and release conditions
Evidence	Shows what evidence supports each assumption
No false precision	Uses ranges, confidence and sensitivity rather than fake certainty