返回 Papers
AI 扩展计划 / Playbooks

AI Risk Quantification / Scenario Loss / Control ROI Playbook

本 playbook 用于回答一个高级问题:

527AI_RISK_QUANTIFICATION_SCENARIO_LOSS_CONTROL_ROI_PLAYBOOK.md

AI Risk Quantification / Scenario Loss / Control ROI Playbook

定位: 面向 experienced CBAP / financial retail PM / product architect / solution architect / AI governance lead 的执行手册。目标是把 AI risk scenario 转成 expected loss range、control effectiveness、residual risk、control ROI、management action 和 release decision。

重要说明: 本文不是法律、合规、审计、精算、资本计量或监管意见。正式项目必须由 Legal、Compliance、Risk、Model Risk、Finance、Operations、Security、Privacy、Technology、Internal Audit 和业务管理层确认。


1. Purpose and When to Use

1.1 Purpose

本 playbook 用于回答一个高级问题:

Given an AI use case and a set of risk scenarios,
which controls are economically and architecturally justified,
what residual risk remains,
and what management action should be taken?

它把 AI 风险讨论从 qualitative rating 升级为 decision architecture:

scenario -> exposure -> frequency -> severity -> gross loss
-> control effectiveness -> residual loss
-> ROI / threshold / management action

1.2 When to Use

SituationUse this playbook to decide
New high-impact AI use case是否进入 pilot、shadow mode、assist-only 或 full release
Scale decision是否扩大渠道、客户群、业务线或自动化程度
Architecture funding是否投资 RAG grounding、agent gateway、eval suite、human review capacity、vendor redundancy
Risk exception是否接受 residual risk、接受多久、需要哪些补偿控制
Incident learning事故后哪些控制投资真正降低 expected loss 或 tail loss
Vendor concentration是否为某些 critical AI services 建 failover 或 graceful degradation
Product roadmap tradeoff高价值功能是否因 tail risk 太高需要降级、分阶段或改变 scope

1.3 When Not to Use

SituationBetter artifact
定义组织总体 AI risk appetiterisk appetite / policy product management
向董事会做组合级报告board MI / management information architecture
处理客户补救和申诉customer harm / redress architecture
做控制日常运行测试continuous control monitoring
做法律责任判断legal / compliance / external counsel process

2. Source Anchors

SourceLink本 playbook 使用方式
NIST AI Risk Management Frameworkhttps://www.nist.gov/itl/ai-risk-management-framework用 Govern / Map / Measure / Manage 组织 AI risk scenario、measurement、risk treatment 和 evidence。
NIST AI RMF Generative AI Profilehttps://www.nist.gov/publications/artificial-intelligence-risk-management-framework-generative-artificial-intelligence用于 GenAI hallucination、grounding failure、misuse、supply-chain dependency、evaluation gap 和 content risk 的 scenario 设计。
ISO/IEC 23894:2023https://www.iso.org/standard/77304.html参考 AI risk management 如何进入 AI lifecycle、risk treatment 和组织治理流程。
ISO/IEC 42001:2023https://www.iso.org/standard/81230.html用 AI management system 语言连接 policy、objective、operation、performance evaluation 和 continual improvement。
NIST SP 800-30 Rev. 1https://csrc.nist.gov/publications/detail/sp/800-30/rev-1/final用 likelihood、impact、residual risk 和 risk response 思路组织量化假设。
BIS Principles for Operational Resiliencehttps://www.bis.org/bcbs/publ/d516.htm用于 vendor outage、critical service impact、manual fallback 和 resilience control ROI。

3. Operating Model

3.1 Roles

RoleResponsibility
Business owner定义业务目标、价值、可接受的服务影响和管理动作
Product owner定义 use case boundary、roadmap options、scope reduction options 和 release ask
Senior BA / CBAP把 workflow、decision point、exception、evidence 和 scenario register 结构化
Solution / product architect设计 RAG、agent、copilot、eval、gateway、telemetry 和 failover 控制
Risk partner挑战 scenario、frequency、severity、residual threshold 和 acceptance
Finance partner校准 unit cost、loss component、ROI 和 benefit realization
Operations owner提供 volume、review capacity、SLA、manual fallback 和 friction cost
EvalOps / model quality提供 defect frequency、control effectiveness、confidence interval 和 regression evidence
Security / privacy / third-party提供 data boundary、vendor dependency、access、resilience 和 concentration view
Governance forum做 invest / scale / hold / reduce / accept / avoid 决策

3.2 Workflow

1. Scope use case and decision boundary.
2. Build scenario register.
3. Estimate exposure, frequency and severity.
4. Estimate gross expected loss and P95 tail loss.
5. Map candidate controls to loss mechanisms.
6. Estimate control effectiveness, cost and friction.
7. Calculate residual risk and control ROI.
8. Compare against thresholds and business value.
9. Prepare management action pack.
10. Attach evidence packet to ADR / release decision.

3.3 Decision Rights

DecisionRecommended owner
Business value baselineBusiness owner + Finance
Scenario materialityRisk + Business owner
Frequency assumptionEvalOps + Risk + Operations
Severity assumptionFinance + Risk + Operations
Control designArchitect + Control owner
Control costTechnology + Operations + Finance
Residual risk acceptanceNamed risk owner / governance forum
Release actionProduct owner + governance forum
Architecture investmentCTO / platform owner / portfolio forum

4. Templates

4.1 Scenario Register Template

FieldGuidanceExample
Scenario IDStable idAI-RQ-AML-001
Use caseSpecific system / workflowAML Copilot investigation summary
Business decision boundaryWhat AI influencesanalyst prioritization and red-flag checklist
Failure modeModel / workflow failuremisses mule-account red flag
Loss eventBusiness consequencedelayed escalation and reopened investigation
Exposure populationEvents in scope18,000 alerts / month in pilot queue
Affected segmentSegment / channel / geographyhigh-risk retail payments alerts
Gross risk hypothesisPre-control storysummary omission reduces analyst attention
Candidate controlsControls to evaluatetypology eval, mandatory checklist, no auto-close
Evidence maturityexpert / eval / shadow / productionshadow + QA sample
OwnerAccountable personAML operations owner

Compact version:

Scenario IDUse caseLoss eventExposureFrequency evidenceSeverity driverCandidate controlsOwner
AI-RQ-ADV-001Regulated advice copilotunsupported product-specific recommendation reused by advisor35k answers / monthred-team + shadowreview, complaint, correctionapproved corpus, citation gate, handoffWealth PM
AI-RQ-AML-001AML copilotmissed suspicious pattern escalation18k alerts / monthQA + shadowreopened case, escalation delaytypology eval, checklist, no auto-closeAML Ops
AI-RQ-FRD-001Payment fraud triagefalse negative delays intervention2.4m transactions / monthmodel validation + productionfraud amount, recoverythreshold, rules fallback, human queueFraud Strategy
AI-RQ-KYC-001KYC onboardinglegitimate applicant wrongfully rejected80k applications / monthappeal uphold + auditlost margin, reworklow-confidence route, second reviewOnboarding PM
AI-RQ-CS-001Contact center RAGincorrect policy answer accepted by agent420k answers / monthQA samplerecontact, correction, goodwillsource gate, regulated templatesContact Center
AI-RQ-VND-001AI vendor dependencyvendor outage degrades critical workflow600 runtime hours / monthSLA + resilience testmanual fallback, backlogfailover, cache, degradationPlatform Owner

4.2 Severity / Frequency Estimate Template

Frequency Estimate

FieldGuidanceExample
Exposure denominatorCount of AI-influenced events420,000 customer-service assisted answers / month
Gross defect frequency rangeBefore additional controls0.25%-0.60% unsupported regulated-topic answers
BasisData source4-week QA sample of 8,000 answers + red-team
Confidencelow / medium / highmedium
Segment adjustmentHigher-risk segment multiplierfees/disputes x 1.8
Stress frequencyAdverse but plausible1.2% during policy update month
Review triggerWhen to refreshcorpus update, prompt change, complaint spike

Severity Estimate

ComponentUnitLowBaseStressEvidence source
Direct financial correctionper confirmed event2575250fee correction history
Recontact / service costper event81835contact center cost model
QA / remediation reviewper event3060140QA staffing model
Goodwill / relationship costper event050300historical complaint settlement
Management / audit supportbatch5,00015,00060,000prior remediation project

Gross Loss Calculation

ScenarioExposureFrequencyAvg severityExpected monthly lossP95 monthly loss
Contact center misinformation420,0000.35%110161,700420,000-680,000
KYC wrongful rejection80,0000.08%95060,800180,000-320,000
Fraud false negative2,400,0000.012%1,400403,2001.1m-2.4m

4.3 Control Effectiveness Estimate Template

Control IDControlTarget mechanismGross assumptionResidual assumptionEvidenceConfidenceFriction
CTRL-RAG-001citation-required answer modefrequency reductionunsupported rate 0.35%0.11%-0.18%QA before/aftermedium+9 sec AHT
CTRL-AGT-002human approval token for writesseverity reductionwrite error reaches system of recorderror remains draft until approvalworkflow testhighreview queue load
CTRL-EVL-003typology regression evalfrequency reduction + detectionAML miss detected after QA cyclemiss detected at release gateeval resultmediumrelease delay
CTRL-VND-004model fallback routeoutage severity reduction80% workflow disruption25%-40% disruptionresilience drilllow-mediumextra platform cost

Control effectiveness narrative:

The control reduces risk by changing the loss mechanism, not by improving the model in general.
Citation-required answer mode blocks or escalates unsupported regulated claims, so frequency falls.
Human approval token changes severity because draft errors do not directly update systems of record.
Vendor fallback reduces outage severity and duration but does not reduce hallucination frequency.

4.4 Residual Risk Template

FieldGuidanceExample
Scenario IDLink to scenario registerAI-RQ-CS-001
Gross expected lossPre-control range160k-290k / month
Gross P95 lossTail estimate420k-680k / month
Selected controlsControl idsCTRL-RAG-001, CTRL-CS-005, CTRL-EVL-002
Residual expected lossPost-control range45k-92k / month
Residual P95 lossPost-control tail130k-240k / month
Residual threshold statusgreen / amber / redamber
Residual risk ownerNamed ownerContact Center Risk Owner
Acceptance conditionConditions and expiryaccept for 90-day pilot, no scale to disputes until P95 below 180k
Management actionreduce / invest / accept / avoid / transferinvest in regulated answer templates and weekly eval refresh
Review triggerEventpolicy corpus update, complaint rate increase, unsupported claim rate > 0.2%

Residual risk language:

Residual risk is not "low".
Residual expected loss is estimated at 45k-92k per month, with P95 at 130k-240k.
This is inside pilot threshold but above scale threshold.
Management action is controlled pilot plus targeted control investment before expansion.

4.5 Control ROI Template

FieldFormula / guidance
Gross expected annual lossmonthly gross expected loss x 12
Residual expected annual lossmonthly residual expected loss x 12
Expected loss reductiongross expected annual loss - residual expected annual loss
Annual control costengineering amortization + runtime + operations + license
Annual friction costreview time + latency / abandonment + false positive cost
Net risk reductionexpected loss reduction - annual friction cost
Control ROI(net risk reduction - annual control cost) / annual control cost
Tail impactP95 / P99 loss compression narrative
Strategic notereuse across use cases, resilience benefit, regulatory confidence, architecture runway

Example:

ControlGross annual expected lossResidual annual expected lossLoss reductionControl costFriction costROITail impact
Contact center citation gate2.4m780k1.62m260k180k4.5xP95 monthly loss from 680k to 240k
Agent write gateway1.8m420k1.38m420k220k1.8xconverts system-of-record errors to draft errors
Vendor dual-run fallback900k360k540k520k60k-0.08xweak expected ROI, strong critical-service tail reduction

Interpretation:

  • Positive ROI is not automatically enough; control may still create unacceptable customer friction.
  • Negative expected ROI can still be justified for critical resilience if P95 / P99 loss is intolerable.
  • Reusable platform controls should allocate cost across use cases rather than penalizing the first adopter.

4.6 Management Action Pack Template

SectionContent
Decision requestedapprove pilot / scale / hold / reduce scope / fund control / accept residual risk
Use case boundaryworkflow, users, channels, automation level, excluded scope
Top scenarios3-6 material scenarios with gross and residual loss ranges
Control portfolioselected controls, rejected controls, rationale
Risk-adjusted valuebusiness benefit minus residual loss and control / friction cost
Threshold statuswhich scenarios are green / amber / red and why
Management actioninvest, reduce, accept with expiry, avoid, transfer, monitor
Conditionsvolume cap, channel cap, excluded intents, manual review, review date
Evidence packetlinks or references to assumptions, eval, QA, architecture and finance evidence
ADR summaryarchitecture decision and tradeoffs

One-page format:

DecisionRecommendation
Release askControlled pilot for contact center RAG on servicing intents, excluding disputes and account closure
Value8%-12% AHT reduction; 1.1m annual productivity value after adoption adjustment
Material riskmisinformation on regulated policy topics
Gross loss2.0m-3.5m annual expected loss before controls
Controlsapproved corpus, citation gate, regulated templates, escalation intent classifier
Residual loss540k-1.1m annual expected loss; P95 monthly 130k-240k
ROIcitation gate + templates estimated 3.2x-4.8x after friction
Actionapprove pilot; do not scale to dispute / account closure until residual P95 below threshold
EvidenceQA sample, red-team, architecture diagram, trace coverage, finance unit-cost model

4.7 Evidence Packet Template

Evidence artifactRequired contentOwner
Use case boundaryworkflow, channel, user, customer segment, automation level, exclusionsProduct owner
Architecture diagramRAG / model / gateway / tool / workflow / evidence boundaryArchitect
Scenario registermaterial scenarios, loss events, exposure, ownersBA + Risk
Frequency estimatesample, period, defect count, confidence, segment adjustmentEvalOps
Severity estimateunit costs, loss components, finance sign-offFinance
Control designcontrol objective, mechanism, event fields, ownerArchitect + Control owner
Effectiveness evidencebefore / after, A/B, shadow, QA or expert basisEvalOps + Risk
Residual risk memoresidual ranges, threshold status, action, expiryRisk owner
ROI modelgross loss, residual loss, cost, friction, sensitivityFinance + Product
ADRdecision, alternatives, consequences, review triggerArchitect

5. PM / BA / Architecture Questions

5.1 PM Questions

QuestionWhy it matters
What business decision does this AI influence?Loss must attach to a decision or workflow, not a model in isolation.
What is the exposed population per month?Frequency without denominator is unusable.
Which scenarios change the release decision?Focus on material scenarios, not every theoretical failure.
Which control creates unacceptable user friction?High control effectiveness can still destroy product value.
What scope reduction would avoid the tail scenario?Sometimes product boundary beats technical control.
What value remains after residual loss and control cost?AI ROI must be risk-adjusted.
Which scenario needs explicit management acceptance?PM should not silently carry residual risk.

5.2 BA / CBAP Questions

QuestionWhy it matters
Where exactly does AI enter the process?Determines exposure and control point.
Is AI searching, drafting, recommending, deciding or executing?Automation level changes severity.
What is the exception path when AI confidence is low?Low-confidence handling often determines residual risk.
Which data fields prove a control operated?Evidence must be designed into workflow events.
Which manual review outcomes can calibrate frequency assumptions?Human workflow is a source of quant evidence.
What segments need separate estimates?Average risk can hide KYC, fraud or AML concentration.
Which assumptions will be challenged by risk or audit?Explicit assumptions are easier to defend and update.

5.3 Architecture Questions

QuestionWhy it matters
Which controls are deterministic vs model-dependent?Deterministic controls often compress tail risk better.
Does the control reduce frequency, severity, detection delay or recovery cost?Controls should map to loss mechanisms.
What telemetry is required to calculate exposure and coverage?Quantification fails without event contracts.
Can the system degrade safely when model / vector DB / vendor is down?Vendor outage is a resilience scenario, not only an IT incident.
Are controls reusable across use cases?Platform ROI may be high even when single-use ROI is weak.
Does the architecture support per-scenario thresholds?Different intents and tools need different risk treatment.
What common-mode failure could defeat multiple controls?Shared dependency can make residual risk understated.

6. Release Checklist

6.1 Minimum Before Pilot

CheckPass condition
Use case boundaryDocumented scope, exclusions, automation level and owner
Scenario registerAt least top 5 material scenarios with owners
Exposure denominatorMonthly volume estimate by workflow / channel / segment
Frequency estimateRange and evidence maturity for top scenarios
Severity estimateLoss components and finance / ops basis
Control mappingCandidate controls mapped to loss mechanisms
Residual riskExpected and tail residual range for top scenarios
Management actionPilot decision, conditions, threshold and review trigger
Evidence packetTraceable artifacts stored with owner and date

6.2 Minimum Before Scale

CheckPass condition
Production evidencePilot telemetry validates or updates assumptions
Control coverageControls operate on required share of exposed events
Effectiveness evidenceBefore / after or shadow evidence supports reduction estimate
Friction costAHT, review load, false positive and abandonment effects included
Residual thresholdResidual expected loss and P95 tail within scale threshold or explicitly accepted
Architecture ADRSelected controls and rejected alternatives documented
Scenario sensitivityStress case run for volume, defect rate, severity and outage
Management sign-offNamed owner accepts residual risk with expiry and review conditions

6.3 Stop / Hold Triggers

TriggerDefault action
Frequency estimate confidence remains low for material scenariohold scale; extend shadow mode or sampling
Residual P95 exceeds thresholdreduce scope or add control before release
Control friction eliminates business valueredesign control or change product boundary
Control effectiveness is unproventreat as assumption, not as residual risk reduction
Vendor dependency lacks graceful degradation for critical serviceblock scale for critical workflow
Common-mode failure defeats multiple controlsredesign architecture; do not count controls independently

7. Executive Narrative

7.1 Standard Narrative

We quantified the material AI risk scenarios for this release rather than relying on qualitative risk ratings.
The highest-risk scenarios are [A], [B] and [C].
For each scenario, we estimated exposure, frequency and severity using [shadow mode / QA / eval / production / expert] evidence.

Before controls, expected annual loss is estimated at [range], with P95 tail at [range].
The recommended controls reduce risk by [mechanism]: [frequency / severity / detection / recovery].
After controls, residual expected loss is [range], with P95 at [range].

The control portfolio costs [amount] annually and adds [friction].
Net risk reduction is [amount], producing [ROI] and compressing the main tail scenario from [before] to [after].

Recommendation: [approve pilot / scale / hold / reduce scope / fund control].
Conditions: [volume cap, excluded scope, threshold, review date, owner].

7.2 CFO Version

The business case is risk-adjusted.
We are not presenting productivity value alone.
We subtract residual expected loss, control operating cost and workflow friction.
The highest ROI controls are [X] and [Y].
The vendor redundancy control has weak expected ROI but is justified only for critical workflows because it compresses P95 outage loss.

7.3 CRO Version

The residual risk statement is scenario-specific.
We are not asking for blanket acceptance of AI risk.
For each material scenario, the pack shows gross loss, selected controls, residual loss, threshold status, owner, expiry and review trigger.
Scenarios above threshold are not approved for scale unless scope is reduced or controls are funded.

7.4 CTO Version

The architecture investment is prioritized by loss mechanism.
Prompt-only control has low tail impact.
Deterministic controls such as source gating, tool gateway, scoped authorization, trace evidence and graceful degradation compress the loss distribution more reliably.
Reusable platform controls should be funded as architecture runway, not charged only to one product.

8. Interview Drills

Drill 1: 30-Second Answer

Question:

How do you quantify AI risk for a financial retail AI product?

Answer:

I start with business loss scenarios, not model metrics. For each scenario I define exposure, frequency and severity, then estimate gross expected loss and a P95 tail range. I map controls to the loss mechanism they reduce: frequency, severity, detection delay or recovery cost. After estimating control cost and friction, I calculate residual risk and control ROI. The output is a management action: approve, scale, reduce scope, invest in controls, accept residual risk with expiry or stop.

Drill 2: Regulated Advice Hallucination

Question:

A wealth copilot sometimes produces unsupported product-specific advice. What do you do?

Strong answer:

I would not start with a generic hallucination metric. I would quantify the scenario where unsupported product-specific advice is reused in a customer conversation. The denominator is regulated-topic advisor responses. Frequency comes from red-team, shadow mode and QA. Severity includes review, correction, complaint handling and potential relationship impact. Controls include approved-source RAG, citation enforcement, recommendation phrase blocking and licensed-advisor handoff. The release decision depends on residual P95 and adoption friction: education content may proceed, product-specific advice may stay out of scope until residual risk is inside threshold.

Drill 3: AML Missed Escalation

Question:

How would you compare productivity gain from an AML copilot with missed escalation risk?

Strong answer:

I separate productivity value from tail risk. Productivity may reduce investigation time, but missed escalation has asymmetric downside. I would estimate typology-specific false negative or omission frequency using shadow comparison and QA, then calculate severity as reopened investigations, escalation delay, backlog and exam support. Controls should preserve recall for high-risk typologies: mandatory red-flag checklist, no auto-close, typology eval floor and analyst override evidence. If tail risk remains above threshold, the copilot can summarize evidence but should not prioritize or de-prioritize alerts.

Drill 4: Fraud False Negatives vs False Positives

Question:

Fraud wants a lower threshold, product worries about false positives. How do you decide?

Strong answer:

I would use net risk reduction. Lowering the threshold reduces fraud false-negative loss, but increases review cost, customer friction and possible abandonment. The model should estimate avoided fraud loss minus incremental false-positive cost and service friction. I would segment by transaction risk, recovery rate and customer value rather than applying one threshold globally. Controls with strong ROI may include risk-tiered queues, velocity rules fallback and post-event learning, while low-value segments may keep higher automation.

Drill 5: Vendor Outage

Question:

Should every AI system have a second model vendor?

Strong answer:

No. I would quantify outage scenario by business criticality. For low-risk internal drafting, manual fallback may be cheaper than multi-vendor complexity. For critical AML, fraud or contact-center peak workflows, vendor outage severity can justify graceful degradation, cached policy answers or fallback routing. The ROI may be weak on expected loss but justified on P95 tail compression. The architecture decision should be risk-tiered, not blanket multi-vendor.

Drill 6: Control ROI Challenge

Question:

How do you prove an AI control is worth the cost?

Strong answer:

I prove mechanism and economics. Mechanism means the control clearly reduces frequency, severity, detection delay or recovery cost. Economics means gross loss minus residual loss exceeds control and friction cost, or the control compresses a tail scenario management will not accept. I would show assumptions, evidence maturity, sensitivity and residual threshold status. If control ROI is low but the control is required for critical tail risk or policy boundary, I would say that explicitly rather than forcing a fake ROI.

9. Portfolio Artifact Checklist

Use this checklist before publishing the artifact as a portfolio note:

CheckStandard
Advanced audienceAssumes PM / BA / architect experience; avoids basic risk definitions
Scenario examplesIncludes regulated advice, AML, fraud, KYC, contact center and vendor outage
Quant methodHas exposure, frequency, severity, gross loss, residual loss and tail loss
Control ROIIncludes cost, friction, effectiveness and ROI formula
Architecture relevanceMaps to RAG, agent, copilot, eval, gateway, telemetry and resilience
Decision outputProduces management action, ADR and release conditions
EvidenceShows what evidence supports each assumption
No false precisionUses ranges, confidence and sensitivity rather than fake certainty