AI 扩展计划 / Playbooks

AI Exception / Risk Acceptance / Waiver Playbook

AI Exception / Risk Acceptance / Waiver Architecture 是一套把“暂时不能满足标准 AI 控制”转成有业务理由、有风险归属、有补偿控制、有到期、有证据、有升级、有硬停止条件的治理机制。

792 行AI_EXCEPTION_RISK_ACCEPTANCE_WAIVER_PLAYBOOK.md

AI Exception / Risk Acceptance / Waiver Architecture Playbook

适用对象: 高级 AI Product Manager / AI BA / Product Architect / Enterprise Architect / Risk Partner / Model Risk Lead / Operational Risk Lead / Compliance / Privacy / Security / Third-Party Risk / Internal Audit / Board reporting owner。目的: 训练金融零售 AI 团队如何管理政策例外、临时 waiver、剩余风险接受、补偿控制、到期续期、证据留存、升级路径和硬停止条件。重点不是基础 BA, 而是把 AI governance 做成可运行、可监控、可审计、可向高管和审计委员会说明的 exception control system。核心观点: Risk appetite 定义组织的 AI 风险边界; exception / waiver architecture 管理的是在风险偏好和标准控制已经定义之后, 某个 use case 对标准控制的限时、限域、留证偏离。例外不能成为永久 shadow policy。

重要说明: 本文是学习、作品集和治理设计材料, 不是法律、审计、监管、模型验证或合规意见。正式项目必须由 business owner、legal、compliance、model risk、operational risk、security、privacy、third-party risk、technology、internal audit 和管理层结合机构、司法辖区、监管关系和内部政策确认。

1. Executive Framing

1.1 One-sentence positioning

An AI waiver is not permission to ignore controls.
It is a time-boxed, scope-bound, evidence-backed acceptance of residual risk.

1.2 Distinction from risk appetite

Layer	解决的问题	输出
Risk appetite	组织愿意承担哪些 AI 风险, 哪些用途禁止, 哪些用途有条件允许	风险偏好声明、risk tier、标准控制基线
Standard control catalog	每个 risk tier 默认必须有哪些控制	eval、model validation、HITL、DLP、tool gateway、monitoring、evidence
Exception / waiver	某个 use case 暂时偏离标准控制时怎么办	exception memo、risk acceptance、compensating controls、expiry、hard stop
Issue / incident	控制失败或损害已经发生时怎么办	incident response、RCA、remediation、customer correction

高级表达: Risk appetite is the baseline. Waiver management is the controlled deviation from that baseline. If the same waiver keeps renewing, the organization has unresolved control debt or an outdated policy baseline.

1.3 Why AI exception handling is special

AI 行为由 model route、prompt、RAG corpus、source registry、eval rubric、tool permissions、agent autonomy、human review capacity、privacy logging、security gateway、vendor terms、customer disclosure 和 remediation process 共同决定。

GenAI / agentic AI 的例外不能只归入模型风险。它需要同时组合 model risk、operational risk、consumer compliance、privacy、security、third-party、technology resilience、customer harm 和 audit evidence。

1.4 Executive question set

Executive question	Good answer requires
Which AI controls are being waived?	control id、policy baseline、use case、risk tier
Why is the waiver necessary?	business value、timing pressure、control debt, alternatives considered
Who accepts the residual risk?	named role, delegated authority, approval record
How narrow is the exception?	user, channel, geography, data, tool, model, traffic, duration
What compensating controls operate now?	tested workflow controls, monitoring, review, fallback
What would force an immediate stop?	hard stop triggers, kill switch, owner, runbook
When does it expire?	fixed expiry date, renewal criteria, exit path
Is this becoming shadow policy?	aging, renewals, repeat reason, remediation backlog
Can audit reconstruct the decision?	evidence binder with versions, signoffs, logs and KRI history

2. Source Anchors

以下来源作为治理语言和证据结构锚点。本文把它们转成 AI 产品、架构、BA 和金融零售治理实践。访问日期按 2026-06-30 记录。

Anchor	Official link	本 playbook 使用方式
NIST AI RMF	https://www.nist.gov/itl/ai-risk-management-framework	用 Govern / Map / Measure / Manage 组织 AI exception 的背景、风险测量、控制补偿、持续监控、治理升级和证据闭环。
ISO/IEC 42001	https://www.iso.org/standard/42001	用 AI management system 视角把 exception 纳入 scope、role、operation、performance evaluation、management review 和 continual improvement。
Federal Reserve SR 26-2	https://www.federalreserve.gov/supervisionreg/srletters/SR2602.htm	作为 2026 年模型风险管理新锚点。SR 26-2 于 2026-04-17 替代 SR 11-7 和 SR 21-8; 对 AI / ML / GenAI / agentic AI 的 intended use、risk tier、effective challenge、monitoring 和 governance 有现实参考意义。
FFIEC IT Examination Handbook Management booklet	https://ithandbook.ffiec.gov/it-booklets/management.aspx	用 board oversight、IT governance、risk management、third-party、change management、audit 和 management reporting 视角组织 exception management。

2.1 Current nuance

SR 26-2 替代 SR 11-7 和 SR 21-8, 因此 2026 年后的模型风险讨论不能只停留在旧 letter。
SR 26-2 把模型风险管理推向更明确的 risk-based tailoring、intended use、model inventory、monitoring、validation、effective challenge 和 governance。
对 GenAI / agentic AI, exception handling 必须同时说明 operational risk、consumer compliance、privacy、security、third-party、tool autonomy 和 evidence gaps。
任何长期续期的例外都要升级为 policy review、control investment 或 formal residual risk decision。

3. Exception Taxonomy

3.1 Baseline structure

Risk appetite
-> risk tier
-> standard control baseline
-> control gap
-> exception request

模糊说法: We need an exception for the AI assistant.

合格说法:

We request a 45-day exception from CTRL-EVAL-HIGH-003,
which requires full regression coverage for complaint and fee-dispute slices,
for a 5% employee-only pilot of the retail service copilot.
The exception excludes customer auto-send and all write-enabled tools.

3.2 Taxonomy by control domain

Exception type	Control gap	Example	Typical compensating control
Eval coverage	eval suite 不完整或新场景样本不足	complaint slice 样本不足	smaller pilot, daily QA, failed-case capture
Model validation	independent validation / challenge 未完成	validation report due in 30 days	shadow mode, traffic cap, validation milestone
Model inventory	model / prompt / RAG / tool 未完全登记	prompt variants not in registry	freeze variants, manual register, block expansion
Data boundary	数据分类、redaction、retention 证据不足	prompt log redaction proof missing	no sensitive data, DLP monitoring
Privacy	consent、purpose、retention 或 vendor terms 未完全证明	transcript retention policy pending approval	restricted dataset, no export
Security	gateway、RBAC、SIEM、prompt injection 或 tool permission 控制缺口	missing deny reason in tool log	read-only mode, extra SIEM alert
Third-party	vendor evidence、SLA、DPA、exit path 或 region routing 缺口	vendor SOC report renewal in progress	fallback route, lower traffic
Consumer compliance	disclosure、adverse action、UDAAP、complaint、recourse 相关缺口	customer disclosure copy not fully approved	employee-only release
Operational readiness	human review queue、training、SOP、capacity 不达标	review backlog risk	lower volume, queue KRI
Evidence	evidence binder 自动化或 trace coverage 不完整	trace tags cover 92%, standard is 99%	manual evidence pack
Change governance	artifact versioning、rollback、release approval 缺口	RAG rollback drill not completed	no auto-refresh, manual snapshot
Board visibility	material exception 未进入 MI pack	high-tier exception approved locally	immediate escalation

3.3 Taxonomy by AI autonomy

Autonomy level	Exception sensitivity	Typical rule
Retrieve / summarize	Lower but depends on data and citation	Short waivers possible if employee-facing and low impact
Recommend	Medium to high	Waiver must preserve human decision and explainability
Draft customer communication	High	Disclosure, approved language and human send controls are difficult to waive
Decide	High to critical	Exceptions rare; prohibited uses remain no-go
Execute tool action	High to critical	Write-enabled waiver requires tool gateway, approval token and rollback
Multi-agent orchestration	Critical when autonomous	Waiver must cover delegation, identity, tool chain, monitoring and emergency stop

3.4 Non-waivable conditions

Do not approve when the request involves prohibited use, no accountable residual risk owner, no enforceable scope, no stop capability, no evidence path, known active harm without containment, unapproved sensitive-data exposure, or hidden final-decision automation.

4. Waiver Lifecycle

4.1 Lifecycle map

Exception trigger
-> intake
-> control gap classification
-> residual risk assessment
-> compensating control design
-> approval routing
-> restricted operation
-> KRI monitoring
-> evidence capture
-> expiry review
-> close / renew / remediate / policy update / stop

4.2 Trigger examples

Trigger	Example
Product release pressure	Pilot date arrives before full eval coverage
Control maturity gap	policy engine supports deny but not reason logging
Vendor dependency	model provider updates data-processing evidence late
New use case	policy does not yet classify agentic workflow pattern
Operational constraint	human review capacity below high-tier target
Incident remediation	system can restart only with temporary traffic cap
Regulatory or audit finding	control gap must be tracked until remediation completes

4.3 Intake fields

Field	Strong content
Exception ID	stable id used in release, telemetry, evidence and dashboard
Use case	product, business domain, channel, customer/employee impact
Risk tier	low, medium, high, critical with rationale
Standard control	control id, policy name, required baseline
Requested deviation	exact control gap, not vague description
Business reason	why the deviation is needed now
Alternatives considered	wait, reduce scope, manual process, different vendor, redesign
Scope	users, channels, geography, data, model, tools, traffic, time
Residual risk	what risk remains after compensating controls
Compensating controls	preventive, detective, corrective controls
Hard stop	measurable conditions that end the waiver immediately
Evidence plan	what proves controls operate
Expiry	specific date and review forum

4.4 Classification and approval

Dimension	Low signal	High signal
Customer impact	employee-only internal draft	customer-facing or customer-impacting output
Automation	read-only assistant	decision or write-enabled tool
Data sensitivity	public or approved internal knowledge	PII, account, credit, wealth, AML, complaint data
Regulatory relevance	internal productivity	credit, AML, privacy, complaint, advice, unfair practice
Reversibility	easy rollback, no external record	irreversible record, notice, funds, account status
Control gap severity	evidence formatting gap	missing validation, HITL or data approval

Risk tier	Approval pattern
Low	Product owner and control owner; evidence owner informed
Medium	Product, business owner, control owner, risk partner, security/privacy if affected
High	Business executive delegate, product, model risk, compliance, operational risk, security/privacy, architecture, operations
Critical	Executive risk owner, formal governance forum, legal/compliance/model risk, CISO/privacy/third-party as relevant, board/audit visibility if material

Authority principle: The person who wants the value should not be the only person accepting the residual risk.

4.5 Restricted operation

Restriction	Architecture implementation
Traffic cap	feature flag, release orchestrator, route control
User scope	IAM, group entitlement, role policy
Channel scope	API gateway, UI route, deployment config
Data scope	data gateway, retriever filter, DLP, purpose tag
Tool scope	tool gateway, allowlist, deny-by-default
Model scope	model route policy, region route, vendor allowlist
Time scope	scheduled expiry flag, release calendar lock
Evidence scope	trace attribute `exception_id`, evidence binder rule

4.6 Expiry outcomes

Outcome	Meaning
Close	standard control is satisfied; waiver ends
Remediate then close	control gap fixed before continued operation
Renew with evidence	short renewal approved with new evidence and stronger conditions
Convert to policy change	repeated waiver reveals baseline policy/control must be updated formally
Stop	value no longer justifies residual risk or control gap persists

No silent extension.

5. Risk Acceptance Memo

5.1 Memo purpose

The memo proves what control is missing or weakened, why the business still wants limited operation, what risk remains, who accepts it, what controls compensate, what evidence will be reviewed, when the acceptance ends and what triggers immediate stop.

5.2 Memo template

Section	Example content
Decision summary	AI-WVR-2026-0042; Retail Service Low-Risk FAQ Pilot; temporary exception from CTRL-CITATION-PDF-004; high risk; limited approval for 30 days; expiry 2026-07-30
Business rationale	pilot reduces call-center FAQ volume, tests self-service demand, stays limited to low-risk FAQ, excludes complaint / fee waiver / credit commitment / account closure
Alternatives	wait for full checker, extend employee-only pilot, use manual FAQ update
Control baseline and gap	customer-facing high-tier answers require automated citation support check; PDF table citation checker does not cover every fee table structure
Non-waivable boundaries	no write tools, no customer-specific advice, no complaint handling, no credit decision
Residual risk	customer may receive incomplete source support in low-risk FAQ; reduced by source allowlist, no-answer fallback and daily QA
Residual risk owner	Head of Retail Service, with Risk Partner concurrence
Compensating controls	5% traffic cap, intent exclusions, daily 100-case QA, trace tags, route-to-human, feature flag disable
Hard stops	wrong fee commitment > 0; complaint miss > 0; unsupported citation rate > 2%; trace completeness < 98%; sensitive data exposure
Evidence	approval record, source registry, daily QA, KRI dashboard, trace report, incident log, stop-rule test
Expiry and exit	review on 2026-07-25; close after checker regression passes; stop if remediation evidence is insufficient

5.3 Memo quality bar

Weak memo: We need a temporary exception because the business needs to launch quickly.

Strong memo: We request a 30-day exception from a specific citation control for a 5% low-risk FAQ pilot. The waiver excludes high-risk intents and write tools, uses daily sample review and hard stop triggers, and expires before the next risk committee review.

6. Compensating Controls

6.1 Control design principles

Compensating controls must be tied to the specific gap, stronger where uncertainty is higher, feasible for operations, instrumented with evidence, time-boxed, reviewed at expiry and mapped to an owner.

6.2 Control catalog

Gap	Preventive control	Detective control	Corrective control
Eval coverage incomplete	narrow scope, excluded intents, traffic cap	daily QA sample, failed-case review	expand eval set, rerun gate, pause ramp
Model validation pending	shadow mode, no customer impact	validation checkpoint, challenger review	hold expansion, rollback candidate
Citation checker incomplete	source allowlist, no-answer fallback	citation QA, unsupported claim metric	fix parser, remove source format
HITL capacity below standard	lower traffic, queue limit	queue aging KRI, reviewer load dashboard	route to manual backlog, stop release
Tool logging gap	read-only mode, deny write tools	tool deny audit, SIEM rule	disable tool, add log fields
Privacy evidence gap	data minimization, redaction	DLP sample, privacy review	purge logs, tighten route
Vendor evidence gap	fallback vendor, reduced data	SLA and incident monitoring	switch route, stop vendor use
Evidence automation gap	manual evidence pack owner	daily completeness check	block renewal until automated

6.3 Compensating controls for agentic AI

Agent risk	Compensating control
Tool chain expands beyond scope	tool gateway with exception-specific allowlist
Agent delegates to another agent	identity propagation, delegation policy, trace parent id
Multi-step plan hides risk	plan approval before action, step-level policy checks
Write action causes irreversible change	dry-run, human approval token, idempotency and reversal path
Prompt injection changes tool use	instruction hierarchy, content isolation, policy post-check
Agent loops or escalates cost	budget cap, step cap, timeout, emergency stop
Third-party tool changes behavior	contract pinning, vendor change notice, fallback

6.4 Control evidence

Control	Evidence
Traffic cap	release config, exposure report
Excluded intents	policy test, production deny logs
Human sample review	sample list, reviewer, outcome, defect reason
Source allowlist	registry snapshot, retrieval trace
No write tools	tool gateway config, deny logs
DLP route	redaction report, blocked request log
Hard stop test	kill switch drill or feature flag proof
Expiry enforcement	scheduled review, automated disable config

7. Expiry and Renewal

7.1 Expiry rule

Every waiver needs an expiry date, review owner, forum, renewal criteria, closure criteria, stop criteria and evidence bundle. No expiry means no waiver approval.

7.2 Recommended maximums

Risk tier	Normal maximum duration	Renewal posture
Low	60-90 days	one renewal with owner approval
Medium	30-60 days	renewal requires evidence and control owner
High	14-45 days	renewal requires cross-functional approval
Critical	shortest practical window	renewal discouraged; executive review required

7.3 Renewal criteria

Renewal should require: business rationale still valid; no hard stop triggered; KRI stable; compensating controls operated as designed; remediation progress; scope not expanded; residual risk owner re-accepts risk; new expiry is shorter or tied to a concrete delivery date.

7.4 Repeat renewal escalation

Signal	Required action
second renewal	risk forum review and remediation funding decision
third renewal	executive review, policy/control baseline reassessment
over 90 days active for high-tier	board/audit committee visibility if material
same reason across multiple teams	platform control investment or policy update
expired but active	incident or management issue, not administrative delay

7.5 Shadow policy test

Ask whether the same exception is active for more than one review cycle, teams design around the waiver as normal practice, the reason recurs across products, management stops discussing exit, controls remain unfunded or audit would see a gap between written policy and operations.

8. No-Go Criteria and Hard Stops

8.1 No-go criteria

No-go condition	Reason
prohibited use is implicated	exceptions cannot override prohibited uses
residual risk owner lacks authority	accountability mismatch
scope cannot be enforced	waiver may spread beyond approval
hard stop cannot be executed	risk cannot be contained
customer harm is ongoing	issue/incident path required
sensitive data route unapproved	privacy/security/third-party risk unacceptable
control gap hides final decision automation	automation boundary not transparent
evidence cannot be retained	audit and model risk cannot reconstruct decision
renewal history shows chronic non-remediation	waiver is becoming policy

8.2 Hard stop examples

Use case	Hard stop
Customer service FAQ	confirmed wrong fee commitment > 0
Complaint triage	complaint escalation miss > 0
Credit memo copilot	adverse action reason mismatch in any customer-impacting path
Wealth assistant	personalized recommendation breach > 0
AML investigation copilot	AI-generated SAR / no-SAR conclusion observed
Fraud agent	tool action changes account state without approval token
RAG policy assistant	unsupported citation in high-risk slice > threshold
Agent workflow	tool loop, cost spike, or delegated action outside allowlist
Privacy-sensitive assistant	PII sent to unapproved model route
Third-party model route	vendor SLA or data-processing evidence becomes invalid

8.3 Stop rule runbook

Trigger detected
-> classify severity
-> pause or cap feature by exception_id
-> disable affected tool/model/channel if needed
-> notify product, risk, operations, compliance, security/privacy as relevant
-> preserve traces and evidence
-> review affected customers/cases
-> decide rollback, remediation, customer correction or incident escalation
-> update waiver status

8.4 Stop authority

Role	Stop authority
Release owner	pause ramp or traffic cap
Operations lead	route to manual queue
Security lead	disable unsafe tool or route
Privacy lead	stop data flow
Risk/compliance lead	stop customer-facing use
Executive owner	terminate material waiver

9. Dashboards and KRIs

9.1 Exception dashboard

Metric	Meaning
Active exceptions by risk tier	overall residual risk exposure
Exceptions by domain	credit, wealth, AML, fraud, service, operations
Exceptions by control domain	eval, model risk, privacy, security, third-party, operations
Aging	days active and days to expiry
Renewal count	shadow policy signal
Expired but active	governance breach
Hard stops triggered	containment effectiveness
KRI breaches	risk is outside waiver conditions
Remediation progress	whether control debt is closing
Evidence completeness	audit readiness
Exceptions linked to incidents	whether waivers contributed to harm
Board-reportable exceptions	material residual risk visibility

9.2 KRI catalog

KRI	Definition	Management use
Exception aging	active days since approval	spot stalled remediation
Repeat renewal rate	renewals / active exceptions	identify shadow policy
Expired active exceptions	expired exceptions still in production	immediate escalation
Control gap concentration	same control waived across teams	platform investment need
High-tier exception count	high/critical open waivers	risk appetite pressure
Hard stop trigger count	triggered stops by use case	system risk and control quality
Evidence completeness	required evidence present and current	audit readiness
Remediation slippage	missed control fix dates	governance effectiveness
Exception incident linkage	incidents linked to active waivers	risk acceptance quality
Third-party exception exposure	vendor-related open exceptions	supplier risk
Consumer harm signal	complaints, appeals, upheld cases linked to waiver scope	customer protection
Human review overload	review SLA breach under waiver	compensating control failure

9.3 Board and audit committee view

Board question	Dashboard answer
Are we operating outside approved AI controls?	count and severity of active high/critical waivers
Are exceptions temporary?	aging, expiry and renewal pattern
Are customers exposed?	customer-facing scope, complaint and harm signals
Who accepted residual risk?	accountable executive roles
Are controls compensating effectively?	KRI status and evidence completeness
Are repeat exceptions creating shadow policy?	repeat reasons and policy update decisions
Are GenAI/agentic risks covered cross-functionally?	model, operational, privacy, security, third-party and compliance view

9.4 Example dashboard row

Field	Example
Exception ID	AI-WVR-2026-0042
Use case	Retail Service Low-Risk FAQ Pilot
Risk tier	High
Control gap	PDF table citation checker incomplete
Scope	5% customer FAQ traffic, low-risk intents only
Residual risk owner	Head of Retail Service
Compensating controls	daily QA, source allowlist, no write tools, complaint hard-route
Expiry	2026-07-30
KRI status	green, no hard stops
Evidence status	98.9% trace completeness, daily QA complete
Renewal count	0
Board visibility	included in monthly AI risk MI if extended

10. RACI

10.1 Core roles

Role	Accountability
Business owner	owns business value and accepts business residual risk within authority
AI Product Manager	defines scope, value, user impact, release conditions and product guardrails
AI BA	maps policy/control gap to requirements, workflow, evidence and stakeholder decisions
Product Architect	maps waiver conditions to runtime architecture and controls
Model Risk	evaluates model intended use, validation gap, monitoring and effective challenge
Operational Risk	evaluates process, human control, capacity, incident and control operation
Compliance	evaluates consumer, regulatory, disclosure, complaint and record implications
Privacy	evaluates data purpose, minimization, consent, retention and rights
Security	evaluates access, gateway, tool abuse, logging, SIEM and incident path
Third-Party Risk	evaluates vendor evidence, SLA, data terms, resilience and exit
Operations	runs human review, QA sampling, queues and fallback processes
Release Governance	ensures approval routing, evidence, expiry and dashboard
Internal Audit	assesses design and evidence quality without owning management risk
Board / Audit Committee	receives material exception visibility and challenges chronic exposure

10.2 RACI shorthand

Activity	Accountable	Responsible	Consulted
Intake and residual risk memo	Business owner	PM / BA / Release Governance	Architect, Risk, Compliance, Privacy, Security, TPRM, Ops
Control gap and compensating controls	Architect / Operational Risk	BA / PM / Ops	Model Risk, Compliance, Privacy, Security
Approval routing and expiry review	Release Governance	PM / BA	all required control owners by risk tier
Runtime enforcement and KRI dashboard	Architect / Operations	Platform / Ops / Release Governance	Product, Risk, Security, Privacy
Board reporting	Business owner / Release Governance	Risk reporting owner	Product, Architecture, Control owners

10.3 Forum design

Forum	Scope	Cadence
Daily exception triage	low/medium intake, expiring exceptions, evidence gaps	daily or twice weekly
Weekly AI risk review	high-tier waivers, KRI trends, renewal requests	weekly
Material AI governance forum	critical waivers, customer-impacting exceptions, cross-domain disputes	as needed
Monthly management information review	portfolio exposure, aging, repeat exceptions, remediation funding	monthly
Quarterly board/audit package	material residual risk, policy drift, chronic exceptions, audit findings	quarterly

11. Financial Retail Examples

11.1 Credit policy copilot

Scenario: AI drafts credit policy memos for underwriters; standard control requires full fair-lending regression before expanded pilot; a small-business segment is incomplete.

Area	Decision
Scope	employee-only memo draft, no final credit decision, no adverse action notice
Duration	30 days
Residual risk	underwriter may over-trust draft in new segment
Compensating controls	mandatory underwriter attestation, second-line sample review, no customer communication
Hard stop	any reason-code mismatch in customer-impacting path
Evidence	underwriter review logs, sample review, eval expansion plan

No-go: AI automatically generates final adverse action reasons under incomplete validation.

11.2 Wealth guidance assistant

Scenario: AI supports advisor preparation and client education; advice boundary classifier has not completed edge-case testing for retirement product prompts.

Area	Decision
Scope	advisor-facing only, meeting prep and educational summaries
Exclusion	no direct client personalized recommendation
Compensating controls	licensed advisor final review, approved educational content, advice breach monitoring
Hard stop	personalized buy/sell recommendation generated without advisor mediation
Expiry	classifier edge-case test completion date

No-go: customer-facing robo-advice with incomplete advice boundary.

11.3 Customer service AI FAQ

Scenario: Customer-facing low-risk FAQ pilot has incomplete PDF table citation automation.

Area	Decision
Scope	5% low-risk FAQ sessions
Exclusion	complaints, fee waivers, credit commitments, account closure
Compensating controls	source allowlist, no-answer fallback, daily QA, route to human
Hard stop	wrong fee commitment > 0 or complaint miss > 0
Evidence	citation QA, trace samples, customer escalation log

11.4 AML investigation copilot

Scenario: AI summarizes transaction patterns and drafts narrative; tool audit log lacks one required attribute for analyst override reason.

Area	Decision
Scope	internal draft only, no SAR/no-SAR decision
Compensating controls	analyst-in-control, manual override reason field, weekly QA
Hard stop	AI-generated final SAR conclusion or missing case trace
Evidence	case review records, manual override extract, tool trace

No-go: AI submits SAR or decides no-SAR with incomplete evidence controls.

11.5 Fraud operations agent

Scenario: Agent can recommend fraud queue actions; write-enabled account restriction tool is technically available but reversal process is not tested.

Area	Decision
Scope	read-only and recommendation mode
Not approved	account restriction write action
Compensating controls	fraud analyst approval, dry-run tool output, queue monitoring
Hard stop	any account state change without approval token
Exit	complete reversal test and dual-control design

11.6 Third-party model route

Scenario: Vendor model performs better for Spanish-language support; updated vendor evidence for data retention is due but not yet received.

Condition	Required control
Data minimized	no sensitive account details in prompt
Region controlled	approved endpoint only
Duration limited	short expiry aligned to vendor evidence date
Fallback available	route to existing approved model
Monitoring active	data route and vendor SLA dashboard

No-go: PII or restricted data sent to unapproved or unverified route.

12. Templates

12.1 Exception intake form

Field	Example
Identification	AI-WVR-2026-0042; Retail Service FAQ Pilot; Customer Service; request owner AI PM; business owner Head of Retail Service
Baseline	high risk because output is customer-facing; CTRL-CITATION-PDF-004 requires automated citation support check
Deviation	PDF fee table citation support incomplete; 30-day waiver requested; 5% traffic cap; complaints, fee waivers, credit commitments and account closure excluded
Residual risk	customer may receive incomplete source support in low-risk FAQ
Controls	source allowlist, no-answer fallback, daily 100-case QA, route-to-human, no write tools
Hard stops	wrong fee commitment > 0; complaint miss > 0; unsupported citation sample > 2%; trace completeness < 98%
Evidence and exit	release config, source registry, QA, trace dashboard, incident log; close after checker regression or stop pilot

12.2 Residual risk acceptance table

Field	Example
Risk event	Low-risk FAQ answer contains unsupported citation
Potential impact	customer confusion, trust impact, complaint
Inherent severity	medium
Compensating controls	source allowlist, no-answer fallback, daily QA
Residual severity	low to medium within limited scope
Residual risk owner	Head of Retail Service
Acceptance period	2026-07-01 to 2026-07-30
Review cadence	weekly KRI, daily QA
Hard stop	unsupported citation sample rate > 2%
Evidence	QA log, trace report, KRI dashboard

12.3 Compensating control matrix

Control gap	Risk	Compensating control	Owner	Evidence	Frequency
PDF citation parser incomplete	unsupported source claim	daily sample review and no-answer fallback	Ops QA Lead	sample log and fallback metric	daily
trace completeness below standard	audit reconstruction gap	daily trace completeness check	Observability Owner	dashboard export	daily
review queue capacity uncertain	delayed human escalation	traffic cap and queue aging alert	Operations Lead	queue report	hourly
vendor evidence pending	data-processing uncertainty	data minimization and fallback route	TPRM Owner	route log and vendor tracker	weekly

12.4 Waiver approval record

Field	Example
Decision	limited approval for AI-WVR-2026-0042
Approved / not approved	approved 5% low-risk FAQ traffic; not approved high-risk intents, customer-specific advice or write-enabled tools
Dates	approval 2026-07-01; expiry 2026-07-30
Approvers	Business owner, Risk partner, Compliance, Privacy, Security, Product architect, Release Governance
Conditions	daily QA before next-day ramp, no hard stop, weekly expiry review, daily evidence binder update

12.5 Board/audit committee summary

Field	Example
Portfolio exposure	active high/critical exceptions 4; customer-facing 2; critical 0; expired active 0; renewal count >= 2 is 1
Material exception	AI-WVR-2026-0042, Retail Service Low-Risk FAQ Pilot, Head of Retail Service owns residual risk
Control gap and scope	PDF table citation automation incomplete; 5% low-risk FAQ traffic; expiry 2026-07-30
Status	KRI green; no hard stops; pilot stops if citation checker regression is not completed
Management attention	recurring citation-control waiver across service products indicates platform investment need

12.6 Expiry review decision

Field	Example
Review	AI-WVR-2026-0042; review 2026-07-25; expiry 2026-07-30
Evidence	KRI dashboard, QA sample logs, trace completeness, complaint linkage, citation checker remediation
Options	close, renew, stop, convert to formal policy/control baseline review
Decision	close after checker regression passes and production trace confirms coverage
Conditions	no traffic expansion until standard gate; regression failures enter remediation; evidence retained

13. Architecture Pattern

13.1 Exception control plane

Component	Purpose
Exception registry	source of truth for waiver id, scope, owner, expiry, approvals
Control catalog	maps risk tier to required controls and waivable/non-waivable status
Policy engine	enforces scope, deny rules, hard exclusions
Release orchestrator	controls traffic cap, channel, model route, feature flag
Tool gateway	enforces read/write permission and approval token
Model gateway	enforces vendor/model/data route and logging requirements
EvalOps pipeline	runs exception-specific regression and sample review
Observability layer	emits exception_id and control-gap telemetry
Evidence binder	stores memo, approval, KRI, logs, review and expiry decisions
Incident integration	links hard stops to incident and remediation workflow
Management dashboard	reports aging, renewal, KRI, board/audit visibility

13.2 Runtime tagging

ai.exception_id, ai.exception_scope, ai.risk_tier, ai.control_gap_id, ai.residual_risk_owner, ai.expiry_date, ai.model_version, ai.prompt_version, ai.rag.source_registry_version, ai.tool.policy_version, ai.human_review_required, ai.hard_stop_profile

13.3 Enforcement flow

request enters AI gateway
-> check exception_id active
-> check current date before expiry
-> check user/channel/data/tool/model in approved scope
-> apply exception-specific policy bundle
-> emit trace tags
-> route allowed request
-> deny or route to human if outside scope

14. 30-Day Lab

Days	Focus	Deliverables
1-3	Select use case and baseline	use case one-pager, risk appetite statement, risk tier, standard control baseline
4-6	Identify exception scenario	control gap memo, non-waivable boundary list, alternatives considered
7-9	Draft risk acceptance memo	business rationale, residual risk, scope, controls, approval roles, hard stops, expiry
10-12	Design compensating controls	preventive/detective/corrective matrix, owner, evidence, workflow diagram
13-15	Design architecture enforcement	exception registry fields, runtime tags, policy checks, tool restrictions, kill switch
16-18	Build dashboard specification	active exceptions, aging, renewals, expired active, KRI status, evidence completeness
19-21	Write templates	intake form, approval record, expiry review memo, board summary, evidence binder
22-24	Simulate hard stop	incident timeline, stop action, customer/case review, remediation update
25-27	Conduct expiry review	close, renew, remediate, convert to policy update or stop decision
28-30	Interview and portfolio pack	30-second answer, 2-minute answer, CRO version, architect version, board/audit explanation

Success criteria: no open-ended waiver; no unowned residual risk; no missing hard stop; no unenforceable scope; no evidence gap; clear explanation of why the exception is not shadow policy.

15. Interview Answers

15.1 30-second answer

AI risk appetite defines the baseline; waiver management governs controlled deviations from that baseline. For every AI exception, I require a specific control gap, narrow scope, residual risk owner, compensating controls, expiry date, hard stop conditions, evidence plan and renewal criteria. I also track aging and repeat renewals, because recurring waivers can become shadow policy. For GenAI and agentic AI, I do not treat this as only model risk; I combine model risk, operational risk, consumer compliance, privacy, security and third-party controls.

15.2 2-minute answer

I would manage AI exceptions as a formal risk acceptance lifecycle. First, I start from the approved risk appetite and control catalog. If a use case cannot satisfy a standard control, the team must identify the exact policy or control being waived, not just say “we need an exception.” Then I classify the exception by customer impact, automation level, data sensitivity, regulatory relevance, reversibility and control gap severity.

Second, I write a residual risk memo. It explains the business reason, alternatives considered, limited scope, what harm could occur, who accepts the residual risk and for how long. The waiver must include compensating controls, such as traffic caps, employee-only scope, source allowlists, human review, QA sampling, no write tools, extra monitoring or fallback routing.

Third, I make the waiver operational. The exception registry connects to feature flags, policy engine, model gateway, tool gateway, telemetry and evidence binder. Every request under the waiver carries an exception id, scope, risk tier, control gap and expiry. Hard stops are pre-approved: for example, wrong fee commitment, missed complaint escalation, PII routed to an unapproved model or tool write without approval immediately pauses the feature.

Finally, I manage expiry. A waiver can close, renew with new evidence, remediate, convert to a formal policy update or stop. It cannot silently continue. Repeat renewal, expired active exceptions and recurring control gaps are escalated to management and, when material, board or audit committee reporting. That prevents exceptions from becoming permanent shadow policy.

15.3 CRO version

I would focus on residual risk accountability and aggregate exposure. The CRO should see which AI controls are being waived, which high/critical use cases are affected, who accepted residual risk, what compensating controls are operating, which KRIs are near breach, which exceptions are aging and whether repeat waivers indicate policy drift or underfunded controls. I would also distinguish waivable control gaps from prohibited uses. A waiver cannot be used to approve an unauthorized final decision, unapproved sensitive-data route or uncontrollable agent execution.

15.4 Chief Product Officer version

I would frame waivers as a way to learn safely, not as a way to bypass governance. Product teams can run limited pilots when the control gap is specific, the scope is narrow and the residual risk is accepted. But the waiver must shape the roadmap: if multiple teams keep requesting the same exception, that becomes a platform investment or policy decision. The product leader should track exception debt the same way they track technical debt, because unmanaged waivers slow down future releases and create audit risk.

15.5 Chief Architect version

I would implement exception management as a control plane. The exception registry should feed policy engine, model gateway, tool gateway, release orchestrator, observability and evidence binder. Runtime checks should enforce expiry, user/channel/data/tool/model scope and hard exclusions. Every trace should carry exception id, risk tier, control gap, model/prompt/source/tool versions and review status. The architecture must support artifact-level rollback, not just code rollback, because AI behavior can change through prompt, RAG, model route, tool schema or vendor configuration.

15.6 Internal audit version

I would ask whether management can reconstruct the decision and prove the waiver operated within approved boundaries. Evidence should include the policy baseline, control gap, approval roles, residual risk memo, compensating control tests, release configuration, trace samples, KRI history, hard stop drill and expiry decision. I would pay special attention to expired active exceptions, repeat renewals and cases where the same control is waived across multiple products, because those indicate shadow policy risk.

15.7 Board/audit committee version

At board or audit committee level, I would not show every low-risk waiver. I would show material exposure: active high/critical exceptions, customer-facing exceptions, exceptions linked to incidents, expired active items, repeat renewals, control-gap concentration and remediation progress. The key message is whether AI operations remain within approved risk appetite or whether exceptions are becoming the real operating model.

16. Common Failure Modes

Failure mode	Symptom	Better practice
Waiver without expiry	“temporary” approval remains active for months	fixed expiry and automatic escalation
Waiver without control id	nobody knows what is being waived	map to policy/control catalog
Business-only approval	value owner approves own residual risk	tiered cross-functional approval
No runtime enforcement	scope exists only in memo	feature flag, policy engine, gateway control
No hard stop	team debates during breach	pre-approved stop conditions
Weak compensating controls	“manual monitoring” with no sample plan	defined owner, frequency and evidence
Evidence afterthought	audit pack assembled manually months later	evidence generated as workflow operates
Repeated renewal	same gap extended repeatedly	platform investment or policy review
Model-risk-only lens	privacy, security, operations and third party omitted	cross-domain review
Agent tool gap	waiver ignores delegated tool action	tool gateway and identity propagation
Expired active waiver	production still runs after expiry	incident escalation and disable path
Shadow policy	exceptions become standard practice	management review and formal baseline decision

17. Final Memory Card

Concept	One line
Risk appetite	baseline boundary for AI risk-taking
Waiver	time-boxed deviation from a standard control
Risk acceptance	accountable acceptance of residual risk
Compensating control	substitute control that reduces risk while gap exists
Expiry	date when waiver must close, renew, remediate, convert or stop
Hard stop	pre-approved condition that immediately pauses or rolls back use
Shadow policy	repeated or indefinite exception that becomes the real operating model
Board visibility	material exposure, aging, renewal, breach and remediation view
Agentic AI nuance	exception must cover model, tool, identity, delegation, security, privacy, operations and vendor risk

Most important sentence:

A mature AI organization does not pretend exceptions will disappear; it designs them as controlled, temporary, evidenced residual-risk decisions and escalates them before they become shadow policy.