AI Exception / Risk Acceptance / Waiver Playbook
AI Exception / Risk Acceptance / Waiver Architecture 是一套把“暂时不能满足标准 AI 控制”转成有业务理由、有风险归属、有补偿控制、有到期、有证据、有升级、有硬停止条件的治理机制。
AI Exception / Risk Acceptance / Waiver Architecture Playbook
适用对象: 高级 AI Product Manager / AI BA / Product Architect / Enterprise Architect / Risk Partner / Model Risk Lead / Operational Risk Lead / Compliance / Privacy / Security / Third-Party Risk / Internal Audit / Board reporting owner。 目的: 训练金融零售 AI 团队如何管理政策例外、临时 waiver、剩余风险接受、补偿控制、到期续期、证据留存、升级路径和硬停止条件。重点不是基础 BA, 而是把 AI governance 做成可运行、可监控、可审计、可向高管和审计委员会说明的 exception control system。 核心观点: Risk appetite 定义组织的 AI 风险边界; exception / waiver architecture 管理的是在风险偏好和标准控制已经定义之后, 某个 use case 对标准控制的限时、限域、留证偏离。例外不能成为永久 shadow policy。
重要说明: 本文是学习、作品集和治理设计材料, 不是法律、审计、监管、模型验证或合规意见。正式项目必须由 business owner、legal、compliance、model risk、operational risk、security、privacy、third-party risk、technology、internal audit 和管理层结合机构、司法辖区、监管关系和内部政策确认。
1. Executive Framing
1.1 One-sentence positioning
AI Exception / Risk Acceptance / Waiver Architecture 是一套把“暂时不能满足标准 AI 控制”转成有业务理由、有风险归属、有补偿控制、有到期、有证据、有升级、有硬停止条件的治理机制。
An AI waiver is not permission to ignore controls.
It is a time-boxed, scope-bound, evidence-backed acceptance of residual risk.
1.2 Distinction from risk appetite
| Layer | 解决的问题 | 输出 |
|---|---|---|
| Risk appetite | 组织愿意承担哪些 AI 风险, 哪些用途禁止, 哪些用途有条件允许 | 风险偏好声明、risk tier、标准控制基线 |
| Standard control catalog | 每个 risk tier 默认必须有哪些控制 | eval、model validation、HITL、DLP、tool gateway、monitoring、evidence |
| Exception / waiver | 某个 use case 暂时偏离标准控制时怎么办 | exception memo、risk acceptance、compensating controls、expiry、hard stop |
| Issue / incident | 控制失败或损害已经发生时怎么办 | incident response、RCA、remediation、customer correction |
高级表达: Risk appetite is the baseline. Waiver management is the controlled deviation from that baseline. If the same waiver keeps renewing, the organization has unresolved control debt or an outdated policy baseline.
1.3 Why AI exception handling is special
AI 行为由 model route、prompt、RAG corpus、source registry、eval rubric、tool permissions、agent autonomy、human review capacity、privacy logging、security gateway、vendor terms、customer disclosure 和 remediation process 共同决定。
GenAI / agentic AI 的例外不能只归入模型风险。它需要同时组合 model risk、operational risk、consumer compliance、privacy、security、third-party、technology resilience、customer harm 和 audit evidence。
1.4 Executive question set
| Executive question | Good answer requires |
|---|---|
| Which AI controls are being waived? | control id、policy baseline、use case、risk tier |
| Why is the waiver necessary? | business value、timing pressure、control debt, alternatives considered |
| Who accepts the residual risk? | named role, delegated authority, approval record |
| How narrow is the exception? | user, channel, geography, data, tool, model, traffic, duration |
| What compensating controls operate now? | tested workflow controls, monitoring, review, fallback |
| What would force an immediate stop? | hard stop triggers, kill switch, owner, runbook |
| When does it expire? | fixed expiry date, renewal criteria, exit path |
| Is this becoming shadow policy? | aging, renewals, repeat reason, remediation backlog |
| Can audit reconstruct the decision? | evidence binder with versions, signoffs, logs and KRI history |
2. Source Anchors
以下来源作为治理语言和证据结构锚点。本文把它们转成 AI 产品、架构、BA 和金融零售治理实践。访问日期按 2026-06-30 记录。
| Anchor | Official link | 本 playbook 使用方式 |
|---|---|---|
| NIST AI RMF | https://www.nist.gov/itl/ai-risk-management-framework | 用 Govern / Map / Measure / Manage 组织 AI exception 的背景、风险测量、控制补偿、持续监控、治理升级和证据闭环。 |
| ISO/IEC 42001 | https://www.iso.org/standard/42001 | 用 AI management system 视角把 exception 纳入 scope、role、operation、performance evaluation、management review 和 continual improvement。 |
| Federal Reserve SR 26-2 | https://www.federalreserve.gov/supervisionreg/srletters/SR2602.htm | 作为 2026 年模型风险管理新锚点。SR 26-2 于 2026-04-17 替代 SR 11-7 和 SR 21-8; 对 AI / ML / GenAI / agentic AI 的 intended use、risk tier、effective challenge、monitoring 和 governance 有现实参考意义。 |
| FFIEC IT Examination Handbook Management booklet | https://ithandbook.ffiec.gov/it-booklets/management.aspx | 用 board oversight、IT governance、risk management、third-party、change management、audit 和 management reporting 视角组织 exception management。 |
2.1 Current nuance
- SR 26-2 替代 SR 11-7 和 SR 21-8, 因此 2026 年后的模型风险讨论不能只停留在旧 letter。
- SR 26-2 把模型风险管理推向更明确的 risk-based tailoring、intended use、model inventory、monitoring、validation、effective challenge 和 governance。
- 对 GenAI / agentic AI, exception handling 必须同时说明 operational risk、consumer compliance、privacy、security、third-party、tool autonomy 和 evidence gaps。
- 任何长期续期的例外都要升级为 policy review、control investment 或 formal residual risk decision。
3. Exception Taxonomy
3.1 Baseline structure
Risk appetite
-> risk tier
-> standard control baseline
-> control gap
-> exception request
模糊说法: We need an exception for the AI assistant.
合格说法:
We request a 45-day exception from CTRL-EVAL-HIGH-003,
which requires full regression coverage for complaint and fee-dispute slices,
for a 5% employee-only pilot of the retail service copilot.
The exception excludes customer auto-send and all write-enabled tools.
3.2 Taxonomy by control domain
| Exception type | Control gap | Example | Typical compensating control |
|---|---|---|---|
| Eval coverage | eval suite 不完整或新场景样本不足 | complaint slice 样本不足 | smaller pilot, daily QA, failed-case capture |
| Model validation | independent validation / challenge 未完成 | validation report due in 30 days | shadow mode, traffic cap, validation milestone |
| Model inventory | model / prompt / RAG / tool 未完全登记 | prompt variants not in registry | freeze variants, manual register, block expansion |
| Data boundary | 数据分类、redaction、retention 证据不足 | prompt log redaction proof missing | no sensitive data, DLP monitoring |
| Privacy | consent、purpose、retention 或 vendor terms 未完全证明 | transcript retention policy pending approval | restricted dataset, no export |
| Security | gateway、RBAC、SIEM、prompt injection 或 tool permission 控制缺口 | missing deny reason in tool log | read-only mode, extra SIEM alert |
| Third-party | vendor evidence、SLA、DPA、exit path 或 region routing 缺口 | vendor SOC report renewal in progress | fallback route, lower traffic |
| Consumer compliance | disclosure、adverse action、UDAAP、complaint、recourse 相关缺口 | customer disclosure copy not fully approved | employee-only release |
| Operational readiness | human review queue、training、SOP、capacity 不达标 | review backlog risk | lower volume, queue KRI |
| Evidence | evidence binder 自动化或 trace coverage 不完整 | trace tags cover 92%, standard is 99% | manual evidence pack |
| Change governance | artifact versioning、rollback、release approval 缺口 | RAG rollback drill not completed | no auto-refresh, manual snapshot |
| Board visibility | material exception 未进入 MI pack | high-tier exception approved locally | immediate escalation |
3.3 Taxonomy by AI autonomy
| Autonomy level | Exception sensitivity | Typical rule |
|---|---|---|
| Retrieve / summarize | Lower but depends on data and citation | Short waivers possible if employee-facing and low impact |
| Recommend | Medium to high | Waiver must preserve human decision and explainability |
| Draft customer communication | High | Disclosure, approved language and human send controls are difficult to waive |
| Decide | High to critical | Exceptions rare; prohibited uses remain no-go |
| Execute tool action | High to critical | Write-enabled waiver requires tool gateway, approval token and rollback |
| Multi-agent orchestration | Critical when autonomous | Waiver must cover delegation, identity, tool chain, monitoring and emergency stop |
3.4 Non-waivable conditions
Do not approve when the request involves prohibited use, no accountable residual risk owner, no enforceable scope, no stop capability, no evidence path, known active harm without containment, unapproved sensitive-data exposure, or hidden final-decision automation.
4. Waiver Lifecycle
4.1 Lifecycle map
Exception trigger
-> intake
-> control gap classification
-> residual risk assessment
-> compensating control design
-> approval routing
-> restricted operation
-> KRI monitoring
-> evidence capture
-> expiry review
-> close / renew / remediate / policy update / stop
4.2 Trigger examples
| Trigger | Example |
|---|---|
| Product release pressure | Pilot date arrives before full eval coverage |
| Control maturity gap | policy engine supports deny but not reason logging |
| Vendor dependency | model provider updates data-processing evidence late |
| New use case | policy does not yet classify agentic workflow pattern |
| Operational constraint | human review capacity below high-tier target |
| Incident remediation | system can restart only with temporary traffic cap |
| Regulatory or audit finding | control gap must be tracked until remediation completes |
4.3 Intake fields
| Field | Strong content |
|---|---|
| Exception ID | stable id used in release, telemetry, evidence and dashboard |
| Use case | product, business domain, channel, customer/employee impact |
| Risk tier | low, medium, high, critical with rationale |
| Standard control | control id, policy name, required baseline |
| Requested deviation | exact control gap, not vague description |
| Business reason | why the deviation is needed now |
| Alternatives considered | wait, reduce scope, manual process, different vendor, redesign |
| Scope | users, channels, geography, data, model, tools, traffic, time |
| Residual risk | what risk remains after compensating controls |
| Compensating controls | preventive, detective, corrective controls |
| Hard stop | measurable conditions that end the waiver immediately |
| Evidence plan | what proves controls operate |
| Expiry | specific date and review forum |
4.4 Classification and approval
| Dimension | Low signal | High signal |
|---|---|---|
| Customer impact | employee-only internal draft | customer-facing or customer-impacting output |
| Automation | read-only assistant | decision or write-enabled tool |
| Data sensitivity | public or approved internal knowledge | PII, account, credit, wealth, AML, complaint data |
| Regulatory relevance | internal productivity | credit, AML, privacy, complaint, advice, unfair practice |
| Reversibility | easy rollback, no external record | irreversible record, notice, funds, account status |
| Control gap severity | evidence formatting gap | missing validation, HITL or data approval |
| Risk tier | Approval pattern |
|---|---|
| Low | Product owner and control owner; evidence owner informed |
| Medium | Product, business owner, control owner, risk partner, security/privacy if affected |
| High | Business executive delegate, product, model risk, compliance, operational risk, security/privacy, architecture, operations |
| Critical | Executive risk owner, formal governance forum, legal/compliance/model risk, CISO/privacy/third-party as relevant, board/audit visibility if material |
Authority principle: The person who wants the value should not be the only person accepting the residual risk.
4.5 Restricted operation
| Restriction | Architecture implementation |
|---|---|
| Traffic cap | feature flag, release orchestrator, route control |
| User scope | IAM, group entitlement, role policy |
| Channel scope | API gateway, UI route, deployment config |
| Data scope | data gateway, retriever filter, DLP, purpose tag |
| Tool scope | tool gateway, allowlist, deny-by-default |
| Model scope | model route policy, region route, vendor allowlist |
| Time scope | scheduled expiry flag, release calendar lock |
| Evidence scope | trace attribute exception_id, evidence binder rule |
4.6 Expiry outcomes
| Outcome | Meaning |
|---|---|
| Close | standard control is satisfied; waiver ends |
| Remediate then close | control gap fixed before continued operation |
| Renew with evidence | short renewal approved with new evidence and stronger conditions |
| Convert to policy change | repeated waiver reveals baseline policy/control must be updated formally |
| Stop | value no longer justifies residual risk or control gap persists |
No silent extension.
5. Risk Acceptance Memo
5.1 Memo purpose
The memo proves what control is missing or weakened, why the business still wants limited operation, what risk remains, who accepts it, what controls compensate, what evidence will be reviewed, when the acceptance ends and what triggers immediate stop.
5.2 Memo template
| Section | Example content |
|---|---|
| Decision summary | AI-WVR-2026-0042; Retail Service Low-Risk FAQ Pilot; temporary exception from CTRL-CITATION-PDF-004; high risk; limited approval for 30 days; expiry 2026-07-30 |
| Business rationale | pilot reduces call-center FAQ volume, tests self-service demand, stays limited to low-risk FAQ, excludes complaint / fee waiver / credit commitment / account closure |
| Alternatives | wait for full checker, extend employee-only pilot, use manual FAQ update |
| Control baseline and gap | customer-facing high-tier answers require automated citation support check; PDF table citation checker does not cover every fee table structure |
| Non-waivable boundaries | no write tools, no customer-specific advice, no complaint handling, no credit decision |
| Residual risk | customer may receive incomplete source support in low-risk FAQ; reduced by source allowlist, no-answer fallback and daily QA |
| Residual risk owner | Head of Retail Service, with Risk Partner concurrence |
| Compensating controls | 5% traffic cap, intent exclusions, daily 100-case QA, trace tags, route-to-human, feature flag disable |
| Hard stops | wrong fee commitment > 0; complaint miss > 0; unsupported citation rate > 2%; trace completeness < 98%; sensitive data exposure |
| Evidence | approval record, source registry, daily QA, KRI dashboard, trace report, incident log, stop-rule test |
| Expiry and exit | review on 2026-07-25; close after checker regression passes; stop if remediation evidence is insufficient |
5.3 Memo quality bar
Weak memo: We need a temporary exception because the business needs to launch quickly.
Strong memo: We request a 30-day exception from a specific citation control for a 5% low-risk FAQ pilot. The waiver excludes high-risk intents and write tools, uses daily sample review and hard stop triggers, and expires before the next risk committee review.
6. Compensating Controls
6.1 Control design principles
Compensating controls must be tied to the specific gap, stronger where uncertainty is higher, feasible for operations, instrumented with evidence, time-boxed, reviewed at expiry and mapped to an owner.
6.2 Control catalog
| Gap | Preventive control | Detective control | Corrective control |
|---|---|---|---|
| Eval coverage incomplete | narrow scope, excluded intents, traffic cap | daily QA sample, failed-case review | expand eval set, rerun gate, pause ramp |
| Model validation pending | shadow mode, no customer impact | validation checkpoint, challenger review | hold expansion, rollback candidate |
| Citation checker incomplete | source allowlist, no-answer fallback | citation QA, unsupported claim metric | fix parser, remove source format |
| HITL capacity below standard | lower traffic, queue limit | queue aging KRI, reviewer load dashboard | route to manual backlog, stop release |
| Tool logging gap | read-only mode, deny write tools | tool deny audit, SIEM rule | disable tool, add log fields |
| Privacy evidence gap | data minimization, redaction | DLP sample, privacy review | purge logs, tighten route |
| Vendor evidence gap | fallback vendor, reduced data | SLA and incident monitoring | switch route, stop vendor use |
| Evidence automation gap | manual evidence pack owner | daily completeness check | block renewal until automated |
6.3 Compensating controls for agentic AI
| Agent risk | Compensating control |
|---|---|
| Tool chain expands beyond scope | tool gateway with exception-specific allowlist |
| Agent delegates to another agent | identity propagation, delegation policy, trace parent id |
| Multi-step plan hides risk | plan approval before action, step-level policy checks |
| Write action causes irreversible change | dry-run, human approval token, idempotency and reversal path |
| Prompt injection changes tool use | instruction hierarchy, content isolation, policy post-check |
| Agent loops or escalates cost | budget cap, step cap, timeout, emergency stop |
| Third-party tool changes behavior | contract pinning, vendor change notice, fallback |
6.4 Control evidence
| Control | Evidence |
|---|---|
| Traffic cap | release config, exposure report |
| Excluded intents | policy test, production deny logs |
| Human sample review | sample list, reviewer, outcome, defect reason |
| Source allowlist | registry snapshot, retrieval trace |
| No write tools | tool gateway config, deny logs |
| DLP route | redaction report, blocked request log |
| Hard stop test | kill switch drill or feature flag proof |
| Expiry enforcement | scheduled review, automated disable config |
7. Expiry and Renewal
7.1 Expiry rule
Every waiver needs an expiry date, review owner, forum, renewal criteria, closure criteria, stop criteria and evidence bundle. No expiry means no waiver approval.
7.2 Recommended maximums
| Risk tier | Normal maximum duration | Renewal posture |
|---|---|---|
| Low | 60-90 days | one renewal with owner approval |
| Medium | 30-60 days | renewal requires evidence and control owner |
| High | 14-45 days | renewal requires cross-functional approval |
| Critical | shortest practical window | renewal discouraged; executive review required |
7.3 Renewal criteria
Renewal should require: business rationale still valid; no hard stop triggered; KRI stable; compensating controls operated as designed; remediation progress; scope not expanded; residual risk owner re-accepts risk; new expiry is shorter or tied to a concrete delivery date.
7.4 Repeat renewal escalation
| Signal | Required action |
|---|---|
| second renewal | risk forum review and remediation funding decision |
| third renewal | executive review, policy/control baseline reassessment |
| over 90 days active for high-tier | board/audit committee visibility if material |
| same reason across multiple teams | platform control investment or policy update |
| expired but active | incident or management issue, not administrative delay |
7.5 Shadow policy test
Ask whether the same exception is active for more than one review cycle, teams design around the waiver as normal practice, the reason recurs across products, management stops discussing exit, controls remain unfunded or audit would see a gap between written policy and operations.
8. No-Go Criteria and Hard Stops
8.1 No-go criteria
| No-go condition | Reason |
|---|---|
| prohibited use is implicated | exceptions cannot override prohibited uses |
| residual risk owner lacks authority | accountability mismatch |
| scope cannot be enforced | waiver may spread beyond approval |
| hard stop cannot be executed | risk cannot be contained |
| customer harm is ongoing | issue/incident path required |
| sensitive data route unapproved | privacy/security/third-party risk unacceptable |
| control gap hides final decision automation | automation boundary not transparent |
| evidence cannot be retained | audit and model risk cannot reconstruct decision |
| renewal history shows chronic non-remediation | waiver is becoming policy |
8.2 Hard stop examples
| Use case | Hard stop |
|---|---|
| Customer service FAQ | confirmed wrong fee commitment > 0 |
| Complaint triage | complaint escalation miss > 0 |
| Credit memo copilot | adverse action reason mismatch in any customer-impacting path |
| Wealth assistant | personalized recommendation breach > 0 |
| AML investigation copilot | AI-generated SAR / no-SAR conclusion observed |
| Fraud agent | tool action changes account state without approval token |
| RAG policy assistant | unsupported citation in high-risk slice > threshold |
| Agent workflow | tool loop, cost spike, or delegated action outside allowlist |
| Privacy-sensitive assistant | PII sent to unapproved model route |
| Third-party model route | vendor SLA or data-processing evidence becomes invalid |
8.3 Stop rule runbook
Trigger detected
-> classify severity
-> pause or cap feature by exception_id
-> disable affected tool/model/channel if needed
-> notify product, risk, operations, compliance, security/privacy as relevant
-> preserve traces and evidence
-> review affected customers/cases
-> decide rollback, remediation, customer correction or incident escalation
-> update waiver status
8.4 Stop authority
| Role | Stop authority |
|---|---|
| Release owner | pause ramp or traffic cap |
| Operations lead | route to manual queue |
| Security lead | disable unsafe tool or route |
| Privacy lead | stop data flow |
| Risk/compliance lead | stop customer-facing use |
| Executive owner | terminate material waiver |
9. Dashboards and KRIs
9.1 Exception dashboard
| Metric | Meaning |
|---|---|
| Active exceptions by risk tier | overall residual risk exposure |
| Exceptions by domain | credit, wealth, AML, fraud, service, operations |
| Exceptions by control domain | eval, model risk, privacy, security, third-party, operations |
| Aging | days active and days to expiry |
| Renewal count | shadow policy signal |
| Expired but active | governance breach |
| Hard stops triggered | containment effectiveness |
| KRI breaches | risk is outside waiver conditions |
| Remediation progress | whether control debt is closing |
| Evidence completeness | audit readiness |
| Exceptions linked to incidents | whether waivers contributed to harm |
| Board-reportable exceptions | material residual risk visibility |
9.2 KRI catalog
| KRI | Definition | Management use |
|---|---|---|
| Exception aging | active days since approval | spot stalled remediation |
| Repeat renewal rate | renewals / active exceptions | identify shadow policy |
| Expired active exceptions | expired exceptions still in production | immediate escalation |
| Control gap concentration | same control waived across teams | platform investment need |
| High-tier exception count | high/critical open waivers | risk appetite pressure |
| Hard stop trigger count | triggered stops by use case | system risk and control quality |
| Evidence completeness | required evidence present and current | audit readiness |
| Remediation slippage | missed control fix dates | governance effectiveness |
| Exception incident linkage | incidents linked to active waivers | risk acceptance quality |
| Third-party exception exposure | vendor-related open exceptions | supplier risk |
| Consumer harm signal | complaints, appeals, upheld cases linked to waiver scope | customer protection |
| Human review overload | review SLA breach under waiver | compensating control failure |
9.3 Board and audit committee view
| Board question | Dashboard answer |
|---|---|
| Are we operating outside approved AI controls? | count and severity of active high/critical waivers |
| Are exceptions temporary? | aging, expiry and renewal pattern |
| Are customers exposed? | customer-facing scope, complaint and harm signals |
| Who accepted residual risk? | accountable executive roles |
| Are controls compensating effectively? | KRI status and evidence completeness |
| Are repeat exceptions creating shadow policy? | repeat reasons and policy update decisions |
| Are GenAI/agentic risks covered cross-functionally? | model, operational, privacy, security, third-party and compliance view |
9.4 Example dashboard row
| Field | Example |
|---|---|
| Exception ID | AI-WVR-2026-0042 |
| Use case | Retail Service Low-Risk FAQ Pilot |
| Risk tier | High |
| Control gap | PDF table citation checker incomplete |
| Scope | 5% customer FAQ traffic, low-risk intents only |
| Residual risk owner | Head of Retail Service |
| Compensating controls | daily QA, source allowlist, no write tools, complaint hard-route |
| Expiry | 2026-07-30 |
| KRI status | green, no hard stops |
| Evidence status | 98.9% trace completeness, daily QA complete |
| Renewal count | 0 |
| Board visibility | included in monthly AI risk MI if extended |
10. RACI
10.1 Core roles
| Role | Accountability |
|---|---|
| Business owner | owns business value and accepts business residual risk within authority |
| AI Product Manager | defines scope, value, user impact, release conditions and product guardrails |
| AI BA | maps policy/control gap to requirements, workflow, evidence and stakeholder decisions |
| Product Architect | maps waiver conditions to runtime architecture and controls |
| Model Risk | evaluates model intended use, validation gap, monitoring and effective challenge |
| Operational Risk | evaluates process, human control, capacity, incident and control operation |
| Compliance | evaluates consumer, regulatory, disclosure, complaint and record implications |
| Privacy | evaluates data purpose, minimization, consent, retention and rights |
| Security | evaluates access, gateway, tool abuse, logging, SIEM and incident path |
| Third-Party Risk | evaluates vendor evidence, SLA, data terms, resilience and exit |
| Operations | runs human review, QA sampling, queues and fallback processes |
| Release Governance | ensures approval routing, evidence, expiry and dashboard |
| Internal Audit | assesses design and evidence quality without owning management risk |
| Board / Audit Committee | receives material exception visibility and challenges chronic exposure |
10.2 RACI shorthand
| Activity | Accountable | Responsible | Consulted |
|---|---|---|---|
| Intake and residual risk memo | Business owner | PM / BA / Release Governance | Architect, Risk, Compliance, Privacy, Security, TPRM, Ops |
| Control gap and compensating controls | Architect / Operational Risk | BA / PM / Ops | Model Risk, Compliance, Privacy, Security |
| Approval routing and expiry review | Release Governance | PM / BA | all required control owners by risk tier |
| Runtime enforcement and KRI dashboard | Architect / Operations | Platform / Ops / Release Governance | Product, Risk, Security, Privacy |
| Board reporting | Business owner / Release Governance | Risk reporting owner | Product, Architecture, Control owners |
10.3 Forum design
| Forum | Scope | Cadence |
|---|---|---|
| Daily exception triage | low/medium intake, expiring exceptions, evidence gaps | daily or twice weekly |
| Weekly AI risk review | high-tier waivers, KRI trends, renewal requests | weekly |
| Material AI governance forum | critical waivers, customer-impacting exceptions, cross-domain disputes | as needed |
| Monthly management information review | portfolio exposure, aging, repeat exceptions, remediation funding | monthly |
| Quarterly board/audit package | material residual risk, policy drift, chronic exceptions, audit findings | quarterly |
11. Financial Retail Examples
11.1 Credit policy copilot
Scenario: AI drafts credit policy memos for underwriters; standard control requires full fair-lending regression before expanded pilot; a small-business segment is incomplete.
| Area | Decision |
|---|---|
| Scope | employee-only memo draft, no final credit decision, no adverse action notice |
| Duration | 30 days |
| Residual risk | underwriter may over-trust draft in new segment |
| Compensating controls | mandatory underwriter attestation, second-line sample review, no customer communication |
| Hard stop | any reason-code mismatch in customer-impacting path |
| Evidence | underwriter review logs, sample review, eval expansion plan |
No-go: AI automatically generates final adverse action reasons under incomplete validation.
11.2 Wealth guidance assistant
Scenario: AI supports advisor preparation and client education; advice boundary classifier has not completed edge-case testing for retirement product prompts.
| Area | Decision |
|---|---|
| Scope | advisor-facing only, meeting prep and educational summaries |
| Exclusion | no direct client personalized recommendation |
| Compensating controls | licensed advisor final review, approved educational content, advice breach monitoring |
| Hard stop | personalized buy/sell recommendation generated without advisor mediation |
| Expiry | classifier edge-case test completion date |
No-go: customer-facing robo-advice with incomplete advice boundary.
11.3 Customer service AI FAQ
Scenario: Customer-facing low-risk FAQ pilot has incomplete PDF table citation automation.
| Area | Decision |
|---|---|
| Scope | 5% low-risk FAQ sessions |
| Exclusion | complaints, fee waivers, credit commitments, account closure |
| Compensating controls | source allowlist, no-answer fallback, daily QA, route to human |
| Hard stop | wrong fee commitment > 0 or complaint miss > 0 |
| Evidence | citation QA, trace samples, customer escalation log |
11.4 AML investigation copilot
Scenario: AI summarizes transaction patterns and drafts narrative; tool audit log lacks one required attribute for analyst override reason.
| Area | Decision |
|---|---|
| Scope | internal draft only, no SAR/no-SAR decision |
| Compensating controls | analyst-in-control, manual override reason field, weekly QA |
| Hard stop | AI-generated final SAR conclusion or missing case trace |
| Evidence | case review records, manual override extract, tool trace |
No-go: AI submits SAR or decides no-SAR with incomplete evidence controls.
11.5 Fraud operations agent
Scenario: Agent can recommend fraud queue actions; write-enabled account restriction tool is technically available but reversal process is not tested.
| Area | Decision |
|---|---|
| Scope | read-only and recommendation mode |
| Not approved | account restriction write action |
| Compensating controls | fraud analyst approval, dry-run tool output, queue monitoring |
| Hard stop | any account state change without approval token |
| Exit | complete reversal test and dual-control design |
11.6 Third-party model route
Scenario: Vendor model performs better for Spanish-language support; updated vendor evidence for data retention is due but not yet received.
| Condition | Required control |
|---|---|
| Data minimized | no sensitive account details in prompt |
| Region controlled | approved endpoint only |
| Duration limited | short expiry aligned to vendor evidence date |
| Fallback available | route to existing approved model |
| Monitoring active | data route and vendor SLA dashboard |
No-go: PII or restricted data sent to unapproved or unverified route.
12. Templates
12.1 Exception intake form
| Field | Example |
|---|---|
| Identification | AI-WVR-2026-0042; Retail Service FAQ Pilot; Customer Service; request owner AI PM; business owner Head of Retail Service |
| Baseline | high risk because output is customer-facing; CTRL-CITATION-PDF-004 requires automated citation support check |
| Deviation | PDF fee table citation support incomplete; 30-day waiver requested; 5% traffic cap; complaints, fee waivers, credit commitments and account closure excluded |
| Residual risk | customer may receive incomplete source support in low-risk FAQ |
| Controls | source allowlist, no-answer fallback, daily 100-case QA, route-to-human, no write tools |
| Hard stops | wrong fee commitment > 0; complaint miss > 0; unsupported citation sample > 2%; trace completeness < 98% |
| Evidence and exit | release config, source registry, QA, trace dashboard, incident log; close after checker regression or stop pilot |
12.2 Residual risk acceptance table
| Field | Example |
|---|---|
| Risk event | Low-risk FAQ answer contains unsupported citation |
| Potential impact | customer confusion, trust impact, complaint |
| Inherent severity | medium |
| Compensating controls | source allowlist, no-answer fallback, daily QA |
| Residual severity | low to medium within limited scope |
| Residual risk owner | Head of Retail Service |
| Acceptance period | 2026-07-01 to 2026-07-30 |
| Review cadence | weekly KRI, daily QA |
| Hard stop | unsupported citation sample rate > 2% |
| Evidence | QA log, trace report, KRI dashboard |
12.3 Compensating control matrix
| Control gap | Risk | Compensating control | Owner | Evidence | Frequency |
|---|---|---|---|---|---|
| PDF citation parser incomplete | unsupported source claim | daily sample review and no-answer fallback | Ops QA Lead | sample log and fallback metric | daily |
| trace completeness below standard | audit reconstruction gap | daily trace completeness check | Observability Owner | dashboard export | daily |
| review queue capacity uncertain | delayed human escalation | traffic cap and queue aging alert | Operations Lead | queue report | hourly |
| vendor evidence pending | data-processing uncertainty | data minimization and fallback route | TPRM Owner | route log and vendor tracker | weekly |
12.4 Waiver approval record
| Field | Example |
|---|---|
| Decision | limited approval for AI-WVR-2026-0042 |
| Approved / not approved | approved 5% low-risk FAQ traffic; not approved high-risk intents, customer-specific advice or write-enabled tools |
| Dates | approval 2026-07-01; expiry 2026-07-30 |
| Approvers | Business owner, Risk partner, Compliance, Privacy, Security, Product architect, Release Governance |
| Conditions | daily QA before next-day ramp, no hard stop, weekly expiry review, daily evidence binder update |
12.5 Board/audit committee summary
| Field | Example |
|---|---|
| Portfolio exposure | active high/critical exceptions 4; customer-facing 2; critical 0; expired active 0; renewal count >= 2 is 1 |
| Material exception | AI-WVR-2026-0042, Retail Service Low-Risk FAQ Pilot, Head of Retail Service owns residual risk |
| Control gap and scope | PDF table citation automation incomplete; 5% low-risk FAQ traffic; expiry 2026-07-30 |
| Status | KRI green; no hard stops; pilot stops if citation checker regression is not completed |
| Management attention | recurring citation-control waiver across service products indicates platform investment need |
12.6 Expiry review decision
| Field | Example |
|---|---|
| Review | AI-WVR-2026-0042; review 2026-07-25; expiry 2026-07-30 |
| Evidence | KRI dashboard, QA sample logs, trace completeness, complaint linkage, citation checker remediation |
| Options | close, renew, stop, convert to formal policy/control baseline review |
| Decision | close after checker regression passes and production trace confirms coverage |
| Conditions | no traffic expansion until standard gate; regression failures enter remediation; evidence retained |
13. Architecture Pattern
13.1 Exception control plane
| Component | Purpose |
|---|---|
| Exception registry | source of truth for waiver id, scope, owner, expiry, approvals |
| Control catalog | maps risk tier to required controls and waivable/non-waivable status |
| Policy engine | enforces scope, deny rules, hard exclusions |
| Release orchestrator | controls traffic cap, channel, model route, feature flag |
| Tool gateway | enforces read/write permission and approval token |
| Model gateway | enforces vendor/model/data route and logging requirements |
| EvalOps pipeline | runs exception-specific regression and sample review |
| Observability layer | emits exception_id and control-gap telemetry |
| Evidence binder | stores memo, approval, KRI, logs, review and expiry decisions |
| Incident integration | links hard stops to incident and remediation workflow |
| Management dashboard | reports aging, renewal, KRI, board/audit visibility |
13.2 Runtime tagging
ai.exception_id, ai.exception_scope, ai.risk_tier, ai.control_gap_id, ai.residual_risk_owner, ai.expiry_date, ai.model_version, ai.prompt_version, ai.rag.source_registry_version, ai.tool.policy_version, ai.human_review_required, ai.hard_stop_profile
13.3 Enforcement flow
request enters AI gateway
-> check exception_id active
-> check current date before expiry
-> check user/channel/data/tool/model in approved scope
-> apply exception-specific policy bundle
-> emit trace tags
-> route allowed request
-> deny or route to human if outside scope
14. 30-Day Lab
| Days | Focus | Deliverables |
|---|---|---|
| 1-3 | Select use case and baseline | use case one-pager, risk appetite statement, risk tier, standard control baseline |
| 4-6 | Identify exception scenario | control gap memo, non-waivable boundary list, alternatives considered |
| 7-9 | Draft risk acceptance memo | business rationale, residual risk, scope, controls, approval roles, hard stops, expiry |
| 10-12 | Design compensating controls | preventive/detective/corrective matrix, owner, evidence, workflow diagram |
| 13-15 | Design architecture enforcement | exception registry fields, runtime tags, policy checks, tool restrictions, kill switch |
| 16-18 | Build dashboard specification | active exceptions, aging, renewals, expired active, KRI status, evidence completeness |
| 19-21 | Write templates | intake form, approval record, expiry review memo, board summary, evidence binder |
| 22-24 | Simulate hard stop | incident timeline, stop action, customer/case review, remediation update |
| 25-27 | Conduct expiry review | close, renew, remediate, convert to policy update or stop decision |
| 28-30 | Interview and portfolio pack | 30-second answer, 2-minute answer, CRO version, architect version, board/audit explanation |
Success criteria: no open-ended waiver; no unowned residual risk; no missing hard stop; no unenforceable scope; no evidence gap; clear explanation of why the exception is not shadow policy.
15. Interview Answers
15.1 30-second answer
AI risk appetite defines the baseline; waiver management governs controlled deviations from that baseline. For every AI exception, I require a specific control gap, narrow scope, residual risk owner, compensating controls, expiry date, hard stop conditions, evidence plan and renewal criteria. I also track aging and repeat renewals, because recurring waivers can become shadow policy. For GenAI and agentic AI, I do not treat this as only model risk; I combine model risk, operational risk, consumer compliance, privacy, security and third-party controls.
15.2 2-minute answer
I would manage AI exceptions as a formal risk acceptance lifecycle. First, I start from the approved risk appetite and control catalog. If a use case cannot satisfy a standard control, the team must identify the exact policy or control being waived, not just say “we need an exception.” Then I classify the exception by customer impact, automation level, data sensitivity, regulatory relevance, reversibility and control gap severity.
Second, I write a residual risk memo. It explains the business reason, alternatives considered, limited scope, what harm could occur, who accepts the residual risk and for how long. The waiver must include compensating controls, such as traffic caps, employee-only scope, source allowlists, human review, QA sampling, no write tools, extra monitoring or fallback routing.
Third, I make the waiver operational. The exception registry connects to feature flags, policy engine, model gateway, tool gateway, telemetry and evidence binder. Every request under the waiver carries an exception id, scope, risk tier, control gap and expiry. Hard stops are pre-approved: for example, wrong fee commitment, missed complaint escalation, PII routed to an unapproved model or tool write without approval immediately pauses the feature.
Finally, I manage expiry. A waiver can close, renew with new evidence, remediate, convert to a formal policy update or stop. It cannot silently continue. Repeat renewal, expired active exceptions and recurring control gaps are escalated to management and, when material, board or audit committee reporting. That prevents exceptions from becoming permanent shadow policy.
15.3 CRO version
I would focus on residual risk accountability and aggregate exposure. The CRO should see which AI controls are being waived, which high/critical use cases are affected, who accepted residual risk, what compensating controls are operating, which KRIs are near breach, which exceptions are aging and whether repeat waivers indicate policy drift or underfunded controls. I would also distinguish waivable control gaps from prohibited uses. A waiver cannot be used to approve an unauthorized final decision, unapproved sensitive-data route or uncontrollable agent execution.
15.4 Chief Product Officer version
I would frame waivers as a way to learn safely, not as a way to bypass governance. Product teams can run limited pilots when the control gap is specific, the scope is narrow and the residual risk is accepted. But the waiver must shape the roadmap: if multiple teams keep requesting the same exception, that becomes a platform investment or policy decision. The product leader should track exception debt the same way they track technical debt, because unmanaged waivers slow down future releases and create audit risk.
15.5 Chief Architect version
I would implement exception management as a control plane. The exception registry should feed policy engine, model gateway, tool gateway, release orchestrator, observability and evidence binder. Runtime checks should enforce expiry, user/channel/data/tool/model scope and hard exclusions. Every trace should carry exception id, risk tier, control gap, model/prompt/source/tool versions and review status. The architecture must support artifact-level rollback, not just code rollback, because AI behavior can change through prompt, RAG, model route, tool schema or vendor configuration.
15.6 Internal audit version
I would ask whether management can reconstruct the decision and prove the waiver operated within approved boundaries. Evidence should include the policy baseline, control gap, approval roles, residual risk memo, compensating control tests, release configuration, trace samples, KRI history, hard stop drill and expiry decision. I would pay special attention to expired active exceptions, repeat renewals and cases where the same control is waived across multiple products, because those indicate shadow policy risk.
15.7 Board/audit committee version
At board or audit committee level, I would not show every low-risk waiver. I would show material exposure: active high/critical exceptions, customer-facing exceptions, exceptions linked to incidents, expired active items, repeat renewals, control-gap concentration and remediation progress. The key message is whether AI operations remain within approved risk appetite or whether exceptions are becoming the real operating model.
16. Common Failure Modes
| Failure mode | Symptom | Better practice |
|---|---|---|
| Waiver without expiry | “temporary” approval remains active for months | fixed expiry and automatic escalation |
| Waiver without control id | nobody knows what is being waived | map to policy/control catalog |
| Business-only approval | value owner approves own residual risk | tiered cross-functional approval |
| No runtime enforcement | scope exists only in memo | feature flag, policy engine, gateway control |
| No hard stop | team debates during breach | pre-approved stop conditions |
| Weak compensating controls | “manual monitoring” with no sample plan | defined owner, frequency and evidence |
| Evidence afterthought | audit pack assembled manually months later | evidence generated as workflow operates |
| Repeated renewal | same gap extended repeatedly | platform investment or policy review |
| Model-risk-only lens | privacy, security, operations and third party omitted | cross-domain review |
| Agent tool gap | waiver ignores delegated tool action | tool gateway and identity propagation |
| Expired active waiver | production still runs after expiry | incident escalation and disable path |
| Shadow policy | exceptions become standard practice | management review and formal baseline decision |
17. Final Memory Card
| Concept | One line |
|---|---|
| Risk appetite | baseline boundary for AI risk-taking |
| Waiver | time-boxed deviation from a standard control |
| Risk acceptance | accountable acceptance of residual risk |
| Compensating control | substitute control that reduces risk while gap exists |
| Expiry | date when waiver must close, renew, remediate, convert or stop |
| Hard stop | pre-approved condition that immediately pauses or rolls back use |
| Shadow policy | repeated or indefinite exception that becomes the real operating model |
| Board visibility | material exposure, aging, renewal, breach and remediation view |
| Agentic AI nuance | exception must cover model, tool, identity, delegation, security, privacy, operations and vendor risk |
Most important sentence:
A mature AI organization does not pretend exceptions will disappear; it designs them as controlled, temporary, evidenced residual-risk decisions and escalates them before they become shadow policy.