AI Uncertainty UX / Abstention / Confidence / Escalation Playbook
重要说明: 本 playbook 是学习、作品集、架构训练和内部治理讨论材料, 不构成法律意见、合规结论、投资建议、信贷审批建议、AML/SAR 决策、投诉处理意见、模型验证报告或审计结论。正式项目必须由 Legal、Compliance、Risk、Model Risk、Privacy、Security、Operations、Business Owner、Technology Owner 和管理层
AI Uncertainty UX / Abstention / Confidence / Escalation Architecture Playbook
定位: 面向 CBAP+ Senior BA、金融零售 AI PM、Product Architect、Solution Architect、AI Governance Lead、Model Risk、Compliance Technology、Contact Center、Complaint Operations 和 Financial Crime Ops 的可执行 playbook。 核心目标: 把 AI 不确定性从 "confidence score" 升级为可运行的产品控制、体验语言、人工升级、证据包、监控和治理体系。 适用范围: credit eligibility assistant、wealth education / advice boundary、payment dispute claim assistant、AML analyst assistant、KYC document extraction、complaint intelligence、contact center copilot、RAG customer service、agentic workflow 和 employee copilot。
重要说明: 本 playbook 是学习、作品集、架构训练和内部治理讨论材料, 不构成法律意见、合规结论、投资建议、信贷审批建议、AML/SAR 决策、投诉处理意见、模型验证报告或审计结论。正式项目必须由 Legal、Compliance、Risk、Model Risk、Privacy、Security、Operations、Business Owner、Technology Owner 和管理层结合机构类型、司法辖区、产品、客户群、渠道和内部政策确认。
Source Anchors
| Source | Link | 本 playbook 使用方式 |
|---|---|---|
| Conformal prediction tutorial | https://arxiv.org/abs/2107.07511 | 作为 uncertainty quantification 的技术锚点, 但不把模型置信区间等同于产品体验或升级策略 |
| NIST AI RMF | https://www.nist.gov/itl/ai-risk-management-framework | 用 Govern / Map / Measure / Manage 组织 risk、control、monitoring 和 governance evidence |
| NIST GenAI Profile | https://www.nist.gov/publications/artificial-intelligence-risk-management-framework-generative-artificial-intelligence | 用生成式 AI 风险视角设计 hallucination、misuse、overreliance、incident 和 human oversight 控制 |
| Microsoft Guidelines for Human-AI Interaction | https://www.microsoft.com/en-us/research/project/guidelines-for-human-ai-interaction/ | 参考 human-AI interaction 的能力边界、uncertainty、feedback 和 recovery 原则, 并转成金融零售 runtime controls |
| ISO/IEC 42001 | https://www.iso.org/standard/81230.html | 用 AI management system 视角组织 policy、role、operation、performance evaluation、audit 和 continual improvement |
| ISO/IEC 23894 | https://www.iso.org/standard/77304.html | 用 AI risk management 视角组织风险识别、分析、评估、处置和复盘 |
1. Purpose And When To Use
Use this playbook when an AI system may influence:
- customer understanding of eligibility, fees, disputes, complaints, account status or next steps;
- employee handling of regulated, sensitive or customer-impacting workflows;
- extraction, classification, summarization or recommendation used in operations;
- routing to human review, specialist queue, licensed advisor, AML investigator or complaint team;
- customer trust, harm prevention, support handoff, appeal or regulatory evidence.
Do not use uncertainty UX as a cosmetic disclaimer. Use it when the product must decide:
Can AI answer?
Can AI answer only part of the question?
Should AI ask for more information?
Should AI refuse safely?
Should AI route to a human?
Should AI block the action and trigger control review?
What should the customer or employee see?
What evidence proves the decision was appropriate?
1.1 Outcome
By the end of design, each use case should have:
| Artifact | Purpose |
|---|---|
| Uncertainty taxonomy | Names the reasons AI cannot provide a normal answer |
| Abstention rules | Maps triggers to answer / ask / refuse / escalate / block |
| Escalation matrix | Defines owner, SLA, handoff payload and customer message |
| Confidence language guide | Controls how uncertainty is communicated by audience |
| Customer communication test | Prevents false promises, over-refusal and misleading disclaimers |
| Evidence packet | Enables replay, audit, complaint review and governance |
| Monitoring dashboard | Tracks harm, friction, quality and control health |
2. Operating Model
2.1 Roles
| Role | Owns |
|---|---|
| Product Owner | Use-case boundaries, value, customer journey, action classes, release trade-offs |
| Senior BA | Decision taxonomy, scenarios, exception paths, evidence requirements, acceptance criteria |
| Solution Architect | Runtime architecture, integration, state machine, logging, workflow and resilience |
| AI / ML Lead | Model confidence, retrieval confidence, eval, drift and technical error analysis |
| Risk / Compliance | Policy boundary, approved language, disclosure, escalation triggers and residual risk |
| Operations Owner | Queue design, SLA, staffing, handoff quality, manual review and closure |
| CX / Content Design | Customer-safe language, accessibility, trust calibration and comprehension testing |
| Model Risk / Audit | Validation evidence, monitoring, control effectiveness and replayability |
| Data Governance / Privacy | Source permission, retention, consent, minimization and sensitive data controls |
2.2 Decision Cadence
| Cadence | Meeting | Decisions |
|---|---|---|
| Design | Uncertainty design workshop | taxonomy, action classes, boundary copy, escalation |
| Pre-release | Control readiness review | eval results, threshold evidence, operational capacity, residual risk |
| Weekly launch | Launch monitoring standup | abnormal abstention, queue load, complaints, overrides, incidents |
| Monthly | Governance review | policy drift, segment disparity, complaint themes, CAPA |
| Quarterly | Management review | risk acceptance, investment, control maturity, audit findings |
2.3 Runtime Flow
intent detected
-> impact and sensitivity classified
-> evidence state assessed
-> policy boundary checked
-> model / retrieval / tool confidence aggregated
-> action class selected
-> approved language applied
-> handoff or answer delivered
-> evidence packet recorded
-> monitoring and learning loop updated
2.4 Minimum Viable Control Plane
| Capability | Minimum requirement |
|---|---|
| Use-case registry | Intended use, prohibited use, impact tier and owner recorded |
| Policy decision service | Deterministic rules for answer / ask / refuse / escalate / block |
| Approved language library | Versioned customer and employee language fragments |
| Evidence ledger | Runtime decision event with source, policy, model and output refs |
| Handoff workflow | Queue, SLA, payload and closure status |
| Eval suite | Scenario tests for uncertainty actions, not only factual accuracy |
| Monitoring | Abstention, escalation, override, complaint, appeal and harm metrics |
3. Template: Uncertainty Taxonomy
| Class | Definition | Example | Default action | Customer / employee message pattern |
|---|---|---|---|---|
| Evidence missing | Required fact, document, transaction or consent is absent | Dispute case lacks cancellation date | Ask more or partial answer | "I can explain the process, but I need the cancellation date for the case-specific next step." |
| Evidence conflict | Sources disagree or document values conflict | KYC document address differs from application | Escalate review | "The information does not match, so a reviewer needs to verify it before we rely on it." |
| Policy boundary | Requested output exceeds permitted AI/channel/role scope | Customer asks for investment buy/sell recommendation | Safe refusal + handoff | "I can provide general information. A licensed advisor is required for personal recommendations." |
| Authorization boundary | User, employee or system lacks permission or consent | Agent asks for restricted AML rationale | Refuse or route authorized owner | "I cannot show that information in this channel." |
| High-impact low-certainty | Potential impact to funds, eligibility, account access or complaint outcome | Low confidence credit reason explanation | Escalate | "This may affect your account or eligibility, so it needs specialist review." |
| Harm or vulnerability | Signals of distress, financial hardship, fraud victimization, accessibility need or complaint | Customer says fee caused missed rent | Escalate support path | "I am routing this to a specialist and preserving the details you shared." |
| Tool or source failure | Core system, retrieval index, payment API or workflow service unavailable | Account status API timeout | Safe delay / case creation | "I cannot verify the status right now, and I will not guess." |
| Unsafe instruction | Request to deceive, bypass controls, hide facts or generate harmful guidance | User asks how to phrase false dispute claim | Refuse / incident | "I cannot help create or submit misleading information." |
| Monitoring hold | Capability paused due to degraded data, model issue or incident | RAG source freshness breach | Disable answer / escalate | "This request needs manual review because the automated source is unavailable." |
Design rule:
Every uncertainty class must map to:
trigger -> action class -> message -> owner -> evidence -> metric.
4. Template: Abstention Rule
Use this table as a design artifact. Rows below are concrete examples ready to adapt into a control library.
| Rule id | Scenario | Trigger | Action class | UX behavior | Owner | Evidence required |
|---|---|---|---|---|---|---|
| UXR-001 | Credit eligibility | User asks if they will be approved and no official prequalification result exists | Qualified refusal + education | Explain factors and final decision boundary; do not predict approval | Lending Product / Compliance | product rule version, approved copy id, customer question |
| UXR-002 | Wealth advice | User asks whether to buy/sell/hold a specific investment | Safe refusal + licensed handoff | Provide neutral education and offer advisor route | Wealth Compliance / Advisor Ops | intent classification, channel, role, boundary policy |
| UXR-003 | Payment dispute | User asks if dispute will succeed while merchant evidence is missing | Partial answer + ask more | Explain process and request missing evidence | Dispute Ops | transaction refs, dispute policy, missing field list |
| UXR-004 | AML assistant | Analyst asks AI to decide SAR filing | Refuse decision + investigation support | Summarize facts and list open questions; require authorized decision | BSA / AML Owner | alert id, source transaction refs, reviewer id |
| UXR-005 | KYC extraction | Document field confidence below threshold or validation mismatch | Field-level abstention + review | Highlight field, route reviewer or request re-upload | KYC Ops / Identity Platform | document ref, field confidence, validation failure |
| UXR-006 | Complaint | AI detects possible complaint but category confidence is low | Preserve and route | Create complaint-intake review, retain original narrative | Complaint Ops | original text, channel, harm signal, AI trace |
| UXR-007 | Contact center | Suggested script contains unsupported promise or prohibited phrase | Block suggestion + supervisor alert | Do not show script or show corrected safe version | Contact Center QA / Compliance | draft output hash, scanner result, policy id |
4.1 Rule Design Pattern
rule_id: UXR-003
scenario: payment_dispute
trigger:
intent: dispute_outcome_prediction
evidence_state: merchant_evidence_missing
customer_impact: high
decision:
action_class: partial_answer_ask_more
prohibited_outputs:
- promise_success
- assign_fault_without_investigation
allowed_outputs:
- explain_process
- request_missing_evidence
- create_case
message:
customer_language_id: dispute_missing_evidence_v4
owner:
business: dispute_operations
control: payments_compliance
evidence:
required:
- transaction_ref
- dispute_policy_version
- missing_evidence_list
monitoring:
tags:
- high_impact
- partial_answer
- evidence_missing
5. Template: Escalation Matrix
| Trigger | Escalation route | SLA | Customer / employee message | Handoff payload | Closure evidence |
|---|---|---|---|---|---|
| Credit application exception, conflicting reason codes or vulnerable customer signal | Lending specialist / credit ops | Same business day for active application | "A specialist needs to review this before we can explain the next step." | application id, reason codes, evidence conflict, customer question | specialist decision, customer communication, reason code owner |
| Personalized wealth recommendation request | Licensed advisor / wealth service team | Channel-specific appointment or warm transfer target | "A licensed advisor can discuss your personal situation." | customer intent, product mentioned, profile freshness, boundary reason | advisor referral status, disclosure shown |
| High-value payment dispute, fraud signal or repeated failed dispute | Dispute ops / fraud team | Urgent queue for active fraud or vulnerable customer | "I am routing this for specialist review and preserving the details." | transaction refs, customer narrative, evidence list, risk flag | case id, reviewer notes, outcome notice |
| AML case decision request | AML investigator / BSA officer | Internal investigation SLA | "AI can support analysis, but an authorized reviewer makes this decision." | alert id, transaction graph, entity list, missing evidence | reviewer disposition, SAR-sensitive evidence controls |
| KYC document mismatch | KYC review queue | Identity platform exception SLA | "The information needs review because the document and application do not match." | doc refs, extracted fields, validation failures, source images | reviewer confirmation, re-upload or exception disposition |
| Complaint or legal threat | Complaint operations / legal-compliance review | Complaint policy SLA starts at intake | "I have recorded this for review and the team will follow the complaint process." | original narrative, transcript, product, harm signal, AI touchpoint | complaint id, acknowledgement, resolution communication |
| Contact center unsafe script or agent override | Supervisor / QA / compliance surveillance | Real-time or next QA cycle by severity | "This suggested response requires review before use." | transcript, script, scanner flags, agent action | QA disposition, coaching or control update |
Escalation quality test:
- The customer is not forced to restart the story.
- The employee receives evidence, not just a risk label.
- The receiving team has authority to act.
- SLA starts and closes in a system of record.
- The AI trace links to the case without overexposing sensitive data.
6. Template: Confidence Language Guide
6.1 Customer-Facing
| Situation | Approved pattern | Avoid |
|---|---|---|
| Strong evidence | "Based on your posted transaction and the current dispute process, the next step is..." | "I am 94% sure..." |
| Missing case fact | "I need one more detail to explain the case-specific next step." | "Probably..." |
| Advice boundary | "I can explain general concepts, but I cannot provide a personal recommendation in this channel." | "You should..." |
| Eligibility boundary | "A final decision depends on the official application review." | "You should qualify." |
| System unavailable | "I cannot verify that right now, so I will not guess." | "It looks fine." |
| Harm signal | "This may affect your account or finances, so I am routing it to a specialist." | "Please try again later." |
6.2 Employee-Facing
| Signal | Display |
|---|---|
| Evidence confidence | source status, freshness, contradiction flag |
| Model confidence | banded confidence with explanation of what it measures |
| Missing facts | explicit list and required questions |
| Forbidden language | blocked phrases and safer alternatives |
| Escalation trigger | reason, route, SLA and required payload |
| Review obligation | fields that require human confirmation before action |
6.3 Governance-Facing
| Audience | Required view |
|---|---|
| Model Risk | eval results by uncertainty class, confidence distribution, threshold evidence |
| Compliance | boundary breaches, prohibited claims, approved language usage |
| Operations | escalation volume, SLA, queue aging, closure quality |
| CX | customer comprehension, abandonment, repeat contact, trust calibration |
| Audit | replayable evidence packet, policy version, output hash, owner approval |
7. Template: Customer Communication Test
Run every customer-facing uncertainty message through this test before release.
| Test | Pass condition | Fail example |
|---|---|---|
| Capability clarity | User understands what AI can and cannot do | "AI-generated answer may be inaccurate" with no boundary |
| Evidence clarity | User knows what the answer is based on | "Based on available information" without naming source type |
| No false promise | Message does not imply approval, refund, eligibility, compensation or advice | "You should be eligible" |
| Useful next step | User knows what to provide or who will respond | "I cannot help with that" |
| Regulated boundary | Legal, investment, credit, AML and complaint boundaries are respected | "This is not a valid complaint" |
| Accessibility and vulnerability | Message avoids blame, pressure and confusing jargon | "You failed verification" without alternatives |
| Recourse path | Impacted customer has route to review, appeal, complaint or specialist | no route after adverse or uncertain outcome |
| Evidence retention | Message id and output hash are retained | unversioned free-form copy |
Example revised message:
I can explain the dispute process and record the facts you provide. I cannot promise the outcome because the merchant evidence and network review are not complete. Please upload the cancellation confirmation or tell me the date you contacted the merchant. If this transaction is fraud or is causing immediate hardship, I can route it to a specialist now.
Why it passes:
- It gives scope.
- It refuses outcome promise.
- It asks for specific evidence.
- It provides escalation triggers.
- It avoids raw confidence and vague disclaimer language.
8. Template: Evidence Packet
Each uncertainty decision should create a compact event that can be joined to conversation, case, model, policy and workflow records.
| Field group | Fields |
|---|---|
| Identity | uncertainty_event_id, conversation_id, case_id, customer_token, employee_id, channel |
| Use case | use_case_id, intent, impact_tier, regulated_boundary, vulnerable_signal |
| Evidence state | source_refs, source_versions, freshness, contradictions, missing_fields |
| Confidence | model_confidence_band, retrieval_confidence_band, tool_validation_status, calibration_segment |
| Policy | policy_decision_id, abstention_class, action_class, threshold_version, approved_language_ids |
| Output | response_template_id, output_hash, citations, disclosure_ids, blocked_phrase_scan |
| Handoff | escalation_required, queue, SLA, handoff_reason, payload_refs, closure_status |
| Human action | reviewer_id, override, override_reason, final_disposition, customer_message_ref |
| Monitoring | tags, complaint_link, appeal_link, QA_result, incident_id, CAPA_id |
| Governance | retention_class, legal_hold_flag, audit_ready, residual_risk_acceptance |
Concrete JSON-style example:
{
"uncertainty_event_id": "uxr-2026-06-30-169-001",
"use_case_id": "kyc_document_extraction",
"intent": "extract_identity_fields",
"impact_tier": "high",
"abstention_class": "evidence_conflict",
"action_class": "field_level_review",
"evidence_state": {
"source_refs": ["doc_front_image_ref", "application_address_ref"],
"contradictions": ["address_mismatch"],
"missing_fields": []
},
"confidence": {
"name_field": "high",
"dob_field": "high",
"address_field": "low"
},
"policy_decision_id": "kyc_field_abstention_v3",
"approved_language_ids": ["kyc_address_review_message_v2"],
"handoff": {
"escalation_required": true,
"queue": "kyc_review",
"handoff_reason": "document_application_address_mismatch"
},
"monitoring_tags": ["high_impact", "field_level_abstention", "manual_review"]
}
9. PM / BA / Architecture Questions
9.1 PM Questions
| Question | Strong answer |
|---|---|
| What customer job is AI helping with? | Specific journey step and value, not generic chatbot scope |
| What harm can occur if AI is overconfident? | Financial loss, wrong eligibility belief, missed complaint, unsuitable advice, delayed fraud response |
| When is partial answer better than refusal? | When process education is safe but case-specific conclusion lacks evidence |
| What is the trust calibration goal? | User understands AI scope and still has a path forward |
| What business KPI could conflict with safe abstention? | Containment, conversion, AHT, approval rate or manual review cost |
9.2 BA Questions
| Question | Strong answer |
|---|---|
| What are the decision states? | answer, qualify, partial answer, ask more, refuse, escalate, block |
| What inputs are required for each state? | Evidence, role, policy, identity, consent and source-of-record fields |
| What exception paths exist? | missing docs, conflicting sources, tool failure, vulnerable customer, complaint, legal threat |
| What reason codes are customer-safe? | Approved reason taxonomy mapped to internal reason without over-disclosure |
| What evidence must be preserved? | Original user text, source refs, policy ids, model/tool versions, output hash and human override |
9.3 Architecture Questions
| Question | Strong answer |
|---|---|
| Where does the action decision happen? | In policy decision service, not only in the prompt |
| How are confidence signals aggregated? | Model, retrieval, tool validation, evidence completeness and policy certainty kept distinct |
| How are approved messages controlled? | Versioned language service with channel, role, product and region filters |
| How does handoff work? | Case creation, queue, SLA, payload refs and closure event |
| How can audit replay a case? | Evidence ledger joins conversation, model, RAG, policy, workflow and human action records |
| How is privacy preserved? | Tokenized IDs, role-based evidence, retention class and source minimization |
10. Release Checklist
10.1 Product And Policy
| Check | Pass condition |
|---|---|
| Intended use registered | Use case, owner, impact tier and prohibited use documented |
| Action classes approved | answer / partial / ask / refuse / escalate / block mapped to scenarios |
| Boundary policy approved | credit, wealth, AML, complaint, KYC and dispute boundaries reviewed |
| Approved language loaded | customer, employee and escalation copy versioned |
| Recourse path defined | review, appeal, complaint, specialist or advisor route exists |
10.2 Architecture And Operations
| Check | Pass condition |
|---|---|
| Policy engine integrated | Runtime can enforce action class and copy constraints |
| Evidence ledger live | Every uncertainty decision records required fields |
| Handoff tested | Queue receives payload and can close with disposition |
| Tool failure behavior tested | System does not hallucinate source-of-record state |
| Capacity assessed | Escalation volume does not exceed staffed SLA assumptions |
10.3 Eval And Monitoring
| Check | Pass condition |
|---|---|
| Scenario eval complete | Missing evidence, conflict, boundary, harm, tool outage and adversarial pressure cases pass |
| Segment review complete | Abstention and friction rates checked for material disparity |
| Red-team completed | Prompts attempting overreach, false claims and control bypass are blocked |
| Dashboard ready | Metrics for abstention, escalation, override, complaint, appeal and evidence completeness |
| Incident playbook ready | Owner, severity, containment, customer remediation and CAPA path defined |
10.4 Governance
| Check | Pass condition |
|---|---|
| Risk acceptance recorded | Residual risk and launch constraints approved by named owner |
| Model risk evidence available | Eval, calibration, uncertainty behavior and monitoring plan retained |
| Compliance sign-off recorded | Boundary, disclosure and approved copy reviewed |
| Audit sample replay passed | At least one end-to-end case replayed from evidence packet |
| Post-launch review scheduled | Date, metrics, owners and decision rights set |
11. Scenario Design Packs
11.1 Credit Eligibility
| Design element | Decision |
|---|---|
| Allowed | Explain general criteria, required documents, application status if sourced |
| Abstain | Predict approval without official prequalification or decision engine |
| Ask more | Missing income, identity, product selection or consent |
| Escalate | Conflicting reason codes, vulnerable customer, adverse action confusion, complaint |
| Evidence | application refs, reason code policy, approved copy, decision owner |
11.2 Wealth Advice Boundary
| Design element | Decision |
|---|---|
| Allowed | Neutral education, risk concepts, fee explanation, advisor appointment |
| Abstain | Personalized buy/sell/hold, allocation, tax or guaranteed outcome |
| Ask more | If education context is unclear, ask whether user wants general information |
| Escalate | Personal recommendation request, complex product, vulnerable investor |
| Evidence | intent, role/channel, profile freshness, boundary policy, referral status |
11.3 Payment Dispute Claims
| Design element | Decision |
|---|---|
| Allowed | Process explanation, fact collection, document checklist, status from system |
| Abstain | Outcome promise, merchant fault conclusion, unsupported chargeback advice |
| Ask more | merchant contact, cancellation proof, delivery evidence, fraud status |
| Escalate | high value, fraud, hardship, repeated dispute, legal/regulator mention |
| Evidence | transaction refs, customer narrative, dispute policy, case id |
11.4 AML Analyst Assistant
| Design element | Decision |
|---|---|
| Allowed | Alert summary, entity extraction, transaction chronology, typology hints |
| Abstain | SAR decision, case closure, customer-facing explanation of suspicious activity |
| Ask more | missing KYC, missing source-of-funds, unclear beneficial ownership |
| Escalate | high-risk typology, sanctions overlap, SAR-sensitive issue, reviewer conflict |
| Evidence | alert id, graph refs, transaction refs, reviewer notes, disposition |
11.5 KYC Document Extraction
| Design element | Decision |
|---|---|
| Allowed | Field extraction, confidence by field, source crop, validation check |
| Abstain | Identity verification conclusion from low-quality or conflicting evidence |
| Ask more | re-upload, alternate document, missing backside, address proof |
| Escalate | mismatch, suspected tampering, accessibility issue, repeated failure |
| Evidence | document refs, field confidence, validation result, reviewer confirmation |
11.6 Complaints
| Design element | Decision |
|---|---|
| Allowed | Preserve narrative, classify, summarize with citations, route |
| Abstain | Legal conclusion, dismissal, compensation promise, complaint suppression |
| Ask more | product, date, transaction, impact, desired resolution |
| Escalate | regulatory source, legal threat, vulnerable customer, repeated harm |
| Evidence | original text, transcript, AI touchpoint, harm signal, complaint id |
11.7 Contact Center
| Design element | Decision |
|---|---|
| Allowed | Suggested questions, safe draft, summary, next-best-action within policy |
| Abstain | Unsafe script, sales push in hardship, unsupported promise |
| Ask more | agent needs to confirm facts or consent before script |
| Escalate | supervisor, specialist, complaint, fraud, vulnerable customer |
| Evidence | transcript, script hash, scanner flags, agent accept/override |
12. Executive Narrative
Executive message:
We are not launching AI that simply says "I may be wrong."
We are launching a decision-control architecture that knows when to answer,
when to ask for more evidence, when to refuse, when to route to specialists,
and how to prove the decision later.
Why it matters:
- Customer trust improves when uncertainty is specific and paired with a next step.
- Customer harm decreases when high-impact, low-evidence and regulated-boundary cases are not guessed.
- Operations improve when handoff includes evidence, owner, reason and SLA.
- Compliance improves when approved language and policy ids are enforced at runtime.
- Model risk improves when uncertainty behavior is measured, not hidden.
- Executive governance improves when incidents, complaints and overrides feed back into CAPA.
Board-level framing:
| Board concern | Response |
|---|---|
| Are we over-automating regulated interactions? | High-impact and advice-boundary scenarios require policy-gated action and human handoff |
| Can we prove what AI did? | Evidence packet joins model, source, policy, output and human action |
| Will this worsen customer friction? | Metrics track ask-more, escalation, abandonment and segment disparity |
| Will this reduce operational efficiency? | Escalation precision, queue SLA and handoff completeness manage workload |
| How do we learn from failures? | Complaints, overrides, QA and incidents feed CAPA and eval updates |
13. Interview Drills
Drill 1: Explain The Difference Between Confidence And Evidence
Strong answer:
Confidence is a signal about model or retrieval behavior. Evidence is what supports a specific claim in a specific customer context. In finance, a high model confidence answer can still be unsafe if the source is stale, the policy boundary forbids the claim, the user lacks authorization, or the action affects customer rights. I separate model confidence, evidence confidence, policy certainty and impact severity, then use those signals to choose answer, partial answer, ask-more, refusal or escalation.
Drill 2: Design Uncertainty UX For Payment Disputes
Strong answer:
I would not show a probability that the customer will win. The AI can explain the dispute process, collect facts, identify missing evidence and create a case. It should abstain from outcome promises and escalate fraud, hardship, high-value, repeat or complaint cases. The handoff packet should include transaction refs, customer narrative, evidence gaps, policy version and AI uncertainty reason.
Drill 3: Handle Wealth Advice Boundary
Strong answer:
If the user asks whether to buy or sell a product, uncertainty is not just technical. It is a licensing, suitability and conduct boundary. The AI should provide neutral education, disclose that personal recommendations require an authorized channel, and offer a licensed advisor handoff. It should log the intent, channel, profile freshness and boundary policy.
Drill 4: CTO Architecture Question
Strong answer:
I would create a policy decision service that receives intent, impact tier, evidence state, authorization, confidence signals, tool status and reversibility. It returns an action class and copy constraints. The LLM does not independently decide regulated refusal or escalation. Each event writes an evidence packet with model, RAG source, policy id, output hash and handoff status. Monitoring then tracks abstention class, escalation precision, override, complaint and segment friction.
Drill 5: Governance Lead Question
Strong answer:
I would require every customer-impacting uncertainty event to be replayable. That means source refs, policy versions, approved language ids, model/tool versions, action class, handoff route, human override and downstream complaint or appeal link. Governance reviews should look at not only model accuracy, but also unsupported claims, wrong abstentions, missed escalations, customer harm and CAPA closure.
14. Portfolio Summary
AI uncertainty architecture is the difference between a fluent assistant and a governed service system.
The mature design question is not:
How confident is the model?
The mature design question is:
Given this user, intent, evidence, policy boundary, impact level and operational capacity,
what is the safest useful next action, what should we say, who owns it,
and what evidence proves we handled it correctly?
That question is where advanced PM, BA, architecture and AI governance work meet.