AI Explainability:可解释与可争议架构
一句话:
AI Explainability / Contestability / Adverse Action Architecture 解读
面向对象: AI PM / Product Architect / Compliance Product Lead / Senior BA / Model Risk / Customer Experience Risk。 核心问题: 解释不是把模型 chain-of-thought 展示给用户。受监管金融 AI 需要把用户可见解释、内部证据、reason code、申诉路径、人工复核和审计查询设计成产品架构。 学习目标: 设计 explanation layers、adverse action reason architecture、contestability workflow、explanation eval 和 audit evidence pack。
Source Anchors
| Source | Link | 用途 |
|---|---|---|
| NIST AI RMF | https://www.nist.gov/itl/ai-risk-management-framework | 参考 transparency、accountability、measurement、risk management |
| EU AI Act | https://eur-lex.europa.eu/eli/reg/2024/1689/oj | 参考 transparency、human oversight、high-risk AI obligations |
| CFPB Circular 2022-03 | https://www.consumerfinance.gov/compliance/circulars/circular-2022-03-adverse-action-notification-requirements-in-connection-with-credit-decisions-based-on-complex-algorithms/ | 参考复杂算法信贷决策中的 adverse action 通知要求 |
| Regulation B section 1002.9 | https://www.ecfr.gov/current/title-12/chapter-X/part-1002/section-1002.9 | 参考 adverse action notice 的监管语境 |
| W3C PROV | https://www.w3.org/TR/prov-overview/ | 用 provenance 思维组织解释来源和证据链 |
一句话:
Explanation is a governed product interface, not raw model reasoning.
1. Explanation Layers
| Layer | 受众 | 内容 | 不应包含 |
|---|---|---|---|
| User-facing explanation | 客户/员工 | 简洁原因、证据、下一步、申诉路径 | raw chain-of-thought |
| Regulator/auditor evidence | 审计/监管 | decision record、source、reason code、control evidence | 无法验证的模型猜测 |
| Model/system validation | 模型风险/验证 | eval results、segment behavior、failure modes | 客户敏感原文过量暴露 |
| Engineering debug trace | 工程团队 | config、prompt、retrieval、tool、logs | 未授权 PII |
| Internal reasoning trace | 系统内部 | 推理辅助材料 | 直接作为客户解释 |
核心边界:
internal reasoning is not customer explanation
客户解释必须绑定事实、政策、decision code、可验证来源和纠错路径。
2. Adverse Action Reason Architecture
高风险错误:
LLM reads application + invents plausible denial reason
正确架构:
decision engine / policy system
-> canonical reason code
-> approved reason language
-> evidence fields
-> user-facing explanation template
-> audit record
| Component | 作用 |
|---|---|
| Reason code source of truth | 禁止模型编造原因 |
| Evidence mapping | 每个原因绑定事实字段 |
| Approved templates | 控制措辞、清晰度、合规口径 |
| Explanation generator | 只做格式化和个性化可读性 |
| Faithfulness evaluator | 检查解释是否符合 code/evidence |
| Contestability workflow | 客户可纠错、补资料、申诉 |
3. Contestability Architecture
customer receives explanation
-> understands reason and next action
-> submits correction / appeal / missing evidence
-> human review
-> data correction or decision reconsideration
-> outcome notification
-> evidence retained
-> monitoring updates
Contestability 不是“联系客服”。它要明确:
- 哪些决定可申诉。
- 客户如何提交新证据。
- 谁复核。
- 是否会重新运行规则/模型。
- 多久给结果。
- 如何解释维持或改变决定。
- 结果如何进入质量和公平性监控。
4. 金融零售案例
4.1 Credit denial explanation assistant
AI 可以:
- 把正式 reason code 转成易懂语言。
- 解释客户可采取的下一步。
- 检查是否缺少 required disclosure。
AI 不可以:
- 发明拒绝原因。
- 修改正式 reason code。
- 给出不一致的重新申请建议。
4.2 Overdraft fee explanation
AI 可以:
- 引用 fee schedule。
- 解释交易时间线。
- 引导客户发起争议。
风险:
- 过度自信解释复杂交易顺序。
- 忽略客户投诉和 vulnerable customer signal。
4.3 Fraud false positive
AI 可以:
- 解释安全验证步骤。
- 给客户恢复访问路径。
风险:
- 透露欺诈检测规则。
- 对客户造成过度负担。
5. Explanation Faithfulness Eval
| Eval item | Pass criteria |
|---|---|
| Reason-code consistency | explanation reason matches source of truth |
| Evidence grounding | all factual claims link to evidence |
| No invented reason | no unsupported denial / fee / fraud cause |
| Policy version | uses current approved policy |
| Customer readability | understandable at target reading level |
| Appeal path | next action / recourse path included |
| Segment consistency | explanation quality not worse by language/channel/customer group |
| Prohibited wording | avoids blame, guarantees, legal overstatements |
6. Adverse Action Evidence Pack
# Adverse Action Explanation Evidence Pack
Customer / case id:
Decision date:
Decision system:
Reason code:
Evidence fields:
Policy version:
Explanation template version:
Generated text hash:
Human review:
Customer delivery channel:
Appeal / correction path:
Contestability outcome:
Monitoring tags:
7. Metrics / KRIs
| Metric | 风险 |
|---|---|
| Unsupported reason rate | 模型编造或误用原因 |
| Explanation correction rate | 解释质量问题 |
| Appeal upheld rate | 决策或解释错误 |
| Segment explanation disparity | 不同群体解释质量差异 |
| Stale policy citation | 使用过期政策 |
| Human override rate | AI 解释不可靠 |
| Complaint linkage | 客户不理解或不接受 |
| Missing appeal path | contestability failure |
8. 面试表达
30 秒版本:
我会把解释设计成受治理的产品界面, 而不是暴露模型推理。金融服务里, user-facing explanation 必须绑定 reason code、证据字段、政策版本和申诉路径。模型可以帮助生成可读文本, 但不能发明拒绝原因或替代 decision system。Contestability 要有客户纠错、人工复核、重跑/更正和结果通知闭环。
2 分钟版本:
解释架构我会分层: 用户可见解释、审计/监管证据、模型验证证据、工程 debug trace 和内部推理材料。客户看到的是简洁、可验证、合规的 explanation; 审计看到的是 decision record、reason code、source、policy version 和 control evidence。 对 adverse action, reason code 必须来自 decision engine 或政策系统, LLM 只能根据批准模板和证据字段生成可读说明。必须有 faithfulness eval 检查 reason-code consistency、evidence grounding、no invented reason、policy freshness、appeal path 和 segment consistency。 Contestability 不是一句“如有疑问请联系客服”, 而是可执行 workflow: 客户提交纠错或新证据, 人工复核, 必要时重跑规则/模型, 结果通知, 并把 upheld appeal 回流到监控和 eval。
9. Portfolio Exercise
为 credit explanation assistant 设计:
- Explanation layer map。
- Reason-code mapping table。
- Explanation policy。
- Faithfulness eval set。
- Contestability workflow。
- Adverse action evidence pack。
输出:
- Explainability & Contestability Pack。
- 1 页 adverse action architecture memo。
- 2 分钟面试回答。