返回 Papers
AI 底层逻辑 / 经典论文

AI Explainability:可解释与可争议架构

一句话:

225ai-foundations/papers/105-ai-explainability-contestability-adverse-action-architecture.md

AI Explainability / Contestability / Adverse Action Architecture 解读

面向对象: AI PM / Product Architect / Compliance Product Lead / Senior BA / Model Risk / Customer Experience Risk。 核心问题: 解释不是把模型 chain-of-thought 展示给用户。受监管金融 AI 需要把用户可见解释、内部证据、reason code、申诉路径、人工复核和审计查询设计成产品架构。 学习目标: 设计 explanation layers、adverse action reason architecture、contestability workflow、explanation eval 和 audit evidence pack。


Source Anchors

SourceLink用途
NIST AI RMFhttps://www.nist.gov/itl/ai-risk-management-framework参考 transparency、accountability、measurement、risk management
EU AI Acthttps://eur-lex.europa.eu/eli/reg/2024/1689/oj参考 transparency、human oversight、high-risk AI obligations
CFPB Circular 2022-03https://www.consumerfinance.gov/compliance/circulars/circular-2022-03-adverse-action-notification-requirements-in-connection-with-credit-decisions-based-on-complex-algorithms/参考复杂算法信贷决策中的 adverse action 通知要求
Regulation B section 1002.9https://www.ecfr.gov/current/title-12/chapter-X/part-1002/section-1002.9参考 adverse action notice 的监管语境
W3C PROVhttps://www.w3.org/TR/prov-overview/用 provenance 思维组织解释来源和证据链

一句话:

Explanation is a governed product interface, not raw model reasoning.


1. Explanation Layers

Layer受众内容不应包含
User-facing explanation客户/员工简洁原因、证据、下一步、申诉路径raw chain-of-thought
Regulator/auditor evidence审计/监管decision record、source、reason code、control evidence无法验证的模型猜测
Model/system validation模型风险/验证eval results、segment behavior、failure modes客户敏感原文过量暴露
Engineering debug trace工程团队config、prompt、retrieval、tool、logs未授权 PII
Internal reasoning trace系统内部推理辅助材料直接作为客户解释

核心边界:

internal reasoning is not customer explanation

客户解释必须绑定事实、政策、decision code、可验证来源和纠错路径。


2. Adverse Action Reason Architecture

高风险错误:

LLM reads application + invents plausible denial reason

正确架构:

decision engine / policy system
  -> canonical reason code
  -> approved reason language
  -> evidence fields
  -> user-facing explanation template
  -> audit record
Component作用
Reason code source of truth禁止模型编造原因
Evidence mapping每个原因绑定事实字段
Approved templates控制措辞、清晰度、合规口径
Explanation generator只做格式化和个性化可读性
Faithfulness evaluator检查解释是否符合 code/evidence
Contestability workflow客户可纠错、补资料、申诉

3. Contestability Architecture

customer receives explanation
  -> understands reason and next action
  -> submits correction / appeal / missing evidence
  -> human review
  -> data correction or decision reconsideration
  -> outcome notification
  -> evidence retained
  -> monitoring updates

Contestability 不是“联系客服”。它要明确:

  • 哪些决定可申诉。
  • 客户如何提交新证据。
  • 谁复核。
  • 是否会重新运行规则/模型。
  • 多久给结果。
  • 如何解释维持或改变决定。
  • 结果如何进入质量和公平性监控。

4. 金融零售案例

4.1 Credit denial explanation assistant

AI 可以:

  • 把正式 reason code 转成易懂语言。
  • 解释客户可采取的下一步。
  • 检查是否缺少 required disclosure。

AI 不可以:

  • 发明拒绝原因。
  • 修改正式 reason code。
  • 给出不一致的重新申请建议。

4.2 Overdraft fee explanation

AI 可以:

  • 引用 fee schedule。
  • 解释交易时间线。
  • 引导客户发起争议。

风险:

  • 过度自信解释复杂交易顺序。
  • 忽略客户投诉和 vulnerable customer signal。

4.3 Fraud false positive

AI 可以:

  • 解释安全验证步骤。
  • 给客户恢复访问路径。

风险:

  • 透露欺诈检测规则。
  • 对客户造成过度负担。

5. Explanation Faithfulness Eval

Eval itemPass criteria
Reason-code consistencyexplanation reason matches source of truth
Evidence groundingall factual claims link to evidence
No invented reasonno unsupported denial / fee / fraud cause
Policy versionuses current approved policy
Customer readabilityunderstandable at target reading level
Appeal pathnext action / recourse path included
Segment consistencyexplanation quality not worse by language/channel/customer group
Prohibited wordingavoids blame, guarantees, legal overstatements

6. Adverse Action Evidence Pack

# Adverse Action Explanation Evidence Pack

Customer / case id:
Decision date:
Decision system:
Reason code:
Evidence fields:
Policy version:
Explanation template version:
Generated text hash:
Human review:
Customer delivery channel:
Appeal / correction path:
Contestability outcome:
Monitoring tags:

7. Metrics / KRIs

Metric风险
Unsupported reason rate模型编造或误用原因
Explanation correction rate解释质量问题
Appeal upheld rate决策或解释错误
Segment explanation disparity不同群体解释质量差异
Stale policy citation使用过期政策
Human override rateAI 解释不可靠
Complaint linkage客户不理解或不接受
Missing appeal pathcontestability failure

8. 面试表达

30 秒版本:

我会把解释设计成受治理的产品界面, 而不是暴露模型推理。金融服务里, user-facing explanation 必须绑定 reason code、证据字段、政策版本和申诉路径。模型可以帮助生成可读文本, 但不能发明拒绝原因或替代 decision system。Contestability 要有客户纠错、人工复核、重跑/更正和结果通知闭环。

2 分钟版本:

解释架构我会分层: 用户可见解释、审计/监管证据、模型验证证据、工程 debug trace 和内部推理材料。客户看到的是简洁、可验证、合规的 explanation; 审计看到的是 decision record、reason code、source、policy version 和 control evidence。 对 adverse action, reason code 必须来自 decision engine 或政策系统, LLM 只能根据批准模板和证据字段生成可读说明。必须有 faithfulness eval 检查 reason-code consistency、evidence grounding、no invented reason、policy freshness、appeal path 和 segment consistency。 Contestability 不是一句“如有疑问请联系客服”, 而是可执行 workflow: 客户提交纠错或新证据, 人工复核, 必要时重跑规则/模型, 结果通知, 并把 upheld appeal 回流到监控和 eval。


9. Portfolio Exercise

为 credit explanation assistant 设计:

  1. Explanation layer map。
  2. Reason-code mapping table。
  3. Explanation policy。
  4. Faithfulness eval set。
  5. Contestability workflow。
  6. Adverse action evidence pack。

输出:

  • Explainability & Contestability Pack。
  • 1 页 adverse action architecture memo。
  • 2 分钟面试回答。