AI 底层逻辑 / 经典论文

AI Explainability：可解释与可争议架构

一句话:

225 行ai-foundations/papers/105-ai-explainability-contestability-adverse-action-architecture.md

AI Explainability / Contestability / Adverse Action Architecture 解读

面向对象: AI PM / Product Architect / Compliance Product Lead / Senior BA / Model Risk / Customer Experience Risk。核心问题: 解释不是把模型 chain-of-thought 展示给用户。受监管金融 AI 需要把用户可见解释、内部证据、reason code、申诉路径、人工复核和审计查询设计成产品架构。学习目标: 设计 explanation layers、adverse action reason architecture、contestability workflow、explanation eval 和 audit evidence pack。

Source Anchors

Source	Link	用途
NIST AI RMF	https://www.nist.gov/itl/ai-risk-management-framework	参考 transparency、accountability、measurement、risk management
EU AI Act	https://eur-lex.europa.eu/eli/reg/2024/1689/oj	参考 transparency、human oversight、high-risk AI obligations
CFPB Circular 2022-03	https://www.consumerfinance.gov/compliance/circulars/circular-2022-03-adverse-action-notification-requirements-in-connection-with-credit-decisions-based-on-complex-algorithms/	参考复杂算法信贷决策中的 adverse action 通知要求
Regulation B section 1002.9	https://www.ecfr.gov/current/title-12/chapter-X/part-1002/section-1002.9	参考 adverse action notice 的监管语境
W3C PROV	https://www.w3.org/TR/prov-overview/	用 provenance 思维组织解释来源和证据链

一句话:

Explanation is a governed product interface, not raw model reasoning.

1. Explanation Layers

Layer	受众	内容	不应包含
User-facing explanation	客户/员工	简洁原因、证据、下一步、申诉路径	raw chain-of-thought
Regulator/auditor evidence	审计/监管	decision record、source、reason code、control evidence	无法验证的模型猜测
Model/system validation	模型风险/验证	eval results、segment behavior、failure modes	客户敏感原文过量暴露
Engineering debug trace	工程团队	config、prompt、retrieval、tool、logs	未授权 PII
Internal reasoning trace	系统内部	推理辅助材料	直接作为客户解释

核心边界:

internal reasoning is not customer explanation

客户解释必须绑定事实、政策、decision code、可验证来源和纠错路径。

2. Adverse Action Reason Architecture

高风险错误:

LLM reads application + invents plausible denial reason

正确架构:

decision engine / policy system
  -> canonical reason code
  -> approved reason language
  -> evidence fields
  -> user-facing explanation template
  -> audit record

Component	作用
Reason code source of truth	禁止模型编造原因
Evidence mapping	每个原因绑定事实字段
Approved templates	控制措辞、清晰度、合规口径
Explanation generator	只做格式化和个性化可读性
Faithfulness evaluator	检查解释是否符合 code/evidence
Contestability workflow	客户可纠错、补资料、申诉

3. Contestability Architecture

customer receives explanation
  -> understands reason and next action
  -> submits correction / appeal / missing evidence
  -> human review
  -> data correction or decision reconsideration
  -> outcome notification
  -> evidence retained
  -> monitoring updates

Contestability 不是“联系客服”。它要明确:

哪些决定可申诉。
客户如何提交新证据。
谁复核。
是否会重新运行规则/模型。
多久给结果。
如何解释维持或改变决定。
结果如何进入质量和公平性监控。

4. 金融零售案例

4.1 Credit denial explanation assistant

AI 可以:

把正式 reason code 转成易懂语言。
解释客户可采取的下一步。
检查是否缺少 required disclosure。

AI 不可以:

发明拒绝原因。
修改正式 reason code。
给出不一致的重新申请建议。

4.2 Overdraft fee explanation

AI 可以:

引用 fee schedule。
解释交易时间线。
引导客户发起争议。

风险:

过度自信解释复杂交易顺序。
忽略客户投诉和 vulnerable customer signal。

4.3 Fraud false positive

AI 可以:

解释安全验证步骤。
给客户恢复访问路径。

风险:

透露欺诈检测规则。
对客户造成过度负担。

5. Explanation Faithfulness Eval

Eval item	Pass criteria
Reason-code consistency	explanation reason matches source of truth
Evidence grounding	all factual claims link to evidence
No invented reason	no unsupported denial / fee / fraud cause
Policy version	uses current approved policy
Customer readability	understandable at target reading level
Appeal path	next action / recourse path included
Segment consistency	explanation quality not worse by language/channel/customer group
Prohibited wording	avoids blame, guarantees, legal overstatements

6. Adverse Action Evidence Pack

# Adverse Action Explanation Evidence Pack

Customer / case id:
Decision date:
Decision system:
Reason code:
Evidence fields:
Policy version:
Explanation template version:
Generated text hash:
Human review:
Customer delivery channel:
Appeal / correction path:
Contestability outcome:
Monitoring tags:

7. Metrics / KRIs

Metric	风险
Unsupported reason rate	模型编造或误用原因
Explanation correction rate	解释质量问题
Appeal upheld rate	决策或解释错误
Segment explanation disparity	不同群体解释质量差异
Stale policy citation	使用过期政策
Human override rate	AI 解释不可靠
Complaint linkage	客户不理解或不接受
Missing appeal path	contestability failure

8. 面试表达

30 秒版本:

我会把解释设计成受治理的产品界面, 而不是暴露模型推理。金融服务里, user-facing explanation 必须绑定 reason code、证据字段、政策版本和申诉路径。模型可以帮助生成可读文本, 但不能发明拒绝原因或替代 decision system。Contestability 要有客户纠错、人工复核、重跑/更正和结果通知闭环。

2 分钟版本:

解释架构我会分层: 用户可见解释、审计/监管证据、模型验证证据、工程 debug trace 和内部推理材料。客户看到的是简洁、可验证、合规的 explanation; 审计看到的是 decision record、reason code、source、policy version 和 control evidence。对 adverse action, reason code 必须来自 decision engine 或政策系统, LLM 只能根据批准模板和证据字段生成可读说明。必须有 faithfulness eval 检查 reason-code consistency、evidence grounding、no invented reason、policy freshness、appeal path 和 segment consistency。 Contestability 不是一句“如有疑问请联系客服”, 而是可执行 workflow: 客户提交纠错或新证据, 人工复核, 必要时重跑规则/模型, 结果通知, 并把 upheld appeal 回流到监控和 eval。

9. Portfolio Exercise

为 credit explanation assistant 设计:

Explanation layer map。
Reason-code mapping table。
Explanation policy。
Faithfulness eval set。
Contestability workflow。
Adverse action evidence pack。

输出:

Explainability & Contestability Pack。
1 页 adverse action architecture memo。
2 分钟面试回答。