AI 底层逻辑 / 经典论文

Human-AI Interaction Guidelines：AI 产品设计

一句话:

254 行ai-foundations/papers/61-human-ai-interaction-guidelines-product-design.md

Human-AI Interaction Guidelines / AI Product Design 解读

面向对象: AI Product Manager / UX Lead / AI BA / AI Architect / Customer-Facing AI Owner。核心问题: AI 产品失败经常不是模型完全不会，而是用户不知道它什么时候可靠、什么时候不可靠、如何纠错、如何升级人工、如何恢复控制。Human-AI Interaction 是把模型能力变成可被人安全使用的产品系统。学习目标: 理解 Amershi et al. 的 Human-AI Interaction Guidelines、Google PAIR People + AI Guidebook、calibrated trust、automation bias、recoverability、feedback 和人机协作设计，并映射到金融零售客户和员工场景。

Source Anchors

Source	Link	用途
Guidelines for Human-AI Interaction	https://www.microsoft.com/en-us/research/publication/guidelines-for-human-ai-interaction/	理解 18 条 HAI 设计指南及其在 AI-infused products 中的验证
People + AI Guidebook	https://pair.withgoogle.com/guidebook/	参考以人为中心的 AI 产品设计方法
NIST AI RMF	https://www.nist.gov/itl/ai-risk-management-framework	把人机交互、透明度、可靠性和风险处置纳入 AI 风险管理
HAX Toolkit	https://www.microsoft.com/en-us/haxtoolkit/	参考 human-centered AI 的设计、测试和失败场景工具

一句话:

Human-AI Interaction 不是给 AI 加个漂亮界面，而是设计人如何理解、控制、纠错、信任、拒绝和升级 AI。

1. AI 产品设计的核心断点

用户与 AI 协作时，失败常见于四个断点:

断点	表现	风险
预期错误	用户以为 AI 能做它不能做的事	过度依赖、错误决策
置信错误	用户看不出 AI 什么时候不确定	信任校准失败
控制错误	用户不知道如何纠正或撤销	自动化伤害扩大
责任错误	用户不知道谁最终负责	审计、投诉、运营混乱

金融零售场景更敏感:

客户可能把 AI 回答理解成银行承诺。
员工可能把 Copilot 草稿当最终判断。
风险系统可能把模型建议包装成事实。
高压场景中，用户更容易 automation bias。

2. HAI 设计原则的产品化

Microsoft HAI Guidelines 可转成四个产品阶段:

阶段	设计问题	产品能力
Initially	用户一开始如何理解 AI 能力	capability boundary、example、限制说明
During interaction	AI 如何表达状态和不确定性	confidence UX、progress、evidence、reason
When wrong	用户如何纠错和恢复	edit、undo、appeal、human escalation
Over time	系统如何学习并保持一致	feedback loop、preference、monitoring、change notice

2.1 设定正确预期

不要说:

AI understands your banking needs.

应该说:

AI can summarize policy articles and account service steps.
For credit decisions, complaints, and disputed transactions, it routes to a specialist.

2.2 表达不确定性

不建议展示虚假精确:

92.7% confident

更好的表达:

The answer is supported by the current fee policy.
This does not apply to credit limit decisions.
Ask a specialist if you want a formal review.

2.3 支持纠错和恢复

AI 交互必须有 recoverability:

Undo。
Edit。
Dispute。
Escalate。
Save evidence。
Explain why。
Report issue。

3. Human-AI Interaction Architecture

user intent
  -> risk tier
  -> AI capability boundary
  -> model/RAG/agent output
  -> evidence and uncertainty layer
  -> UX policy renderer
  -> user control / feedback / escalation
  -> monitoring and learning loop

关键组件:

组件	职责
Capability boundary service	定义 AI 能做/不能做什么
Risk tier classifier	决定是否需要人工或限制表达
Evidence layer	引用、数据来源、计算依据
Uncertainty layer	confidence、coverage、answerability、OOD
UX policy renderer	根据风险渲染语言、按钮、警示和人工入口
Feedback collector	纠错、评分、申诉、员工 override
Escalation workflow	warm handoff 到人和 case record
Monitoring	over-reliance、override、complaints、task success

4. 金融零售案例

Case A: 客户服务 RAG

风险:

AI 把政策解释成正式承诺。
引用旧政策。
客户不知道如何升级人工。

设计:

每个答案显示来源和适用边界。
信贷、投诉、争议交易自动进入人工路径。
用户可标记“不适用/需要人工”。
监控 unsupported answer、escalation、complaints。

Case B: AML Analyst Copilot

风险:

分析师过度依赖总结。
AI 漏掉关键可疑信号。
case narrative 被复制为事实。

设计:

AI 输出区分 “observed evidence” 和 “suggested hypothesis”。
要求 analyst 确认关键证据。
所有 AI 草稿保留 provenance。
高风险 case 禁止一键提交。

Case C: 信贷员工助手

风险:

员工误把建议当审批结果。
客户得到不完整解释。

设计:

AI 只能生成解释草稿和政策引用。
正式 adverse action reason 来自受控规则/模型治理流程。
员工必须确认版本和适用范围。

5. Automation Bias 控制

风险	控制
用户默认相信 AI	显示证据和限制，不用绝对语气
员工跳过检查	forced review checklist
AI 建议影响人工标签	blind review 或延迟显示建议
纠错成本太高	快速 edit、undo、appeal
人工入口隐藏	高风险场景显式 escalation

要设计 calibrated trust:

trust AI when evidence is strong
doubt AI when uncertainty is high
control AI when action is high impact
escalate when responsibility must be human

6. HAI Eval Metrics

指标	含义
Task success	用户是否完成正确任务
Appropriate reliance	该信时信，该疑时疑
Override quality	用户纠错是否提高结果
Escalation precision	升级人工是否命中高风险/不确定案例
Recovery time	错误发生后恢复需要多久
Complaint / appeal	客户是否受到误导或损害
Trust calibration	用户对 AI 能力边界的理解
Human workload	AI 是否真的减负，还是增加检查负担

7. Product Requirement Patterns

Requirement	高级写法
Capability boundary	系统必须在高风险意图上说明无法给出正式决策，并提供人工路径
Evidence display	每个政策答案必须显示 source、version、effective date
Uncertainty handling	answerability 低于阈值时必须澄清或拒答
Recoverability	用户可撤销 AI 生成动作，员工可编辑并记录 override reason
Feedback	反馈必须进入 QA queue，不直接自动训练
Change notice	AI 行为重大变化要通知员工并更新培训

8. 面试表达

30 秒版本

Human-AI Interaction 的核心是校准用户信任。AI 产品要让用户知道系统能做什么、证据是什么、什么时候不确定、如何纠错、如何升级人工。金融零售里我不会只设计“答案界面”，而会设计 capability boundary、confidence/evidence UX、recoverability、feedback 和 escalation。

2 分钟版本

以客户服务 RAG 为例，模型能回答费用政策，但不能替代正式信贷决策或投诉处理。我会先按风险分层定义能力边界，再把检索证据、引用版本和不确定性接到 UX。低风险 FAQ 可以直接回答，高风险或证据不足时系统澄清、拒答或转人工。用户要能反馈答案不适用，员工要能看到证据和 override。监控上看 task success、appropriate reliance、unsupported answer、complaint 和 escalation precision。这样设计的目标不是让用户盲目信 AI，而是让用户知道什么时候该信、什么时候该停。

CTO 追问

如果问为什么 HAI 是架构问题，我会回答: 因为界面只负责呈现，背后需要 risk tier、evidence layer、uncertainty layer、policy renderer、feedback pipeline 和 human escalation workflow。没有这些架构，UX 文案无法保证一致、安全和可审计。

9. Portfolio Task

做一个 “Human-AI Interaction Product Pack”:

Artifact	内容
Interaction risk map	低/中/高风险意图与允许动作
Capability boundary spec	AI 能力、限制、禁止承诺
Confidence UX matrix	evidence、uncertainty、refusal、escalation 语言
Recovery flow	undo、edit、appeal、human handoff
Feedback loop	feedback taxonomy、QA queue、label governance
HAI eval report	appropriate reliance、override、complaint、task success

最终要能讲清楚: AI 产品的体验能力不是“看起来像人”，而是让人在正确时高效，在不确定时谨慎，在错误时可恢复。