AIPA Day 93

AI Act 映射 II — Articles 13-15 落到 HITL gateway / eval suite / 红队

2026-09-15

eu-ai-acthuman-oversightautomation-biasrobustness

日期: 2026-09-15 阶段: Phase 3 - AML 调查 Copilot 标签: #eu-ai-act #human-oversight #automation-bias #robustness

核心问题

Day 92 把 Articles 9-12（风险管理/数据治理/技术文档/记录留存）映射到 failureTaxonomy、数据血缘、trace 底座。今天接完后半段——Articles 13-15：透明（13）、人类监督（14）、准确·鲁棒·安全（15）。这三条比 9-12 更「贴肉」，因为它们直接定义了 AI 和人怎么协作、系统被攻击时怎么扛。

今天回答三个问题：

Article 14 人类监督，本项目的 HITL 真的合规吗？ Day 49 已论证 HITL 触发 Article 50(4) 编辑责任豁免。但 Article 14 对高风险系统的要求远比「有个人复核」严——它点名要对抗 automation bias（自动化偏见）。今天证明一个反直觉点：「分析师每条都点同意」恰恰是 Article 14 意义上监督失败的信号，而非成功。
Article 15 的「鲁棒·安全」要什么？ 它点名 data poisoning、adversarial examples、model evasion——这正是 P2「MCPTox 红队」（Day 52）打的那些攻击面。合规要求和红队工程第三次同构。
数据驻留怎么接？ 高风险 + 金融数据 → 数据驻留路由（EU 数据不出境），接 P2 gateway。

今天把 Articles 13-15 映射成组件，并定稿 Day 92 起草的缺口清单。

关键内容

A. Article 14 人类监督 → HITL gateway，但 automation bias 是真考点

精读 Article 14（artificialintelligenceact.eu 权威转录）。14(1)：

「High-risk AI systems shall be designed and developed in such a way...that they can be effectively overseen by natural persons during the period in which they are in use.」

关键是 14(4) 的五项具体能力，监督者必须被赋予：

14(4)	法条原文（节选）	本项目 HITL 对应
(a) 理解能力与局限	「properly understand the relevant capacities and limitations」	UI 显式标注 AI 置信度 + 局限（Day 5 HITL UX）
(b) 警惕自动化偏见	「remain aware of the possible tendency of automatically relying or over-relying」	置信度信号 + 强制复核档（见洞察①）
(c) 正确解读输出	「correctly interpret...taking into account interpretation tools」	透明推理：展示证据链而非只给结论（Day 5）
(d) 否决/推翻输出	「disregard, override or reverse the output」	HITL 可编辑/拒绝 SAR 草稿，不可一键全采纳
(e) 介入/停止	「intervene...or interrupt through a stop button」	risk gateway 熔断（Day 53）+ 暂停提交

14(4)(b) 是最容易被忽略、也最硬的一条——它要求监督者意识到「过度依赖 AI 输出」的倾向。学界（Melanie Fink, SSRN 2026；arXiv 2502.10036《Automation Bias in the AI Act》）和监管口径一致：

「Regulators look at override rates: if an operator never overrides the AI, their 'oversight' is not meaningful.」

反直觉洞察①（「分析师 100% 同意 AI」是监督失败信号，不是质量证明）：直觉以为「AI 起草的 SAR，分析师全都点同意」说明 AI 质量高、HITL 顺畅。但 Article 14(4)(b) 的 automation bias 视角下，override 率为零恰恰是监督失效的红旗——它意味着分析师已退化成「橡皮图章」，14(1) 要的「effective oversight」名存实亡。监管看的是 override 率：长期接近 0，说明人没在真正审。这把一个产品指标（采纳率高=好）反转成合规风险指标（采纳率 100%=监督失败）。设计含义：HITL gateway 必须埋点统计 override/edit 率，并对「连续 N 条零修改采纳」告警——不是为了让分析师多改，而是为了让监督的「meaningful」可被审计。

设计上对抗 automation bias 的杠杆：(1) 对高金额/高风险 SAR 强制「双人复核」（呼应 14(5) 生物识别的双人确认精神，AML 自设阈值如 CTR $10,000 以上）；(2) 故意暴露 AI 的不确定性（置信度低的字段标黄，强制分析师填）；(3) override 率纳入合规仪表盘。

B. Article 15 鲁棒·安全 → eval suite + MCPTox 红队

15(1) 要求「appropriate level of accuracy, robustness and cybersecurity...perform consistently throughout their lifecycle」。三个维度逐一映射：

准确（accuracy） — 15(3)：「accuracy metrics shall be declared in the accompanying instructions」。映射：本项目的 evalBaseline.ts（recall/FPR）就是要 declare 的 accuracy 指标，写进 model registry（Day 92）即满足。

鲁棒（robustness） — 15(4)：系统须对「errors, faults or inconsistencies」有韧性，且要管反馈回路：

「eliminate or reduce as far as possible the risk of possibly biased outputs influencing input for future operations (feedback loops)。」

对 AML 的精确含义：若 AI 误判 → 生成的 SAR 进入训练/检索库 → 污染未来判断，形成偏差自我强化。映射：RAG 知识库（P2 hybridSearch）的内容须与「AI 自动生成的产物」隔离，人工复核过的 SAR 才能回流——否则反馈回路放大偏差。

安全（cybersecurity） — 15(5)：要对抗

「data poisoning, model poisoning, adversarial examples, model evasion, confidentiality attacks。」

这逐条命中 P2 的 MCPTox 红队（Day 52）打的攻击面：

  Article 15(5) 威胁         本项目 P2 红队/防护对应
  ────────────────         ──────────────────────────
  adversarial examples ◄──► 提示注入红队（MCPTox，Day 52）
  model evasion        ◄──► 规避检测的对抗性交易模式测试
  data poisoning       ◄──► RAG 库写入校验（防注入恶意 typology）
  confidentiality      ◄──► risk gateway 数据脱敏（Day 53）
  → 防护：risk gateway（Day 53）+ 红队回归（Day 54）

反直觉洞察②（Article 15(5) 把「红队」从安全可选项升级为合规强制项）：直觉把红队当作「安全团队的额外尽职」，可做可不做。但 15(5) 逐条点名 adversarial examples / model evasion / data poisoning——对高风险系统，红队不再是 nice-to-have，而是法条明文要求的鲁棒性证据。更反直觉的是：AML 场景的对抗者本来就存在——洗钱者会主动设计「规避检测的交易模式」（structuring 拆分到阈值以下、trade-based 伪装），这正是 15(5) 的 model evasion。所以本项目的红队不是模拟假想攻击，是建模真实对手。Day 52 的 MCPTox 红队从「工程审慎」变成「Article 15(5) 合规证据」，这是合规要求和红队工程的第三次同构（前两次：Article 9↔evals 闭环、Article 10↔评测集治理）。

C. Article 13 透明 + 数据驻留路由

Article 13 要求高风险系统「sufficiently transparent to enable deployers to interpret a system's output and use it appropriately」，并附使用说明（instructions for use）。映射：Day 5 的「透明推理」（展示证据链）+ Day 49 的「AI 起草」披露 + model registry 里的能力/局限声明，三者合起来满足 13。注意 13 是 provider 对 deployer 的透明（不同于 50 对终端用户的透明）。

数据驻留：AI Act 本身不直接管数据驻留（那是 GDPR/本地金融法的事），但高风险 + 金融客户数据 → 实务上要求 EU 数据不出境、模型调用走 EU 区域。映射到 P2 gateway（src/agent/gateway/）——多供应商路由时按数据分类选区域端点：

  路由决策（伪代码，规划）：
  function routeModel(req):
    if req.dataClass == 'EU_PII_FINANCIAL':
      return pickEndpoint(region='eu', providers=euResidentModels)
    else:
      return pickEndpoint(region='any', providers=allModels)  // 走 Pareto 最优（Day 47）

这条路由和 Day 94 要讲的 DORA「ICT 第三方风险 → 多供应商」是同一个 gateway 的两个约束维度（区域 + 供应商冗余）。

D. 缺口清单定稿（Articles 9-15 已实现/缺口对照）

把 Day 92（9-12）+ 今天（13-15）合并定稿：

Article	要求	组件	状态
9 风险管理	持续迭代过程	failureTaxonomy + eval gate	✅ 已实现（待文档化）
9(9) 弱势群体	fairness 影响评估	failureTaxonomy 人群偏差类	🔴 缺口
10 数据治理	评测集 bias 审计	66 案金标类型学覆盖	🟡 部分（待覆盖审计）
11 技术文档	Annex IV 文档	model registry	🟡 部分（接口待建）
12 记录留存	lifetime 日志	trace + attributeMap	✅ 已实现（待分层留存）
13 透明	可解读 + 使用说明	透明推理 + model registry	🟡 部分
14 人类监督	effective + anti automation bias	HITL gateway + override 埋点	🟡 部分（override 率埋点缺）
15(3) 准确	declare 指标	evalBaseline	✅ 已实现（待写入 registry）
15(4) 鲁棒	反馈回路防护	RAG 写入隔离	🔴 缺口
15(5) 安全	对抗攻击	MCPTox 红队 + risk gateway	✅ 已实现（P2）

三个红色/未建缺口：fairness 评估（9(9)）、反馈回路隔离（15(4)）、override 率埋点（14(4)(b)）——这是 P3 后续要补的合规重点。

设计要点/决策表

要点	决策	理由
Article 14 监督	HITL gateway 埋点 override/edit 率 + 零修改采纳告警	14(4)(b) 反 automation bias；override 率=监督 meaningful 的审计指标
高金额 SAR	强制双人复核（自设阈值，如 CTR $10K+）	14(5) 双人确认精神；对抗单人橡皮图章
Article 15(4) 反馈回路	RAG 库隔离 AI 自动产物，仅人工复核过的回流	防偏差自我强化（biased output → future input）
Article 15(5) 安全	MCPTox 红队定性为合规强制证据	法条逐条点名 adversarial/evasion/poisoning
数据驻留	gateway 按数据分类路由 EU 区域端点	金融 PII 不出境（GDPR/本地金融法）
缺口定稿	fairness/反馈回路/override 埋点列 P3 待补	Articles 9(9)/15(4)/14(4)(b) 红色缺口

对本项目的落地

扩展 src/aml/compliance/aiActMapping.ts（Day 92 建）：补入 Articles 13-15 的映射条目，并导出 gapReport() → Gap[]——把 D 节缺口表（fairness/反馈回路/override 埋点为 'gap'，其余 'implemented'/'partial'）输出为结构化数据。CI 断言「15(5) status='implemented'（P2 红队已覆盖）」「14(4)(b) override 埋点 status='gap'」，让缺口在测试里可追踪、关闭后改状态。
HITL override 埋点：在 SAR 复核组件（src/components/aml/ 三屏之一）规划埋点 { caseId, aiDraft, humanEdited: boolean, editDistance, action: 'accept'|'edit'|'reject' }，统计 override/edit 率，对「连续 N 条零修改采纳」告警。这是 Article 14(4)(b) 的合规执行，也是 automation bias 的可观测信号（计划语气，未实现）。
RAG 反馈回路隔离：在 src/agent/rag 的写入路径规划标记 provenance: 'human-verified' | 'ai-generated'，知识库回流只接受 human-verified——满足 Article 15(4) 防偏差自我强化。当前 P3 仅定义字段，实际隔离在 RAG 写入策略设计时做。
gateway 数据驻留路由：在 src/agent/gateway/ 规划 routeByDataClass()，按数据分类选 EU/任意区域端点；与 Day 47 的 Pareto 路由（成本/质量最优）叠加为「先满足驻留约束、再在合规端点集内取 Pareto 最优」。与 Day 94 DORA 的多供应商冗余共用此 gateway。
诚实标注：aiActMapping.ts 头注续写——Articles 13-15 映射同为合规架构映射非法律意见；override 埋点/反馈回路隔离/fairness 评估为 P3 规划缺口，未实现；15(5) 红队已在 P2 落地（Day 52/54）但其「合规证据」定性依赖最终高风险判定；数据驻留要求来自 GDPR/本地金融法而非 AI Act 本身，引用须分清来源。

参考资料

artificialintelligenceact.eu — Article 14: Human Oversight：14(1)「effectively overseen by natural persons」；14(4)(a) 理解 capacities and limitations；14(4)(b)「remain aware of...automatically relying or over-relying」（automation bias）；14(4)(d) override or reverse；14(4)(e) stop button；14(5) 生物识别双人确认（权威转录，持续更新）
artificialintelligenceact.eu — Article 15: Accuracy, Robustness and Cybersecurity：15(1) consistent throughout lifecycle；15(3) accuracy metrics declared in instructions；15(4) 反馈回路「biased outputs influencing input for future operations」；15(5)「data poisoning, model poisoning, adversarial examples, model evasion, confidentiality attacks」（持续更新）
artificialintelligenceact.eu — Article 13: Transparency and Provision of Information to Deployers：「sufficiently transparent to enable deployers to interpret a system's output」+ instructions for use（持续更新）
Melanie Fink — Human Oversight under Article 14 of the EU AI Act (SSRN)：监督须 meaningful 非形式化；override 率为零=监督失效；Article 26(2) deployer 须指派合格人员（2026）
arXiv 2502.10036 — Automation Bias in the AI Act: On the Legal Implications of Attempting to De-Bias Human Oversight of AI：14(4)(b) 反自动化偏见的认知局限与法律含义（2025-02）
本仓库 src/aml/compliance/aiActMapping.ts（Day 92 建，待补 13-15）、src/components/aml/（HITL 三屏，待加 override 埋点）、src/agent/rag（反馈回路隔离）、src/agent/gateway/（数据驻留路由）、Day 52/54 MCPTox 红队（15(5) 证据）(2026-06)

SOTA 检查 (2026-06-11)

Articles 13-15 条文在 2026-06 稳定：高风险要求条文自 2024-06 AI Act 通过未改；Omnibus（2026-05-07）只推迟生效时间线（Annex III 独立系统至 2027-12-02，见 Day 92），实质义务不变。
automation bias 是 2025-2026 学界与监管热点：Article 14(4)(b) 的 anti-automation-bias 要求催生大量研究（Fink SSRN 2026；arXiv 2502.10036, 2025-02），核心争议是「人类监督在认知上是否真能纠偏 AI」——本笔记反直觉洞察①（override 率=监督有效性指标）是这一讨论的工程落地，且是 live 的：多数团队仍把高采纳率当好事，未把它当合规风险埋点。
15(5) 对抗鲁棒性标准仍在制定：Commission 须与利益方制定 benchmark/measurement 方法（15 末句），具体测试标准（如对抗样本评测套件）尚未定稿。本项目 MCPTox 红队是过渡方案，待标准定稿后对齐——这是有意识地不押注未定标准。
数据驻留非 AI Act 直管：须分清——数据驻留来自 GDPR + 本地金融监管（如 EBA/DORA），AI Act 本身不直接规定。本笔记把它接进 gateway 是工程实务整合，引用时已标注来源，避免把 GDPR 义务误挂到 AI Act。
待跟踪：Article 15(5) 对抗鲁棒性的官方 benchmark 定稿（决定红队评测对齐）；override 率埋点落地后回填缺口表（14(4)(b) 从 gap→implemented）；fairness 评估（9(9)）与反馈回路隔离（15(4)）两个红色缺口的关闭进度；最终高风险判定（影响 15(5) 红队的「合规证据」定性是否成立）。