AI Traceability Graph:需求-评测-控制追踪
一句话:
AI Traceability Graph / Requirements-Eval-Control-Evidence 解读
面向对象: AI BA / AI Product Manager / AI Architect / Model Risk Lead / Audit Evidence Owner。 核心问题: AI 需求如果只停留在 user story 或 PRD 段落, 很难证明系统上线后真的满足业务目标、风险控制、评测门槛和审计要求。高级 AI 需求工程要把 business outcome、requirement、eval、control、ADR、implementation、telemetry 和 evidence 连接成 traceability graph。 学习目标: 把 BA 的 traceability 能力升级到 AI 系统级别, 形成从需求到评测、控制、运行证据和审计问答的闭环。
Source Anchors
| Source | Link | 用途 |
|---|---|---|
| W3C PROV Overview | https://www.w3.org/TR/prov-overview/ | 参考 provenance、entity、activity、agent 的思想, 设计证据来源和生成关系 |
| NIST AI RMF | https://www.nist.gov/itl/ai-risk-management-framework | 将 traceability 连接到 AI 风险治理、测量和管理 |
| ISO/IEC 42001 | https://www.iso.org/standard/81230.html | 参考 AI management system、责任、控制、持续改进和证据要求 |
| OpenTelemetry | https://opentelemetry.io/docs/ | 用 trace、metrics、logs 把运行时行为连接到需求和控制证据 |
| JSON Schema | https://json-schema.org/ | 为 eval result、control evidence、structured output 和 evidence object 提供结构化约束 |
一句话:
AI Traceability Graph 是把“为什么做、要满足什么、如何评测、如何控制、如何实现、线上发生了什么、证据在哪里”连接成一张可查询的关系图。
1. AI 需求为什么不能只停在 User Story
普通 user story:
As a customer service agent, I want AI to answer customer policy questions, so that I can respond faster.
对传统功能也许足够开头, 但对 AI 系统远远不够:
| 缺口 | AI 系统需要额外回答 |
|---|---|
| 质量不确定 | 什么叫回答好, 用哪些样本和阈值证明 |
| 风险边界 | 哪些问题必须拒答或升级人工 |
| 数据/知识来源 | 答案必须引用哪些批准来源, 如何处理 stale source |
| 控制 | 哪个组件执行 policy, 哪个角色审批例外 |
| 运行漂移 | 上线后如何监控质量、成本、风险和 adoption |
| 审计证据 | 监管或内审问“为什么允许上线”时给什么证据 |
| 变更影响 | prompt/model/retriever/tool 改动影响哪些需求和控制 |
所以高级 AI 需求工程应该写成:
Business outcome
-> requirement
-> acceptance criterion
-> eval case / metric / threshold
-> control
-> architecture decision
-> implementation/config
-> telemetry
-> evidence
-> release decision
这不是基础 BA traceability matrix 的重复, 而是把 AI 的不确定性、风险和运行证据纳入同一张图。
2. Traceability Graph Nodes
AI traceability graph 可以用节点和边表达。
2.1 Node Types
| Node | 示例 |
|---|---|
| Business outcome | 减少客服平均处理时间 20%, 不增加投诉 |
| Stakeholder concern | 准确性、信任、合规、成本、人工负载 |
| Requirement | 回答必须基于批准政策知识源 |
| Risk | 未经授权建议、错误费用解释、PII 泄露 |
| Acceptance criterion | customer-facing answer must cite source |
| Eval case | 费用政策问答 golden set |
| Metric | citation correctness、unsupported claim rate |
| Threshold | citation correctness >= 95%, unsupported claim = 0 for regulated answer |
| Control | source allowlist、policy classifier、HITL |
| ADR | choose RAG over fine-tuning for policy knowledge |
| Component | retrieval service、policy engine、model gateway |
| Config | prompt version、retriever version、policy profile |
| Runtime trace | request span、retrieval span、policy decision span |
| Evidence | eval report、release memo、approval、trace sample |
| Decision | release、scale、hold、rollback、exception |
| Incident | wrong answer, policy bypass, data exposure |
| Remediation | source update, eval expansion, policy rule change |
2.2 Edge Types
| Edge | Meaning |
|---|---|
satisfies | requirement satisfies business outcome |
mitigates | control mitigates risk |
measured_by | requirement measured by eval/metric |
implemented_by | requirement implemented by component/config |
decided_by | ADR explains architecture choice |
emits | component emits telemetry |
evidenced_by | decision/control supported by evidence |
violated_by | incident violates requirement/control |
remediated_by | remediation changes control/config/eval |
supersedes | new ADR/config/eval replaces previous one |
3. Traceability Graph Example
场景: Customer-facing fee policy assistant。
| From | Edge | To |
|---|---|---|
| Reduce call handling time without increasing complaints | satisfied_by | AI policy assistant capability |
| AI policy assistant capability | decomposed_into | Fee explanation requirement |
| Fee explanation requirement | measured_by | Fee policy golden set |
| Fee policy golden set | uses_metric | Citation correctness |
| Citation correctness | has_threshold | >= 95% |
| Fee explanation requirement | constrained_by | No personalized financial advice risk |
| No personalized financial advice risk | mitigated_by | Advice boundary policy |
| Advice boundary policy | implemented_by | Policy engine profile retail-advice-boundary-v3 |
| Fee explanation requirement | implemented_by | RAG retrieval service |
| RAG retrieval service | uses_config | Source allowlist retail-policy-approved |
| RAG design | decided_by | ADR-001 RAG over fine-tuning |
| Production request | emits | OpenTelemetry trace |
| Release decision | evidenced_by | Eval report, ADR, risk signoff, trace sample |
| Wrong fee answer incident | violates | Fee explanation requirement |
| Wrong fee answer incident | remediated_by | Eval set expansion and source version update |
图不一定要用图数据库落地。初期可以用表格、YAML、spreadsheet 或 markdown 矩阵。关键是关系要清楚, 能回答查询。
4. Requirements-to-Eval
AI requirement 必须写到可评测:
| Requirement pattern | Eval translation |
|---|---|
| Must answer using approved source | source-grounded eval, citation check |
| Must refuse unsupported question | no-answer eval, refusal correctness |
| Must escalate high-risk scenario | escalation eval, HITL routing test |
| Must not expose PII | redaction eval, privacy test |
| Must call correct tool | trajectory eval, tool selection accuracy |
| Must complete under time budget | latency SLO, load test |
| Must be cost efficient | cost per successful task |
| Must be understandable to user | human rating, comprehension test |
高级要求:
- Eval case 必须覆盖 normal、edge、adversarial、regulatory、segment-specific cases。
- Eval result 必须能映射回 requirement。
- Eval failure 必须能追踪到 component/config/source/prompt。
- Release decision 必须引用 eval evidence。
5. Requirements-to-Control
AI requirement 还必须写到控制:
| Requirement / Risk | Control | Evidence |
|---|---|---|
| 不得给个性化投资建议 | advice boundary classifier + refusal template | policy decision logs、red-team eval |
| 工具调用必须授权 | tool gateway + RBAC/ABAC/ReBAC | tool audit span |
| 高风险 case 必须人工复核 | HITL queue + approval workflow | approval id、review record |
| 知识必须来自批准来源 | source registry + retrieval filter | citation source version |
| 不能保存超过期限的 prompt context | retention policy + deletion job | retention audit report |
| 变更必须可追溯 | ADR + version registry + release bundle | ADR log、release memo |
控制不应只是“有人审核”。成熟控制要说明:
- 发生在哪里。
- 是预防、检测还是纠正。
- 是否自动化。
- 由谁负责。
- 如何测试有效性。
- 产生什么证据。
6. Requirements-to-Architecture
Traceability graph 能暴露架构决策的必要性。
| Requirement pressure | Architecture response |
|---|---|
| 需要引用和快速更新政策 | RAG + source registry, 不使用纯 fine-tuning 存政策知识 |
| 需要控制工具副作用 | Tool gateway + policy engine + HITL |
| 需要审计上线依据 | Evidence binder + ADR + release bundle |
| 需要线上漂移监控 | OpenTelemetry trace + eval sampling + dashboard |
| 需要降低成本 | model routing + cache + context budget |
| 需要跨团队复用 | service catalog + golden paths |
如果需求无法追踪到架构元素, 说明需求可能只是愿望。如果架构元素无法追踪到需求或风险, 说明它可能是过度设计。
7. Financial Retail Case: AML Investigation Copilot
目标: 帮助 AML analyst 汇总客户、交易、实体关系和监管规则证据, 生成 investigation brief 草稿。AI 不做最终 SAR 决策。
7.1 Traceability Matrix
| Business outcome | Requirement | Eval | Control | Evidence |
|---|---|---|---|---|
| 缩短调查准备时间 | Copilot 汇总相关交易和实体关系 | analyst task completion benchmark | graph/data source allowlist | benchmark report, trace sample |
| 提高证据完整性 | Brief 必须引用交易、规则、case history | evidence coverage eval | citation required, source versioning | eval report, source lineage |
| 防止错误监管结论 | AI 不得决定 SAR filing | refusal/escalation eval | decision boundary policy, human signoff | policy log, reviewer record |
| 保护敏感数据 | 不向未授权模型发送 restricted data | privacy test | model route data boundary | route logs, DLP report |
| 支持审计 | 每次 brief 有 version、inputs、reviewer | evidence completeness query | evidence binder | release and case evidence |
7.2 Graph Queries
高级 BA/Architect 应能回答:
| Query | 用途 |
|---|---|
| 哪些 high-risk requirements 没有 eval coverage | 找上线前缺口 |
| 哪些 controls 没有 runtime evidence | 找审计风险 |
| 哪些 incidents 违反同一 requirement | 找系统性问题 |
| 某个 prompt 版本影响哪些 requirements | 做变更影响分析 |
| 某个 source 更新影响哪些 eval cases | 做知识更新回归 |
| 哪些 requirements 没有关联 business outcome | 清理低价值需求 |
| 哪些 architecture components 没有关联 requirement/risk | 识别过度设计 |
8. Evidence Objects
一个 evidence object 应有结构:
| Field | 内容 |
|---|---|
| evidence_id | 唯一编号 |
| evidence_type | eval_report、trace_sample、approval、ADR、policy_log、incident |
| produced_by | 系统、人工、评测工具或审批流程 |
| produced_at | 时间 |
| related_requirement | requirement id |
| related_control | control id |
| related_release | release id |
| source_version | prompt/model/retriever/tool/source version |
| retention | 保存期限 |
| integrity | hash、immutable store 或 access log |
| reviewer | 如需人工确认, 记录 reviewer |
这让 evidence 不只是附件, 而是可查询、可复用、可审计的治理资产。
9. Release Decision Memo
Release memo 应明确:
| Section | 内容 |
|---|---|
| Business objective | 为什么要发布 |
| Scope | 发布到哪些用户、渠道、地区、场景 |
| Requirements coverage | 关键 requirements 和覆盖状态 |
| Eval results | 指标、阈值、失败项、残余风险 |
| Control coverage | 风险、控制、控制测试、证据 |
| Architecture decisions | 关键 ADR 和反转条件 |
| Runtime readiness | telemetry、SLO、runbook、incident path |
| Exceptions | 例外、补偿控制、过期日期 |
| Decision | release、limited pilot、hold、rollback |
| Signoff | product、architecture、risk、ops owner |
这份 memo 是 traceability graph 的人类可读摘要。
10. Common Failure Modes
| Failure mode | 表现 | 修正 |
|---|---|---|
| User story only | 需求不能追踪到 eval/control/evidence | 写 requirement-eval-control matrix |
| Eval orphan | 评测集存在但不知道覆盖哪些需求 | eval case 必须关联 requirement id |
| Control orphan | 控制存在但没有测试或证据 | control 必须关联 test/evidence |
| Evidence pile | 材料很多但不可查询 | evidence object metadata |
| No runtime link | 上线后 trace 和需求断开 | telemetry span 带 use case、requirement/control tags |
| Change blindness | prompt/model/source/tool 改动影响不可见 | version registry + impact query |
| Audit scramble | 内审来时临时找材料 | release bundle and evidence binder by default |
11. Templates
11.1 Traceability Graph Table
| ID | Node type | Name | Owner | Links |
|---|---|---|---|---|
| BO-001 | Business outcome | Reduce dispute triage cycle time | Product | REQ-001, MET-001 |
| REQ-001 | Requirement | Agent recommends next best action with evidence | Product/BA | EVAL-001, CTRL-001, ADR-002 |
| EVAL-001 | Eval | Dispute trajectory eval | QA/EvalOps | MET-001, EVD-001 |
| CTRL-001 | Control | High-risk action requires HITL | Risk/Ops | EVD-002 |
| ADR-002 | ADR | Use tool gateway for case actions | Architect | CTRL-001, COMP-003 |
| EVD-001 | Evidence | Eval run 2026-06-29 | EvalOps | REL-001 |
11.2 Coverage Matrix
| Requirement | Eval coverage | Control coverage | Runtime evidence | Status |
|---|---|---|---|---|
| REQ-001 | EVAL-001 | CTRL-001 | trace tag req=REQ-001 | covered |
| REQ-002 | EVAL-002 | CTRL-002 | policy logs | covered |
| REQ-003 | EVAL-003 | CTRL-003 | HITL records | covered |
11.3 Evidence Query Examples
| Query | Expected result |
|---|---|
| Show all requirements for release REL-001 without eval evidence | Empty set before release |
| Show all policy exceptions expiring in 30 days | Exception review backlog |
| Show all incidents linked to advice boundary requirement | Incident trend and remediation |
| Show all prompt versions used by customer-facing release | Version and eval report |
| Show all controls with no runtime trace | Control implementation gap |
12. 面试表达
30 秒版本:
我会把 AI 需求从 user story 升级成 traceability graph。每个 business outcome 连接 requirement, requirement 连接 eval、threshold、control、ADR、implementation/config、runtime telemetry 和 evidence。这样上线时能证明为什么可以发布, 线上出问题时能追踪影响, 审计时能回答每个高风险需求如何被测试和控制。
2 分钟版本:
以 AML investigation copilot 为例, 业务目标是缩短调查准备时间并提高证据完整性。需求不是“AI 生成调查摘要”这么简单, 而是摘要必须引用交易、规则和 case history; AI 不得决定 SAR filing; restricted data 不得发往未批准模型; 每次 brief 有 reviewer 和 evidence id。每条需求都有 eval、control 和 evidence。发布 memo 汇总 coverage, 线上 trace 带 requirement/control 标签, incident 会回连到违反的 requirement 和 remediation。这样 BA、产品、架构、风险和审计在同一张证据图上工作。
深挖追问:
| 追问 | 回答要点 |
|---|---|
| traceability graph 和传统 traceability matrix 区别 | graph 能表达多对多关系、版本、证据、incident 和 runtime telemetry |
| 如何避免文档负担 | 从 high-risk/material requirements 开始, 自动采集 evidence |
| 如何和 eval 连接 | 每个 eval case 关联 requirement/risk/control |
| 如何和架构连接 | ADR、component、config 都关联 requirement/risk |
| 如何支持审计 | evidence object + release memo + queryable binder |
13. Practice Assignment
选一个 AI use case, 建立 traceability graph:
- 3 个 business outcomes。
- 8 个 requirements。
- 12 个 eval cases。
- 8 个 controls。
- 5 个 ADR/config/component links。
- 6 个 evidence object examples。
- 5 条 evidence query。
- 1 份 release decision memo 摘要。
完成标准:
- 每个 high-risk requirement 都有 eval 和 control。
- 每个 control 都有 evidence。
- 每个 release decision 都能追溯到 eval 和 signoff。
- 至少 2 个 incident/remediation 例子能回连到 requirement。
- 能用 2 分钟讲清这张 graph 如何帮助产品、架构和审计共同决策。