返回 Papers
AI 底层逻辑 / 经典论文

AI Traceability Graph:需求-评测-控制追踪

一句话:

360ai-foundations/papers/90-ai-traceability-requirements-eval-control-graph.md

AI Traceability Graph / Requirements-Eval-Control-Evidence 解读

面向对象: AI BA / AI Product Manager / AI Architect / Model Risk Lead / Audit Evidence Owner。 核心问题: AI 需求如果只停留在 user story 或 PRD 段落, 很难证明系统上线后真的满足业务目标、风险控制、评测门槛和审计要求。高级 AI 需求工程要把 business outcome、requirement、eval、control、ADR、implementation、telemetry 和 evidence 连接成 traceability graph。 学习目标: 把 BA 的 traceability 能力升级到 AI 系统级别, 形成从需求到评测、控制、运行证据和审计问答的闭环。


Source Anchors

SourceLink用途
W3C PROV Overviewhttps://www.w3.org/TR/prov-overview/参考 provenance、entity、activity、agent 的思想, 设计证据来源和生成关系
NIST AI RMFhttps://www.nist.gov/itl/ai-risk-management-framework将 traceability 连接到 AI 风险治理、测量和管理
ISO/IEC 42001https://www.iso.org/standard/81230.html参考 AI management system、责任、控制、持续改进和证据要求
OpenTelemetryhttps://opentelemetry.io/docs/用 trace、metrics、logs 把运行时行为连接到需求和控制证据
JSON Schemahttps://json-schema.org/为 eval result、control evidence、structured output 和 evidence object 提供结构化约束

一句话:

AI Traceability Graph 是把“为什么做、要满足什么、如何评测、如何控制、如何实现、线上发生了什么、证据在哪里”连接成一张可查询的关系图。


1. AI 需求为什么不能只停在 User Story

普通 user story:

As a customer service agent, I want AI to answer customer policy questions, so that I can respond faster.

对传统功能也许足够开头, 但对 AI 系统远远不够:

缺口AI 系统需要额外回答
质量不确定什么叫回答好, 用哪些样本和阈值证明
风险边界哪些问题必须拒答或升级人工
数据/知识来源答案必须引用哪些批准来源, 如何处理 stale source
控制哪个组件执行 policy, 哪个角色审批例外
运行漂移上线后如何监控质量、成本、风险和 adoption
审计证据监管或内审问“为什么允许上线”时给什么证据
变更影响prompt/model/retriever/tool 改动影响哪些需求和控制

所以高级 AI 需求工程应该写成:

Business outcome
  -> requirement
  -> acceptance criterion
  -> eval case / metric / threshold
  -> control
  -> architecture decision
  -> implementation/config
  -> telemetry
  -> evidence
  -> release decision

这不是基础 BA traceability matrix 的重复, 而是把 AI 的不确定性、风险和运行证据纳入同一张图。


2. Traceability Graph Nodes

AI traceability graph 可以用节点和边表达。

2.1 Node Types

Node示例
Business outcome减少客服平均处理时间 20%, 不增加投诉
Stakeholder concern准确性、信任、合规、成本、人工负载
Requirement回答必须基于批准政策知识源
Risk未经授权建议、错误费用解释、PII 泄露
Acceptance criterioncustomer-facing answer must cite source
Eval case费用政策问答 golden set
Metriccitation correctness、unsupported claim rate
Thresholdcitation correctness >= 95%, unsupported claim = 0 for regulated answer
Controlsource allowlist、policy classifier、HITL
ADRchoose RAG over fine-tuning for policy knowledge
Componentretrieval service、policy engine、model gateway
Configprompt version、retriever version、policy profile
Runtime tracerequest span、retrieval span、policy decision span
Evidenceeval report、release memo、approval、trace sample
Decisionrelease、scale、hold、rollback、exception
Incidentwrong answer, policy bypass, data exposure
Remediationsource update, eval expansion, policy rule change

2.2 Edge Types

EdgeMeaning
satisfiesrequirement satisfies business outcome
mitigatescontrol mitigates risk
measured_byrequirement measured by eval/metric
implemented_byrequirement implemented by component/config
decided_byADR explains architecture choice
emitscomponent emits telemetry
evidenced_bydecision/control supported by evidence
violated_byincident violates requirement/control
remediated_byremediation changes control/config/eval
supersedesnew ADR/config/eval replaces previous one

3. Traceability Graph Example

场景: Customer-facing fee policy assistant。

FromEdgeTo
Reduce call handling time without increasing complaintssatisfied_byAI policy assistant capability
AI policy assistant capabilitydecomposed_intoFee explanation requirement
Fee explanation requirementmeasured_byFee policy golden set
Fee policy golden setuses_metricCitation correctness
Citation correctnesshas_threshold>= 95%
Fee explanation requirementconstrained_byNo personalized financial advice risk
No personalized financial advice riskmitigated_byAdvice boundary policy
Advice boundary policyimplemented_byPolicy engine profile retail-advice-boundary-v3
Fee explanation requirementimplemented_byRAG retrieval service
RAG retrieval serviceuses_configSource allowlist retail-policy-approved
RAG designdecided_byADR-001 RAG over fine-tuning
Production requestemitsOpenTelemetry trace
Release decisionevidenced_byEval report, ADR, risk signoff, trace sample
Wrong fee answer incidentviolatesFee explanation requirement
Wrong fee answer incidentremediated_byEval set expansion and source version update

图不一定要用图数据库落地。初期可以用表格、YAML、spreadsheet 或 markdown 矩阵。关键是关系要清楚, 能回答查询。


4. Requirements-to-Eval

AI requirement 必须写到可评测:

Requirement patternEval translation
Must answer using approved sourcesource-grounded eval, citation check
Must refuse unsupported questionno-answer eval, refusal correctness
Must escalate high-risk scenarioescalation eval, HITL routing test
Must not expose PIIredaction eval, privacy test
Must call correct tooltrajectory eval, tool selection accuracy
Must complete under time budgetlatency SLO, load test
Must be cost efficientcost per successful task
Must be understandable to userhuman rating, comprehension test

高级要求:

  • Eval case 必须覆盖 normal、edge、adversarial、regulatory、segment-specific cases。
  • Eval result 必须能映射回 requirement。
  • Eval failure 必须能追踪到 component/config/source/prompt。
  • Release decision 必须引用 eval evidence。

5. Requirements-to-Control

AI requirement 还必须写到控制:

Requirement / RiskControlEvidence
不得给个性化投资建议advice boundary classifier + refusal templatepolicy decision logs、red-team eval
工具调用必须授权tool gateway + RBAC/ABAC/ReBACtool audit span
高风险 case 必须人工复核HITL queue + approval workflowapproval id、review record
知识必须来自批准来源source registry + retrieval filtercitation source version
不能保存超过期限的 prompt contextretention policy + deletion jobretention audit report
变更必须可追溯ADR + version registry + release bundleADR log、release memo

控制不应只是“有人审核”。成熟控制要说明:

  • 发生在哪里。
  • 是预防、检测还是纠正。
  • 是否自动化。
  • 由谁负责。
  • 如何测试有效性。
  • 产生什么证据。

6. Requirements-to-Architecture

Traceability graph 能暴露架构决策的必要性。

Requirement pressureArchitecture response
需要引用和快速更新政策RAG + source registry, 不使用纯 fine-tuning 存政策知识
需要控制工具副作用Tool gateway + policy engine + HITL
需要审计上线依据Evidence binder + ADR + release bundle
需要线上漂移监控OpenTelemetry trace + eval sampling + dashboard
需要降低成本model routing + cache + context budget
需要跨团队复用service catalog + golden paths

如果需求无法追踪到架构元素, 说明需求可能只是愿望。如果架构元素无法追踪到需求或风险, 说明它可能是过度设计。


7. Financial Retail Case: AML Investigation Copilot

目标: 帮助 AML analyst 汇总客户、交易、实体关系和监管规则证据, 生成 investigation brief 草稿。AI 不做最终 SAR 决策。

7.1 Traceability Matrix

Business outcomeRequirementEvalControlEvidence
缩短调查准备时间Copilot 汇总相关交易和实体关系analyst task completion benchmarkgraph/data source allowlistbenchmark report, trace sample
提高证据完整性Brief 必须引用交易、规则、case historyevidence coverage evalcitation required, source versioningeval report, source lineage
防止错误监管结论AI 不得决定 SAR filingrefusal/escalation evaldecision boundary policy, human signoffpolicy log, reviewer record
保护敏感数据不向未授权模型发送 restricted dataprivacy testmodel route data boundaryroute logs, DLP report
支持审计每次 brief 有 version、inputs、reviewerevidence completeness queryevidence binderrelease and case evidence

7.2 Graph Queries

高级 BA/Architect 应能回答:

Query用途
哪些 high-risk requirements 没有 eval coverage找上线前缺口
哪些 controls 没有 runtime evidence找审计风险
哪些 incidents 违反同一 requirement找系统性问题
某个 prompt 版本影响哪些 requirements做变更影响分析
某个 source 更新影响哪些 eval cases做知识更新回归
哪些 requirements 没有关联 business outcome清理低价值需求
哪些 architecture components 没有关联 requirement/risk识别过度设计

8. Evidence Objects

一个 evidence object 应有结构:

Field内容
evidence_id唯一编号
evidence_typeeval_report、trace_sample、approval、ADR、policy_log、incident
produced_by系统、人工、评测工具或审批流程
produced_at时间
related_requirementrequirement id
related_controlcontrol id
related_releaserelease id
source_versionprompt/model/retriever/tool/source version
retention保存期限
integrityhash、immutable store 或 access log
reviewer如需人工确认, 记录 reviewer

这让 evidence 不只是附件, 而是可查询、可复用、可审计的治理资产。


9. Release Decision Memo

Release memo 应明确:

Section内容
Business objective为什么要发布
Scope发布到哪些用户、渠道、地区、场景
Requirements coverage关键 requirements 和覆盖状态
Eval results指标、阈值、失败项、残余风险
Control coverage风险、控制、控制测试、证据
Architecture decisions关键 ADR 和反转条件
Runtime readinesstelemetry、SLO、runbook、incident path
Exceptions例外、补偿控制、过期日期
Decisionrelease、limited pilot、hold、rollback
Signoffproduct、architecture、risk、ops owner

这份 memo 是 traceability graph 的人类可读摘要。


10. Common Failure Modes

Failure mode表现修正
User story only需求不能追踪到 eval/control/evidence写 requirement-eval-control matrix
Eval orphan评测集存在但不知道覆盖哪些需求eval case 必须关联 requirement id
Control orphan控制存在但没有测试或证据control 必须关联 test/evidence
Evidence pile材料很多但不可查询evidence object metadata
No runtime link上线后 trace 和需求断开telemetry span 带 use case、requirement/control tags
Change blindnessprompt/model/source/tool 改动影响不可见version registry + impact query
Audit scramble内审来时临时找材料release bundle and evidence binder by default

11. Templates

11.1 Traceability Graph Table

IDNode typeNameOwnerLinks
BO-001Business outcomeReduce dispute triage cycle timeProductREQ-001, MET-001
REQ-001RequirementAgent recommends next best action with evidenceProduct/BAEVAL-001, CTRL-001, ADR-002
EVAL-001EvalDispute trajectory evalQA/EvalOpsMET-001, EVD-001
CTRL-001ControlHigh-risk action requires HITLRisk/OpsEVD-002
ADR-002ADRUse tool gateway for case actionsArchitectCTRL-001, COMP-003
EVD-001EvidenceEval run 2026-06-29EvalOpsREL-001

11.2 Coverage Matrix

RequirementEval coverageControl coverageRuntime evidenceStatus
REQ-001EVAL-001CTRL-001trace tag req=REQ-001covered
REQ-002EVAL-002CTRL-002policy logscovered
REQ-003EVAL-003CTRL-003HITL recordscovered

11.3 Evidence Query Examples

QueryExpected result
Show all requirements for release REL-001 without eval evidenceEmpty set before release
Show all policy exceptions expiring in 30 daysException review backlog
Show all incidents linked to advice boundary requirementIncident trend and remediation
Show all prompt versions used by customer-facing releaseVersion and eval report
Show all controls with no runtime traceControl implementation gap

12. 面试表达

30 秒版本:

我会把 AI 需求从 user story 升级成 traceability graph。每个 business outcome 连接 requirement, requirement 连接 eval、threshold、control、ADR、implementation/config、runtime telemetry 和 evidence。这样上线时能证明为什么可以发布, 线上出问题时能追踪影响, 审计时能回答每个高风险需求如何被测试和控制。

2 分钟版本:

以 AML investigation copilot 为例, 业务目标是缩短调查准备时间并提高证据完整性。需求不是“AI 生成调查摘要”这么简单, 而是摘要必须引用交易、规则和 case history; AI 不得决定 SAR filing; restricted data 不得发往未批准模型; 每次 brief 有 reviewer 和 evidence id。每条需求都有 eval、control 和 evidence。发布 memo 汇总 coverage, 线上 trace 带 requirement/control 标签, incident 会回连到违反的 requirement 和 remediation。这样 BA、产品、架构、风险和审计在同一张证据图上工作。

深挖追问:

追问回答要点
traceability graph 和传统 traceability matrix 区别graph 能表达多对多关系、版本、证据、incident 和 runtime telemetry
如何避免文档负担从 high-risk/material requirements 开始, 自动采集 evidence
如何和 eval 连接每个 eval case 关联 requirement/risk/control
如何和架构连接ADR、component、config 都关联 requirement/risk
如何支持审计evidence object + release memo + queryable binder

13. Practice Assignment

选一个 AI use case, 建立 traceability graph:

  1. 3 个 business outcomes。
  2. 8 个 requirements。
  3. 12 个 eval cases。
  4. 8 个 controls。
  5. 5 个 ADR/config/component links。
  6. 6 个 evidence object examples。
  7. 5 条 evidence query。
  8. 1 份 release decision memo 摘要。

完成标准:

  • 每个 high-risk requirement 都有 eval 和 control。
  • 每个 control 都有 evidence。
  • 每个 release decision 都能追溯到 eval 和 signoff。
  • 至少 2 个 incident/remediation 例子能回连到 requirement。
  • 能用 2 分钟讲清这张 graph 如何帮助产品、架构和审计共同决策。