返回 Papers
AI 扩展计划 / Playbooks

AI Traceability Requirements-Eval-Control Graph Playbook

以下官方来源作为 traceability graph 的方法锚点。本文把它们转成金融零售 AI 项目的需求工程, 评测门禁, 控制证据, 架构治理和审计问询资产。访问日期按 2026-06-29 记录。

738AI_TRACEABILITY_REQUIREMENTS_EVAL_CONTROL_GRAPH_PLAYBOOK.md

AI Traceability Requirements / Eval / Control Graph Playbook

面向对象: 高级 AI PM / AI BA / Product Architect / Solutions Architect / Enterprise Architect / EvalOps Lead / AI Governance / Model Risk / Internal Audit / 金融零售 AI 转型负责人。

目的: 把传统 BA 的 requirements traceability 升级为 AI 系统的 requirement -> eval -> control -> architecture decision -> implementation/config -> telemetry -> incident -> evidence graph, 用于产品设计, 架构治理, 上线门禁, 审计证据和作品集展示。

核心观点: AI traceability 不是把 PRD 条目连到测试用例, 而是证明一个 AI 能力在特定业务结果, 风险边界, 系统版本和运行证据下可被测量, 控制, 追责和持续治理。

使用方式: 每个高影响 AI use case 至少维护一张 Traceability Graph Table, 一张 Coverage Matrix, 一组 Evidence Queries, 一份 Release Decision Memo, 一组 Audit Q&A 和一份 Portfolio Evidence Pack。

重要说明: 本文是学习, 作品集和治理设计材料, 不是法律意见, 审计意见, 模型验证结论或监管解释。正式项目必须由 Legal, Compliance, Risk, Model Risk, Internal Audit, Security, Privacy, Data Owner 和业务管理层按适用司法辖区确认。


1. Source Anchors

以下官方来源作为 traceability graph 的方法锚点。本文把它们转成金融零售 AI 项目的需求工程, 评测门禁, 控制证据, 架构治理和审计问询资产。访问日期按 2026-06-29 记录。

AnchorOfficial source本文使用方式
W3C PROVhttps://www.w3.org/TR/prov-overview/借用 Entity / Activity / Agent 的 provenance 思路, 组织 AI requirement, eval run, release decision, telemetry, incident 和 evidence 的来源链。
NIST AI RMFhttps://www.nist.gov/itl/ai-risk-management-framework用 Govern / Map / Measure / Manage 组织 AI 风险上下文, 评测, 控制, 决策和持续改进的 traceability。
ISO/IEC 42001https://www.iso.org/standard/81230.html用 AI management system 视角把 scope, role, operation, performance evaluation, improvement 和 management review 转成证据对象。
OpenTelemetryhttps://opentelemetry.io/docs/用 traces, metrics, logs 和 attributes 思路设计生产 telemetry, 让线上行为可以回连到 requirement, eval, control 和 incident。

Standards-to-artifacts:

Source lensTraceability artifact高级表达
W3C PROVlineage model, evidence provenance, change impact map“我不只保存证据, 我会证明证据由谁, 在哪个活动, 基于哪个实体生成。”
NIST AI RMFrisk-to-eval-to-control traceability, release gate, monitoring gate“我把 AI requirement 放入 Govern / Map / Measure / Manage 的闭环, 而不是把它当功能需求结束。”
ISO 42001AIMS evidence map, ownership matrix, management review pack“我用管理体系语言证明 AI 能力有范围, 责任, 运行控制, 绩效评价和持续改进。”
OpenTelemetrytrace attribute spec, production evidence query, incident replay“我在架构阶段就定义可观测性字段, 不靠事后截图证明 AI 行为。”

2. One-Sentence Positioning

AI Traceability Graph = 把业务目标, AI 需求, 评测契约, 控制活动, 架构决策, 配置版本, 生产遥测, 事故复盘和审计证据连接成一张可查询, 可复核, 可治理的证据图。

最小链路:

Business outcome
-> stakeholder concern
-> AI requirement
-> eval question
-> eval case / metric / threshold
-> control objective / control activity
-> ADR
-> implementation / prompt / RAG / tool / policy config
-> telemetry signal
-> incident / exception / change
-> evidence artifact
-> release or assurance decision

高级 BA 的差异化不在于“能不能画 traceability matrix”, 而在于能否回答:

  1. 这个 AI 需求为什么服务某个业务结果和风险结果。
  2. 这个需求如何被 eval contract 证明, 而不是靠 demo 证明。
  3. 哪些控制降低了 AI 的误用, 幻觉, 越权, 泄露, 偏差和过度依赖风险。
  4. 哪个 ADR 解释了模型, RAG, tool, logging, fallback 和 human oversight 的关键选择。
  5. 哪些配置版本在生产中真实运行。
  6. 哪些 telemetry 可以证明 AI 行为仍在边界内。
  7. 哪个 incident 或 exception 影响哪些需求, eval, 控制和证据。
  8. 审计或监管问询时, 如何从问题追到证据路径。

3. 为什么 AI 需求不能只停在 User Story

User story 适合表达用户目标和交互意图, 但 AI 系统的真实风险通常不在 story 文案里, 而在概率行为, 数据来源, 工具权限, 模型版本, 人工控制, 运行漂移和证据缺口里。

3.1 User story 的失效点

传统写法表面价值AI 项目缺口
As an analyst, I want AI to summarize AML cases, so that I can work faster.有角色, 功能, 价值没有说明关键事实必须来源于哪些证据, 哪些结论禁止生成, 哪些输出必须人工确认, 如何评测遗漏 red flag。
As a customer service agent, I want AI to answer policy questions.有工作流插入点没有定义政策版本, citation, unauthorized commitment, 投诉升级, 客户影响和生产监控。
As a lender, I want AI to draft credit memos.有辅助写作场景没有定义 fair lending 边界, protected class exclusion, reason code, human decision, audit log 和 override evidence。
As a product owner, I want a dashboard for AI quality.有管理视图没有定义 quality 与 requirement, eval, release gate, monitoring signal, incident severity 的关系。

User story 可以保留, 但只能作为 graph 的一个节点。AI requirement 必须进一步转成:

User story
-> decision boundary
-> data and knowledge boundary
-> behavior requirement
-> risk requirement
-> eval requirement
-> control requirement
-> telemetry requirement
-> evidence requirement

3.2 弱需求与强需求

弱需求问题强需求
AI must provide accurate answers.没有样本, 分母, 风险等级和失败定义在信用卡费用政策场景中, 面向坐席的回答必须引用 approved policy source id 和 effective date; unauthorized fee waiver commitment 为 critical failure, 目标为 0。
AI should cite sources.不知道来源是否有效, 是否支持结论每个 material factual claim 必须关联至少一个 active source id; citation audit 检查 source existence, source freshness, claim support 和 entitlement。
AI should be safe.控制目标不可测试AI 不得执行客户影响动作; 所有 tool call 受 allowlist, role entitlement, dry-run validation 和 human approval 控制。
AI should be monitored after release.监控对象不清生产 telemetry 必须记录 requirement_id, eval_contract_id, model_version, prompt_version, kb_version, tool_name, decision_boundary, risk_tier 和 escalation_result。
AI output should be reviewed by humans.人工复核可能形式化高风险输出保存到 system of record 前必须记录 reviewer_id, review_outcome, edit_diff_hash, approval_timestamp 和 escalation_reason。

3.3 从 CBAP Traceability 到 AI Traceability

本文不重复 CBAP 需求生命周期基础。升级点如下:

CBAP traceabilityAI traceability upgrade
Business requirement -> stakeholder requirement -> solution requirementBusiness outcome -> AI behavior contract -> eval contract -> control objective -> production evidence
Requirement -> design -> test caseRequirement -> dataset slice -> metric -> threshold -> critical failure -> release gate
Change impact analysisPrompt/model/RAG/tool/policy change -> impacted eval cases -> impacted controls -> impacted telemetry -> impacted evidence
Requirements statusRequirement state + eval state + control state + monitoring state + incident state
Acceptance criteriaRisk-tiered evaluation, hard blockers, residual risk acceptance, monitoring triggers
Traceability matrixQueryable graph with version, owner, provenance, freshness and decision impact

一句面试表达:

传统 traceability 证明“需求被实现和测试”。AI traceability 还要证明“行为在风险边界内被评测, 控制, 监控, 复核, 变更和审计”。这就是 CBAP 能力在 AI 系统里的升级。


4. Traceability Graph Nodes

Traceability graph 的节点不是越多越好, 而是每个节点都必须回答一个治理问题。下面 taxonomy 适合金融零售 AI 项目。

Node type核心问题最低字段
Business Outcome业务或风险结果改善什么outcome_id, baseline, target, owner, measurement_method, risk_constraint
Stakeholder Concern谁担心什么concern_id, stakeholder, concern, decision_needed, severity
AI RequirementAI 必须如何行为requirement_id, requirement_type, allowed_behavior, forbidden_behavior, risk_tier, owner
Assumption需求成立依赖什么assumption_id, assumption, validation_method, expiry_condition
Risk失败会造成什么影响risk_id, risk_event, impact, likelihood, severity, affected_party
Eval Question评测要回答什么eval_question_id, question, linked_requirement, decision_use
Eval Case用哪些样本证明eval_case_id, scenario, dataset_slice, expected_behavior, severity
Metric用什么信号判断metric_id, definition, denominator, threshold, hard_stop, slice
Eval Run哪次评测结果eval_run_id, version_set, dataset_version, result, failed_cases, reviewer
Control Objective风险降低到什么状态control_objective_id, risk_id, objective, assurance_claim
Control Activity控制如何实际运行control_activity_id, preventive_detective_corrective, system_rule, manual_step, owner
ADR架构为什么这样选adr_id, decision, alternatives, control_impact, tradeoff, approval
Implementation Component哪个组件落地需求component_id, service, workflow_step, interface, repository_or_system
Configuration哪个版本在运行config_id, model_version, prompt_version, kb_version, tool_schema_version, policy_version
Telemetry Signal生产中看什么signal_id, trace_attribute, metric_or_log, threshold, sampling_rule, retention
Incident / Exception哪个失败或例外改变状态incident_id, severity, linked_signal, affected_requirement, remediation, decision
Evidence Artifact哪份材料证明evidence_id, artifact_type, source_system, version, generated_at, reviewer, retention
Decision Record谁基于什么决策decision_id, decision_type, go_limited_no_go, approver, conditions, expiry
Owner / Agent谁负责或生成agent_id, role, accountability, review_cadence

4.1 Node metadata standard

每个高价值节点至少包含:

Metadata要求
Unique ID跨 PRD, ADR, eval report, control matrix, telemetry 和 audit binder 可引用。
Version与系统版本, 模型版本, prompt 版本, 知识库版本, policy 版本分离管理。
Owner业务 owner, control owner, evidence owner 和 technical owner 不混用。
Statedraft, approved, active, restricted, retired, superseded, failed, accepted_exception。
Effective window适用日期, 版本窗口, release scope 和复核日期。
Risk tierlow, medium, high, critical, 并说明依据。
Evidence qualityfreshness, coverage, independence, reproducibility, retention。
Change trigger哪些变化会导致节点重新评审或证据失效。

4.2 W3C PROV 映射

用 PROV 思路可以避免证据链混乱:

PROV conceptAI traceability object示例
Entityrequirement, dataset, prompt, model, policy, eval report, release memo, log extractREQ-CS-014, KB-RETAIL-POLICY-2026-06, EV-EVAL-2026-06-18
Activityelicitation, risk assessment, eval run, release review, incident triage, control retestEVAL-RUN-2026-06-18, REL-GATE-2026-06-20
Agentproduct owner, BA, architect, model risk reviewer, compliance approver, service accountAI Product Owner, Model Risk Reviewer, EvalOps Pipeline

一句话:

Evidence is credible when the graph can show which activity generated it, which entity it used, which agent approved it, and which decision it supported.


5. Traceability Graph Edges

Edges 是图谱的治理价值。节点只是对象清单, edge 才回答“为什么相关”。

Edge type含义典型查询
derives_fromrequirement 来自 outcome 或 stakeholder concern哪些需求支撑这个业务结果?
refines高层需求被细化为行为, 数据, 评测, 控制需求这个 PRD 需求有哪些 AI-specific 子需求?
constrainspolicy, risk appetite 或 decision boundary 限制某需求哪些约束导致该 AI 不能自动决策?
verifieseval case, metric 或 test 验证 requirement/control这个需求被哪些 eval 证明?
mitigatescontrol objective 降低 risk这个风险由哪些控制降低?
implementscomponent/config 实现 requirement 或 control哪个服务, prompt, tool schema 落地该控制?
decided_byADR 或 release memo 解释选择为什么采用 RAG 而不是 fine-tuning?
emitscomponent 产生 telemetry signal哪些生产信号证明该需求在运行?
triggerssignal 触发 incident, review, rollback 或 eval refresh哪些指标超过阈值会停止发布?
supportsevidence 支持 claim, test, control 或 decision这份证据支持哪个上线主张?
approved_bydecision 由某 agent 批准谁接受剩余风险, 到什么时候?
supersedes新版本替代旧版本哪些旧证据因新 prompt 失效?
impacted_bychange 影响 requirement, eval, control, telemetry 或 evidence模型升级会影响哪些门禁?
observed_inincident 或 production trace 观察到某失败模式哪些线上失败已经进入 regression dataset?

5.1 Edge cardinality rules

Rule质量标准
每个 high-risk AI requirement 至少连接一个 risk, 一个 eval question, 一个 metric, 一个 control objective 和一个 evidence artifact。
每个 critical risk 至少连接一个 preventive control, 一个 detective control, 一个 incident response trigger 和一个 release blocker。
每个 release decision 必须连接 eval run, control evidence, residual risk, approver 和 expiry/review condition。
每个 production incident 必须反向连接 affected requirement, affected config, telemetry signal, evidence update 和 regression eval case。
每个 ADR 若改变模型, RAG, tool, logging, fallback 或 human oversight, 必须连接 impacted requirements 和 controls。

5.2 Edge anti-patterns

Anti-pattern风险修正
Requirement 只连 Jira ticket无法证明风险和证据加入 eval, control, ADR, telemetry 和 evidence edges。
Eval report 只连 release memo无法追到具体需求每个关键 metric 连接 requirement, risk, slice 和 threshold。
Control matrix 只连政策条款控制与系统实现脱节控制活动连接 component, config, log field 和 operating evidence。
Incident 只在运维系统失败不会改善需求和评测incident 连接 failed requirement, root cause, regression case 和 release condition。
Evidence 只按文件夹存放问询时无法定位evidence 连接 claim, control, version, owner, generation activity 和 retention。

6. End-to-End Chain: Outcome to Evidence

下面链路是本文的核心模板。它把需求工程, 评测, 控制, 架构, 生产运行和证据治理放在同一张图里。

Business outcome
-> Requirement
-> Eval
-> Control
-> ADR
-> Implementation / Config
-> Telemetry
-> Incident / Exception
-> Evidence
Step产物关键问题金融零售示例
Business outcomeOutcome card业务结果和风险约束是什么坐席政策查询时间降低 30%, unauthorized commitment 为 0。
RequirementAI requirement cardAI 可以/不可以做什么AI 可草拟回答, 必须引用有效政策, 命中投诉/欺诈/信贷承诺转人工。
EvalEval contract如何证明需求可上线用 fee, dispute, credit limit, complaint, fraud slices 评估 groundedness, escalation, forbidden commitment。
ControlControl matrix哪些控制降低风险RAG source approval, forbidden action classifier, HITL escalation, QA sampling。
ADRArchitecture decision为什么这样设计采用 RAG + policy registry, 不 fine-tune 政策知识; tool action 只允许 read-only。
Implementation / ConfigVersion registry哪些版本在生产model gpt-4.1-prod, prompt cs-p12, kb retail-policy-2026-06, policy fee-v8
TelemetryTrace/log/metric spec线上如何观察每次回答记录 requirement_id, citation_status, escalation_result, policy_version, risk_tier。
Incident / ExceptionIncident record失败如何闭环发现 1 条错误费用承诺, 冻结相关 intent, 更新 policy guardrail, 加入 regression set。
EvidenceEvidence binder item如何给审计证明eval report, release memo, source registry, trace sample, incident RCA, regression pass report。

6.1 Release gate view

Gate questionGraph query
是否所有 high-risk requirements 都有 eval coverage?找出 risk_tier in high/critical 且无 verifies edge 的 requirements。
是否所有 critical failures 都关闭或有 no-go decision?查询 failed eval cases where severity = critical and release_decision not in no-go / restricted / remediated。
是否生产 telemetry 覆盖上线主张?查询 release claims without emitted telemetry signals or evidence retention rule。
是否某次模型升级影响控制证据?从 model_version change 沿 impacted_by edge 找 requirements, eval cases, controls, evidence artifacts。
是否可以进入 limited release?汇总 eval pass, control evidence quality, residual risk acceptance, monitoring readiness 和 stop rules。

6.2 Audit view

Examiner questionGraph path
这个 AI 是否会做最终客户影响决定?question -> claim -> decision boundary requirement -> tool control -> ADR -> permission test -> audit log sample -> release memo。
如何证明政策回答使用了有效来源?question -> RAG requirement -> source registry control -> citation eval -> production trace sample -> evidence binder。
谁批准了剩余风险?question -> residual risk -> release decision -> approver agent -> decision record -> expiry condition。
某事故是否完成整改?question -> incident -> root cause -> remediation control -> regression eval -> production monitoring -> closure evidence。

7. Graph Architecture and Operating Model

Traceability graph 不一定一开始就是图数据库。高级做法是先把 ID, edge, owner, evidence 和 query discipline 建起来, 再决定工具形态。

7.1 Maturity levels

Level形态适用阶段风险
L1 Spreadsheet MatrixExcel/Sheets/Markdown table 管理节点和 edges作品集, PoC, 单用例 pilot容易版本漂移, 查询能力有限。
L2 Linked ArtifactsPRD, ADR, eval report, control matrix, evidence index 使用统一 ID多团队 release需要强文档纪律和 reviewer 机制。
L3 Metadata Registry用例台账, model registry, prompt registry, dataset registry, evidence registry 联动多用例治理需要治理 owner 和数据质量规则。
L4 Queryable Graph图数据库或 GRC/SDLC 工具集成, 支持影响分析和审计查询高影响 AI portfolio需要 schema governance, access control 和 change management。
L5 Runtime-Connected GraphOpenTelemetry, incident, CI/CD, eval pipeline 自动回写 traceability生产级 AI platform需要平台工程投入和强安全控制。

7.2 最小可行 schema

TablePurposeKey fields
trace_nodes管理所有节点node_id, node_type, title, owner, risk_tier, state, version, effective_from, effective_to
trace_edges管理关系source_node_id, edge_type, target_node_id, rationale, created_by, created_at, confidence
trace_evidence管理证据evidence_id, artifact_type, source_system, version, generated_by, reviewed_by, retention, quality_score
trace_decisions管理门禁和风险接受decision_id, decision_type, scope, result, conditions, approver, expiry, evidence_refs
trace_changes管理影响分析change_id, change_type, affected_versions, impacted_nodes, required_regression, decision

7.3 OpenTelemetry instrumentation discipline

生产 telemetry 必须能回连 graph, 否则运行证据会断链。

建议核心 attributes:

Attribute用途
ai.use_case_id连接 AI inventory 和 business outcome。
ai.requirement_id连接 production trace 到需求。
ai.eval_contract_id判断该生产行为是否有上线前评测契约。
ai.risk_tier支持高风险场景抽样, 告警和保留策略。
ai.model_version连接 model registry 和 release evidence。
ai.prompt_version连接 prompt change 和 regression eval。
ai.kb_version连接 RAG source registry 和 citation audit。
ai.policy_version连接政策生效日期和回答有效性。
ai.tool_name连接 tool allowlist 和 permission control。
ai.tool_decision记录 allowed, blocked, human_approval_required, executed。
ai.citation_status记录 supported, missing, stale, non_supporting, unauthorized。
ai.escalation_result记录 no_escalation, escalated_to_human, supervisor_review, blocked。
ai.output_hash支持复盘, 降低敏感明文暴露。
ai.evidence_trace_id连接日志样本到 evidence binder。

7.4 RACI

RoleTraceability responsibility
AI PM定义 outcome, release decision, adoption metrics, residual value/risk narrative。
AI BA建立 requirement, stakeholder concern, workflow, decision boundary 和 traceability graph。
Architect定义 ADR, trust boundary, logging, data flow, tool permission, fallback 和 config lineage。
EvalOps Lead维护 eval contract, dataset slices, eval runs, metrics 和 failed-case regression。
Control Owner定义 control objective, control activity, test method, frequency 和 failure condition。
Evidence Owner维护 evidence artifact, quality score, retention, reviewer 和 freshness。
Model Risk / Compliance审核 risk tier, evaluation sufficiency, control coverage, exception 和 release condition。
Internal Audit检查 traceability completeness, evidence quality, operating effectiveness 和 management review trail。

8. Financial Retail Case: Customer Service AI Policy Copilot

8.1 Use case boundary

DimensionScope
Use case零售银行客服坐席 AI Policy Copilot, 辅助回答信用卡费用, 争议处理, 账户服务和投诉升级问题。
AI roleretrieve, summarize, draft, cite, classify high-risk intent。
AI 不做不直接向客户发送回复, 不承诺费用减免, 不批准信贷, 不提供法律结论, 不绕过主管升级。
Users客服坐席, 质检主管, 知识库 owner, 产品 owner, 合规 reviewer。
Data / knowledgeapproved policy repository, fee schedule, dispute SOP, complaint escalation policy, account context with entitlement filtering。
Risk tier高影响辅助系统, 因可能影响客户金融产品理解, 费用争议, 投诉升级和客户权益。

8.2 Traceability graph sample

Source nodeEdgeTarget nodeRationale
OUT-CS-001: Reduce policy lookup time by 30% without unauthorized commitmentderives_fromREQ-CS-014: AI answers must cite active policy and avoid fee waiver commitment业务效率目标必须受客户权益约束。
REQ-CS-014verifiesEVAL-CS-021: Fee waiver and dispute response golden set用高风险费用和争议样本验证政策引用和禁止承诺。
EVAL-CS-021uses_metricMET-CS-005: Critical unauthorized commitment count = 0任何未经授权承诺都是 release blocker。
REQ-CS-014mitigated_byCTRL-RAG-003: Approved source and citation control通过 source registry 和 citation audit 降低错误政策风险。
CTRL-RAG-003decided_byADR-CS-007: RAG over fine-tuning for policy freshness政策频繁变化, 需要 source-level freshness 和可引用性。
ADR-CS-007implemented_byCMP-CS-RAG-02: Policy retriever with entitlement filter系统实现 approved source, active status 和权限过滤。
CMP-CS-RAG-02configured_byCFG-CS-2026-06: kb retail-policy-2026-06, prompt cs-p12明确生产版本。
CMP-CS-RAG-02emitsSIG-CS-009: citation_status by policy_version and intent生产监控来源引用是否存在, 过期或不支持结论。
SIG-CS-009triggersINC-CS-2026-0611: Stale fee policy citation incident线上发现过期费用政策引用。
INC-CS-2026-0611observed_inEVAL-CS-REG-003: Regression case for stale fee source事故样本进入回归集。
EVAL-CS-REG-003supportsEV-CS-037: Regression pass report after source registry fix证明修复有效。
EV-CS-037supportsDEC-CS-2026-0620: Limited release to 10% seats支撑灰度发布决策。

8.3 Coverage matrix

RequirementRiskEval coverageControl coverageTelemetry coverageEvidenceGate impact
REQ-CS-014 active policy citationWrong or stale policy misleads customerFee/dispute/complaint golden set, stale-source red teamApproved source registry, citation auditai.citation_status, ai.policy_version, ai.kb_versionsource registry, eval report, trace samplecritical wrong citation blocks release
REQ-CS-018 high-risk escalationComplaint, fraud or legal-risk intent not escalatedHigh-risk intent classifier evalEscalation SOP, supervisor queueai.escalation_result, intent risk labelescalation log, QA sampleunder-escalation critical count = 0
REQ-CS-022 no unauthorized commitmentAI promises fee waiver or credit outcomeForbidden commitment red-team setResponse policy, forbidden action classifierblocked commitment counter, QA tagsred-team report, blocked output logsany confirmed occurrence = no-go or restricted release
REQ-CS-026 human review before customer sendAI draft sent without agent responsibilityWorkflow walkthrough, UATUI requires agent final action and records edit diffreview action, edit diff hash, send actorworkflow log, review samplemissing review log blocks customer-facing rollout
REQ-CS-031 traceable production behaviorCannot reconstruct output after complaintLogging completeness testOTel attribute standard, retention ruletrace id, output hash, version attributeslog completeness reportincomplete logs restrict release scope

8.4 Release decision interpretation

SignalResultDecision implication
Critical unauthorized commitment0 in release eval, 0 in pilot trace sampleEligible for limited release if monitoring and stop rule active。
Citation support98.7% overall, 100% on high-risk fee/dispute slicesAcceptable with weekly citation audit and source owner attestation。
High-risk escalation99.2% overall, one medium severity miss in low-impact intentLimited release with updated classifier rule and QA sampling。
Log completeness99.8% required attributes presentSupports audit replay and complaint investigation。
Residual riskUsers may over-trust fluent draftsMitigated by UI disclosure, mandatory agent final action, QA sampling and training。

9. Financial Retail Case: AML Investigation Agent

9.1 Case graph

Graph layerAML example
Business outcomeReduce evidence collection and narrative drafting time by 25% while keeping critical red flag omission at 0。
AI requirementAgent may retrieve, summarize and draft narrative; it must not close alerts, change risk rating or submit SAR。
EvalHistorical alert set, red flag omission set, source grounding eval, policy conflict cases。
ControlTool allowlist read/draft only, SAR workflow no AI write permission, L2 review before case record save。
ADRUse RAG with case evidence and AML policy source ids; no autonomous disposition tool。
Implementation/configcase retriever, policy retriever, narrative prompt, tool registry, role entitlement filter。
Telemetryred_flag_checklist_status, citation_status, reviewer_approval, edit_diff_hash, disposition_actor。
IncidentAI narrative omitted structuring pattern in a QA sample; case enters regression dataset。
Evidencegrounding eval, red flag eval, permission test, review log sample, QA finding closure。

9.2 Audit-ready Q&A path

QuestionAnswerEvidence path
AI 是否会替代 analyst 做 AML disposition?不会。Agent 只能检索, 摘要和草拟; disposition 和 SAR submission 仍由授权人员在 case system 完成。REQ-AML-010 -> CTRL-AGT-004 -> ADR-AML-003 -> permission matrix -> negative test -> case action log。
如何证明 narrative 没有编造事实?material factual statements 必须有 case evidence 或 AML policy source id; release eval 和 QA 抽样检查 unsupported claim。REQ-AML-014 -> EVAL-GRD-AML-02 -> MET-UNSUP-001 -> citation audit -> expert review sample。
如果 AI 遗漏 red flag 怎么办?QA finding 触发 incident, failed trace 进入 regression set, 修复 prompt/RAG 后重新执行 eval gate。SIG-AML-RED-004 -> INC-AML-2026-07A -> EVAL-REG-AML-009 -> EV-RET-AML-011。

10. Financial Retail Case: Credit Memo Copilot

10.1 Boundary

DimensionScope
Use caseSmall business credit memo copilot, 为 underwriter 草拟材料缺口, policy checklist, risk factors 和 memo structure。
AI 不做不 approve / decline, 不生成最终 adverse action reason, 不使用 protected class 推断, 不绕过 underwriter accountability。
Risk focusfair lending, explainability, data minimization, reason consistency, human decision boundary。

10.2 Traceability example

RequirementEvalControlEvidence
AI must not infer or use protected class attributes.Protected-attribute leakage and proxy reasoning red-team cases。Feature exclusion, prompt policy, reviewer checklist, logging of input field set。data field inventory, red-team report, review sample, log extract。
AI may draft memo but final credit decision remains human.Workflow UAT verifies final decision actor and approval path。Core lending system decision buttons unavailable to AI service account。RBAC test, service account permission matrix, decision audit log。
AI risk factor statements must cite application data or policy.Groundedness eval across thin-file, missing-data, conflicting-document slices。Evidence citation requirement and missing-evidence response policy。eval report, failed case analysis, policy source registry。
AI output must support adverse action consistency but not auto-generate notice.Reason-code consistency eval and human review calibration。Underwriter review, compliance sample, notice generation remains rule-controlled。calibration note, compliance review, workflow sample。

10.3 Senior interview point

In credit, I would treat AI as a memo and control assistant, not a decision engine, unless the institution has explicitly approved that automation boundary. My traceability graph would make this visible: every credit-impacting requirement links to eval slices, fair-lending controls, decision-system permissions, reviewer evidence and production telemetry.


11. Templates

这些模板使用“字段 + 合格样例”的方式, 避免空表格。正式项目可以把样例替换为机构内真实 ID 和证据编号。

11.1 Traceability Graph Table

Field填写要求合格样例
Source Node ID起点节点, 使用稳定 IDREQ-CS-014
Source Node Typerequirement, eval, control, ADR, component, signal, incident, evidencerequirement
Edge Typederives_from, verifies, mitigates, implements, emits, triggers, supportsverifies
Target Node ID终点节点EVAL-CS-021
Target Node Type终点类型eval_case
Rationale为什么相连费用政策回答需求由高风险费用/争议样本验证。
Owner关系维护责任人AI BA Lead
Evidence Ref支撑该关系的材料EV-CS-021-EVAL-MAP
Freshness Rule何时重审prompt, kb, fee policy 或 escalation policy 变更时重审。
Gate Impact对上线决策的影响critical slice failed -> no-go。

11.2 Coverage Matrix

Requirement IDBusiness outcomeRiskEval casesMetricsControlsADR / ComponentTelemetryEvidenceCoverage status
REQ-CS-014OUT-CS-001RISK-CS-006 wrong policy adviceEVAL-CS-021, EVAL-CS-REG-003MET-CS-005, MET-CIT-002CTRL-RAG-003ADR-CS-007, CMP-CS-RAG-02SIG-CS-009EV-CS-021, EV-CS-037Covered for limited release
REQ-CS-022OUT-CS-001RISK-CS-009 unauthorized commitmentEVAL-CS-033MET-CRIT-001CTRL-SAFE-006ADR-CS-010, CMP-CS-GUARD-01SIG-CS-014EV-CS-033, EV-CS-041Covered with hard blocker
REQ-CS-031OUT-CS-002RISK-AUD-002 cannot reconstruct answerEVAL-LOG-004MET-LOG-001CTRL-LOG-002ADR-OBS-002, CMP-OTEL-01SIG-TRACE-001EV-LOG-004Covered with retention rule

11.3 Evidence Query Examples

Evidence queries 可以先用 SQL, graph query, spreadsheet filter 或 GRC 报表表达。关键是查询语义清楚。

Query A: high-risk requirements without eval coverage

SELECT r.node_id, r.title, r.owner
FROM trace_nodes r
LEFT JOIN trace_edges e
  ON e.source_node_id = r.node_id
 AND e.edge_type = 'verifies'
WHERE r.node_type = 'requirement'
  AND r.risk_tier IN ('high', 'critical')
  AND e.target_node_id IS NULL;

Decision use: release readiness review。结果非空时, 高风险需求不得进入 production release。

Query B: evidence supporting a release claim

SELECT c.node_id AS claim_id,
       ctrl.node_id AS control_id,
       ev.evidence_id,
       ev.artifact_type,
       ev.version,
       ev.reviewed_by,
       ev.quality_score
FROM trace_nodes c
JOIN trace_edges e1 ON e1.source_node_id = c.node_id AND e1.edge_type = 'mitigated_by'
JOIN trace_nodes ctrl ON ctrl.node_id = e1.target_node_id
JOIN trace_edges e2 ON e2.source_node_id = ctrl.node_id AND e2.edge_type = 'supports'
JOIN trace_evidence ev ON ev.evidence_id = e2.target_node_id
WHERE c.node_id = 'CLAIM-CS-DECISION-BOUNDARY';

Decision use: audit response。输出用于证明 AI 不做最终客户影响决定。

Query C: change impact from prompt version

SELECT impacted.target_node_id,
       impacted.edge_type,
       n.node_type,
       n.title,
       n.owner
FROM trace_edges changed
JOIN trace_edges impacted
  ON impacted.source_node_id = changed.target_node_id
JOIN trace_nodes n
  ON n.node_id = impacted.target_node_id
WHERE changed.source_node_id = 'CFG-PROMPT-CS-P12'
  AND changed.edge_type = 'impacted_by';

Decision use: change advisory board。prompt 变更必须列出受影响需求, eval, controls, telemetry 和 evidence。

Query D: production incidents not yet in regression set

SELECT i.node_id, i.title, i.owner
FROM trace_nodes i
LEFT JOIN trace_edges e
  ON e.source_node_id = i.node_id
 AND e.edge_type = 'observed_in'
WHERE i.node_type = 'incident'
  AND i.risk_tier IN ('high', 'critical')
  AND e.target_node_id IS NULL;

Decision use: monitoring gate。高风险线上失败必须进入 regression dataset 或有正式风险接受记录。

11.4 Release Decision Memo

# Release Decision Memo: Retail Service AI Policy Copilot r18

## Scope
- Use case: Customer service policy copilot for credit card fee, dispute and account-service questions.
- Release stage: Limited release to 10% trained agents.
- AI role: retrieve, summarize, draft, cite, classify high-risk intents.
- Excluded actions: direct customer send, fee waiver commitment, credit approval, legal conclusion.
- Version set: model `gpt-4.1-prod`, prompt `cs-p12`, knowledge base `retail-policy-2026-06`, tool schema `read-only-v3`.

## Traceability Summary
| Area | Result |
|---|---|
| Requirements | 14 active high-risk requirements, all mapped to eval and controls. |
| Eval | 6 eval suites passed; critical unauthorized commitment count = 0. |
| Controls | 9 required controls active; citation audit and escalation queue are release blockers. |
| Architecture | ADR-CS-007 and ADR-CS-010 approved by architecture and compliance reviewers. |
| Telemetry | Required OpenTelemetry attributes present in 99.8% of pilot traces. |
| Evidence | 18 evidence artifacts indexed with owner, version, reviewer and retention. |

## Decision
Limited go for 10% trained agents for 30 calendar days.

## Conditions
- Customer self-service channel remains disabled.
- Weekly citation audit covers fee, dispute and complaint slices.
- Any confirmed unauthorized commitment triggers immediate stop rule.
- Prompt, knowledge base, tool schema or escalation policy change triggers regression eval.

## Residual Risk
Agents may over-trust fluent drafts. Mitigation: UI boundary disclosure, mandatory agent final action, edit-diff logging, QA sampling and targeted training.

## Approvals
Business Owner, Compliance Reviewer, Model Risk Reviewer, AI Product Owner, Chief Architect delegate and Operations Owner approved this limited release decision on 2026-06-20.

11.5 Audit Q&A

Examiner questionFactual answerGraph pathEvidence
Does the AI make final decisions affecting customers?No. It drafts agent-facing responses only. Customer send action remains with trained human agents.CLAIM-CS-DECISION-BOUNDARY -> REQ-CS-026 -> CTRL-HITL-002 -> ADR-CS-010 -> CMP-WORKFLOW-01workflow UAT, permission matrix, send-action audit log
How do you know answers use current policy?The RAG layer only indexes active sources from the approved policy registry and emits citation status by policy version.REQ-CS-014 -> CTRL-RAG-003 -> CMP-CS-RAG-02 -> SIG-CS-009 -> EV-CIT-2026-06source registry, index build log, citation audit
What happens after a wrong answer incident?The incident is triaged, affected version is scoped, failed trace enters regression, fix is retested before release expansion.INC-CS-2026-0611 -> EVAL-CS-REG-003 -> EV-CS-037 -> DEC-CS-2026-0620incident RCA, regression pass report, release condition
Who accepted the residual risk?Business and risk owners accepted limited-release residual risk for 30 days with explicit stop rules.RISK-CS-OVERTRUST -> DEC-CS-2026-0620 -> AGENT-BUS-OWNER / AGENT-RISK-REVIEWERrelease memo, approval record

11.6 Portfolio Evidence Pack

Portfolio asset内容展示能力
One-page executive narrative用一个金融零售 AI use case 讲清 outcome, risk boundary, release decision 和 evidence thesis。高管沟通和产品判断。
Traceability graph table20 到 40 条关键 nodes/edges, 覆盖 outcome -> evidence。高级需求工程和系统化治理。
Coverage matrixhigh-risk requirements 到 eval, controls, telemetry, evidence 的覆盖状态。上线就绪判断和缺口管理。
ADR pack3 到 5 个关键 ADR: RAG, tool permission, logging, fallback, HITL。架构治理和 tradeoff 表达。
Eval contract excerptdataset slices, metrics, thresholds, critical failures, release blockers。AI 验收和 EvalOps 能力。
Control evidence mapcontrol objective, activity, test, evidence, owner, cadence。风险, 合规, 审计语言。
Telemetry specOpenTelemetry attributes, log retention, incident replay path。工程落地和生产可观测性。
Audit Q&A8 到 12 个监管/内审问题和 evidence path。审计证据和问询响应能力。
Interview answer pack30 秒, 2 分钟, CTO/CRO/Chief Architect 版本。求职转化和跨角色表达。

12. Review Checklist

12.1 Requirement traceability

  • 每个 high-risk AI requirement 是否连接 business outcome 和 stakeholder concern。
  • 每个 requirement 是否写清 allowed behavior, forbidden behavior, decision boundary 和 risk tier。
  • 每个 behavior requirement 是否有 eval question, dataset slice, metric 和 threshold。
  • 每个 critical failure 是否独立作为 release blocker, 不被平均分抵消。
  • 每个 requirement 是否连接 control objective 和 control activity。
  • 每个 requirement 是否有 telemetry signal 支持生产监控。

12.2 Architecture traceability

  • 每个关键 ADR 是否说明影响哪些 requirement, control, telemetry 和 evidence。
  • 模型, prompt, RAG index, tool schema, policy source 和 guardrail 是否分别有版本。
  • tool action 是否有 allowlist, permission, approval 和 rollback path。
  • logging 是否能重建版本, 来源, 用户动作和人工复核。
  • fallback 和 stop rule 是否有生产演练或测试证据。

12.3 Evidence traceability

  • 每个 release claim 是否至少有一份直接 evidence 支撑。
  • 每份 evidence 是否有 owner, version, generated_at, reviewer, retention 和 quality score。
  • evidence 是否能追到生成活动和审批 agent。
  • evidence 是否覆盖当前生产版本, 而不是旧模型或旧 prompt。
  • incident closure 是否包含 root cause, remediation, regression result 和 monitoring confirmation。

12.4 Operating model traceability

  • 业务 owner 是否负责 outcome 和 residual risk narrative。
  • control owner 是否负责控制活动和运行频率。
  • evidence owner 是否负责证据质量和刷新。
  • EvalOps 是否负责 failed case 回流和 regression dataset。
  • Model Risk, Compliance, Internal Audit 是否能通过 graph 查询同一口径材料。

13. Common Failure Modes

Failure mode表现修正
Story-only requirementsPRD 写得像普通 SaaS 功能把 AI 行为拆成 decision boundary, eval, control, telemetry 和 evidence requirements。
Eval disconnected from requirements评测报告分数很高, 但不知道证明哪个需求每个 metric 映射 requirement, scenario slice, risk severity 和 release decision。
Controls disconnected from architecture控制矩阵说有权限控制, 系统没有日志证明控制活动连接 component, config, permission test 和 OTel attributes。
ADR without governance impactADR 只讲技术选型ADR 必须说明对风险, 控制, 证据, 成本, 监控和回滚的影响。
Telemetry afterthought上线后才发现无法解释事故在需求阶段定义 trace attributes, retention, sampling 和 incident replay。
Evidence folder chaos材料很多, 问询时找不到路径用 evidence_id, graph path, owner, version 和 quality score 管理。
Change breaks evidenceprompt 或知识库变了, 旧评测仍被用于 release变更触发 impacted_by 查询和 regression eval。
Incident not feeding eval事故复盘停在 RCAfailed trace 必须进入 regression case, 关联修复和 release condition。
Human review theater文档说有人审, 日志无法证明记录 reviewer, edit diff, approval action, escalation reason 和 QA sample。
Average metric hides harm平均 groundedness 达标, 高风险切片失败对 critical slices 设置 hard stop 和 slice-level threshold。

14. Interview Expressions

Q1: 你如何把传统需求追踪升级成 AI traceability?

30 秒回答:

我会把 traceability 从 “requirement 到 test case” 扩展成 “business outcome 到 evidence”。对 AI 系统, 每个高风险需求都要追到 eval contract, 控制目标, ADR, 版本配置, 生产 telemetry, incident loop 和审计证据。这样不只是证明功能做了, 还证明 AI 行为在风险边界内被测量, 控制和持续治理。

2 分钟回答:

传统 traceability matrix 通常回答需求是否被设计和测试。AI 系统还需要回答概率行为, 数据来源, 模型版本, 工具权限, 人工复核和上线后漂移。因此我会先定义 business outcome 和 stakeholder concern, 再写 AI requirement 的 allowed / forbidden behavior 和 decision boundary。随后把需求映射到 eval question, dataset slice, metric, threshold 和 critical failure。每个风险再映射到 control objective, control activity 和 test evidence。架构上用 ADR 记录模型, RAG, tool, logging 和 fallback 决策; 生产上用 OpenTelemetry attributes 把 trace 连接到 requirement, model version, prompt version, kb version 和 incident。最后审计问询可以从问题追到 claim, control 和 evidence path。

Q2: 为什么 user story 不足以管理 AI 需求?

30 秒回答:

User story 只能表达用户意图, 不能表达 AI 的评测样本, 失败严重度, 控制活动, 版本边界, telemetry 和审计证据。AI 需求必须变成行为契约和证据链。

2 分钟回答:

例如 “作为客服坐席, 我希望 AI 回答政策问题” 是一个有用入口, 但它不能告诉我们政策来源是否有效, 是否允许承诺费用减免, 哪些投诉必须升级, 错误回答如何被发现, 哪个版本在生产, 事故后如何回归测试。我的做法是把 story 作为图谱节点之一, 再扩展出 decision boundary, data/knowledge boundary, eval contract, control objective, ADR, telemetry 和 evidence。这样需求可以上线, 也可以被审计。

Q3: 你如何设计 AI release gate 的 traceability?

30 秒回答:

我会让 release gate 基于 graph query, 而不是会议印象。它必须证明 high-risk requirements 有 eval coverage, critical failures 为 0 或已限制范围, 控制证据有效, telemetry 就绪, residual risk 有 owner 和期限。

2 分钟回答:

Release gate 的输入包括 requirement coverage matrix, eval run report, failed case list, control evidence, ADR approval, telemetry readiness, incident response plan 和 residual risk acceptance。比如客服 AI 如果有 unauthorized commitment, 即使平均准确率很高也 no-go。如果 citation 支持率在高风险切片达标, 日志完整率达标, 人工复核真实运行, 那可以 limited go。关键是每个结论都能追到图谱路径, 例如 requirement -> eval case -> metric -> control -> ADR -> trace sample -> evidence -> decision。

Q4: 如何把 OpenTelemetry 用到 AI governance?

30 秒回答:

我会把 OpenTelemetry 当作 AI evidence graph 的生产连接层。每次 AI 调用都带 use_case_id, requirement_id, eval_contract_id, model_version, prompt_version, kb_version, tool decision, citation status 和 escalation result。

2 分钟回答:

很多 AI 项目上线后无法回答“错误来自哪个版本, 哪个政策, 哪个工具调用, 哪个用户动作”。所以可观测性不能只看 latency 和 cost。对金融零售 AI, trace attributes 必须能支持审计复盘和控制验证。例如 RAG 回答要记录 kb_version, policy_version, citation_status; Agent 工具调用要记录 tool_name, tool_decision, approval result; 人工复核要记录 review outcome 和 edit diff hash。这样 incident 可以从 production trace 回连 requirement, eval, control 和 evidence。

Q5: 面对监管或内审, traceability graph 的价值是什么?

30 秒回答:

它让团队不再临时找材料, 而是从监管问题直接追到 claim, requirement, control, test, evidence, owner 和 decision record, 并且能说明证据适用于哪个版本和时间窗口。

2 分钟回答:

监管或内审通常不会只问“有没有测试报告”, 而会问 AI 是否会做最终决策, 如何证明数据权限没有扩大, 政策回答是否当前有效, 事故如何整改, 谁接受剩余风险。Traceability graph 可以把这些问题转成路径。例如 “AI 不做最终信贷决定” 连接到 decision boundary requirement, tool permission control, ADR, RBAC test, workflow log 和 release memo。这个结构比共享盘文件夹更稳, 因为每份证据都有 owner, version, reviewer, retention 和支持的 claim。

Q6: 你如何把这套能力做成作品集?

30 秒回答:

我会选一个金融零售 AI 用例, 例如客服 policy copilot 或 AML investigation agent, 做一套 outcome-to-evidence case pack: traceability graph, coverage matrix, eval contract excerpt, ADR, telemetry spec, audit Q&A 和 release decision memo。

2 分钟回答:

作品集不要只放 PRD。高级 AI PM/BA/Architect 的价值在于能跨业务, 需求, 架构, 风险和审计。我会展示一个完整链路: 先用一页 executive narrative 说明业务目标和风险边界; 然后用 traceability graph 说明每个高风险需求如何被 eval, control, ADR, telemetry 和 evidence 覆盖; 再用 release decision memo 说明为什么是 limited go 而不是 full release; 最后用 audit Q&A 模拟监管问询。这样面试官能看到我不是只懂需求文档, 而是能把 AI 系统推到可治理上线。


15. Final Memory Card

Concept一句话
AI Traceability Graph把 outcome, requirement, eval, control, ADR, config, telemetry, incident 和 evidence 连成可查询治理图。
User story limitUser story 表达意图, 但不足以证明 AI 行为可测, 可控, 可审计。
Eval linkage每个高风险 AI requirement 必须有 eval question, dataset slice, metric, threshold 和 critical failure rule。
Control linkage每个关键风险必须有 preventive, detective 或 corrective controls, 并能追到测试和证据。
Architecture linkageADR 必须说明模型, RAG, tool, logging, fallback 和 HITL 决策对控制证据的影响。
Runtime linkageOpenTelemetry attributes 把生产行为连接回需求, 版本, 控制和 incident。
Evidence linkage证据必须有 owner, version, reviewer, freshness, retention 和支持的 claim。
Portfolio thesis高级 BA 的 AI 竞争力是把需求追踪升级为系统可治理性的证据图。

最重要的一句话:

AI traceability 的目标不是证明团队写过需求, 而是证明组织知道这个 AI 为什么存在, 如何被评测, 被哪些控制约束, 以什么版本运行, 发生问题如何复盘, 以及用什么证据接受上线和持续运营。