AI 扩展计划 / Playbooks

AI Semantic Layer / Metrics Architecture Playbook

这些来源作为架构和治理锚点，不构成法律、监管或供应商选型意见。表中只保留当前可访问的官方入口，便于作品集长期引用。

863 行AI_SEMANTIC_LAYER_METRICS_ARCHITECTURE_PLAYBOOK.md

AI Semantic Layer / Metrics Architecture Playbook

适用对象: AI PM、AI Architect、Data Product Manager、EvalOps Lead、AI Value Office Lead、金融零售数据平台负责人。 核心问题: 当 AI 助手、自然语言分析、EvalOps、业务仪表盘和高管投资组合治理都依赖“指标”时，如何保证每个指标有一致口径、可追溯血缘、明确 owner、质量 SLO、发布门禁和风险控制？ 学习目标: 把 metric definition 从“报表字段说明”升级为“产品契约”，设计可被 BI、LLM、AI agent、eval harness 和治理流程共同消费的 semantic layer / metrics architecture。 使用边界: 本手册不讲 BA 基础需求分析，不做工具教程；重点是架构决策、产品治理、指标风险、平台选型和可放入作品集的证据包。

Source Anchors

这些来源作为架构和治理锚点，不构成法律、监管或供应商选型意见。表中只保留当前可访问的官方入口，便于作品集长期引用。

Source	Official / primary source	本手册使用方式
dbt Semantic Layer	https://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl	用作“指标语义层把 metric 作为统一消费接口”的产品和平台锚点。
dbt MetricFlow	https://docs.getdbt.com/docs/build/about-metricflow	用作 semantic model、metric、dimension、entity、SQL 生成与集中定义的设计锚点。
dbt Semantic Models	https://docs.getdbt.com/docs/build/semantic-models	用作 entity、dimension、measure、semantic graph 的建模锚点。
OpenLineage	https://openlineage.io/docs/	用作 dataset、job、run、facet 和跨工具 lineage event 的血缘锚点。
OpenMetadata Data Contracts	https://docs.open-metadata.org/v1.12.x/how-to-guides/data-contracts	用作 schema、semantics、data quality、SLA、execution history 的数据契约锚点。
W3C DCAT 3	https://www.w3.org/TR/vocab-dcat-3/	用作 dataset、data service、catalog、metadata interoperability 的目录标准锚点。
NIST AI RMF	https://www.nist.gov/itl/ai-risk-management-framework	用 Govern / Map / Measure / Manage 组织 AI 指标风险、eval、monitoring 和治理证据。

1. 定位：指标语义层是 AI 产品的事实控制面

传统指标治理常被放在 BI、数仓或数据治理团队内部。AI 时代这个边界不够了，因为同一套指标会被更多系统消费：

高管用 AI Value Office dashboard 判断哪些 AI use case 继续投、停止或规模化。
AI PM 用业务指标和 eval 指标证明 agent 是否真的改善流程。
LLM analytics assistant 用自然语言生成查询、解释趋势、回答“为什么下降”。
EvalOps 用 metric slice 判断模型在地区、渠道、客群、风险等级上的质量差异。
风控、审计和监管沟通需要看到指标定义、血缘、变更记录和 release gate。

一句话：

AI semantic layer is the governed contract between business meaning, data implementation, AI evaluation and executive decision-making.

如果指标没有契约化，AI 系统会把模糊口径放大：

常见症状	AI 放大后的风险	架构根因
同一个 KPI 在不同 dashboard 数字不一致	AI 助手给出互相矛盾的经营解释	没有统一 semantic model 和 metric registry
指标定义写在报表说明里	LLM 生成 SQL 时绕过口径	指标不是机器可执行 contract
指标血缘不清	事故后无法定位是源系统、转换、权限还是模型解释错误	没有 dataset/job/run 级 lineage
指标 owner 不清	口径变更没人批准，AI eval gate 失效	缺少 RACI 和 approval workflow
只看模型分数	AI 系统通过 eval 但业务价值没有改善	eval metric 没有连接 workflow metric 和 business metric
自然语言分析直接查 warehouse	权限绕过、错误 join、聚合泄露、幻觉 SQL	缺少 semantic API 和 LLM SQL guardrails

2. 高阶框架：Metric Contract Stack

一个可用于 AI 产品和治理的指标架构，不只是 semantic layer 工具，而是九层契约栈。

Layer	核心对象	关键问题	产物
1. Business outcome	业务结果	这个指标服务什么决策、流程或风险控制？	Outcome map、decision memo
2. Metric contract	指标定义	公式、粒度、时间窗口、排除项、source-of-truth 是否明确？	Metric Contract
3. Semantic model	语义模型	entity、dimension、measure、metric、join path 是否稳定？	Semantic Model Spec
4. Data contract	输入数据契约	schema、semantics、quality、freshness、SLA 是否可执行？	Data Contract
5. Lineage	血缘	从 source record 到 metric output 的影响范围能否追踪？	Lineage Map
6. Consumption policy	消费策略	BI、AI、LLM SQL、eval、export、API 的允许用途是什么？	Consumption Policy
7. Eval linkage	评测连接	业务指标、workflow metric、eval metric、feature metric 如何相互解释？	Eval-to-Business Matrix
8. Release gates	发布门禁	哪些指标变化会阻止模型、dashboard 或自然语言分析上线？	Release Gate Memo
9. Observability	运营观测	指标 freshness、accuracy、drift、usage、incident 是否持续监控？	Metric Quality Dashboard

2.1 决策原则

原则	含义	反例
Metric is a product contract	指标定义有 owner、版本、SLO、变更流程和消费者清单	指标只是一段 SQL，被复制到多个 dashboard
Semantic model before natural language	LLM 只能消费已批准语义对象，而不是直接解释裸表	让 LLM 猜 `customer_id`、`client_id`、`party_id` 的差异
Lineage before blame	指标异常先定位血缘和变更，再判断模型或业务原因	看到 dashboard 波动就让数据团队临时查 SQL
Eval metric must map to business metric	离线质量分必须解释业务流程改善或风险降低	judge 分数提高但投诉率、AHT、case defect 没变
Drift is a governance event	指标口径、数据分布、用户行为、政策变化都可能漂移	只把 drift 当成模型监控问题
AI consumption needs policy	每个指标要声明是否可被 LLM 解释、生成 SQL、导出或用于训练/eval	所有 metric 默认进入聊天助手上下文

3. 核心对象：不要把 Metric、Measure、Feature、Eval 混在一起

AI 产品里经常出现“指标”一词覆盖过宽。面试和架构评审中要把对象边界说清楚。

对象	定义	示例	Owner	主要消费者
Entity	业务实体或 join anchor	customer、account、case、complaint、loan_application	Domain data owner	semantic layer、BI、AI retrieval
Dimension	切片维度	product_line、region、risk_tier、channel、complaint_category	Data steward	BI、eval slicing、AI explanation
Measure	可聚合基础量	transaction_amount、case_count、handle_minutes	Data engineering / analytics	metrics、semantic model
Metric	有业务口径的指标	AML case cycle time、AI adoption rate、complaint escalation defect rate	Business metric owner	dashboard、AI assistant、Value Office
Feature	模型输入信号	customer_txn_30d_count、income_verification_confidence	ML / risk analytics owner	model、routing、risk scoring
Eval metric	AI 行为质量指标	citation correctness、unsupported claim rate、policy adherence	EvalOps / AI PM	release gate、monitoring
Business metric	业务结果指标	cost per case、first contact resolution、loss prevented	Business owner / finance	portfolio decision、funding gate
Guardrail metric	安全/合规阈值	PII leakage = 0、unauthorized decision = 0	Risk / compliance	release gate、incident

3.1 关键分离

容易混淆	应该如何分离	决策意义
Model accuracy vs business value	accuracy 是 AI 行为质量；value 要看流程指标和财务认可	防止“模型更好但业务没变”
Feature freshness vs metric freshness	feature freshness 影响模型输入；metric freshness 影响经营决策	两者触发不同降级动作
Metric definition vs SQL implementation	定义是产品契约；SQL 是实现	变更审批看定义，回归测试看实现
Dashboard count vs official KPI	dashboard 可探索；official KPI 需要 contract、owner、audit	防止探索指标被 AI 当成正式事实
Eval slice vs business segment	eval slice 为发现模型失效；business segment 为经营决策	两者命名可相似，但审批和解释不同

4. 参考架构：AI Metrics Control Plane

flowchart TB
  subgraph Sources[Source Systems]
    CORE[Core banking / ledger]
    CRM[CRM / customer profile]
    AML[AML case system]
    COMPLAINT[Complaint / contact center]
    CREDIT[Credit underwriting]
  end

  subgraph Contracts[Data Contracts and Catalog]
    DC[Data contracts]
    CAT[Catalog / DCAT metadata]
    LIN[OpenLineage events]
  end

  subgraph DataLayer[Curated Data Layer]
    CUR[Curated marts]
    DQ[Quality checks]
    PII[PII / access policy]
  end

  subgraph Semantic[Semantic Layer]
    SM[Semantic models]
    MR[Metric registry]
    GL[Metric glossary]
    API[Metrics API / governed SQL]
  end

  subgraph AI[AI Consumption]
    NLA[Natural-language analytics]
    EVAL[Eval harness]
    AGENT[AI assistants / agents]
    VALUE[AI Value Office dashboard]
  end

  subgraph Ops[Governance and Observability]
    RACI[Ownership / RACI]
    GATE[Approval and release gates]
    OBS[Metric quality observability]
    INC[Incident loop]
  end

  Sources --> DC
  Sources --> LIN
  DC --> CUR
  CAT --> SM
  LIN --> SM
  CUR --> DQ
  DQ --> SM
  PII --> API
  SM --> MR
  MR --> GL
  MR --> API
  API --> NLA
  API --> EVAL
  API --> AGENT
  API --> VALUE
  RACI --> GATE
  GATE --> MR
  OBS --> INC
  OBS --> GATE
  LIN --> OBS

4.1 平台能力清单

Capability	必须提供什么	不成熟信号
Metric registry	指标定义、owner、version、status、消费者、变更历史	指标散落在 Confluence、SQL、dashboard 注释里
Semantic API	按指标、维度、时间、过滤条件查询，不暴露裸表	LLM 直接生成任意 warehouse SQL
Data contract enforcement	schema、semantic、quality、freshness、SLA 可测试	上游字段改名后 dashboard 和 eval 才发现
Lineage capture	source -> job -> dataset -> semantic model -> metric -> consumer	只能追到表，追不到指标和 AI 输出
Glossary	人类可读 + 机器可检索的业务口径	客服、风险、财务各自解释同一指标
Approval workflow	draft、review、release、deprecate、incident 全流程	新指标上线靠 Slack 同意
Metric observability	freshness、completeness、reconciliation、drift、usage、cost	只监控 pipeline 成功失败
AI guardrails	NL analytics、LLM SQL、RAG citation、eval gate 的指标消费策略	AI 可以回答未批准指标或敏感切片

5. Metric Definition as Product Contract

指标契约不是字段字典。它要回答“谁可以基于这个数字做什么决策，以及数字错了谁负责”。

5.1 最小 Metric Contract

Contract field	必填内容	示例：AML Evidence Completeness Rate
Metric name	稳定名称和 namespace	`aml.evidence_completeness_rate`
Business decision	这个指标支持的决策	判断 AML Copilot 是否可扩到更多 analyst team
Definition	业务定义	已完成 case 中，关键 evidence checklist 全部满足的 case 占比
Formula	可执行公式	`complete_cases / reviewed_cases`
Numerator	分子口径	`reviewed_cases` 中所有 mandatory evidence item 均有 source record 且 QA 通过的 case
Denominator	分母口径	在统计期内完成 analyst review 且进入 QA sample 的 case
Grain	统计粒度	case_id + review_date
Time window	时间口径	按 review completion date，周/月滚动
Exclusions	排除项	legally restricted cases、migrated legacy cases、test cases
Dimensions	允许切片	jurisdiction、risk_tier、typology、analyst_team、AI_assisted_flag
Source-of-truth	权威来源	AML case system + evidence checklist service
Data contract	输入契约	case_id 非空 100%，checklist version 必须匹配 review_date
Quality SLO	质量目标	freshness P95 < 2 hours，source linkage accuracy >= 99%
AI consumption	AI 可如何使用	可用于 dashboard、eval report、Value Office summary；不可用于自动关闭 case
Release gate	阈值	高风险 jurisdiction evidence completeness < 95% 时 AI 扩容 gate fail
Owner	责任	AML operations accountable；financial crime data owner responsible
Change policy	变更规则	mandatory item 变更为 major version，需要 risk approval

5.2 指标契约比普通 KPI 定义多出的内容

增量能力	为什么 AI 场景必须要	证据
Machine-readable formula	LLM 和 eval harness 要稳定查询，不靠自然语言猜	semantic model / MetricFlow spec
Consumption policy	不同 AI 场景风险不同	allowed AI use、prohibited use
Lineage	出错后定位 blast radius	OpenLineage events、consumer map
Release gate	指标失效要阻止上线或扩容	gate memo、quality dashboard
Drift policy	口径和分布会变化	drift signal、review cadence
RACI	指标不是数据团队单独负责	ownership table、approval history
Incident playbook	指标错误会影响 AI 决策	severity、rollback、communication

6. Semantic Model Design Patterns

6.1 语义模型核心元素

Element	设计重点	金融零售示例
Entity	确定 join anchor 和唯一性	`customer`、`account`、`transaction`、`case`、`complaint`、`loan_application`
Measure	保持简单、可聚合、低语义争议	`case_count`、`handle_minutes`、`dispute_amount`
Dimension	先治理高价值切片	`risk_tier`、`channel`、`product_line`、`jurisdiction`
Metric	用业务契约定义复杂口径	`first_contact_resolution_rate`、`aml_false_positive_reduction`
Time spine	统一时间窗口和日历	fiscal month、business day、regulatory reporting period
Join path	控制多对多和重复计数	customer-account-transaction、complaint-contact-case
Access policy	指标、维度、行级、字段级策略	高净值客户、敏感地区、未成年人数据

6.2 语义层架构选型

方案	适用场景	优势	风险	决策建议
Central semantic layer	企业统一 KPI、AI Value Office、监管指标	一致性强，AI 消费简单	中央团队成为瓶颈	用于 official metrics 和高风险 AI 场景
Domain-owned semantic models	多业务域快速迭代	贴近业务 owner，领域语义准确	命名和维度可能分裂	用 federated governance 和共享 glossary 约束
BI-tool semantic model	主要消费在单一 BI 工具	落地快	AI、eval、API 消费弱	不适合作为企业 AI 唯一语义层
Custom metric API	高度定制、复杂权限、低延迟	可控性强	维护成本高，重复造轮子	只在监管、实时风控、特殊 SLA 场景使用
Hybrid	企业常态	official metrics 集中，domain metrics 分布	需要强治理	推荐：central registry + domain semantic model + shared contracts

6.3 设计决策表

决策	推荐默认	例外
指标命名	`domain.metric_name`	跨域指标用 `enterprise.metric_name`
时间口径	每个 metric contract 显式声明 event date、posted date、review date	监管报送使用 regulatory reporting date
状态过滤	只把业务终态纳入 official metric	运营过程监控可使用 in-progress metrics
维度授权	低风险维度默认可查，高敏维度需要 policy	customer-level、protected attribute、small cell size
实验指标	`experimental` status，不进入 AI answer 默认上下文	经 owner 批准可进入 pilot eval
Deprecation	至少一个 reporting cycle 并提供 replacement	错误指标可立即撤回并发布 incident note

7. Metric Lineage：从源记录到 AI 输出的证据链

指标血缘不能只停留在表级。AI 场景需要能回答：

哪些 source records 影响了这个指标？
哪个 data contract、pipeline job、semantic model version 和 metric formula 参与计算？
哪些 dashboard、agent、eval report、自然语言回答使用了这个指标？
口径变更会影响哪些 release gates、portfolio decisions 和监管报告？

7.1 Lineage 粒度

粒度	记录什么	用途
Source record	原始业务记录、更新时间、系统版本	事实争议、审计抽样
Dataset	表、视图、文件、topic、index	数据质量、影响分析
Job / run	转换任务、运行实例、参数、状态	pipeline incident、回放
Semantic model	entity、dimension、measure、join path	口径一致性、join 风险
Metric	formula、version、approval status	KPI 治理、AI answer grounding
Consumer	dashboard、API、LLM assistant、eval report	blast radius、停用通知
Decision	release gate、funding memo、risk acceptance	高管和审计证据

7.2 Metric Lineage Map 示例

Lineage node	示例	关键 metadata
Source system	AML case management	system owner、SLA、access class
Source table	`aml_cases`	schema version、contract version
Pipeline job	`curate_aml_case_daily`	run_id、code_version、input/output datasets
Curated dataset	`mart_aml_case_review`	freshness、completeness、row count anomaly
Semantic model	`semantic_aml_case`	entities: case, customer; dimensions: typology, risk_tier
Metric	`aml.case_cycle_time_p50`	formula、grain、filters、version
Eval report	AML Copilot release gate	threshold、failed slices、owner
Dashboard	AI Value Office dashboard	refresh cadence、executive owner
AI answer	Monthly portfolio summary	metric citations、query id、prompt version

7.3 Impact Analysis Gate

任何 metric major version 变更都要跑影响分析：

检查项	通过标准	Fail 动作
Consumer inventory	所有消费者已识别	阻止变更发布
Backtest delta	新旧口径差异按主要维度解释清楚	标记为 breaking change
Eval dependency	依赖该指标的 release gate 已更新	冻结相关模型 release
Executive decision dependency	最近一次 funding / scale memo 已标记口径变更	重新签署价值判断
Regulatory dependency	监管或审计指标不受影响或已批准	进入 risk review

8. Feature / Eval / Business Metric Relationship

AI PM 和 Architect 最容易犯的错误是把 eval 指标当业务价值证明。正确做法是建立 metric ladder。

Business outcome
-> Workflow metric
-> AI behavior metric
-> Data / feature metric
-> Platform metric
-> Guardrail metric

8.1 Metric Ladder 示例

场景	Business outcome	Workflow metric	AI eval metric	Data / feature metric	Platform metric	Guardrail
AML Copilot	每个高质量 case 的处理成本下降	case review cycle time、QA defect rate	evidence recall、unsupported claim rate、typology coverage	transaction freshness、alert-case linkage accuracy	retrieval latency、judge cost	SAR final decision by AI = 0
客服投诉 AI	合规投诉率下降，首次解决率提升	first contact resolution、complaint reopen rate	policy citation correctness、tone compliance、escalation correctness	approved policy freshness、customer context completeness	p95 latency、fallback rate	customer harm / false promise = 0
信贷助手	memo 一致性提升，人工审批效率提升	analyst memo turnaround、exception review time	policy adherence、reason code consistency、decision boundary compliance	income verification completeness、policy version match	route-to-human rate、cost per memo	automated credit decision = 0
AI Value Office	资金投向高价值低失控风险 use case	scale/stop decision cycle time、benefit sign-off rate	release evidence completeness、risk gate pass rate	metric contract coverage、baseline quality	dashboard freshness、metric API uptime	unapproved KPI in exec summary = 0

8.2 解释链

如果看到	不应直接推断	应检查
eval 分数提升	业务价值必然提升	adoption、workflow metric、business baseline、样本覆盖
AHT 下降	客户体验改善	reopen、complaint、wrong answer、escalation defect
false positive 下降	风险更低	missed suspicious patterns、QA defect、regulatory feedback
adoption 上升	产品成功	override rate、quality trend、强制使用政策
cost per request 下降	单位经济性改善	successful task rate、fallback、human review cost

9. Metric Drift：指标漂移是治理事件

AI 指标漂移不是一个单独的 ML monitoring 问题。它可能来自业务、数据、流程、政策、模型、用户行为或指标定义本身。

Drift type	说明	信号	处理动作
Definition drift	指标定义或排除规则改变	contract version delta、backtest gap	major version、owner approval、consumer notice
Data drift	输入数据分布改变	null rate、range、category distribution、row count anomaly	data incident、pipeline check、source owner review
Source drift	上游系统字段或状态语义改变	data contract breach、schema diff	block release、update contract、run backfill
Process drift	人工流程或业务规则改变	handling time step change、new status path	workflow review、metric contract update
Population drift	客群、渠道、产品 mix 变化	segment share shift	slice-level eval、business explanation
Policy drift	政策、监管、产品条款变更	effective date mismatch、policy version conflict	knowledge and metric re-index、release gate rerun
Model behavior drift	AI 输出行为改变	judge score trend、override reason shift	eval refresh、prompt/model rollback
User behavior drift	用户开始绕过或过度依赖 AI	adoption spike、manual review drop、override decline	HITL audit、training、UI control

9.1 Metric Drift Triage

Triage question	判断逻辑	责任人
数字是否真的变了？	先看 pipeline、contract、freshness、reconciliation	Data owner
口径是否变了？	查 metric version、semantic model、filters、dimensions	Metric owner
业务是否变了？	查政策、流程、客群、渠道、活动	Business owner
AI 是否导致变化？	查 AI-assisted flag、adoption、override、eval trend	AI PM / EvalOps
变化是否影响决策？	查 release gate、funding memo、risk threshold	AI Value Office / Risk

10. Natural-Language Analytics 风险

自然语言分析不是“更友好的 BI”。在金融零售场景，它是一个会生成查询、解释指标并影响决策的 AI 产品。

10.1 主要风险

Risk	典型表现	后果
Metric ambiguity	“活跃用户”被解释成 login、transaction、AI usage 三种口径	高管决策基于错误 KPI
Unapproved metric	LLM 根据裸表临时拼出指标	非正式指标被当成企业事实
Wrong grain	customer-level 和 account-level 混算	重复计数或遗漏
Time misinterpretation	posted date、event date、review date 混淆	趋势解释错误
Join hallucination	LLM 猜测 join path	数字严重偏差
Row-level security bypass	生成 SQL 绕过 entitlement	数据泄露
Small-cell disclosure	切片太细暴露敏感群体	隐私和合规风险
Prompt injection	用户或文档诱导模型忽略政策	越权查询或错误解释
Cost explosion	宽表全扫描、无限维度 group by	平台成本失控
False causality	把相关性解释成因果	错误业务行动

10.2 LLM-Generated SQL Guardrails

Guardrail	设计要求	Release evidence
Semantic-only query	LLM 只能调用 metric API 或 approved semantic objects	raw table access disabled for assistant
Intent disambiguation	指标名、时间、维度、过滤条件不明确时必须追问	ambiguity eval pass >= 95%
SQL AST validation	解析生成 SQL，限制 SELECT、approved functions、approved joins	AST policy test suite
Row/column policy	查询前后都执行 RLS/CLS 和 small-cell suppression	entitlement test 100% pass
Query plan check	限制扫描量、join 数、时间范围和 group cardinality	cost guardrail dashboard
Metric citation	回答必须引用 metric contract version 和 query id	citation coverage >= 98%
Result sanity check	和历史范围、总量、reconciliation metric 对比	anomaly block / warning
Explanation policy	区分“数据观察”“可能原因”“已验证原因”	causality language rubric
Saved answer review	高管摘要、监管敏感解释进入人工复核	executive summary review log
Audit trace	保存 prompt、semantic objects、query、result hash、response	trace retention policy

10.3 Natural-Language Analytics Release Gate

Gate	阈值	Fail 处理
Unauthorized metric usage	0	阻止发布，收紧 metric retrieval scope
RLS / CLS breach	0	停用 assistant，启动 security incident
Ambiguous question answered without clarification	<= 2%	调整 intent classifier 和 prompt
Wrong metric definition citation	0 high-risk cases	修复 glossary retrieval 和 citation
Unsafe causality claim	0 executive / regulatory summaries	增加 causal language guardrail
Cost per successful query	在预算阈值内	优化 query plan 或限制查询范围
User trust defect rate	按 QA 抽样低于 agreed threshold	缩小 beta 范围，补充 training

11. Metrics Governance Operating Model

11.1 RACI

R = Responsible，A = Accountable，C = Consulted，I = Informed。

Activity	Business Metric Owner	Data Owner	Semantic Layer Steward	AI PM	Data Product Manager	EvalOps	Architect	Risk / Compliance	Finance
Define business decision	A/R	C	I	R	C	C	C	C	C
Approve metric contract	A	C	R	R	C	C	C	C	C
Define semantic model	C	C	A/R	C	R	I	R	I	I
Approve source-of-truth	C	A/R	C	I	R	I	C	C	I
Enforce data contract	C	A/R	C	I	R	I	R	C	I
Define eval linkage	C	C	C	A/R	C	R	C	C	I
Approve AI consumption policy	C	C	R	A/R	C	C	R	A/R	I
Set release gates	A	C	C	R	C	A/R	C	A/R	C
Monitor metric quality	C	R	A/R	C	R	C	C	I	I
Run metric incident review	A	R	R	R	C	C	C	C	I
Approve value reporting	C	C	I	R	C	C	I	I	A/R

11.2 Metric Approval Workflow

Stage	入口条件	核心检查	出口证据
Draft	有明确业务决策和 owner	是否已有相似指标，是否需要 official status	draft contract
Design review	公式、粒度、维度、时间口径完成	source-of-truth、semantic model、access policy	review notes
Data contract check	输入数据源确认	schema、semantics、freshness、quality、SLA	data contract version
Backtest	历史数据可计算	与现有指标 reconciliation、异常解释	backtest report
AI consumption review	需要被 AI / NL analytics / eval 使用	allowed use、prohibited use、guardrails	consumption policy
Release gate	质量和风险达标	SLO、lineage、RACI、monitoring	release memo
Operate	指标进入 production	freshness、drift、usage、incident	quality dashboard
Deprecate	有替代指标或风险撤回	consumer impact、migration、communication	deprecation record

11.3 Release Gate 分级

Metric tier	示例	Gate 强度
T0 Critical	监管报送、信贷决策边界、客户资金、AI risk gate	risk/compliance approval、zero critical defect、lineage complete
T1 Executive	AI Value Office ROI、board dashboard、portfolio funding KPI	finance sign-off、backtest、glossary、owner approval
T2 Operational	AML case productivity、contact center efficiency、model operations	owner approval、quality SLO、monitoring
T3 Exploratory	analyst ad hoc analysis、pilot hypothesis	标记 experimental，不允许作为 executive answer

12. Observability for Metric Quality

指标观测要覆盖“数字是否准”“解释是否准”“AI 是否正确消费”。

Signal	定义	触发阈值示例	行动
Freshness	metric 距 source-of-truth 的延迟	p95 > contract target	显示 stale warning，阻止高风险回答
Completeness	关键字段和实体覆盖	critical field completeness < 99%	暂停 official metric refresh
Accuracy / reconciliation	与权威系统或控制总账对账	差异 > 0.5% 或金额差异超阈值	incident review
Semantic consistency	同一指标跨工具结果一致	BI、API、SQL diff > tolerance	停用 downstream cache
Lineage completeness	source、job、dataset、metric、consumer 是否完整	lineage missing for T0/T1 metric	release gate fail
Dimension cardinality anomaly	维度取值异常变化	新类别或占比突变	source/process review
Metric drift	定义、分布、流程或政策漂移	slice-level trend break	drift triage
AI query defect	LLM 使用错误指标或错误维度	high-risk defect > 0	assistant rollback
Usage anomaly	消费者、查询量、成本异常	executive metric sudden spike	audit trace review
Decision dependency	哪些决策依赖该指标	T0/T1 incident	decision owner notification

12.1 Metric Quality Dashboard 结构

Panel	指标	解释
Contract health	contract coverage、owner coverage、approved status	指标治理基础是否完整
Data health	freshness、null rate、schema breach、SLA breach	输入数据是否满足契约
Semantic health	duplicate metrics、join path errors、dimension policy violations	语义模型是否可靠
Lineage health	lineage completeness、consumer count、impact map	影响范围是否可解释
AI consumption health	NL query pass rate、metric citation rate、guardrail blocks	AI 是否安全消费指标
Eval linkage	eval metric to business metric coverage、release gate failures	评测是否连接价值和风险
Drift	definition drift、distribution drift、policy drift	是否需要复核口径或流程
Incident	open incidents、MTTR、repeat incident count	运营能力是否成熟

13. 金融零售案例库

13.1 AI Value Office Dashboard

维度	设计
决策问题	哪些 AI use case 继续投、停止、扩容或平台化？
核心指标	funded use cases、release gate pass rate、benefit sign-off rate、adoption by target role、cost per successful task、critical incident count、platform reuse rate
Source-of-truth	portfolio backlog、finance benefit register、AI observability traces、risk gate records、platform billing
Metric contract 风险	“收益”只用估算小时数，没有 finance sign-off；“adoption”只看登录，不看 workflow completion
Semantic layer 要点	use_case、business_unit、risk_tier、stage、platform_capability、benefit_type 作为 conformed dimensions
Eval 连接	release gate pass rate 必须连接 eval report completeness、critical failure count 和 incident history
Release gate	T1 executive metrics 需要 finance / risk / business owner 联合批准
Portfolio artifact	AI portfolio metric contract pack + executive dashboard lineage map

示例指标：

Metric	Definition	Gate
`ai_value.benefit_signed_off_rate`	已由 finance delegate 签署收益证据的 use case / 已上线 use case	< 80% 时不得宣称 portfolio ROI
`ai_value.cost_per_successful_task`	AI 总运行成本 / 通过质量标准并被用户采纳的任务数	连续两期上升需 route/cost review
`ai_value.platform_reuse_rate`	复用标准 gateway、RAG、eval、observability 的 use case / active use cases	< 60% 时检查平台 product-market fit

13.2 AML Efficiency Metrics

维度	设计
决策问题	AML Copilot 是否降低 analyst 工作量，同时不降低风险识别质量？
核心指标	case review cycle time、evidence completeness rate、false positive reduction、QA defect rate、missed red flag rate、unsupported narrative claim rate
Source-of-truth	AML alerts、case management、transaction monitoring、QA review、SAR narrative workflow
Metric contract 风险	case complexity 没有切片，导致 AI 看似降低平均时长但只处理简单 case
Semantic layer 要点	typology、risk_tier、jurisdiction、case_complexity、AI_assisted_flag 必须是受控维度
Eval 连接	evidence recall、citation correctness、narrative factuality 映射到 QA defect 和 review cycle time
Guardrail	AI final SAR decision = 0；unsupported high-risk claim = 0
Portfolio artifact	AML metric ladder + evidence completeness contract + release gate memo

示例指标矩阵：

Metric	Definition	Required slices	Bad interpretation
`aml.case_review_cycle_time_p50`	case assigned 到 analyst review complete 的中位时间	risk_tier、typology、AI_assisted_flag	不按复杂度切片就宣称效率提升
`aml.evidence_completeness_rate`	mandatory evidence 全部满足的 QA sample case 占比	jurisdiction、typology、analyst_team	只看有附件，不看证据是否支持结论
`aml.unsupported_narrative_claim_rate`	AI draft 中无 source record 支持的关键 claim 占比	risk_tier、model_version、prompt_version	把语言流畅度当 narrative 质量

13.3 Customer-Facing AI Complaint Metrics

维度	设计
决策问题	面向客户的 AI 是否减少投诉并提升解决率，同时不制造错误承诺或监管风险？
核心指标	complaint rate per AI interaction、first contact resolution、reopen rate、escalation correctness、false promise rate、policy citation correctness、vulnerable customer escalation rate
Source-of-truth	contact center platform、complaint management、approved policy repository、customer interaction logs、QA reviews
Metric contract 风险	complaint 被不同渠道记录，web chat、call center、branch complaint 口径不一致
Semantic layer 要点	complaint_category、channel、product_line、customer_vulnerability_flag、AI_touchpoint、resolution_status
Eval 连接	tone compliance、policy adherence、escalation correctness 映射到 complaint reopen 和 QA defect
Guardrail	false promise = 0；internal-only policy exposure = 0；vulnerable customer escalation miss = 0
Portfolio artifact	Customer-facing AI complaint metric glossary + NL analytics guardrail report

示例指标：

Metric	Definition	Release use
`cx.ai_complaint_rate`	AI interaction 后 7 天内产生可归因投诉的 interaction 占比	上线后按周监控，超过 control group 阈值自动扩容冻结
`cx.escalation_correctness_rate`	应升级人工的场景中，AI 正确升级的比例	高风险 release gate
`cx.false_promise_rate`	AI 对费用减免、授信、退款、时限作出无授权承诺的比例	T0 guardrail，必须为 0

13.4 Credit Assistant Eval Metrics

维度	设计
决策问题	信贷助手是否提升 memo 质量和 analyst 效率，而不越过人工审批和模型风险边界？
核心指标	memo turnaround time、policy checklist completeness、reason code consistency、unsupported credit rationale rate、human override quality、decision boundary violation
Source-of-truth	loan application system、credit bureau attributes、income verification、credit policy repository、underwriting decision records
Metric contract 风险	历史审批结果被当成“正确答案”，固化旧偏差；outcome window 未成熟导致 eval 虚高
Semantic layer 要点	loan_product、risk_band、channel、policy_version、exception_type、protected_attribute_review_slice
Eval 连接	policy adherence、reason code consistency、factual grounding 与 memo QA defect、exception review time 连接
Guardrail	AI automated approval / decline = 0；prohibited feature usage = 0
Portfolio artifact	Credit assistant eval-to-business matrix + model risk metric pack

示例指标：

Metric	Definition	Review note
`credit.policy_adherence_score`	AI memo 是否正确应用申请日期对应政策版本的专家评分	政策 effective date 必须进入 semantic model
`credit.decision_boundary_violation_count`	AI 输出中出现审批、拒绝、额度承诺等越界行为次数	T0 guardrail
`credit.reason_code_consistency_rate`	AI 草拟 reason code 与人工最终 memo / policy rule 一致比例	不能等同于贷款表现预测准确率

14. Templates / Portfolio Artifacts

下面模板使用具体示例值，目的是形成作品集证据，不是空白表单。

14.1 Metric Contract Artifact

# Metric Contract: aml.evidence_completeness_rate

## Identity
- Domain: Financial Crime / AML
- Metric tier: T1 Operational Risk
- Status: production
- Version: 2.1.0
- Accountable owner: AML Operations Director
- Responsible owner: Financial Crime Data Product Manager
- Review cadence: monthly, and immediately after AML policy change

## Business Decision
- Used to decide whether AML Copilot can expand from pilot analyst team to all Level 1 alert review teams.
- Used in AI release gate with `aml.unsupported_narrative_claim_rate` and `aml.case_review_cycle_time_p50`.

## Definition
- A reviewed AML case is complete when every mandatory evidence item in the applicable checklist version has a linked source record, a valid timestamp, and QA pass status.
- Metric formula: complete reviewed cases / total reviewed cases in QA sample.
- Grain: case_id + review_completion_date.
- Time basis: review completion date in business timezone.

## Inclusion / Exclusion
- Include: production AML alert cases completed by Level 1 analyst workflow.
- Exclude: test cases, migrated legacy cases without checklist version, legally restricted cases unavailable to AI.

## Dimensions
- Allowed: jurisdiction, risk_tier, typology, analyst_team, AI_assisted_flag, checklist_version.
- Restricted: customer_id, account_id, beneficial_owner_id.

## Source and Lineage
- Source-of-truth: AML case management system and evidence checklist service.
- Input datasets: aml_cases, aml_evidence_items, aml_checklist_versions, qa_review_results.
- Lineage required: source dataset -> curation job -> semantic model -> metric -> release gate report.

## Quality SLO
- Freshness P95: under 2 hours.
- Mandatory field completeness: 99.5% or higher.
- Case-to-evidence linkage accuracy: 99% or higher in QA sample.
- Lineage completeness: 100% for production report.

## AI Consumption Policy
- Allowed: AI Value Office dashboard, AML Copilot release report, analyst productivity dashboard.
- Prohibited: automatic SAR filing, automatic case closure, customer-level natural-language exploration.

## Release Gate
- Expansion gate fails if high-risk jurisdiction slice drops below 95%.
- Critical gate fails if unsupported high-risk narrative claim count is greater than 0.

## Incident
- Severity 1: metric used in release decision with incorrect numerator or denominator.
- Immediate action: freeze Copilot expansion, notify AML owner, rerun backtest, publish correction note.

14.2 Metric Glossary Entry

Field	Example
Business term	First Contact Resolution
Approved definition	客户首次联系后 7 天内无需二次联系且 case resolution status 为 resolved 的 interaction 占比。
Not the same as	Chatbot containment、agent handle time、same-day close rate
Official metric	`cx.first_contact_resolution_rate`
Time basis	initial interaction timestamp
Allowed dimensions	channel、product_line、issue_category、AI_touchpoint
Restricted dimensions	customer_id、protected vulnerability details
AI explanation rule	AI 可以解释趋势，但必须区分 correlation 与 verified root cause。
Owner	Customer Operations
Review trigger	新投诉分类、新渠道上线、监管定义变化

14.3 Data Contract for Metric Inputs

data_contract:
  asset: mart_aml_case_review
  version: 3.4.0
  owner: financial_crime_data_owner
  consumers:
    - metric: aml.evidence_completeness_rate
    - metric: aml.case_review_cycle_time_p50
    - eval_report: aml_copilot_release_gate
  schema:
    case_id:
      type: string
      required: true
      uniqueness: true
    review_completion_ts:
      type: timestamp
      required: true
    checklist_version:
      type: string
      required: true
    mandatory_evidence_count:
      type: integer
      required: true
      min: 1
    linked_evidence_pass_count:
      type: integer
      required: true
      min: 0
  semantics:
    review_completion_ts: analyst review completion, not QA completion
    checklist_version: version effective on review_completion_ts
  quality:
    freshness_p95_minutes: 120
    case_id_null_rate: 0
    checklist_version_null_rate: 0
    evidence_count_reconciliation_tolerance: 0.005
  governance:
    pii_class: no direct identifiers in this mart
    row_level_policy: analyst team and jurisdiction filters enforced upstream
    allowed_ai_use:
      - release_gate
      - portfolio_dashboard
      - operational_analytics
    prohibited_ai_use:
      - automatic_case_closure
      - customer_level_freeform_query
  breach_action:
    severity_1:
      - freeze AI expansion gates using this metric
      - notify AML Operations and AI PM
      - rerun affected reports after remediation

14.4 Eval-to-Business Metric Matrix

Use case	Eval metric	Workflow metric	Business metric	Interpretation rule
AML Copilot	evidence recall >= 95%	QA defect rate decreases	cost per complete case decreases	只有 adoption 稳定且 QA defect 不上升，cycle time 下降才算收益
Customer AI	policy citation correctness >= 95%	reopen rate decreases	complaint cost decreases	containment 上升但投诉上升时判定为质量失败
Credit Assistant	decision boundary violation = 0	memo turnaround decreases	analyst capacity increases	不能把审批速度提升解释成信贷风险降低
Value Office	release evidence completeness >= 98%	scale/stop cycle time decreases	capital allocation quality improves	ROI 没有 finance sign-off 不进入 executive benefit

14.5 Natural-Language Analytics Risk Assessment

Question class	Example	Allowed response	Guardrail
Approved KPI	“本月 AML case cycle time 为什么下降？”	使用 official metric，引用版本和主要切片	metric citation + causality caveat
Ambiguous KPI	“我们的 AI adoption 好吗？”	先追问 adoption 是 login、weekly active user、accepted suggestion 还是 workflow completion	clarification required
Restricted slice	“列出高净值客户投诉率最高的地区和客户”	拒绝 customer-level 输出，提供合规聚合解释	small-cell suppression + RLS
Exploratory analysis	“新 prompt 是否降低 unsupported claim？”	允许 pilot metric，并标记 experimental	beta label + no executive use
Causal claim	“投诉下降是不是因为 AI？”	说明需要 control group 或干预分析，提供相关性观察	causality language policy

14.6 Metric Release Gate Memo

Section	Example content
Release item	`cx.ai_complaint_rate` enters production for customer-facing AI weekly dashboard
Decision	Approved for limited release to Customer Operations and AI Value Office
Evidence	Metric contract v1.0 approved; 12-week backtest complete; complaint source reconciliation difference < 0.2%; RLS tests 100% pass
Risk	Complaint attribution window may over-credit AI when multiple channels touch same customer
Mitigation	Use `AI_touchpoint` and `last_touch_channel`; publish attribution caveat in glossary
Gate	false promise rate = 0; escalation correctness >= 98%; policy citation correctness >= 95%
Owner	Customer Operations accountable; CX Data Product responsible; Risk consulted
Next review	First monthly review after two reporting cycles

14.7 Portfolio Evidence Package

作品集可以展示一个完整 case，而不是只展示图表：

Artifact	证明什么能力
Metric glossary	能把业务语言变成可治理指标语言
Semantic model spec	能设计 entity、dimension、measure、metric 和 join path
Metric contract pack	能把指标定义成产品契约
Data contract	能管理上游变化、质量和 SLA
Lineage map	能解释指标从源系统到 AI 输出的证据链
Eval-to-business matrix	能把 AI 质量、流程指标和业务价值连接起来
NL analytics risk assessment	能识别 LLM-generated SQL 和自然语言解释风险
Release gate memo	能把指标质量纳入上线和扩容决策
Metric quality dashboard mock	能运营 freshness、drift、incident 和 consumer impact
Interview narrative	能用金融零售案例讲清架构、治理和产品取舍

15. 30-Day Lab：做一个作品集级 Metric Architecture Case

目标：30 天内完成一个“AI Semantic Layer / Metrics Architecture”作品集包。推荐选 AML Copilot、Customer-facing Complaint AI、Credit Assistant 或 AI Value Office Dashboard。

周期	训练重点	产出
Week 1	选择场景，定义业务决策、metric ladder、核心 official metrics	business decision map、Eval-to-Business Metric Matrix
Week 2	设计 semantic model、metric contracts、glossary、data contract	semantic model spec、3 个 metric contracts、glossary
Week 3	设计 lineage、NL analytics guardrails、release gates	lineage map、LLM SQL guardrail policy、release gate memo
Week 4	设计 observability、drift triage、RACI、portfolio narrative	metric quality dashboard、RACI、case study deck outline

30 天验收标准：

能力	可验证证据
指标契约能力	每个 official metric 有 owner、formula、grain、dimensions、quality SLO、AI consumption policy
语义建模能力	能解释 entity、dimension、measure、metric 和 join path 的取舍
血缘治理能力	能从 source system 追到 dashboard、eval report 和 AI answer
EvalOps 连接能力	每个 eval metric 能映射到 workflow metric 和 business metric
风险控制能力	对 NL analytics 和 LLM SQL 有明确 guardrails 与 release gate
运营能力	有 metric quality dashboard、drift triage 和 incident response

16. 面试表达

16.1 30 秒版本

AI 时代的 semantic layer 不是 BI 语义层升级版，而是指标事实控制面。它把 metric definition 做成产品契约：有 owner、公式、粒度、时间口径、维度、source-of-truth、data contract、lineage、质量 SLO、AI consumption policy 和 release gate。这样 BI、自然语言分析、AI assistant、eval harness 和 AI Value Office dashboard 才能消费同一套可信指标。

16.2 2 分钟版本

我会把指标架构分成三条主线。

第一条是语义主线：先确认业务决策和 official metric，再设计 semantic model，包括 entity、dimension、measure、metric、time spine、join path 和 glossary。指标不能只存在于 dashboard SQL，而要进入 metric registry，并且有版本和审批流程。

第二条是证据主线：每个指标要连接 data contract 和 lineage。上游数据要有 schema、semantics、freshness、quality 和 SLA；血缘要能从 source record、pipeline run、curated dataset、semantic model 追到 metric、dashboard、eval report 和 AI answer。这样指标异常时可以做影响分析，而不是靠人工猜。

第三条是 AI 消费主线：自然语言分析和 LLM-generated SQL 只能使用批准的 semantic objects，并经过 SQL AST validation、RLS/CLS、small-cell suppression、query plan check、metric citation 和 causality language policy。EvalOps 也必须把 AI 行为指标连接到 workflow metric 和 business metric，防止模型分数提高但业务没有改善。

在金融零售里，我会对 AML、投诉、信贷这类场景设置更强 release gates，比如 unsupported high-risk claim、false promise、automated credit decision 这类 guardrail 必须为 0。

16.3 高阶追问速答

问题	回答要点
Semantic layer 和 metric store 的差异是什么？	Semantic layer 负责业务语义、entity、dimension、measure、join path 和 metrics API；metric store 更偏指标定义和查询服务。企业 AI 场景通常需要 registry、glossary、data contract、lineage、policy 和 observability 一起组成 control plane。
为什么说 metric definition 是 product contract？	因为它承诺一个数字可以被谁用于什么决策，包含 owner、公式、粒度、时间、排除项、质量 SLO、变更流程、AI 使用边界和 incident response。
如何防止 LLM 写错 SQL？	不让 LLM 直接查裸表。让它通过 semantic API 查询 approved metrics；再加 intent disambiguation、AST allowlist、RLS/CLS、query plan limit、metric citation、result sanity check 和 audit trace。
Eval metric 和 business metric 怎么连？	用 metric ladder：business outcome -> workflow metric -> AI eval metric -> data/feature metric -> platform metric -> guardrail。比如 AML 的 evidence recall 要连接 QA defect、case cycle time 和 cost per complete case。
Metric drift 怎么处理？	先区分 definition、data、source、process、population、policy、model behavior、user behavior drift。每类 drift 有不同 owner 和动作，不能默认归因给模型。
build vs buy 怎么判断？	official metrics 和高风险 AI 需要稳定 semantic layer、data contracts、lineage 和 governance。优先复用成熟工具；只有在实时、复杂权限、监管或低延迟场景需要 custom metric API。
面向客户 AI 的投诉指标最重要是什么？	不只看 complaint volume，还要看 attribution window、first contact resolution、reopen、false promise、policy citation、escalation correctness 和 vulnerable customer handling。false promise 和内部政策泄露是 hard guardrail。
信贷助手为什么不能只看准确率？	信贷助手通常是 decision support，不应自动审批。要看 policy adherence、reason code consistency、factual grounding、decision boundary violation、protected attribute review slice 和人工 override 质量。

17. 常见误区

误区	为什么危险	更好的做法
把 semantic layer 当报表加速层	忽略 AI consumption、eval、risk、lineage	把它设计成 metrics control plane
指标定义只写自然语言	LLM 和 API 无法稳定执行	自然语言定义 + machine-readable formula + semantic model
让 LLM 直接查 warehouse	join、权限、口径和成本风险不可控	semantic-only query + guardrails
只看 dashboard 数字一致	AI 还需要引用 contract、解释口径和追踪血缘	建立 metric citation 和 trace
只治理 executive KPI	AI eval 和 workflow metrics 同样影响发布决策	对 T0/T1/T2 分级治理
Drift 只交给 ML team	指标漂移可能来自政策、流程、数据或用户行为	drift triage 分 owner
ROI 指标没有 finance sign-off	作品集或高管汇报可信度不足	建立 benefit signed-off metric
用历史信贷结果当唯一 eval truth	可能固化历史偏差和未成熟 outcome	分离 policy adherence、decision boundary 和 long-term outcome
客服 AI 只看 containment	可能牺牲客户体验和合规	同时看投诉、reopen、false promise、escalation
没有 deprecation 流程	旧指标继续被 AI 引用	replacement、consumer notice、incident withdrawal

18. 一页总结

AI Semantic Layer / Metrics Architecture 的核心，是把指标从“查询结果”升级为“可治理、可观测、可被 AI 安全消费的产品契约”。

从	到
Dashboard SQL	Metric contract + semantic model
字段说明	Metric glossary + machine-readable formula
表级血缘	Source -> job -> dataset -> semantic model -> metric -> AI answer
数据质量检查	Metric quality SLO + drift + incident
自然语言查数	Governed NL analytics with semantic API and guardrails
模型分数	Eval-to-business metric ladder
一次性上线	Approval workflow + release gate + observability
数据团队单点负责	Business owner、data owner、AI PM、EvalOps、Risk、Finance 的 RACI

作品集表达可以浓缩成一句话：

I design metrics as governed product contracts, so AI systems can reason over business performance without inventing definitions, bypassing controls or breaking trust.