目录
AI Semantic Layer / Metrics Architecture Playbook
适用对象 : AI PM、AI Architect、Data Product Manager、EvalOps Lead、AI Value Office Lead、金融零售数据平台负责人。
核心问题 : 当 AI 助手、自然语言分析、EvalOps、业务仪表盘和高管投资组合治理都依赖“指标”时,如何保证每个指标有一致口径、可追溯血缘、明确 owner、质量 SLO、发布门禁和风险控制?
学习目标 : 把 metric definition 从“报表字段说明”升级为“产品契约”,设计可被 BI、LLM、AI agent、eval harness 和治理流程共同消费的 semantic layer / metrics architecture。
使用边界 : 本手册不讲 BA 基础需求分析,不做工具教程;重点是架构决策、产品治理、指标风险、平台选型和可放入作品集的证据包。
Source Anchors
这些来源作为架构和治理锚点,不构成法律、监管或供应商选型意见。表中只保留当前可访问的官方入口,便于作品集长期引用。
1. 定位:指标语义层是 AI 产品的事实控制面
传统指标治理常被放在 BI、数仓或数据治理团队内部。AI 时代这个边界不够了,因为同一套指标会被更多系统消费:
高管用 AI Value Office dashboard 判断哪些 AI use case 继续投、停止或规模化。
AI PM 用业务指标和 eval 指标证明 agent 是否真的改善流程。
LLM analytics assistant 用自然语言生成查询、解释趋势、回答“为什么下降”。
EvalOps 用 metric slice 判断模型在地区、渠道、客群、风险等级上的质量差异。
风控、审计和监管沟通需要看到指标定义、血缘、变更记录和 release gate。
一句话:
AI semantic layer is the governed contract between business meaning, data implementation, AI evaluation and executive decision-making.
如果指标没有契约化,AI 系统会把模糊口径放大:
常见症状 AI 放大后的风险 架构根因 同一个 KPI 在不同 dashboard 数字不一致 AI 助手给出互相矛盾的经营解释 没有统一 semantic model 和 metric registry 指标定义写在报表说明里 LLM 生成 SQL 时绕过口径 指标不是机器可执行 contract 指标血缘不清 事故后无法定位是源系统、转换、权限还是模型解释错误 没有 dataset/job/run 级 lineage 指标 owner 不清 口径变更没人批准,AI eval gate 失效 缺少 RACI 和 approval workflow 只看模型分数 AI 系统通过 eval 但业务价值没有改善 eval metric 没有连接 workflow metric 和 business metric 自然语言分析直接查 warehouse 权限绕过、错误 join、聚合泄露、幻觉 SQL 缺少 semantic API 和 LLM SQL guardrails
2. 高阶框架:Metric Contract Stack
一个可用于 AI 产品和治理的指标架构,不只是 semantic layer 工具,而是九层契约栈。
Layer 核心对象 关键问题 产物 1. Business outcome 业务结果 这个指标服务什么决策、流程或风险控制? Outcome map、decision memo 2. Metric contract 指标定义 公式、粒度、时间窗口、排除项、source-of-truth 是否明确? Metric Contract 3. Semantic model 语义模型 entity、dimension、measure、metric、join path 是否稳定? Semantic Model Spec 4. Data contract 输入数据契约 schema、semantics、quality、freshness、SLA 是否可执行? Data Contract 5. Lineage 血缘 从 source record 到 metric output 的影响范围能否追踪? Lineage Map 6. Consumption policy 消费策略 BI、AI、LLM SQL、eval、export、API 的允许用途是什么? Consumption Policy 7. Eval linkage 评测连接 业务指标、workflow metric、eval metric、feature metric 如何相互解释? Eval-to-Business Matrix 8. Release gates 发布门禁 哪些指标变化会阻止模型、dashboard 或自然语言分析上线? Release Gate Memo 9. Observability 运营观测 指标 freshness、accuracy、drift、usage、incident 是否持续监控? Metric Quality Dashboard
2.1 决策原则
原则 含义 反例 Metric is a product contract 指标定义有 owner、版本、SLO、变更流程和消费者清单 指标只是一段 SQL,被复制到多个 dashboard Semantic model before natural language LLM 只能消费已批准语义对象,而不是直接解释裸表 让 LLM 猜 customer_id、client_id、party_id 的差异 Lineage before blame 指标异常先定位血缘和变更,再判断模型或业务原因 看到 dashboard 波动就让数据团队临时查 SQL Eval metric must map to business metric 离线质量分必须解释业务流程改善或风险降低 judge 分数提高但投诉率、AHT、case defect 没变 Drift is a governance event 指标口径、数据分布、用户行为、政策变化都可能漂移 只把 drift 当成模型监控问题 AI consumption needs policy 每个指标要声明是否可被 LLM 解释、生成 SQL、导出或用于训练/eval 所有 metric 默认进入聊天助手上下文
3. 核心对象:不要把 Metric、Measure、Feature、Eval 混在一起
AI 产品里经常出现“指标”一词覆盖过宽。面试和架构评审中要把对象边界说清楚。
对象 定义 示例 Owner 主要消费者 Entity 业务实体或 join anchor customer、account、case、complaint、loan_application Domain data owner semantic layer、BI、AI retrieval Dimension 切片维度 product_line、region、risk_tier、channel、complaint_category Data steward BI、eval slicing、AI explanation Measure 可聚合基础量 transaction_amount、case_count、handle_minutes Data engineering / analytics metrics、semantic model Metric 有业务口径的指标 AML case cycle time、AI adoption rate、complaint escalation defect rate Business metric owner dashboard、AI assistant、Value Office Feature 模型输入信号 customer_txn_30d_count、income_verification_confidence ML / risk analytics owner model、routing、risk scoring Eval metric AI 行为质量指标 citation correctness、unsupported claim rate、policy adherence EvalOps / AI PM release gate、monitoring Business metric 业务结果指标 cost per case、first contact resolution、loss prevented Business owner / finance portfolio decision、funding gate Guardrail metric 安全/合规阈值 PII leakage = 0、unauthorized decision = 0 Risk / compliance release gate、incident
3.1 关键分离
容易混淆 应该如何分离 决策意义 Model accuracy vs business value accuracy 是 AI 行为质量;value 要看流程指标和财务认可 防止“模型更好但业务没变” Feature freshness vs metric freshness feature freshness 影响模型输入;metric freshness 影响经营决策 两者触发不同降级动作 Metric definition vs SQL implementation 定义是产品契约;SQL 是实现 变更审批看定义,回归测试看实现 Dashboard count vs official KPI dashboard 可探索;official KPI 需要 contract、owner、audit 防止探索指标被 AI 当成正式事实 Eval slice vs business segment eval slice 为发现模型失效;business segment 为经营决策 两者命名可相似,但审批和解释不同
4. 参考架构:AI Metrics Control Plane
flowchart TB
subgraph Sources[Source Systems]
CORE[Core banking / ledger]
CRM[CRM / customer profile]
AML[AML case system]
COMPLAINT[Complaint / contact center]
CREDIT[Credit underwriting]
end
subgraph Contracts[Data Contracts and Catalog]
DC[Data contracts]
CAT[Catalog / DCAT metadata]
LIN[OpenLineage events]
end
subgraph DataLayer[Curated Data Layer]
CUR[Curated marts]
DQ[Quality checks]
PII[PII / access policy]
end
subgraph Semantic[Semantic Layer]
SM[Semantic models]
MR[Metric registry]
GL[Metric glossary]
API[Metrics API / governed SQL]
end
subgraph AI[AI Consumption]
NLA[Natural-language analytics]
EVAL[Eval harness]
AGENT[AI assistants / agents]
VALUE[AI Value Office dashboard]
end
subgraph Ops[Governance and Observability]
RACI[Ownership / RACI]
GATE[Approval and release gates]
OBS[Metric quality observability]
INC[Incident loop]
end
Sources --> DC
Sources --> LIN
DC --> CUR
CAT --> SM
LIN --> SM
CUR --> DQ
DQ --> SM
PII --> API
SM --> MR
MR --> GL
MR --> API
API --> NLA
API --> EVAL
API --> AGENT
API --> VALUE
RACI --> GATE
GATE --> MR
OBS --> INC
OBS --> GATE
LIN --> OBS
4.1 平台能力清单
Capability 必须提供什么 不成熟信号 Metric registry 指标定义、owner、version、status、消费者、变更历史 指标散落在 Confluence、SQL、dashboard 注释里 Semantic API 按指标、维度、时间、过滤条件查询,不暴露裸表 LLM 直接生成任意 warehouse SQL Data contract enforcement schema、semantic、quality、freshness、SLA 可测试 上游字段改名后 dashboard 和 eval 才发现 Lineage capture source -> job -> dataset -> semantic model -> metric -> consumer 只能追到表,追不到指标和 AI 输出 Glossary 人类可读 + 机器可检索的业务口径 客服、风险、财务各自解释同一指标 Approval workflow draft、review、release、deprecate、incident 全流程 新指标上线靠 Slack 同意 Metric observability freshness、completeness、reconciliation、drift、usage、cost 只监控 pipeline 成功失败 AI guardrails NL analytics、LLM SQL、RAG citation、eval gate 的指标消费策略 AI 可以回答未批准指标或敏感切片
5. Metric Definition as Product Contract
指标契约不是字段字典。它要回答“谁可以基于这个数字做什么决策,以及数字错了谁负责”。
5.1 最小 Metric Contract
Contract field 必填内容 示例:AML Evidence Completeness Rate Metric name 稳定名称和 namespace aml.evidence_completeness_rateBusiness decision 这个指标支持的决策 判断 AML Copilot 是否可扩到更多 analyst team Definition 业务定义 已完成 case 中,关键 evidence checklist 全部满足的 case 占比 Formula 可执行公式 complete_cases / reviewed_casesNumerator 分子口径 reviewed_cases 中所有 mandatory evidence item 均有 source record 且 QA 通过的 caseDenominator 分母口径 在统计期内完成 analyst review 且进入 QA sample 的 case Grain 统计粒度 case_id + review_date Time window 时间口径 按 review completion date,周/月滚动 Exclusions 排除项 legally restricted cases、migrated legacy cases、test cases Dimensions 允许切片 jurisdiction、risk_tier、typology、analyst_team、AI_assisted_flag Source-of-truth 权威来源 AML case system + evidence checklist service Data contract 输入契约 case_id 非空 100%,checklist version 必须匹配 review_date Quality SLO 质量目标 freshness P95 < 2 hours,source linkage accuracy >= 99% AI consumption AI 可如何使用 可用于 dashboard、eval report、Value Office summary;不可用于自动关闭 case Release gate 阈值 高风险 jurisdiction evidence completeness < 95% 时 AI 扩容 gate fail Owner 责任 AML operations accountable;financial crime data owner responsible Change policy 变更规则 mandatory item 变更为 major version,需要 risk approval
5.2 指标契约比普通 KPI 定义多出的内容
增量能力 为什么 AI 场景必须要 证据 Machine-readable formula LLM 和 eval harness 要稳定查询,不靠自然语言猜 semantic model / MetricFlow spec Consumption policy 不同 AI 场景风险不同 allowed AI use、prohibited use Lineage 出错后定位 blast radius OpenLineage events、consumer map Release gate 指标失效要阻止上线或扩容 gate memo、quality dashboard Drift policy 口径和分布会变化 drift signal、review cadence RACI 指标不是数据团队单独负责 ownership table、approval history Incident playbook 指标错误会影响 AI 决策 severity、rollback、communication
6. Semantic Model Design Patterns
6.1 语义模型核心元素
Element 设计重点 金融零售示例 Entity 确定 join anchor 和唯一性 customer、account、transaction、case、complaint、loan_applicationMeasure 保持简单、可聚合、低语义争议 case_count、handle_minutes、dispute_amountDimension 先治理高价值切片 risk_tier、channel、product_line、jurisdictionMetric 用业务契约定义复杂口径 first_contact_resolution_rate、aml_false_positive_reductionTime spine 统一时间窗口和日历 fiscal month、business day、regulatory reporting period Join path 控制多对多和重复计数 customer-account-transaction、complaint-contact-case Access policy 指标、维度、行级、字段级策略 高净值客户、敏感地区、未成年人数据
6.2 语义层架构选型
方案 适用场景 优势 风险 决策建议 Central semantic layer 企业统一 KPI、AI Value Office、监管指标 一致性强,AI 消费简单 中央团队成为瓶颈 用于 official metrics 和高风险 AI 场景 Domain-owned semantic models 多业务域快速迭代 贴近业务 owner,领域语义准确 命名和维度可能分裂 用 federated governance 和共享 glossary 约束 BI-tool semantic model 主要消费在单一 BI 工具 落地快 AI、eval、API 消费弱 不适合作为企业 AI 唯一语义层 Custom metric API 高度定制、复杂权限、低延迟 可控性强 维护成本高,重复造轮子 只在监管、实时风控、特殊 SLA 场景使用 Hybrid 企业常态 official metrics 集中,domain metrics 分布 需要强治理 推荐:central registry + domain semantic model + shared contracts
6.3 设计决策表
决策 推荐默认 例外 指标命名 domain.metric_name跨域指标用 enterprise.metric_name 时间口径 每个 metric contract 显式声明 event date、posted date、review date 监管报送使用 regulatory reporting date 状态过滤 只把业务终态纳入 official metric 运营过程监控可使用 in-progress metrics 维度授权 低风险维度默认可查,高敏维度需要 policy customer-level、protected attribute、small cell size 实验指标 experimental status,不进入 AI answer 默认上下文经 owner 批准可进入 pilot eval Deprecation 至少一个 reporting cycle 并提供 replacement 错误指标可立即撤回并发布 incident note
7. Metric Lineage:从源记录到 AI 输出的证据链
指标血缘不能只停留在表级。AI 场景需要能回答:
哪些 source records 影响了这个指标?
哪个 data contract、pipeline job、semantic model version 和 metric formula 参与计算?
哪些 dashboard、agent、eval report、自然语言回答使用了这个指标?
口径变更会影响哪些 release gates、portfolio decisions 和监管报告?
7.1 Lineage 粒度
粒度 记录什么 用途 Source record 原始业务记录、更新时间、系统版本 事实争议、审计抽样 Dataset 表、视图、文件、topic、index 数据质量、影响分析 Job / run 转换任务、运行实例、参数、状态 pipeline incident、回放 Semantic model entity、dimension、measure、join path 口径一致性、join 风险 Metric formula、version、approval status KPI 治理、AI answer grounding Consumer dashboard、API、LLM assistant、eval report blast radius、停用通知 Decision release gate、funding memo、risk acceptance 高管和审计证据
7.2 Metric Lineage Map 示例
Lineage node 示例 关键 metadata Source system AML case management system owner、SLA、access class Source table aml_casesschema version、contract version Pipeline job curate_aml_case_dailyrun_id、code_version、input/output datasets Curated dataset mart_aml_case_reviewfreshness、completeness、row count anomaly Semantic model semantic_aml_caseentities: case, customer; dimensions: typology, risk_tier Metric aml.case_cycle_time_p50formula、grain、filters、version Eval report AML Copilot release gate threshold、failed slices、owner Dashboard AI Value Office dashboard refresh cadence、executive owner AI answer Monthly portfolio summary metric citations、query id、prompt version
7.3 Impact Analysis Gate
任何 metric major version 变更都要跑影响分析:
检查项 通过标准 Fail 动作 Consumer inventory 所有消费者已识别 阻止变更发布 Backtest delta 新旧口径差异按主要维度解释清楚 标记为 breaking change Eval dependency 依赖该指标的 release gate 已更新 冻结相关模型 release Executive decision dependency 最近一次 funding / scale memo 已标记口径变更 重新签署价值判断 Regulatory dependency 监管或审计指标不受影响或已批准 进入 risk review
8. Feature / Eval / Business Metric Relationship
AI PM 和 Architect 最容易犯的错误是把 eval 指标当业务价值证明。正确做法是建立 metric ladder。
Business outcome
-> Workflow metric
-> AI behavior metric
-> Data / feature metric
-> Platform metric
-> Guardrail metric
8.1 Metric Ladder 示例
场景 Business outcome Workflow metric AI eval metric Data / feature metric Platform metric Guardrail AML Copilot 每个高质量 case 的处理成本下降 case review cycle time、QA defect rate evidence recall、unsupported claim rate、typology coverage transaction freshness、alert-case linkage accuracy retrieval latency、judge cost SAR final decision by AI = 0 客服投诉 AI 合规投诉率下降,首次解决率提升 first contact resolution、complaint reopen rate policy citation correctness、tone compliance、escalation correctness approved policy freshness、customer context completeness p95 latency、fallback rate customer harm / false promise = 0 信贷助手 memo 一致性提升,人工审批效率提升 analyst memo turnaround、exception review time policy adherence、reason code consistency、decision boundary compliance income verification completeness、policy version match route-to-human rate、cost per memo automated credit decision = 0 AI Value Office 资金投向高价值低失控风险 use case scale/stop decision cycle time、benefit sign-off rate release evidence completeness、risk gate pass rate metric contract coverage、baseline quality dashboard freshness、metric API uptime unapproved KPI in exec summary = 0
8.2 解释链
如果看到 不应直接推断 应检查 eval 分数提升 业务价值必然提升 adoption、workflow metric、business baseline、样本覆盖 AHT 下降 客户体验改善 reopen、complaint、wrong answer、escalation defect false positive 下降 风险更低 missed suspicious patterns、QA defect、regulatory feedback adoption 上升 产品成功 override rate、quality trend、强制使用政策 cost per request 下降 单位经济性改善 successful task rate、fallback、human review cost
9. Metric Drift:指标漂移是治理事件
AI 指标漂移不是一个单独的 ML monitoring 问题。它可能来自业务、数据、流程、政策、模型、用户行为或指标定义本身。
Drift type 说明 信号 处理动作 Definition drift 指标定义或排除规则改变 contract version delta、backtest gap major version、owner approval、consumer notice Data drift 输入数据分布改变 null rate、range、category distribution、row count anomaly data incident、pipeline check、source owner review Source drift 上游系统字段或状态语义改变 data contract breach、schema diff block release、update contract、run backfill Process drift 人工流程或业务规则改变 handling time step change、new status path workflow review、metric contract update Population drift 客群、渠道、产品 mix 变化 segment share shift slice-level eval、business explanation Policy drift 政策、监管、产品条款变更 effective date mismatch、policy version conflict knowledge and metric re-index、release gate rerun Model behavior drift AI 输出行为改变 judge score trend、override reason shift eval refresh、prompt/model rollback User behavior drift 用户开始绕过或过度依赖 AI adoption spike、manual review drop、override decline HITL audit、training、UI control
9.1 Metric Drift Triage
Triage question 判断逻辑 责任人 数字是否真的变了? 先看 pipeline、contract、freshness、reconciliation Data owner 口径是否变了? 查 metric version、semantic model、filters、dimensions Metric owner 业务是否变了? 查政策、流程、客群、渠道、活动 Business owner AI 是否导致变化? 查 AI-assisted flag、adoption、override、eval trend AI PM / EvalOps 变化是否影响决策? 查 release gate、funding memo、risk threshold AI Value Office / Risk
10. Natural-Language Analytics 风险
自然语言分析不是“更友好的 BI”。在金融零售场景,它是一个会生成查询、解释指标并影响决策的 AI 产品。
10.1 主要风险
Risk 典型表现 后果 Metric ambiguity “活跃用户”被解释成 login、transaction、AI usage 三种口径 高管决策基于错误 KPI Unapproved metric LLM 根据裸表临时拼出指标 非正式指标被当成企业事实 Wrong grain customer-level 和 account-level 混算 重复计数或遗漏 Time misinterpretation posted date、event date、review date 混淆 趋势解释错误 Join hallucination LLM 猜测 join path 数字严重偏差 Row-level security bypass 生成 SQL 绕过 entitlement 数据泄露 Small-cell disclosure 切片太细暴露敏感群体 隐私和合规风险 Prompt injection 用户或文档诱导模型忽略政策 越权查询或错误解释 Cost explosion 宽表全扫描、无限维度 group by 平台成本失控 False causality 把相关性解释成因果 错误业务行动
10.2 LLM-Generated SQL Guardrails
Guardrail 设计要求 Release evidence Semantic-only query LLM 只能调用 metric API 或 approved semantic objects raw table access disabled for assistant Intent disambiguation 指标名、时间、维度、过滤条件不明确时必须追问 ambiguity eval pass >= 95% SQL AST validation 解析生成 SQL,限制 SELECT、approved functions、approved joins AST policy test suite Row/column policy 查询前后都执行 RLS/CLS 和 small-cell suppression entitlement test 100% pass Query plan check 限制扫描量、join 数、时间范围和 group cardinality cost guardrail dashboard Metric citation 回答必须引用 metric contract version 和 query id citation coverage >= 98% Result sanity check 和历史范围、总量、reconciliation metric 对比 anomaly block / warning Explanation policy 区分“数据观察”“可能原因”“已验证原因” causality language rubric Saved answer review 高管摘要、监管敏感解释进入人工复核 executive summary review log Audit trace 保存 prompt、semantic objects、query、result hash、response trace retention policy
10.3 Natural-Language Analytics Release Gate
Gate 阈值 Fail 处理 Unauthorized metric usage 0 阻止发布,收紧 metric retrieval scope RLS / CLS breach 0 停用 assistant,启动 security incident Ambiguous question answered without clarification <= 2% 调整 intent classifier 和 prompt Wrong metric definition citation 0 high-risk cases 修复 glossary retrieval 和 citation Unsafe causality claim 0 executive / regulatory summaries 增加 causal language guardrail Cost per successful query 在预算阈值内 优化 query plan 或限制查询范围 User trust defect rate 按 QA 抽样低于 agreed threshold 缩小 beta 范围,补充 training
11. Metrics Governance Operating Model
11.1 RACI
R = Responsible,A = Accountable,C = Consulted,I = Informed。
Activity Business Metric Owner Data Owner Semantic Layer Steward AI PM Data Product Manager EvalOps Architect Risk / Compliance Finance Define business decision A/R C I R C C C C C Approve metric contract A C R R C C C C C Define semantic model C C A/R C R I R I I Approve source-of-truth C A/R C I R I C C I Enforce data contract C A/R C I R I R C I Define eval linkage C C C A/R C R C C I Approve AI consumption policy C C R A/R C C R A/R I Set release gates A C C R C A/R C A/R C Monitor metric quality C R A/R C R C C I I Run metric incident review A R R R C C C C I Approve value reporting C C I R C C I I A/R
11.2 Metric Approval Workflow
Stage 入口条件 核心检查 出口证据 Draft 有明确业务决策和 owner 是否已有相似指标,是否需要 official status draft contract Design review 公式、粒度、维度、时间口径完成 source-of-truth、semantic model、access policy review notes Data contract check 输入数据源确认 schema、semantics、freshness、quality、SLA data contract version Backtest 历史数据可计算 与现有指标 reconciliation、异常解释 backtest report AI consumption review 需要被 AI / NL analytics / eval 使用 allowed use、prohibited use、guardrails consumption policy Release gate 质量和风险达标 SLO、lineage、RACI、monitoring release memo Operate 指标进入 production freshness、drift、usage、incident quality dashboard Deprecate 有替代指标或风险撤回 consumer impact、migration、communication deprecation record
11.3 Release Gate 分级
Metric tier 示例 Gate 强度 T0 Critical 监管报送、信贷决策边界、客户资金、AI risk gate risk/compliance approval、zero critical defect、lineage complete T1 Executive AI Value Office ROI、board dashboard、portfolio funding KPI finance sign-off、backtest、glossary、owner approval T2 Operational AML case productivity、contact center efficiency、model operations owner approval、quality SLO、monitoring T3 Exploratory analyst ad hoc analysis、pilot hypothesis 标记 experimental,不允许作为 executive answer
12. Observability for Metric Quality
指标观测要覆盖“数字是否准”“解释是否准”“AI 是否正确消费”。
Signal 定义 触发阈值示例 行动 Freshness metric 距 source-of-truth 的延迟 p95 > contract target 显示 stale warning,阻止高风险回答 Completeness 关键字段和实体覆盖 critical field completeness < 99% 暂停 official metric refresh Accuracy / reconciliation 与权威系统或控制总账对账 差异 > 0.5% 或金额差异超阈值 incident review Semantic consistency 同一指标跨工具结果一致 BI、API、SQL diff > tolerance 停用 downstream cache Lineage completeness source、job、dataset、metric、consumer 是否完整 lineage missing for T0/T1 metric release gate fail Dimension cardinality anomaly 维度取值异常变化 新类别或占比突变 source/process review Metric drift 定义、分布、流程或政策漂移 slice-level trend break drift triage AI query defect LLM 使用错误指标或错误维度 high-risk defect > 0 assistant rollback Usage anomaly 消费者、查询量、成本异常 executive metric sudden spike audit trace review Decision dependency 哪些决策依赖该指标 T0/T1 incident decision owner notification
12.1 Metric Quality Dashboard 结构
Panel 指标 解释 Contract health contract coverage、owner coverage、approved status 指标治理基础是否完整 Data health freshness、null rate、schema breach、SLA breach 输入数据是否满足契约 Semantic health duplicate metrics、join path errors、dimension policy violations 语义模型是否可靠 Lineage health lineage completeness、consumer count、impact map 影响范围是否可解释 AI consumption health NL query pass rate、metric citation rate、guardrail blocks AI 是否安全消费指标 Eval linkage eval metric to business metric coverage、release gate failures 评测是否连接价值和风险 Drift definition drift、distribution drift、policy drift 是否需要复核口径或流程 Incident open incidents、MTTR、repeat incident count 运营能力是否成熟
13. 金融零售案例库
13.1 AI Value Office Dashboard
维度 设计 决策问题 哪些 AI use case 继续投、停止、扩容或平台化? 核心指标 funded use cases、release gate pass rate、benefit sign-off rate、adoption by target role、cost per successful task、critical incident count、platform reuse rate Source-of-truth portfolio backlog、finance benefit register、AI observability traces、risk gate records、platform billing Metric contract 风险 “收益”只用估算小时数,没有 finance sign-off;“adoption”只看登录,不看 workflow completion Semantic layer 要点 use_case、business_unit、risk_tier、stage、platform_capability、benefit_type 作为 conformed dimensions Eval 连接 release gate pass rate 必须连接 eval report completeness、critical failure count 和 incident history Release gate T1 executive metrics 需要 finance / risk / business owner 联合批准 Portfolio artifact AI portfolio metric contract pack + executive dashboard lineage map
示例指标:
Metric Definition Gate ai_value.benefit_signed_off_rate已由 finance delegate 签署收益证据的 use case / 已上线 use case < 80% 时不得宣称 portfolio ROI ai_value.cost_per_successful_taskAI 总运行成本 / 通过质量标准并被用户采纳的任务数 连续两期上升需 route/cost review ai_value.platform_reuse_rate复用标准 gateway、RAG、eval、observability 的 use case / active use cases < 60% 时检查平台 product-market fit
13.2 AML Efficiency Metrics
维度 设计 决策问题 AML Copilot 是否降低 analyst 工作量,同时不降低风险识别质量? 核心指标 case review cycle time、evidence completeness rate、false positive reduction、QA defect rate、missed red flag rate、unsupported narrative claim rate Source-of-truth AML alerts、case management、transaction monitoring、QA review、SAR narrative workflow Metric contract 风险 case complexity 没有切片,导致 AI 看似降低平均时长但只处理简单 case Semantic layer 要点 typology、risk_tier、jurisdiction、case_complexity、AI_assisted_flag 必须是受控维度 Eval 连接 evidence recall、citation correctness、narrative factuality 映射到 QA defect 和 review cycle time Guardrail AI final SAR decision = 0;unsupported high-risk claim = 0 Portfolio artifact AML metric ladder + evidence completeness contract + release gate memo
示例指标矩阵:
Metric Definition Required slices Bad interpretation aml.case_review_cycle_time_p50case assigned 到 analyst review complete 的中位时间 risk_tier、typology、AI_assisted_flag 不按复杂度切片就宣称效率提升 aml.evidence_completeness_ratemandatory evidence 全部满足的 QA sample case 占比 jurisdiction、typology、analyst_team 只看有附件,不看证据是否支持结论 aml.unsupported_narrative_claim_rateAI draft 中无 source record 支持的关键 claim 占比 risk_tier、model_version、prompt_version 把语言流畅度当 narrative 质量
13.3 Customer-Facing AI Complaint Metrics
维度 设计 决策问题 面向客户的 AI 是否减少投诉并提升解决率,同时不制造错误承诺或监管风险? 核心指标 complaint rate per AI interaction、first contact resolution、reopen rate、escalation correctness、false promise rate、policy citation correctness、vulnerable customer escalation rate Source-of-truth contact center platform、complaint management、approved policy repository、customer interaction logs、QA reviews Metric contract 风险 complaint 被不同渠道记录,web chat、call center、branch complaint 口径不一致 Semantic layer 要点 complaint_category、channel、product_line、customer_vulnerability_flag、AI_touchpoint、resolution_status Eval 连接 tone compliance、policy adherence、escalation correctness 映射到 complaint reopen 和 QA defect Guardrail false promise = 0;internal-only policy exposure = 0;vulnerable customer escalation miss = 0 Portfolio artifact Customer-facing AI complaint metric glossary + NL analytics guardrail report
示例指标:
Metric Definition Release use cx.ai_complaint_rateAI interaction 后 7 天内产生可归因投诉的 interaction 占比 上线后按周监控,超过 control group 阈值自动扩容冻结 cx.escalation_correctness_rate应升级人工的场景中,AI 正确升级的比例 高风险 release gate cx.false_promise_rateAI 对费用减免、授信、退款、时限作出无授权承诺的比例 T0 guardrail,必须为 0
13.4 Credit Assistant Eval Metrics
维度 设计 决策问题 信贷助手是否提升 memo 质量和 analyst 效率,而不越过人工审批和模型风险边界? 核心指标 memo turnaround time、policy checklist completeness、reason code consistency、unsupported credit rationale rate、human override quality、decision boundary violation Source-of-truth loan application system、credit bureau attributes、income verification、credit policy repository、underwriting decision records Metric contract 风险 历史审批结果被当成“正确答案”,固化旧偏差;outcome window 未成熟导致 eval 虚高 Semantic layer 要点 loan_product、risk_band、channel、policy_version、exception_type、protected_attribute_review_slice Eval 连接 policy adherence、reason code consistency、factual grounding 与 memo QA defect、exception review time 连接 Guardrail AI automated approval / decline = 0;prohibited feature usage = 0 Portfolio artifact Credit assistant eval-to-business matrix + model risk metric pack
示例指标:
Metric Definition Review note credit.policy_adherence_scoreAI memo 是否正确应用申请日期对应政策版本的专家评分 政策 effective date 必须进入 semantic model credit.decision_boundary_violation_countAI 输出中出现审批、拒绝、额度承诺等越界行为次数 T0 guardrail credit.reason_code_consistency_rateAI 草拟 reason code 与人工最终 memo / policy rule 一致比例 不能等同于贷款表现预测准确率
14. Templates / Portfolio Artifacts
下面模板使用具体示例值,目的是形成作品集证据,不是空白表单。
14.1 Metric Contract Artifact
# Metric Contract: aml.evidence_completeness_rate
## Identity
- Domain: Financial Crime / AML
- Metric tier: T1 Operational Risk
- Status: production
- Version: 2.1.0
- Accountable owner: AML Operations Director
- Responsible owner: Financial Crime Data Product Manager
- Review cadence: monthly, and immediately after AML policy change
## Business Decision
- Used to decide whether AML Copilot can expand from pilot analyst team to all Level 1 alert review teams.
- Used in AI release gate with `aml.unsupported_narrative_claim_rate` and `aml.case_review_cycle_time_p50`.
## Definition
- A reviewed AML case is complete when every mandatory evidence item in the applicable checklist version has a linked source record, a valid timestamp, and QA pass status.
- Metric formula: complete reviewed cases / total reviewed cases in QA sample.
- Grain: case_id + review_completion_date.
- Time basis: review completion date in business timezone.
## Inclusion / Exclusion
- Include: production AML alert cases completed by Level 1 analyst workflow.
- Exclude: test cases, migrated legacy cases without checklist version, legally restricted cases unavailable to AI.
## Dimensions
- Allowed: jurisdiction, risk_tier, typology, analyst_team, AI_assisted_flag, checklist_version.
- Restricted: customer_id, account_id, beneficial_owner_id.
## Source and Lineage
- Source-of-truth: AML case management system and evidence checklist service.
- Input datasets: aml_cases, aml_evidence_items, aml_checklist_versions, qa_review_results.
- Lineage required: source dataset -> curation job -> semantic model -> metric -> release gate report.
## Quality SLO
- Freshness P95: under 2 hours.
- Mandatory field completeness: 99.5% or higher.
- Case-to-evidence linkage accuracy: 99% or higher in QA sample.
- Lineage completeness: 100% for production report.
## AI Consumption Policy
- Allowed: AI Value Office dashboard, AML Copilot release report, analyst productivity dashboard.
- Prohibited: automatic SAR filing, automatic case closure, customer-level natural-language exploration.
## Release Gate
- Expansion gate fails if high-risk jurisdiction slice drops below 95%.
- Critical gate fails if unsupported high-risk narrative claim count is greater than 0.
## Incident
- Severity 1: metric used in release decision with incorrect numerator or denominator.
- Immediate action: freeze Copilot expansion, notify AML owner, rerun backtest, publish correction note.
14.2 Metric Glossary Entry
Field Example Business term First Contact Resolution Approved definition 客户首次联系后 7 天内无需二次联系且 case resolution status 为 resolved 的 interaction 占比。 Not the same as Chatbot containment、agent handle time、same-day close rate Official metric cx.first_contact_resolution_rateTime basis initial interaction timestamp Allowed dimensions channel、product_line、issue_category、AI_touchpoint Restricted dimensions customer_id、protected vulnerability details AI explanation rule AI 可以解释趋势,但必须区分 correlation 与 verified root cause。 Owner Customer Operations Review trigger 新投诉分类、新渠道上线、监管定义变化
data_contract:
asset: mart_aml_case_review
version: 3.4.0
owner: financial_crime_data_owner
consumers:
- metric: aml.evidence_completeness_rate
- metric: aml.case_review_cycle_time_p50
- eval_report: aml_copilot_release_gate
schema:
case_id:
type: string
required: true
uniqueness: true
review_completion_ts:
type: timestamp
required: true
checklist_version:
type: string
required: true
mandatory_evidence_count:
type: integer
required: true
min: 1
linked_evidence_pass_count:
type: integer
required: true
min: 0
semantics:
review_completion_ts: analyst review completion, not QA completion
checklist_version: version effective on review_completion_ts
quality:
freshness_p95_minutes: 120
case_id_null_rate: 0
checklist_version_null_rate: 0
evidence_count_reconciliation_tolerance: 0.005
governance:
pii_class: no direct identifiers in this mart
row_level_policy: analyst team and jurisdiction filters enforced upstream
allowed_ai_use:
- release_gate
- portfolio_dashboard
- operational_analytics
prohibited_ai_use:
- automatic_case_closure
- customer_level_freeform_query
breach_action:
severity_1:
- freeze AI expansion gates using this metric
- notify AML Operations and AI PM
- rerun affected reports after remediation
14.4 Eval-to-Business Metric Matrix
Use case Eval metric Workflow metric Business metric Interpretation rule AML Copilot evidence recall >= 95% QA defect rate decreases cost per complete case decreases 只有 adoption 稳定且 QA defect 不上升,cycle time 下降才算收益 Customer AI policy citation correctness >= 95% reopen rate decreases complaint cost decreases containment 上升但投诉上升时判定为质量失败 Credit Assistant decision boundary violation = 0 memo turnaround decreases analyst capacity increases 不能把审批速度提升解释成信贷风险降低 Value Office release evidence completeness >= 98% scale/stop cycle time decreases capital allocation quality improves ROI 没有 finance sign-off 不进入 executive benefit
14.5 Natural-Language Analytics Risk Assessment
Question class Example Allowed response Guardrail Approved KPI “本月 AML case cycle time 为什么下降?” 使用 official metric,引用版本和主要切片 metric citation + causality caveat Ambiguous KPI “我们的 AI adoption 好吗?” 先追问 adoption 是 login、weekly active user、accepted suggestion 还是 workflow completion clarification required Restricted slice “列出高净值客户投诉率最高的地区和客户” 拒绝 customer-level 输出,提供合规聚合解释 small-cell suppression + RLS Exploratory analysis “新 prompt 是否降低 unsupported claim?” 允许 pilot metric,并标记 experimental beta label + no executive use Causal claim “投诉下降是不是因为 AI?” 说明需要 control group 或干预分析,提供相关性观察 causality language policy
14.6 Metric Release Gate Memo
Section Example content Release item cx.ai_complaint_rate enters production for customer-facing AI weekly dashboardDecision Approved for limited release to Customer Operations and AI Value Office Evidence Metric contract v1.0 approved; 12-week backtest complete; complaint source reconciliation difference < 0.2%; RLS tests 100% pass Risk Complaint attribution window may over-credit AI when multiple channels touch same customer Mitigation Use AI_touchpoint and last_touch_channel; publish attribution caveat in glossary Gate false promise rate = 0; escalation correctness >= 98%; policy citation correctness >= 95% Owner Customer Operations accountable; CX Data Product responsible; Risk consulted Next review First monthly review after two reporting cycles
14.7 Portfolio Evidence Package
作品集可以展示一个完整 case,而不是只展示图表:
Artifact 证明什么能力 Metric glossary 能把业务语言变成可治理指标语言 Semantic model spec 能设计 entity、dimension、measure、metric 和 join path Metric contract pack 能把指标定义成产品契约 Data contract 能管理上游变化、质量和 SLA Lineage map 能解释指标从源系统到 AI 输出的证据链 Eval-to-business matrix 能把 AI 质量、流程指标和业务价值连接起来 NL analytics risk assessment 能识别 LLM-generated SQL 和自然语言解释风险 Release gate memo 能把指标质量纳入上线和扩容决策 Metric quality dashboard mock 能运营 freshness、drift、incident 和 consumer impact Interview narrative 能用金融零售案例讲清架构、治理和产品取舍
15. 30-Day Lab:做一个作品集级 Metric Architecture Case
目标:30 天内完成一个“AI Semantic Layer / Metrics Architecture”作品集包。推荐选 AML Copilot、Customer-facing Complaint AI、Credit Assistant 或 AI Value Office Dashboard。
周期 训练重点 产出 Week 1 选择场景,定义业务决策、metric ladder、核心 official metrics business decision map、Eval-to-Business Metric Matrix Week 2 设计 semantic model、metric contracts、glossary、data contract semantic model spec、3 个 metric contracts、glossary Week 3 设计 lineage、NL analytics guardrails、release gates lineage map、LLM SQL guardrail policy、release gate memo Week 4 设计 observability、drift triage、RACI、portfolio narrative metric quality dashboard、RACI、case study deck outline
30 天验收标准:
能力 可验证证据 指标契约能力 每个 official metric 有 owner、formula、grain、dimensions、quality SLO、AI consumption policy 语义建模能力 能解释 entity、dimension、measure、metric 和 join path 的取舍 血缘治理能力 能从 source system 追到 dashboard、eval report 和 AI answer EvalOps 连接能力 每个 eval metric 能映射到 workflow metric 和 business metric 风险控制能力 对 NL analytics 和 LLM SQL 有明确 guardrails 与 release gate 运营能力 有 metric quality dashboard、drift triage 和 incident response
16. 面试表达
16.1 30 秒版本
AI 时代的 semantic layer 不是 BI 语义层升级版,而是指标事实控制面。它把 metric definition 做成产品契约:有 owner、公式、粒度、时间口径、维度、source-of-truth、data contract、lineage、质量 SLO、AI consumption policy 和 release gate。这样 BI、自然语言分析、AI assistant、eval harness 和 AI Value Office dashboard 才能消费同一套可信指标。
16.2 2 分钟版本
我会把指标架构分成三条主线。
第一条是语义主线:先确认业务决策和 official metric,再设计 semantic model,包括 entity、dimension、measure、metric、time spine、join path 和 glossary。指标不能只存在于 dashboard SQL,而要进入 metric registry,并且有版本和审批流程。
第二条是证据主线:每个指标要连接 data contract 和 lineage。上游数据要有 schema、semantics、freshness、quality 和 SLA;血缘要能从 source record、pipeline run、curated dataset、semantic model 追到 metric、dashboard、eval report 和 AI answer。这样指标异常时可以做影响分析,而不是靠人工猜。
第三条是 AI 消费主线:自然语言分析和 LLM-generated SQL 只能使用批准的 semantic objects,并经过 SQL AST validation、RLS/CLS、small-cell suppression、query plan check、metric citation 和 causality language policy。EvalOps 也必须把 AI 行为指标连接到 workflow metric 和 business metric,防止模型分数提高但业务没有改善。
在金融零售里,我会对 AML、投诉、信贷这类场景设置更强 release gates,比如 unsupported high-risk claim、false promise、automated credit decision 这类 guardrail 必须为 0。
16.3 高阶追问速答
问题 回答要点 Semantic layer 和 metric store 的差异是什么? Semantic layer 负责业务语义、entity、dimension、measure、join path 和 metrics API;metric store 更偏指标定义和查询服务。企业 AI 场景通常需要 registry、glossary、data contract、lineage、policy 和 observability 一起组成 control plane。 为什么说 metric definition 是 product contract? 因为它承诺一个数字可以被谁用于什么决策,包含 owner、公式、粒度、时间、排除项、质量 SLO、变更流程、AI 使用边界和 incident response。 如何防止 LLM 写错 SQL? 不让 LLM 直接查裸表。让它通过 semantic API 查询 approved metrics;再加 intent disambiguation、AST allowlist、RLS/CLS、query plan limit、metric citation、result sanity check 和 audit trace。 Eval metric 和 business metric 怎么连? 用 metric ladder:business outcome -> workflow metric -> AI eval metric -> data/feature metric -> platform metric -> guardrail。比如 AML 的 evidence recall 要连接 QA defect、case cycle time 和 cost per complete case。 Metric drift 怎么处理? 先区分 definition、data、source、process、population、policy、model behavior、user behavior drift。每类 drift 有不同 owner 和动作,不能默认归因给模型。 build vs buy 怎么判断? official metrics 和高风险 AI 需要稳定 semantic layer、data contracts、lineage 和 governance。优先复用成熟工具;只有在实时、复杂权限、监管或低延迟场景需要 custom metric API。 面向客户 AI 的投诉指标最重要是什么? 不只看 complaint volume,还要看 attribution window、first contact resolution、reopen、false promise、policy citation、escalation correctness 和 vulnerable customer handling。false promise 和内部政策泄露是 hard guardrail。 信贷助手为什么不能只看准确率? 信贷助手通常是 decision support,不应自动审批。要看 policy adherence、reason code consistency、factual grounding、decision boundary violation、protected attribute review slice 和人工 override 质量。
17. 常见误区
误区 为什么危险 更好的做法 把 semantic layer 当报表加速层 忽略 AI consumption、eval、risk、lineage 把它设计成 metrics control plane 指标定义只写自然语言 LLM 和 API 无法稳定执行 自然语言定义 + machine-readable formula + semantic model 让 LLM 直接查 warehouse join、权限、口径和成本风险不可控 semantic-only query + guardrails 只看 dashboard 数字一致 AI 还需要引用 contract、解释口径和追踪血缘 建立 metric citation 和 trace 只治理 executive KPI AI eval 和 workflow metrics 同样影响发布决策 对 T0/T1/T2 分级治理 Drift 只交给 ML team 指标漂移可能来自政策、流程、数据或用户行为 drift triage 分 owner ROI 指标没有 finance sign-off 作品集或高管汇报可信度不足 建立 benefit signed-off metric 用历史信贷结果当唯一 eval truth 可能固化历史偏差和未成熟 outcome 分离 policy adherence、decision boundary 和 long-term outcome 客服 AI 只看 containment 可能牺牲客户体验和合规 同时看投诉、reopen、false promise、escalation 没有 deprecation 流程 旧指标继续被 AI 引用 replacement、consumer notice、incident withdrawal
18. 一页总结
AI Semantic Layer / Metrics Architecture 的核心,是把指标从“查询结果”升级为“可治理、可观测、可被 AI 安全消费的产品契约”。
从 到 Dashboard SQL Metric contract + semantic model 字段说明 Metric glossary + machine-readable formula 表级血缘 Source -> job -> dataset -> semantic model -> metric -> AI answer 数据质量检查 Metric quality SLO + drift + incident 自然语言查数 Governed NL analytics with semantic API and guardrails 模型分数 Eval-to-business metric ladder 一次性上线 Approval workflow + release gate + observability 数据团队单点负责 Business owner、data owner、AI PM、EvalOps、Risk、Finance 的 RACI
作品集表达可以浓缩成一句话:
I design metrics as governed product contracts, so AI systems can reason over business performance without inventing definitions, bypassing controls or breaking trust.