返回 Papers
AI 扩展计划 / Playbooks

AI Semantic Layer / Metrics Architecture Playbook

这些来源作为架构和治理锚点,不构成法律、监管或供应商选型意见。表中只保留当前可访问的官方入口,便于作品集长期引用。

863AI_SEMANTIC_LAYER_METRICS_ARCHITECTURE_PLAYBOOK.md

AI Semantic Layer / Metrics Architecture Playbook

适用对象: AI PM、AI Architect、Data Product Manager、EvalOps Lead、AI Value Office Lead、金融零售数据平台负责人。 核心问题: 当 AI 助手、自然语言分析、EvalOps、业务仪表盘和高管投资组合治理都依赖“指标”时,如何保证每个指标有一致口径、可追溯血缘、明确 owner、质量 SLO、发布门禁和风险控制? 学习目标: 把 metric definition 从“报表字段说明”升级为“产品契约”,设计可被 BI、LLM、AI agent、eval harness 和治理流程共同消费的 semantic layer / metrics architecture。 使用边界: 本手册不讲 BA 基础需求分析,不做工具教程;重点是架构决策、产品治理、指标风险、平台选型和可放入作品集的证据包。


Source Anchors

这些来源作为架构和治理锚点,不构成法律、监管或供应商选型意见。表中只保留当前可访问的官方入口,便于作品集长期引用。

SourceOfficial / primary source本手册使用方式
dbt Semantic Layerhttps://docs.getdbt.com/docs/use-dbt-semantic-layer/dbt-sl用作“指标语义层把 metric 作为统一消费接口”的产品和平台锚点。
dbt MetricFlowhttps://docs.getdbt.com/docs/build/about-metricflow用作 semantic model、metric、dimension、entity、SQL 生成与集中定义的设计锚点。
dbt Semantic Modelshttps://docs.getdbt.com/docs/build/semantic-models用作 entity、dimension、measure、semantic graph 的建模锚点。
OpenLineagehttps://openlineage.io/docs/用作 dataset、job、run、facet 和跨工具 lineage event 的血缘锚点。
OpenMetadata Data Contractshttps://docs.open-metadata.org/v1.12.x/how-to-guides/data-contracts用作 schema、semantics、data quality、SLA、execution history 的数据契约锚点。
W3C DCAT 3https://www.w3.org/TR/vocab-dcat-3/用作 dataset、data service、catalog、metadata interoperability 的目录标准锚点。
NIST AI RMFhttps://www.nist.gov/itl/ai-risk-management-framework用 Govern / Map / Measure / Manage 组织 AI 指标风险、eval、monitoring 和治理证据。

1. 定位:指标语义层是 AI 产品的事实控制面

传统指标治理常被放在 BI、数仓或数据治理团队内部。AI 时代这个边界不够了,因为同一套指标会被更多系统消费:

  • 高管用 AI Value Office dashboard 判断哪些 AI use case 继续投、停止或规模化。
  • AI PM 用业务指标和 eval 指标证明 agent 是否真的改善流程。
  • LLM analytics assistant 用自然语言生成查询、解释趋势、回答“为什么下降”。
  • EvalOps 用 metric slice 判断模型在地区、渠道、客群、风险等级上的质量差异。
  • 风控、审计和监管沟通需要看到指标定义、血缘、变更记录和 release gate。

一句话:

AI semantic layer is the governed contract between business meaning, data implementation, AI evaluation and executive decision-making.

如果指标没有契约化,AI 系统会把模糊口径放大:

常见症状AI 放大后的风险架构根因
同一个 KPI 在不同 dashboard 数字不一致AI 助手给出互相矛盾的经营解释没有统一 semantic model 和 metric registry
指标定义写在报表说明里LLM 生成 SQL 时绕过口径指标不是机器可执行 contract
指标血缘不清事故后无法定位是源系统、转换、权限还是模型解释错误没有 dataset/job/run 级 lineage
指标 owner 不清口径变更没人批准,AI eval gate 失效缺少 RACI 和 approval workflow
只看模型分数AI 系统通过 eval 但业务价值没有改善eval metric 没有连接 workflow metric 和 business metric
自然语言分析直接查 warehouse权限绕过、错误 join、聚合泄露、幻觉 SQL缺少 semantic API 和 LLM SQL guardrails

2. 高阶框架:Metric Contract Stack

一个可用于 AI 产品和治理的指标架构,不只是 semantic layer 工具,而是九层契约栈。

Layer核心对象关键问题产物
1. Business outcome业务结果这个指标服务什么决策、流程或风险控制?Outcome map、decision memo
2. Metric contract指标定义公式、粒度、时间窗口、排除项、source-of-truth 是否明确?Metric Contract
3. Semantic model语义模型entity、dimension、measure、metric、join path 是否稳定?Semantic Model Spec
4. Data contract输入数据契约schema、semantics、quality、freshness、SLA 是否可执行?Data Contract
5. Lineage血缘从 source record 到 metric output 的影响范围能否追踪?Lineage Map
6. Consumption policy消费策略BI、AI、LLM SQL、eval、export、API 的允许用途是什么?Consumption Policy
7. Eval linkage评测连接业务指标、workflow metric、eval metric、feature metric 如何相互解释?Eval-to-Business Matrix
8. Release gates发布门禁哪些指标变化会阻止模型、dashboard 或自然语言分析上线?Release Gate Memo
9. Observability运营观测指标 freshness、accuracy、drift、usage、incident 是否持续监控?Metric Quality Dashboard

2.1 决策原则

原则含义反例
Metric is a product contract指标定义有 owner、版本、SLO、变更流程和消费者清单指标只是一段 SQL,被复制到多个 dashboard
Semantic model before natural languageLLM 只能消费已批准语义对象,而不是直接解释裸表让 LLM 猜 customer_idclient_idparty_id 的差异
Lineage before blame指标异常先定位血缘和变更,再判断模型或业务原因看到 dashboard 波动就让数据团队临时查 SQL
Eval metric must map to business metric离线质量分必须解释业务流程改善或风险降低judge 分数提高但投诉率、AHT、case defect 没变
Drift is a governance event指标口径、数据分布、用户行为、政策变化都可能漂移只把 drift 当成模型监控问题
AI consumption needs policy每个指标要声明是否可被 LLM 解释、生成 SQL、导出或用于训练/eval所有 metric 默认进入聊天助手上下文

3. 核心对象:不要把 Metric、Measure、Feature、Eval 混在一起

AI 产品里经常出现“指标”一词覆盖过宽。面试和架构评审中要把对象边界说清楚。

对象定义示例Owner主要消费者
Entity业务实体或 join anchorcustomer、account、case、complaint、loan_applicationDomain data ownersemantic layer、BI、AI retrieval
Dimension切片维度product_line、region、risk_tier、channel、complaint_categoryData stewardBI、eval slicing、AI explanation
Measure可聚合基础量transaction_amount、case_count、handle_minutesData engineering / analyticsmetrics、semantic model
Metric有业务口径的指标AML case cycle time、AI adoption rate、complaint escalation defect rateBusiness metric ownerdashboard、AI assistant、Value Office
Feature模型输入信号customer_txn_30d_count、income_verification_confidenceML / risk analytics ownermodel、routing、risk scoring
Eval metricAI 行为质量指标citation correctness、unsupported claim rate、policy adherenceEvalOps / AI PMrelease gate、monitoring
Business metric业务结果指标cost per case、first contact resolution、loss preventedBusiness owner / financeportfolio decision、funding gate
Guardrail metric安全/合规阈值PII leakage = 0、unauthorized decision = 0Risk / compliancerelease gate、incident

3.1 关键分离

容易混淆应该如何分离决策意义
Model accuracy vs business valueaccuracy 是 AI 行为质量;value 要看流程指标和财务认可防止“模型更好但业务没变”
Feature freshness vs metric freshnessfeature freshness 影响模型输入;metric freshness 影响经营决策两者触发不同降级动作
Metric definition vs SQL implementation定义是产品契约;SQL 是实现变更审批看定义,回归测试看实现
Dashboard count vs official KPIdashboard 可探索;official KPI 需要 contract、owner、audit防止探索指标被 AI 当成正式事实
Eval slice vs business segmenteval slice 为发现模型失效;business segment 为经营决策两者命名可相似,但审批和解释不同

4. 参考架构:AI Metrics Control Plane

flowchart TB
  subgraph Sources[Source Systems]
    CORE[Core banking / ledger]
    CRM[CRM / customer profile]
    AML[AML case system]
    COMPLAINT[Complaint / contact center]
    CREDIT[Credit underwriting]
  end

  subgraph Contracts[Data Contracts and Catalog]
    DC[Data contracts]
    CAT[Catalog / DCAT metadata]
    LIN[OpenLineage events]
  end

  subgraph DataLayer[Curated Data Layer]
    CUR[Curated marts]
    DQ[Quality checks]
    PII[PII / access policy]
  end

  subgraph Semantic[Semantic Layer]
    SM[Semantic models]
    MR[Metric registry]
    GL[Metric glossary]
    API[Metrics API / governed SQL]
  end

  subgraph AI[AI Consumption]
    NLA[Natural-language analytics]
    EVAL[Eval harness]
    AGENT[AI assistants / agents]
    VALUE[AI Value Office dashboard]
  end

  subgraph Ops[Governance and Observability]
    RACI[Ownership / RACI]
    GATE[Approval and release gates]
    OBS[Metric quality observability]
    INC[Incident loop]
  end

  Sources --> DC
  Sources --> LIN
  DC --> CUR
  CAT --> SM
  LIN --> SM
  CUR --> DQ
  DQ --> SM
  PII --> API
  SM --> MR
  MR --> GL
  MR --> API
  API --> NLA
  API --> EVAL
  API --> AGENT
  API --> VALUE
  RACI --> GATE
  GATE --> MR
  OBS --> INC
  OBS --> GATE
  LIN --> OBS

4.1 平台能力清单

Capability必须提供什么不成熟信号
Metric registry指标定义、owner、version、status、消费者、变更历史指标散落在 Confluence、SQL、dashboard 注释里
Semantic API按指标、维度、时间、过滤条件查询,不暴露裸表LLM 直接生成任意 warehouse SQL
Data contract enforcementschema、semantic、quality、freshness、SLA 可测试上游字段改名后 dashboard 和 eval 才发现
Lineage capturesource -> job -> dataset -> semantic model -> metric -> consumer只能追到表,追不到指标和 AI 输出
Glossary人类可读 + 机器可检索的业务口径客服、风险、财务各自解释同一指标
Approval workflowdraft、review、release、deprecate、incident 全流程新指标上线靠 Slack 同意
Metric observabilityfreshness、completeness、reconciliation、drift、usage、cost只监控 pipeline 成功失败
AI guardrailsNL analytics、LLM SQL、RAG citation、eval gate 的指标消费策略AI 可以回答未批准指标或敏感切片

5. Metric Definition as Product Contract

指标契约不是字段字典。它要回答“谁可以基于这个数字做什么决策,以及数字错了谁负责”。

5.1 最小 Metric Contract

Contract field必填内容示例:AML Evidence Completeness Rate
Metric name稳定名称和 namespaceaml.evidence_completeness_rate
Business decision这个指标支持的决策判断 AML Copilot 是否可扩到更多 analyst team
Definition业务定义已完成 case 中,关键 evidence checklist 全部满足的 case 占比
Formula可执行公式complete_cases / reviewed_cases
Numerator分子口径reviewed_cases 中所有 mandatory evidence item 均有 source record 且 QA 通过的 case
Denominator分母口径在统计期内完成 analyst review 且进入 QA sample 的 case
Grain统计粒度case_id + review_date
Time window时间口径按 review completion date,周/月滚动
Exclusions排除项legally restricted cases、migrated legacy cases、test cases
Dimensions允许切片jurisdiction、risk_tier、typology、analyst_team、AI_assisted_flag
Source-of-truth权威来源AML case system + evidence checklist service
Data contract输入契约case_id 非空 100%,checklist version 必须匹配 review_date
Quality SLO质量目标freshness P95 < 2 hours,source linkage accuracy >= 99%
AI consumptionAI 可如何使用可用于 dashboard、eval report、Value Office summary;不可用于自动关闭 case
Release gate阈值高风险 jurisdiction evidence completeness < 95% 时 AI 扩容 gate fail
Owner责任AML operations accountable;financial crime data owner responsible
Change policy变更规则mandatory item 变更为 major version,需要 risk approval

5.2 指标契约比普通 KPI 定义多出的内容

增量能力为什么 AI 场景必须要证据
Machine-readable formulaLLM 和 eval harness 要稳定查询,不靠自然语言猜semantic model / MetricFlow spec
Consumption policy不同 AI 场景风险不同allowed AI use、prohibited use
Lineage出错后定位 blast radiusOpenLineage events、consumer map
Release gate指标失效要阻止上线或扩容gate memo、quality dashboard
Drift policy口径和分布会变化drift signal、review cadence
RACI指标不是数据团队单独负责ownership table、approval history
Incident playbook指标错误会影响 AI 决策severity、rollback、communication

6. Semantic Model Design Patterns

6.1 语义模型核心元素

Element设计重点金融零售示例
Entity确定 join anchor 和唯一性customeraccounttransactioncasecomplaintloan_application
Measure保持简单、可聚合、低语义争议case_counthandle_minutesdispute_amount
Dimension先治理高价值切片risk_tierchannelproduct_linejurisdiction
Metric用业务契约定义复杂口径first_contact_resolution_rateaml_false_positive_reduction
Time spine统一时间窗口和日历fiscal month、business day、regulatory reporting period
Join path控制多对多和重复计数customer-account-transaction、complaint-contact-case
Access policy指标、维度、行级、字段级策略高净值客户、敏感地区、未成年人数据

6.2 语义层架构选型

方案适用场景优势风险决策建议
Central semantic layer企业统一 KPI、AI Value Office、监管指标一致性强,AI 消费简单中央团队成为瓶颈用于 official metrics 和高风险 AI 场景
Domain-owned semantic models多业务域快速迭代贴近业务 owner,领域语义准确命名和维度可能分裂用 federated governance 和共享 glossary 约束
BI-tool semantic model主要消费在单一 BI 工具落地快AI、eval、API 消费弱不适合作为企业 AI 唯一语义层
Custom metric API高度定制、复杂权限、低延迟可控性强维护成本高,重复造轮子只在监管、实时风控、特殊 SLA 场景使用
Hybrid企业常态official metrics 集中,domain metrics 分布需要强治理推荐:central registry + domain semantic model + shared contracts

6.3 设计决策表

决策推荐默认例外
指标命名domain.metric_name跨域指标用 enterprise.metric_name
时间口径每个 metric contract 显式声明 event date、posted date、review date监管报送使用 regulatory reporting date
状态过滤只把业务终态纳入 official metric运营过程监控可使用 in-progress metrics
维度授权低风险维度默认可查,高敏维度需要 policycustomer-level、protected attribute、small cell size
实验指标experimental status,不进入 AI answer 默认上下文经 owner 批准可进入 pilot eval
Deprecation至少一个 reporting cycle 并提供 replacement错误指标可立即撤回并发布 incident note

7. Metric Lineage:从源记录到 AI 输出的证据链

指标血缘不能只停留在表级。AI 场景需要能回答:

  • 哪些 source records 影响了这个指标?
  • 哪个 data contract、pipeline job、semantic model version 和 metric formula 参与计算?
  • 哪些 dashboard、agent、eval report、自然语言回答使用了这个指标?
  • 口径变更会影响哪些 release gates、portfolio decisions 和监管报告?

7.1 Lineage 粒度

粒度记录什么用途
Source record原始业务记录、更新时间、系统版本事实争议、审计抽样
Dataset表、视图、文件、topic、index数据质量、影响分析
Job / run转换任务、运行实例、参数、状态pipeline incident、回放
Semantic modelentity、dimension、measure、join path口径一致性、join 风险
Metricformula、version、approval statusKPI 治理、AI answer grounding
Consumerdashboard、API、LLM assistant、eval reportblast radius、停用通知
Decisionrelease gate、funding memo、risk acceptance高管和审计证据

7.2 Metric Lineage Map 示例

Lineage node示例关键 metadata
Source systemAML case managementsystem owner、SLA、access class
Source tableaml_casesschema version、contract version
Pipeline jobcurate_aml_case_dailyrun_id、code_version、input/output datasets
Curated datasetmart_aml_case_reviewfreshness、completeness、row count anomaly
Semantic modelsemantic_aml_caseentities: case, customer; dimensions: typology, risk_tier
Metricaml.case_cycle_time_p50formula、grain、filters、version
Eval reportAML Copilot release gatethreshold、failed slices、owner
DashboardAI Value Office dashboardrefresh cadence、executive owner
AI answerMonthly portfolio summarymetric citations、query id、prompt version

7.3 Impact Analysis Gate

任何 metric major version 变更都要跑影响分析:

检查项通过标准Fail 动作
Consumer inventory所有消费者已识别阻止变更发布
Backtest delta新旧口径差异按主要维度解释清楚标记为 breaking change
Eval dependency依赖该指标的 release gate 已更新冻结相关模型 release
Executive decision dependency最近一次 funding / scale memo 已标记口径变更重新签署价值判断
Regulatory dependency监管或审计指标不受影响或已批准进入 risk review

8. Feature / Eval / Business Metric Relationship

AI PM 和 Architect 最容易犯的错误是把 eval 指标当业务价值证明。正确做法是建立 metric ladder。

Business outcome
-> Workflow metric
-> AI behavior metric
-> Data / feature metric
-> Platform metric
-> Guardrail metric

8.1 Metric Ladder 示例

场景Business outcomeWorkflow metricAI eval metricData / feature metricPlatform metricGuardrail
AML Copilot每个高质量 case 的处理成本下降case review cycle time、QA defect rateevidence recall、unsupported claim rate、typology coveragetransaction freshness、alert-case linkage accuracyretrieval latency、judge costSAR final decision by AI = 0
客服投诉 AI合规投诉率下降,首次解决率提升first contact resolution、complaint reopen ratepolicy citation correctness、tone compliance、escalation correctnessapproved policy freshness、customer context completenessp95 latency、fallback ratecustomer harm / false promise = 0
信贷助手memo 一致性提升,人工审批效率提升analyst memo turnaround、exception review timepolicy adherence、reason code consistency、decision boundary complianceincome verification completeness、policy version matchroute-to-human rate、cost per memoautomated credit decision = 0
AI Value Office资金投向高价值低失控风险 use casescale/stop decision cycle time、benefit sign-off raterelease evidence completeness、risk gate pass ratemetric contract coverage、baseline qualitydashboard freshness、metric API uptimeunapproved KPI in exec summary = 0

8.2 解释链

如果看到不应直接推断应检查
eval 分数提升业务价值必然提升adoption、workflow metric、business baseline、样本覆盖
AHT 下降客户体验改善reopen、complaint、wrong answer、escalation defect
false positive 下降风险更低missed suspicious patterns、QA defect、regulatory feedback
adoption 上升产品成功override rate、quality trend、强制使用政策
cost per request 下降单位经济性改善successful task rate、fallback、human review cost

9. Metric Drift:指标漂移是治理事件

AI 指标漂移不是一个单独的 ML monitoring 问题。它可能来自业务、数据、流程、政策、模型、用户行为或指标定义本身。

Drift type说明信号处理动作
Definition drift指标定义或排除规则改变contract version delta、backtest gapmajor version、owner approval、consumer notice
Data drift输入数据分布改变null rate、range、category distribution、row count anomalydata incident、pipeline check、source owner review
Source drift上游系统字段或状态语义改变data contract breach、schema diffblock release、update contract、run backfill
Process drift人工流程或业务规则改变handling time step change、new status pathworkflow review、metric contract update
Population drift客群、渠道、产品 mix 变化segment share shiftslice-level eval、business explanation
Policy drift政策、监管、产品条款变更effective date mismatch、policy version conflictknowledge and metric re-index、release gate rerun
Model behavior driftAI 输出行为改变judge score trend、override reason shifteval refresh、prompt/model rollback
User behavior drift用户开始绕过或过度依赖 AIadoption spike、manual review drop、override declineHITL audit、training、UI control

9.1 Metric Drift Triage

Triage question判断逻辑责任人
数字是否真的变了?先看 pipeline、contract、freshness、reconciliationData owner
口径是否变了?查 metric version、semantic model、filters、dimensionsMetric owner
业务是否变了?查政策、流程、客群、渠道、活动Business owner
AI 是否导致变化?查 AI-assisted flag、adoption、override、eval trendAI PM / EvalOps
变化是否影响决策?查 release gate、funding memo、risk thresholdAI Value Office / Risk

10. Natural-Language Analytics 风险

自然语言分析不是“更友好的 BI”。在金融零售场景,它是一个会生成查询、解释指标并影响决策的 AI 产品。

10.1 主要风险

Risk典型表现后果
Metric ambiguity“活跃用户”被解释成 login、transaction、AI usage 三种口径高管决策基于错误 KPI
Unapproved metricLLM 根据裸表临时拼出指标非正式指标被当成企业事实
Wrong graincustomer-level 和 account-level 混算重复计数或遗漏
Time misinterpretationposted date、event date、review date 混淆趋势解释错误
Join hallucinationLLM 猜测 join path数字严重偏差
Row-level security bypass生成 SQL 绕过 entitlement数据泄露
Small-cell disclosure切片太细暴露敏感群体隐私和合规风险
Prompt injection用户或文档诱导模型忽略政策越权查询或错误解释
Cost explosion宽表全扫描、无限维度 group by平台成本失控
False causality把相关性解释成因果错误业务行动

10.2 LLM-Generated SQL Guardrails

Guardrail设计要求Release evidence
Semantic-only queryLLM 只能调用 metric API 或 approved semantic objectsraw table access disabled for assistant
Intent disambiguation指标名、时间、维度、过滤条件不明确时必须追问ambiguity eval pass >= 95%
SQL AST validation解析生成 SQL,限制 SELECT、approved functions、approved joinsAST policy test suite
Row/column policy查询前后都执行 RLS/CLS 和 small-cell suppressionentitlement test 100% pass
Query plan check限制扫描量、join 数、时间范围和 group cardinalitycost guardrail dashboard
Metric citation回答必须引用 metric contract version 和 query idcitation coverage >= 98%
Result sanity check和历史范围、总量、reconciliation metric 对比anomaly block / warning
Explanation policy区分“数据观察”“可能原因”“已验证原因”causality language rubric
Saved answer review高管摘要、监管敏感解释进入人工复核executive summary review log
Audit trace保存 prompt、semantic objects、query、result hash、responsetrace retention policy

10.3 Natural-Language Analytics Release Gate

Gate阈值Fail 处理
Unauthorized metric usage0阻止发布,收紧 metric retrieval scope
RLS / CLS breach0停用 assistant,启动 security incident
Ambiguous question answered without clarification<= 2%调整 intent classifier 和 prompt
Wrong metric definition citation0 high-risk cases修复 glossary retrieval 和 citation
Unsafe causality claim0 executive / regulatory summaries增加 causal language guardrail
Cost per successful query在预算阈值内优化 query plan 或限制查询范围
User trust defect rate按 QA 抽样低于 agreed threshold缩小 beta 范围,补充 training

11. Metrics Governance Operating Model

11.1 RACI

R = Responsible,A = Accountable,C = Consulted,I = Informed。

ActivityBusiness Metric OwnerData OwnerSemantic Layer StewardAI PMData Product ManagerEvalOpsArchitectRisk / ComplianceFinance
Define business decisionA/RCIRCCCCC
Approve metric contractACRRCCCCC
Define semantic modelCCA/RCRIRII
Approve source-of-truthCA/RCIRICCI
Enforce data contractCA/RCIRIRCI
Define eval linkageCCCA/RCRCCI
Approve AI consumption policyCCRA/RCCRA/RI
Set release gatesACCRCA/RCA/RC
Monitor metric qualityCRA/RCRCCII
Run metric incident reviewARRRCCCCI
Approve value reportingCCIRCCIIA/R

11.2 Metric Approval Workflow

Stage入口条件核心检查出口证据
Draft有明确业务决策和 owner是否已有相似指标,是否需要 official statusdraft contract
Design review公式、粒度、维度、时间口径完成source-of-truth、semantic model、access policyreview notes
Data contract check输入数据源确认schema、semantics、freshness、quality、SLAdata contract version
Backtest历史数据可计算与现有指标 reconciliation、异常解释backtest report
AI consumption review需要被 AI / NL analytics / eval 使用allowed use、prohibited use、guardrailsconsumption policy
Release gate质量和风险达标SLO、lineage、RACI、monitoringrelease memo
Operate指标进入 productionfreshness、drift、usage、incidentquality dashboard
Deprecate有替代指标或风险撤回consumer impact、migration、communicationdeprecation record

11.3 Release Gate 分级

Metric tier示例Gate 强度
T0 Critical监管报送、信贷决策边界、客户资金、AI risk gaterisk/compliance approval、zero critical defect、lineage complete
T1 ExecutiveAI Value Office ROI、board dashboard、portfolio funding KPIfinance sign-off、backtest、glossary、owner approval
T2 OperationalAML case productivity、contact center efficiency、model operationsowner approval、quality SLO、monitoring
T3 Exploratoryanalyst ad hoc analysis、pilot hypothesis标记 experimental,不允许作为 executive answer

12. Observability for Metric Quality

指标观测要覆盖“数字是否准”“解释是否准”“AI 是否正确消费”。

Signal定义触发阈值示例行动
Freshnessmetric 距 source-of-truth 的延迟p95 > contract target显示 stale warning,阻止高风险回答
Completeness关键字段和实体覆盖critical field completeness < 99%暂停 official metric refresh
Accuracy / reconciliation与权威系统或控制总账对账差异 > 0.5% 或金额差异超阈值incident review
Semantic consistency同一指标跨工具结果一致BI、API、SQL diff > tolerance停用 downstream cache
Lineage completenesssource、job、dataset、metric、consumer 是否完整lineage missing for T0/T1 metricrelease gate fail
Dimension cardinality anomaly维度取值异常变化新类别或占比突变source/process review
Metric drift定义、分布、流程或政策漂移slice-level trend breakdrift triage
AI query defectLLM 使用错误指标或错误维度high-risk defect > 0assistant rollback
Usage anomaly消费者、查询量、成本异常executive metric sudden spikeaudit trace review
Decision dependency哪些决策依赖该指标T0/T1 incidentdecision owner notification

12.1 Metric Quality Dashboard 结构

Panel指标解释
Contract healthcontract coverage、owner coverage、approved status指标治理基础是否完整
Data healthfreshness、null rate、schema breach、SLA breach输入数据是否满足契约
Semantic healthduplicate metrics、join path errors、dimension policy violations语义模型是否可靠
Lineage healthlineage completeness、consumer count、impact map影响范围是否可解释
AI consumption healthNL query pass rate、metric citation rate、guardrail blocksAI 是否安全消费指标
Eval linkageeval metric to business metric coverage、release gate failures评测是否连接价值和风险
Driftdefinition drift、distribution drift、policy drift是否需要复核口径或流程
Incidentopen incidents、MTTR、repeat incident count运营能力是否成熟

13. 金融零售案例库

13.1 AI Value Office Dashboard

维度设计
决策问题哪些 AI use case 继续投、停止、扩容或平台化?
核心指标funded use cases、release gate pass rate、benefit sign-off rate、adoption by target role、cost per successful task、critical incident count、platform reuse rate
Source-of-truthportfolio backlog、finance benefit register、AI observability traces、risk gate records、platform billing
Metric contract 风险“收益”只用估算小时数,没有 finance sign-off;“adoption”只看登录,不看 workflow completion
Semantic layer 要点use_case、business_unit、risk_tier、stage、platform_capability、benefit_type 作为 conformed dimensions
Eval 连接release gate pass rate 必须连接 eval report completeness、critical failure count 和 incident history
Release gateT1 executive metrics 需要 finance / risk / business owner 联合批准
Portfolio artifactAI portfolio metric contract pack + executive dashboard lineage map

示例指标:

MetricDefinitionGate
ai_value.benefit_signed_off_rate已由 finance delegate 签署收益证据的 use case / 已上线 use case< 80% 时不得宣称 portfolio ROI
ai_value.cost_per_successful_taskAI 总运行成本 / 通过质量标准并被用户采纳的任务数连续两期上升需 route/cost review
ai_value.platform_reuse_rate复用标准 gateway、RAG、eval、observability 的 use case / active use cases< 60% 时检查平台 product-market fit

13.2 AML Efficiency Metrics

维度设计
决策问题AML Copilot 是否降低 analyst 工作量,同时不降低风险识别质量?
核心指标case review cycle time、evidence completeness rate、false positive reduction、QA defect rate、missed red flag rate、unsupported narrative claim rate
Source-of-truthAML alerts、case management、transaction monitoring、QA review、SAR narrative workflow
Metric contract 风险case complexity 没有切片,导致 AI 看似降低平均时长但只处理简单 case
Semantic layer 要点typology、risk_tier、jurisdiction、case_complexity、AI_assisted_flag 必须是受控维度
Eval 连接evidence recall、citation correctness、narrative factuality 映射到 QA defect 和 review cycle time
GuardrailAI final SAR decision = 0;unsupported high-risk claim = 0
Portfolio artifactAML metric ladder + evidence completeness contract + release gate memo

示例指标矩阵:

MetricDefinitionRequired slicesBad interpretation
aml.case_review_cycle_time_p50case assigned 到 analyst review complete 的中位时间risk_tier、typology、AI_assisted_flag不按复杂度切片就宣称效率提升
aml.evidence_completeness_ratemandatory evidence 全部满足的 QA sample case 占比jurisdiction、typology、analyst_team只看有附件,不看证据是否支持结论
aml.unsupported_narrative_claim_rateAI draft 中无 source record 支持的关键 claim 占比risk_tier、model_version、prompt_version把语言流畅度当 narrative 质量

13.3 Customer-Facing AI Complaint Metrics

维度设计
决策问题面向客户的 AI 是否减少投诉并提升解决率,同时不制造错误承诺或监管风险?
核心指标complaint rate per AI interaction、first contact resolution、reopen rate、escalation correctness、false promise rate、policy citation correctness、vulnerable customer escalation rate
Source-of-truthcontact center platform、complaint management、approved policy repository、customer interaction logs、QA reviews
Metric contract 风险complaint 被不同渠道记录,web chat、call center、branch complaint 口径不一致
Semantic layer 要点complaint_category、channel、product_line、customer_vulnerability_flag、AI_touchpoint、resolution_status
Eval 连接tone compliance、policy adherence、escalation correctness 映射到 complaint reopen 和 QA defect
Guardrailfalse promise = 0;internal-only policy exposure = 0;vulnerable customer escalation miss = 0
Portfolio artifactCustomer-facing AI complaint metric glossary + NL analytics guardrail report

示例指标:

MetricDefinitionRelease use
cx.ai_complaint_rateAI interaction 后 7 天内产生可归因投诉的 interaction 占比上线后按周监控,超过 control group 阈值自动扩容冻结
cx.escalation_correctness_rate应升级人工的场景中,AI 正确升级的比例高风险 release gate
cx.false_promise_rateAI 对费用减免、授信、退款、时限作出无授权承诺的比例T0 guardrail,必须为 0

13.4 Credit Assistant Eval Metrics

维度设计
决策问题信贷助手是否提升 memo 质量和 analyst 效率,而不越过人工审批和模型风险边界?
核心指标memo turnaround time、policy checklist completeness、reason code consistency、unsupported credit rationale rate、human override quality、decision boundary violation
Source-of-truthloan application system、credit bureau attributes、income verification、credit policy repository、underwriting decision records
Metric contract 风险历史审批结果被当成“正确答案”,固化旧偏差;outcome window 未成熟导致 eval 虚高
Semantic layer 要点loan_product、risk_band、channel、policy_version、exception_type、protected_attribute_review_slice
Eval 连接policy adherence、reason code consistency、factual grounding 与 memo QA defect、exception review time 连接
GuardrailAI automated approval / decline = 0;prohibited feature usage = 0
Portfolio artifactCredit assistant eval-to-business matrix + model risk metric pack

示例指标:

MetricDefinitionReview note
credit.policy_adherence_scoreAI memo 是否正确应用申请日期对应政策版本的专家评分政策 effective date 必须进入 semantic model
credit.decision_boundary_violation_countAI 输出中出现审批、拒绝、额度承诺等越界行为次数T0 guardrail
credit.reason_code_consistency_rateAI 草拟 reason code 与人工最终 memo / policy rule 一致比例不能等同于贷款表现预测准确率

14. Templates / Portfolio Artifacts

下面模板使用具体示例值,目的是形成作品集证据,不是空白表单。

14.1 Metric Contract Artifact

# Metric Contract: aml.evidence_completeness_rate

## Identity
- Domain: Financial Crime / AML
- Metric tier: T1 Operational Risk
- Status: production
- Version: 2.1.0
- Accountable owner: AML Operations Director
- Responsible owner: Financial Crime Data Product Manager
- Review cadence: monthly, and immediately after AML policy change

## Business Decision
- Used to decide whether AML Copilot can expand from pilot analyst team to all Level 1 alert review teams.
- Used in AI release gate with `aml.unsupported_narrative_claim_rate` and `aml.case_review_cycle_time_p50`.

## Definition
- A reviewed AML case is complete when every mandatory evidence item in the applicable checklist version has a linked source record, a valid timestamp, and QA pass status.
- Metric formula: complete reviewed cases / total reviewed cases in QA sample.
- Grain: case_id + review_completion_date.
- Time basis: review completion date in business timezone.

## Inclusion / Exclusion
- Include: production AML alert cases completed by Level 1 analyst workflow.
- Exclude: test cases, migrated legacy cases without checklist version, legally restricted cases unavailable to AI.

## Dimensions
- Allowed: jurisdiction, risk_tier, typology, analyst_team, AI_assisted_flag, checklist_version.
- Restricted: customer_id, account_id, beneficial_owner_id.

## Source and Lineage
- Source-of-truth: AML case management system and evidence checklist service.
- Input datasets: aml_cases, aml_evidence_items, aml_checklist_versions, qa_review_results.
- Lineage required: source dataset -> curation job -> semantic model -> metric -> release gate report.

## Quality SLO
- Freshness P95: under 2 hours.
- Mandatory field completeness: 99.5% or higher.
- Case-to-evidence linkage accuracy: 99% or higher in QA sample.
- Lineage completeness: 100% for production report.

## AI Consumption Policy
- Allowed: AI Value Office dashboard, AML Copilot release report, analyst productivity dashboard.
- Prohibited: automatic SAR filing, automatic case closure, customer-level natural-language exploration.

## Release Gate
- Expansion gate fails if high-risk jurisdiction slice drops below 95%.
- Critical gate fails if unsupported high-risk narrative claim count is greater than 0.

## Incident
- Severity 1: metric used in release decision with incorrect numerator or denominator.
- Immediate action: freeze Copilot expansion, notify AML owner, rerun backtest, publish correction note.

14.2 Metric Glossary Entry

FieldExample
Business termFirst Contact Resolution
Approved definition客户首次联系后 7 天内无需二次联系且 case resolution status 为 resolved 的 interaction 占比。
Not the same asChatbot containment、agent handle time、same-day close rate
Official metriccx.first_contact_resolution_rate
Time basisinitial interaction timestamp
Allowed dimensionschannel、product_line、issue_category、AI_touchpoint
Restricted dimensionscustomer_id、protected vulnerability details
AI explanation ruleAI 可以解释趋势,但必须区分 correlation 与 verified root cause。
OwnerCustomer Operations
Review trigger新投诉分类、新渠道上线、监管定义变化

14.3 Data Contract for Metric Inputs

data_contract:
  asset: mart_aml_case_review
  version: 3.4.0
  owner: financial_crime_data_owner
  consumers:
    - metric: aml.evidence_completeness_rate
    - metric: aml.case_review_cycle_time_p50
    - eval_report: aml_copilot_release_gate
  schema:
    case_id:
      type: string
      required: true
      uniqueness: true
    review_completion_ts:
      type: timestamp
      required: true
    checklist_version:
      type: string
      required: true
    mandatory_evidence_count:
      type: integer
      required: true
      min: 1
    linked_evidence_pass_count:
      type: integer
      required: true
      min: 0
  semantics:
    review_completion_ts: analyst review completion, not QA completion
    checklist_version: version effective on review_completion_ts
  quality:
    freshness_p95_minutes: 120
    case_id_null_rate: 0
    checklist_version_null_rate: 0
    evidence_count_reconciliation_tolerance: 0.005
  governance:
    pii_class: no direct identifiers in this mart
    row_level_policy: analyst team and jurisdiction filters enforced upstream
    allowed_ai_use:
      - release_gate
      - portfolio_dashboard
      - operational_analytics
    prohibited_ai_use:
      - automatic_case_closure
      - customer_level_freeform_query
  breach_action:
    severity_1:
      - freeze AI expansion gates using this metric
      - notify AML Operations and AI PM
      - rerun affected reports after remediation

14.4 Eval-to-Business Metric Matrix

Use caseEval metricWorkflow metricBusiness metricInterpretation rule
AML Copilotevidence recall >= 95%QA defect rate decreasescost per complete case decreases只有 adoption 稳定且 QA defect 不上升,cycle time 下降才算收益
Customer AIpolicy citation correctness >= 95%reopen rate decreasescomplaint cost decreasescontainment 上升但投诉上升时判定为质量失败
Credit Assistantdecision boundary violation = 0memo turnaround decreasesanalyst capacity increases不能把审批速度提升解释成信贷风险降低
Value Officerelease evidence completeness >= 98%scale/stop cycle time decreasescapital allocation quality improvesROI 没有 finance sign-off 不进入 executive benefit

14.5 Natural-Language Analytics Risk Assessment

Question classExampleAllowed responseGuardrail
Approved KPI“本月 AML case cycle time 为什么下降?”使用 official metric,引用版本和主要切片metric citation + causality caveat
Ambiguous KPI“我们的 AI adoption 好吗?”先追问 adoption 是 login、weekly active user、accepted suggestion 还是 workflow completionclarification required
Restricted slice“列出高净值客户投诉率最高的地区和客户”拒绝 customer-level 输出,提供合规聚合解释small-cell suppression + RLS
Exploratory analysis“新 prompt 是否降低 unsupported claim?”允许 pilot metric,并标记 experimentalbeta label + no executive use
Causal claim“投诉下降是不是因为 AI?”说明需要 control group 或干预分析,提供相关性观察causality language policy

14.6 Metric Release Gate Memo

SectionExample content
Release itemcx.ai_complaint_rate enters production for customer-facing AI weekly dashboard
DecisionApproved for limited release to Customer Operations and AI Value Office
EvidenceMetric contract v1.0 approved; 12-week backtest complete; complaint source reconciliation difference < 0.2%; RLS tests 100% pass
RiskComplaint attribution window may over-credit AI when multiple channels touch same customer
MitigationUse AI_touchpoint and last_touch_channel; publish attribution caveat in glossary
Gatefalse promise rate = 0; escalation correctness >= 98%; policy citation correctness >= 95%
OwnerCustomer Operations accountable; CX Data Product responsible; Risk consulted
Next reviewFirst monthly review after two reporting cycles

14.7 Portfolio Evidence Package

作品集可以展示一个完整 case,而不是只展示图表:

Artifact证明什么能力
Metric glossary能把业务语言变成可治理指标语言
Semantic model spec能设计 entity、dimension、measure、metric 和 join path
Metric contract pack能把指标定义成产品契约
Data contract能管理上游变化、质量和 SLA
Lineage map能解释指标从源系统到 AI 输出的证据链
Eval-to-business matrix能把 AI 质量、流程指标和业务价值连接起来
NL analytics risk assessment能识别 LLM-generated SQL 和自然语言解释风险
Release gate memo能把指标质量纳入上线和扩容决策
Metric quality dashboard mock能运营 freshness、drift、incident 和 consumer impact
Interview narrative能用金融零售案例讲清架构、治理和产品取舍

15. 30-Day Lab:做一个作品集级 Metric Architecture Case

目标:30 天内完成一个“AI Semantic Layer / Metrics Architecture”作品集包。推荐选 AML Copilot、Customer-facing Complaint AI、Credit Assistant 或 AI Value Office Dashboard。

周期训练重点产出
Week 1选择场景,定义业务决策、metric ladder、核心 official metricsbusiness decision map、Eval-to-Business Metric Matrix
Week 2设计 semantic model、metric contracts、glossary、data contractsemantic model spec、3 个 metric contracts、glossary
Week 3设计 lineage、NL analytics guardrails、release gateslineage map、LLM SQL guardrail policy、release gate memo
Week 4设计 observability、drift triage、RACI、portfolio narrativemetric quality dashboard、RACI、case study deck outline

30 天验收标准:

能力可验证证据
指标契约能力每个 official metric 有 owner、formula、grain、dimensions、quality SLO、AI consumption policy
语义建模能力能解释 entity、dimension、measure、metric 和 join path 的取舍
血缘治理能力能从 source system 追到 dashboard、eval report 和 AI answer
EvalOps 连接能力每个 eval metric 能映射到 workflow metric 和 business metric
风险控制能力对 NL analytics 和 LLM SQL 有明确 guardrails 与 release gate
运营能力有 metric quality dashboard、drift triage 和 incident response

16. 面试表达

16.1 30 秒版本

AI 时代的 semantic layer 不是 BI 语义层升级版,而是指标事实控制面。它把 metric definition 做成产品契约:有 owner、公式、粒度、时间口径、维度、source-of-truth、data contract、lineage、质量 SLO、AI consumption policy 和 release gate。这样 BI、自然语言分析、AI assistant、eval harness 和 AI Value Office dashboard 才能消费同一套可信指标。

16.2 2 分钟版本

我会把指标架构分成三条主线。

第一条是语义主线:先确认业务决策和 official metric,再设计 semantic model,包括 entity、dimension、measure、metric、time spine、join path 和 glossary。指标不能只存在于 dashboard SQL,而要进入 metric registry,并且有版本和审批流程。

第二条是证据主线:每个指标要连接 data contract 和 lineage。上游数据要有 schema、semantics、freshness、quality 和 SLA;血缘要能从 source record、pipeline run、curated dataset、semantic model 追到 metric、dashboard、eval report 和 AI answer。这样指标异常时可以做影响分析,而不是靠人工猜。

第三条是 AI 消费主线:自然语言分析和 LLM-generated SQL 只能使用批准的 semantic objects,并经过 SQL AST validation、RLS/CLS、small-cell suppression、query plan check、metric citation 和 causality language policy。EvalOps 也必须把 AI 行为指标连接到 workflow metric 和 business metric,防止模型分数提高但业务没有改善。

在金融零售里,我会对 AML、投诉、信贷这类场景设置更强 release gates,比如 unsupported high-risk claim、false promise、automated credit decision 这类 guardrail 必须为 0。

16.3 高阶追问速答

问题回答要点
Semantic layer 和 metric store 的差异是什么?Semantic layer 负责业务语义、entity、dimension、measure、join path 和 metrics API;metric store 更偏指标定义和查询服务。企业 AI 场景通常需要 registry、glossary、data contract、lineage、policy 和 observability 一起组成 control plane。
为什么说 metric definition 是 product contract?因为它承诺一个数字可以被谁用于什么决策,包含 owner、公式、粒度、时间、排除项、质量 SLO、变更流程、AI 使用边界和 incident response。
如何防止 LLM 写错 SQL?不让 LLM 直接查裸表。让它通过 semantic API 查询 approved metrics;再加 intent disambiguation、AST allowlist、RLS/CLS、query plan limit、metric citation、result sanity check 和 audit trace。
Eval metric 和 business metric 怎么连?用 metric ladder:business outcome -> workflow metric -> AI eval metric -> data/feature metric -> platform metric -> guardrail。比如 AML 的 evidence recall 要连接 QA defect、case cycle time 和 cost per complete case。
Metric drift 怎么处理?先区分 definition、data、source、process、population、policy、model behavior、user behavior drift。每类 drift 有不同 owner 和动作,不能默认归因给模型。
build vs buy 怎么判断?official metrics 和高风险 AI 需要稳定 semantic layer、data contracts、lineage 和 governance。优先复用成熟工具;只有在实时、复杂权限、监管或低延迟场景需要 custom metric API。
面向客户 AI 的投诉指标最重要是什么?不只看 complaint volume,还要看 attribution window、first contact resolution、reopen、false promise、policy citation、escalation correctness 和 vulnerable customer handling。false promise 和内部政策泄露是 hard guardrail。
信贷助手为什么不能只看准确率?信贷助手通常是 decision support,不应自动审批。要看 policy adherence、reason code consistency、factual grounding、decision boundary violation、protected attribute review slice 和人工 override 质量。

17. 常见误区

误区为什么危险更好的做法
把 semantic layer 当报表加速层忽略 AI consumption、eval、risk、lineage把它设计成 metrics control plane
指标定义只写自然语言LLM 和 API 无法稳定执行自然语言定义 + machine-readable formula + semantic model
让 LLM 直接查 warehousejoin、权限、口径和成本风险不可控semantic-only query + guardrails
只看 dashboard 数字一致AI 还需要引用 contract、解释口径和追踪血缘建立 metric citation 和 trace
只治理 executive KPIAI eval 和 workflow metrics 同样影响发布决策对 T0/T1/T2 分级治理
Drift 只交给 ML team指标漂移可能来自政策、流程、数据或用户行为drift triage 分 owner
ROI 指标没有 finance sign-off作品集或高管汇报可信度不足建立 benefit signed-off metric
用历史信贷结果当唯一 eval truth可能固化历史偏差和未成熟 outcome分离 policy adherence、decision boundary 和 long-term outcome
客服 AI 只看 containment可能牺牲客户体验和合规同时看投诉、reopen、false promise、escalation
没有 deprecation 流程旧指标继续被 AI 引用replacement、consumer notice、incident withdrawal

18. 一页总结

AI Semantic Layer / Metrics Architecture 的核心,是把指标从“查询结果”升级为“可治理、可观测、可被 AI 安全消费的产品契约”。

Dashboard SQLMetric contract + semantic model
字段说明Metric glossary + machine-readable formula
表级血缘Source -> job -> dataset -> semantic model -> metric -> AI answer
数据质量检查Metric quality SLO + drift + incident
自然语言查数Governed NL analytics with semantic API and guardrails
模型分数Eval-to-business metric ladder
一次性上线Approval workflow + release gate + observability
数据团队单点负责Business owner、data owner、AI PM、EvalOps、Risk、Finance 的 RACI

作品集表达可以浓缩成一句话:

I design metrics as governed product contracts, so AI systems can reason over business performance without inventing definitions, bypassing controls or breaking trust.