AI 扩展计划 / Playbooks

AI MLOps Continuous Delivery / Release Playbook

CD4ML / MLOps continuous delivery 不是把 notebook 自动部署成 API，而是：

919 行AI_MLOPS_CONTINUOUS_DELIVERY_RELEASE_PLAYBOOK.md

AI MLOps Continuous Delivery Release Playbook

受众：AI PM、AI Architect、Platform PM、MLOps Lead、Model Risk、AI Governance、Release Manager、金融零售技术负责人。核心问题：当 AI 系统的 code、data、feature、model、prompt、RAG index、tool schema、policy 和 eval 同时变化时，团队如何建立可复现、可审计、可回滚、可分级放量的持续交付体系。学习目标：不讲基础 BA，不停留在“训练一个模型”。目标是训练高级角色能设计 CD4ML / MLOps continuous delivery 架构、release gate、promotion workflow、风险分级上线、governance evidence 和可展示作品集。

重要说明：本文是学习、架构设计和作品集材料，不构成法律、监管、模型验证或正式合规意见。金融零售正式项目必须由 business owner、technology、security、privacy、legal、compliance、model risk、operational risk、internal audit 共同确认适用要求、审批权、证据保留和发布边界。

1. One-Sentence Positioning（一句话定位）

CD4ML / MLOps continuous delivery 不是把 notebook 自动部署成 API，而是：

用受控 pipeline 把代码、数据、特征、模型、prompt、评估、发布、监控、回滚和治理证据串成一个可重复的 AI release system。

在传统软件里，release 的核心对象通常是代码包。AI 系统的 release 对象更复杂：

AI release =
  code version
  + data snapshot
  + feature schema and transformation
  + model artifact
  + prompt / policy / tool config
  + eval dataset and result
  + deployment route
  + monitoring and rollback plan
  + governance evidence

这份手册训练的是三个高级能力：

能力	高级表现	作品集资产
Release architecture	能把 ML pipeline、CI/CD、continuous training、model registry、feature store、eval gate、deployment strategy 组合成生产架构	CD4ML reference architecture
Release decision	能根据业务风险、模型表现、数据质量、漂移、成本、延迟和人工控制做 go / limited go / no-go / rollback	Risk-tiered release gate memo
Governance evidence	每次上线都能复现模型来源、训练数据、特征版本、评估结果、审批、例外、放量和回滚记录	AI release evidence binder

核心观点：

没有 lineage 的模型不能发布。
没有 eval gate 的模型不能放量。
没有 rollback path 的模型不能进入高风险流程。
没有 evidence binder 的模型不能通过金融零售审计。

2. Source Anchors

以下来源作为学习锚点和术语校准。正式项目必须按访问日期复核最新版本、产品状态、地区可用性、合同条款、监管要求和机构内部政策。

Anchor	Link	本手册使用方式
Martin Fowler / Thoughtworks CD4ML	https://martinfowler.com/articles/cd4ml.html	学习 Continuous Delivery for Machine Learning 的核心理念：把 ML 交付视为跨团队、跨 artifact、跨环境的持续交付问题，而不是一次性模型训练。
Google Cloud MLOps continuous delivery and automation pipelines	https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning	学习 MLOps automation level、CI/CD/CT、pipeline orchestration、model deployment 和持续训练的分层框架。
TensorFlow TFX	https://www.tensorflow.org/tfx	学习生产级 ML pipeline 组件思维：数据摄取、统计、schema、transform、trainer、evaluator、pusher、metadata 和 pipeline orchestration。
NIST AI Risk Management Framework	https://www.nist.gov/itl/ai-risk-management-framework	用 Govern / Map / Measure / Manage 组织 AI release 的风险识别、评估、处置、监控和治理证据。
AI EvalOps Platform Architecture Playbook	`docs/AI_EVALOPS_PLATFORM_ARCHITECTURE_PLAYBOOK.md`	把 eval dataset、evaluator、experiment、release gate 和生产监控接入本手册的发布流程。
AI Model Risk Management Playbook	`docs/AI_MODEL_RISK_MANAGEMENT_PLAYBOOK.md`	把 inventory、validation、ongoing monitoring、change management 和 effective challenge 接入 MLOps release governance。
AI Audit Evidence Binder Playbook	`docs/AI_AUDIT_EVIDENCE_BINDER_PLAYBOOK.md`	把每次训练、评估、发布、审批、例外、监控和回滚沉淀成审计证据包。
AI Incident / Postmortem / Reliability Playbook	`docs/AI_INCIDENT_POSTMORTEM_RELIABILITY_PLAYBOOK.md`	把 rollback、containment、incident trigger、postmortem 和 regression evidence 接入 release engineering。

3. CD4ML Architecture

3.1 参考架构

CD4ML 架构的目标不是“把训练脚本跑起来”，而是让每个 AI release 都能回答六个问题：

这次变更改了什么。
它依赖哪些数据、特征、prompt、模型和配置。
它用什么数据集和评估器证明质量。
它在哪个风险等级下允许上线。
它如何分阶段放量和监控。
它出问题后如何回滚、止血和复现。

flowchart TB
  A[Change Request] --> B[Risk Tiering and Release Scope]
  B --> C[Source Control]
  C --> D[CI: Code, Config, Prompt, Schema Tests]
  D --> E[Data and Feature Validation]
  E --> F[Pipeline Orchestrator]
  F --> G[Training / Fine-tuning / Prompt Build]
  G --> H[Model and Artifact Registry]
  H --> I[Offline Eval and Validation]
  I --> J[Release Gate]
  J --> K[Staging / Shadow]
  K --> L[Canary / Ramp]
  L --> M[Production Route]
  M --> N[Monitoring and Online Eval]
  N --> O[Incident / Rollback / Retraining Trigger]
  O --> B

  H --> P[Lineage and Metadata Store]
  I --> P
  J --> Q[Governance Evidence Binder]
  M --> Q
  N --> Q

3.2 八个核心控制对象

控制对象	管什么	版本粒度	关键证据
Code	pipeline code、training code、serving code、feature transformation、policy code	commit SHA、build ID、container digest	CI result、test report、dependency scan、review approval
Data	raw data、training set、validation set、production sample、label set	dataset snapshot、partition、query hash、retention tag	dataset card、lineage、quality report、access approval
Feature	feature definition、schema、transformation、feature store view	feature view version、schema version、transform hash	feature validation、training-serving skew report
Model	trained artifact、fine-tuned model、adapter、calibration layer	model version、artifact hash、training run ID	model card、metrics、registry approval、signature
Prompt / Policy	system prompt、few-shot、tool instruction、guardrail policy、decision policy	prompt version、policy version、config hash	prompt diff、policy test、approved use boundary
Eval	eval dataset、evaluator、rubric、judge、threshold	eval run ID、dataset version、evaluator version	experiment report、slice analysis、gate decision
Deployment	endpoint、route、traffic split、feature flag、environment	release ID、deployment manifest、route config	deployment record、canary metrics、rollback plan
Governance	risk tier、approval、exception、monitoring, incident linkage	evidence binder version、decision log ID	gate memo、attestation、issue log、audit sample

3.3 Model / Data / Code / Prompt Version Coupling

AI release 最容易失控的地方，是团队只给 model artifact 编号，却没有绑定 data、feature、prompt、eval 和 serving route。

推荐把每个发布版本定义为 release bundle：

release_id: AIREL-AML-COPILOT-2026-007
use_case_id: AML-COPILOT-001
risk_tier: Tier 1
code:
  repo_commit: 9f2c81a
  pipeline_image: registry/aml-pipeline@sha256:4b7...
  serving_image: registry/aml-serving@sha256:2c1...
data:
  training_snapshot: AML-TRAIN-2026Q2-v3
  validation_snapshot: AML-VALIDATION-2026Q2-v2
  label_policy: AML-LABEL-GUIDE-v1.4
features:
  feature_view: aml-alert-features-v6
  transform_hash: 72e4...
model:
  model_id: aml-narrative-ranker
  model_version: 1.8.0
  artifact_hash: c55a...
prompt_policy:
  prompt_version: aml-summary-system-prompt-v2.3
  guardrail_policy: aml-output-policy-v1.2
  tool_schema: case-evidence-tool-v1.1
eval:
  dataset_versions:
    - AML-GOLDEN-2026Q2-v2
    - AML-REGRESSION-2026Q2-v4
    - AML-REDTEAM-2026Q2-v1
  evaluator_versions:
    - RULE-CITATION-v2
    - JUDGE-GROUNDEDNESS-v1.3
deployment:
  route: aml-copilot-prod
  traffic_start: shadow
  rollback_target: AIREL-AML-COPILOT-2026-006

如果 release bundle 缺少任一关键对象，风险不是“文档不完整”，而是：

缺失	实际风险
没有 data snapshot	训练结果无法复现，漂移无法解释，验证争议无法追溯
没有 feature version	训练与线上特征不一致，模型表现突然退化
没有 prompt / policy version	同一个 model version 行为不同，事故复盘无法定位
没有 eval dataset version	指标不可比较，release gate 被人为调整
没有 deployment manifest	生产实际运行版本与批准版本不一致
没有 rollback target	事故时只能临时停服，无法有序恢复

3.4 Registry and Metadata Layer

CD4ML 架构至少需要六类 registry / metadata 能力：

Registry	管理对象	必须支持
Model registry	model artifact、signature、metric、approval status、deployment stage	versioning、stage promotion、owner、risk tag、rollback target
Dataset registry	training / validation / eval / production sample	source lineage、snapshot、schema、classification、retention、allowed use
Feature registry / feature store	feature definition、transform、serving view	training-serving parity、freshness、schema validation、owner
Prompt / policy registry	prompt、rubric、tool instruction、guardrail config	diff、approval、hash、environment promotion、rollback
Experiment / eval registry	training run、eval run、slice metric、judge result	baseline comparison、critical failure、confidence interval、report link
Deployment registry	release bundle、route、traffic split、environment、rollback	immutable manifest、change approval、canary result、incident linkage

3.5 Pipeline Types

不是所有 AI 系统都需要同一条 pipeline。高级架构师要按系统类型选择 pipeline。

系统类型	Pipeline 重点	Release 风险
传统 ML classifier / ranker	data validation、feature engineering、training、model evaluation、serving parity	data drift、feature skew、threshold shift、label leakage
RAG assistant	source ingestion、chunking、embedding、index build、retrieval eval、answer eval	stale docs、ACL failure、wrong citation、index rollback
LLM prompt product	prompt registry、golden set、judge calibration、policy tests、route config	prompt drift、over-refusal、unsafe answer、vendor model change
Agent workflow	tool schema、permission matrix、simulation、state tests、side-effect audit	tool misuse、approval bypass、loop、irreversible action
Decision automation	model + rules + workflow + human override	customer impact、fairness、adverse action、regulatory audit

4. Release Gate Model

4.1 Gate Philosophy

Release gate 不是把项目拖慢的审批表，而是把“上线风险是否可接受”变成可重复、可审计的决策。

建议把 gate 分成三类：

Gate 类型	目的	决策输出
Engineering gate	证明 artifact 可构建、可测试、可复现、可部署	pass / fail
Quality gate	证明模型、prompt、RAG、tool 在目标任务上满足阈值	go / fix / compare again
Risk gate	证明 residual risk 与业务场景、人工控制、放量范围匹配	go / limited go / no-go / risk acceptance

4.2 风险分级发布模型

Risk tier	典型 AI 用例	Gate 强度	放量策略	审批要求
Tier 0 - Prohibited or restricted	直接自动拒贷、未经审批资金动作、自动提交监管报告	默认不允许；如监管和内部政策允许，需最高等级审查	不进入生产自动化	Executive、Legal、Compliance、Risk、Board-level evidence
Tier 1 - High impact	AML / KYC copilot、欺诈复核、信贷政策 RAG、客户可见费用/权利回答	全量 gate：data、feature、eval、security、privacy、model risk、business sign-off	shadow -> staff pilot -> canary -> limited ramp -> full scale	Business、Model Risk、Compliance、Security、Data Owner
Tier 2 - Medium impact	员工内部分析、运营摘要、低风险推荐、非客户可见流程辅助	标准 gate：CI、eval、数据质量、owner approval、监控	staging -> canary -> ramp	Product、Platform、Data Owner、Risk consult
Tier 3 - Low impact	内部文案、低风险总结、个人生产力工具	轻量 gate：安全、数据分类、basic eval、usage monitoring	direct limited release with monitoring	Product owner、Security policy

4.3 Gate Stack

Gate	进入条件	检查内容	失败动作
G0 Scope and Risk	use case 被登记	approved use、prohibited use、risk tier、customer impact、human role	返回 intake，限制用途或拒绝进入开发
G1 Source and Data Readiness	数据源可用	data owner、classification、lineage、label policy、schema、sampling、retention	阻止训练或限制为 sandbox
G2 Build and CI	代码、配置、prompt 可构建	unit tests、schema tests、prompt tests、dependency scan、container build	修复后重新构建
G3 Feature Validation	特征或 index 已生成	schema drift、missingness、range、freshness、training-serving skew、ACL	阻止训练或回滚 feature/index
G4 Offline Eval	candidate artifact 已产生	golden set、regression set、red-team、slice analysis、cost、latency	no-go 或 targeted remediation
G5 Risk and Security Review	eval 通过工程阈值	privacy、security、model risk、fairness、explainability、human control	limited go、risk acceptance 或 no-go
G6 Shadow Readiness	staging 可运行	production-like traffic replay、no side effect、logging、monitoring dashboard	留在 shadow，修复 observability
G7 Canary	小流量生产	quality、latency、cost、feedback、override、critical failure	自动回滚或暂停 ramp
G8 Ramp and Scale	canary 稳定	segment expansion、capacity、support readiness、issue aging	降低流量或限制场景
G9 Post-Release Review	生产运行一段时间	expected vs actual、incident、drift、business value、evidence completeness	更新 gate、dataset、controls 和 training plan

4.4 Hard Gates vs Soft Gates

Gate 项	Hard gate 示例	Soft gate 示例
Critical failure	PII 泄露、越权工具动作、错误客户承诺、unsupported regulated claim 必须为 0	低风险文案 tone 分数略低
Data quality	关键特征缺失率超过阈值；training-serving schema 不一致	某个非关键特征 freshness 轻微延迟
Eval coverage	高风险 slice 无样本覆盖	长尾低风险 intent 覆盖不足但暂不放量
Security	写工具审批缺失；secret 出现在日志	dependency 中低风险 CVE 有补丁计划
Monitoring	无法按 release_id 追踪 production trace	dashboard 某个非关键 tile 延迟刷新
Rollback	没有可执行 rollback target	回滚演练用时超过目标但仍可执行

4.5 Release Gate Decision

Decision	含义	适用场景
Go	满足目标风险等级和上线范围的全部硬门槛	可进入下一阶段或生产
Limited go	硬门槛通过，但需要限制用户、地区、产品、流量、功能或人工复核	高风险 slice 覆盖不足、人工产能有限、部分监控仍需增强
No-go	存在 critical failure、重大回归、证据缺失或控制不可用	修复后重新 gate
Rollback	已上线版本触发质量、风险、成本、延迟或事故阈值	切回上一个批准 release bundle
Risk acceptance	已知 residual risk 被明确接受，并有时间、范围、补偿控制和审批	短期业务必要性高且风险可控

5. Promotion Workflow

5.1 环境与晋级路径

local / notebook
  -> experiment workspace
  -> controlled training pipeline
  -> staging
  -> shadow
  -> canary
  -> limited production
  -> scaled production
  -> post-release review

每个环境的职责不同：

环境	目的	允许行为	禁止行为
Local / notebook	探索、特征假设、快速实验	使用脱敏样本、生成实验思路	直接连接生产数据和生产工具
Experiment workspace	受控实验和 baseline 比较	记录 run、dataset、参数、指标	手工复制 artifact 到生产
Training pipeline	可复现训练和评估	固定 data snapshot、feature version、container	运行未登记数据或未审查依赖
Staging	生产相似环境验证	使用批准配置和 mock / replay traffic	对真实客户产生 side effect
Shadow	线上旁路评估	读取真实请求，输出不影响用户和系统状态	触发写工具或客户可见内容
Canary	小比例真实流量	限定用户、产品、地区、业务时段	不带监控和自动回滚地扩大
Limited production	受控放量	分 segment 扩展、人工复核、强化监控	跨越未评估场景
Scaled production	规模化运行	常规监控、漂移检测、周期 gate	供应商或策略变更绕过 gate

5.2 CI / CD / CT for ML

能力	在普通软件中的含义	在 ML / AI 中的扩展
CI	代码构建、单元测试、静态检查	pipeline code、training code、serving code、feature transform、prompt、policy、schema、eval code 全部测试
CD	自动部署通过 gate 的 artifact	model / prompt / index / policy / route 作为 release bundle 分阶段部署
CT	通常不存在	基于数据漂移、标签到达、业务规则变化、模型退化或周期计划触发训练和重新验证

5.3 CI 检查清单

检查	示例
Pipeline unit tests	数据摄取、join、sampling、split、label transform 不产生泄漏
Feature transform tests	特征计算在边界值、缺失值、异常值下稳定
Schema tests	training、validation、serving schema 兼容
Prompt tests	禁止用途、输出格式、工具调用规则、拒答规则通过 smoke cases
Policy tests	guardrail、DLP、permission、threshold 规则可解释且可回归
Eval code tests	evaluator 不依赖隐式全局状态，metric 计算可复现
Reproducibility tests	固定 snapshot、container、参数后可重跑并得到可解释差异
Security tests	dependency、secret、container、data egress、tool permission 扫描

5.4 Continuous Training 触发器

Continuous training 不能等同于“有新数据就自动上线”。触发训练和触发发布是两件事。

Trigger	训练动作	发布动作
Data drift	用新分布训练 candidate 或重校准	必须通过 offline eval、slice comparison 和 canary
Label arrival	用新标签回测和再训练	若收益显著且无回归，进入 release gate
Business rule change	更新标签规则、特征、prompt 或 policy	按 materiality 判断是否 major release
Performance degradation	生成修复候选版本	触发 incident / issue，发布前需证明修复
Scheduled refresh	周期训练	低风险可自动生成 candidate，高风险仍需审批
Vendor model change	重新评估 route 或 pin 旧版本	不允许无 gate 指向 latest
RAG source update	重建 index / embedding	做 retrieval eval、freshness check、ACL check 后再 promote

5.5 Feature Validation

特征验证是 MLOps release 的硬门槛。它比“数据质量”更具体，因为它直接影响模型行为。

验证维度	问题	失败示例
Schema	字段名、类型、枚举、单位是否一致	`income_amount` 从月收入变成年收入
Distribution	分布是否相对训练基线异常变化	欺诈模型的交易金额分布突然右移
Missingness	缺失率是否超过阈值	KYC 文档 OCR 字段缺失率从 4% 升至 22%
Freshness	特征是否按 SLA 更新	AML 交易 velocity 特征延迟 2 天
Range	数值范围是否合理	年龄为负数或交易金额为 0 的高比例异常
Cardinality	类别取值是否爆炸或缺失	merchant category 新增大量 unknown
Label leakage	特征是否包含未来信息或目标代理	chargeback outcome 进入训练特征
Training-serving skew	线上特征计算是否与训练一致	训练用净额，线上用毛额
Access / ACL	特征是否允许用于该 use case	客户敏感属性进入不允许的推荐模型

5.6 Shadow / Canary / Ramp

阶段	目标	流量	成功标准	停止条件
Shadow	用真实输入验证行为但不影响用户	0% customer impact	输出质量、trace、latency、cost、policy 通过	critical failure、日志缺失、成本异常
Canary 1	小范围员工或低风险 segment	1%-5%	无 critical failure；人工 override 在阈值内	任何高风险失败或 SLO 破坏
Canary 2	扩大到代表性 segment	5%-20%	不同产品、地区、渠道 slice 稳定	某 slice 显著退化
Controlled ramp	分阶段扩展	20%-50%	质量、业务价值、支持队列、成本稳定	incident、投诉、人工积压
Full scale	常规生产	目标流量	持续监控和定期 review	drift、vendor change、policy change

5.7 Rollback Strategy

AI rollback 不只是“回滚代码”。必须能按组件回滚。

Rollback 类型	适用场景	动作
Route rollback	新模型、新 prompt、新 policy 表现退化	model gateway 或 feature flag 切回上一个批准 release bundle
Model rollback	model artifact 引入回归	registry stage 回退，endpoint 指向旧 model version
Prompt rollback	prompt 改动导致拒答、越权或语气退化	prompt registry 指针切回旧版本并记录 incident
Index rollback	RAG 新索引检索过期、错误或低权限文档	恢复旧 index snapshot，暂停 source ingest
Feature rollback	特征计算错误或 schema drift	切回旧 feature view 或关闭受影响特征
Tool rollback	Agent 写工具误用或审批缺失	禁用写工具，切 read-only / draft-only 模式
Policy rollback	guardrail 过度阻断或漏拦截	恢复旧 policy config 并强化人工复核
Batch quarantine	批处理输出可能错误	暂停下游消费，标记待复核，不让错误结果进入客户流程
Compensation	已产生业务 side effect	撤销、冲正、客户补救、监管或审计记录

5.8 Promotion Workflow 示例

1. Product owner 提交 AI release change request。
2. Release manager 确认 risk tier、scope、affected components。
3. Pipeline 锁定 code commit、data snapshot、feature version、prompt version。
4. CI 运行代码、配置、schema、prompt、policy 和安全测试。
5. Training pipeline 生成 candidate model / prompt / index artifact。
6. Eval runner 对 golden、regression、red-team、slice sets 运行评估。
7. Model registry 记录 artifact、metrics、lineage、approval status。
8. Release gate board 审核 eval、risk、security、rollback、monitoring。
9. 进入 staging 和 shadow，确认 trace、latency、cost、policy、fallback。
10. Canary 小流量上线，自动监控 critical failure、override、complaint、cost。
11. Ramp 按 segment 扩展，所有阶段生成 evidence。
12. Post-release review 更新 dataset、controls、issue log 和 portfolio record。

6. Financial Retail Examples

6.1 Credit Policy RAG Assistant

维度	设计
Use case	为信贷运营和承销人员回答内部政策问题，必须引用批准政策条款
Release bundle	prompt、embedding index、policy document snapshot、retriever、reranker、answerability gate、citation judge
高风险失败	引用过期政策、无证据回答、把内部建议说成客户决定、错误 adverse action 相关表述
Gate	source freshness、ACL、retrieval recall、citation correctness、regulated refusal、SME review
Shadow / canary	先对历史问题 replay，再面向 trained underwriter 小范围开放
Rollback	回滚 index、关闭自由生成、切回政策门户搜索和 SME escalation
Evidence	policy source manifest、index lineage、eval report、SME sign-off、release gate memo

6.2 AML Alert Narrative Copilot

维度	设计
Use case	帮助 AML 分析师汇总交易证据和草拟 investigation narrative
Release bundle	model route、prompt、case evidence tool、transaction feature snapshot、groundedness judge、red-team set
高风险失败	unsupported suspicious activity conclusion、错误实体合并、遗漏关键证据、PII 处理不当
Gate	critical failure 为 0；historical incident regression 100% pass；SME 复核 high-risk slice
Shadow / canary	只生成 analyst draft，不自动写入 SAR 或关闭 case
Rollback	禁用 narrative generation，保留 evidence summary，只允许人工草拟
Evidence	trace sample、tool audit、human edit rate、model risk validation note

6.3 Fraud Decisioning Model

维度	设计
Use case	实时交易欺诈评分，支持 approve / challenge / decline 路由
Release bundle	feature view、training data snapshot、model artifact、threshold config、decision policy、monitoring dashboard
高风险失败	false positive 导致客户交易被错误阻断；false negative 导致损失扩大；特征延迟导致评分失真
Gate	ROC / PR、cost-based threshold、segment fairness、latency、feature freshness、shadow backtest
Shadow / canary	先 shadow 记录建议不影响交易，再对低风险 segment canary
Rollback	threshold rollback、model rollback、fallback rules、人工复核队列
Evidence	threshold decision memo、business loss simulation、segment analysis、rollback rehearsal

6.4 Payment Dispute Agent

维度	设计
Use case	阅读争议材料、建议下一步、草拟客户沟通，可在审批后创建 case note
Release bundle	prompt、tool schema、approval workflow、idempotency policy、dispute document RAG、eval set
高风险失败	重复 provisional credit、错误关闭 dispute、错误客户承诺、未升级监管投诉
Gate	tool simulation、approval UI test、idempotency test、regulated phrase block、complaint escalation test
Shadow / canary	draft-only，写工具关闭；再对 trained agents 开启审批后写 case note
Rollback	关闭写工具，保留只读摘要，冻结受影响 dispute outputs
Evidence	tool call ledger、approval record、customer communication review、incident drill result

6.5 Customer-Facing Service AI

维度	设计
Use case	客户咨询账户、费用、权益、争议和服务流程
Release bundle	model route、prompt、policy guardrail、answer templates、handoff rule、DLP、conversation eval
高风险失败	虚构费用减免承诺、错误投诉权利、PII 泄露、未转人工
Gate	customer-visible critical failure 为 0；tone、disclosure、handoff、policy compliance 通过
Shadow / canary	内部坐席辅助 -> 小比例客户会话 -> 分主题 ramp
Rollback	关闭自由回答，保留模板化 FAQ 和人工转接
Evidence	conversation QA、complaint keyword monitoring、DLP report、handoff SLA

7. Controls and Evidence

7.1 Control Objective Map

Control objective	控制问题	Evidence
Release scope is approved	AI 用例、风险等级、上线范围和禁止用途是否清晰	use case card、risk tier memo、approved use record
Artifact is reproducible	训练和评估是否能用同一 snapshot 重跑	release bundle、pipeline run ID、container digest、dataset snapshot
Data is governed	数据来源、分类、血缘、质量和权限是否受控	dataset card、data quality report、access approval、retention rule
Features are valid	特征是否符合 schema、freshness、range 和 serving parity	feature validation report、skew report、feature owner approval
Model is validated	模型是否达到质量、稳定性、公平性、成本和延迟要求	experiment report、slice analysis、model card、validation memo
Prompt / policy is controlled	prompt、guardrail、tool instruction 是否版本化和审批	prompt diff、policy test result、config hash
Eval is meaningful	评估数据和评估器是否覆盖目标风险	eval contract、dataset card、evaluator card、calibration report
Release is risk-tiered	gate 强度和放量策略是否匹配风险	gate memo、approval trail、limited-go conditions
Deployment is observable	线上是否能追踪 release_id、artifact、trace 和业务影响	deployment manifest、trace schema、dashboard
Rollback is executable	出事后是否能快速回到受控状态	rollback runbook、rollback rehearsal、previous release bundle
Governance is auditable	决策、例外、问题和监控是否能被审计	evidence binder、issue log、attestation、review minutes

7.2 Reproducibility Minimum Evidence

每次模型或 AI 系统发布必须保存：

Evidence	最小内容
Code snapshot	repo、commit、branch protection、reviewer、build ID
Runtime environment	container digest、base image、Python / package lock、hardware profile
Data snapshot	source systems、query、partition、time window、sampling、row count、hash
Label snapshot	labeling guide、label source、reviewer role、quality check、label version
Feature snapshot	feature definition、transform code、feature store version、freshness
Training run	parameters、seed、algorithm、training metrics、resource usage
Model artifact	artifact URI、hash、signature、input/output schema、calibration
Prompt / policy	prompt text hash、policy config、tool schema、guardrail version
Eval run	dataset version、evaluator version、thresholds、slice metrics、raw failures
Deployment	route config、traffic split、feature flags、environment variables
Approval	decision maker、decision date、conditions、exceptions、expiry

7.3 Artifact Lineage

建议每个 production prediction / response 都能追溯：

request_id
release_id
use_case_id
risk_tier
model_version
prompt_version
policy_version
feature_view_version
dataset_or_index_version
tool_schema_version
eval_gate_id
deployment_route
human_review_status
monitoring_tags

对高风险金融零售 AI，trace 还应保留：

Trace element	为什么重要
input classification	判断是否包含 PII、PCI、AML、credit、complaint、vulnerable customer 等敏感分类
retrieved evidence	RAG 答案是否有证据支持，是否用了过期或低权限文档
feature values	传统 ML 决策能否解释和复算
policy decisions	guardrail、DLP、handoff、tool permission 为什么允许或阻断
tool call ledger	Agent 是否触发 side effect，是否经过审批和幂等控制
human override	人工是否接受、修改、拒绝 AI 输出
customer impact flag	是否客户可见、是否影响账户、交易、投诉、信贷或监管流程

7.4 Model Registry 最小字段

Field	示例
model_id	`fraud-realtime-score-v3`
model_version	`3.4.2`
artifact_hash	`sha256:7b8f...`
training_run_id	`TRN-FRAUD-2026-0521`
training_dataset	`FRAUD-TRAIN-2026Q2-v5`
feature_view	`fraud-auth-features-v11`
model_signature	input schema、output schema、score range
owner	Fraud Analytics Lead
risk_tier	Tier 1
approved_stage	staging / shadow / canary / production
approval_conditions	low-risk card-present segment, human review for high-value decline
eval_report	`EVAL-FRAUD-2026-014`
rollback_target	`fraud-realtime-score-v3:3.3.9`
monitoring_dashboard	production quality and drift dashboard
review_expiry	next quarterly validation date

7.5 Governance Evidence Binder

每次 Tier 1 / Tier 2 发布建议生成一个 evidence binder：

Section	Evidence
Executive summary	release scope、risk tier、decision、conditions、rollback target
Architecture	pipeline diagram、component map、data flow、tool boundary、human role
Data and feature	dataset card、feature validation、lineage、privacy classification
Model / prompt / policy	model card、prompt diff、policy config、tool schema、model registry record
Eval	eval contract、golden / regression / red-team result、slice analysis、failure review
Security / privacy	DLP、access control、threat model、dependency scan、egress control
Release decision	gate memo、approval trail、exceptions、limited-go conditions
Deployment	manifest、traffic plan、shadow/canary metrics、runbook
Monitoring	dashboard snapshot、alert rules、sampling plan、human review queue
Rollback	rollback plan、rollback target、drill evidence、incident trigger
Post-release	actual outcomes、issues、remediation、dataset updates、review minutes

8. Templates

8.1 AI Release Gate Memo

# AI Release Gate Memo: AML Copilot Narrative v2.3

## Decision
- decision: Limited go to canary
- release_id: AIREL-AML-COPILOT-2026-007
- use_case_id: AML-COPILOT-001
- risk_tier: Tier 1
- decision date: 2026-06-29
- rollback target: AIREL-AML-COPILOT-2026-006

## Scope
- approved use: analyst-facing draft narrative and evidence summary
- prohibited use: final SAR decision, automatic regulatory submission, automatic case closure
- canary scope: 15 trained Tier 2 AML analysts, high-risk alerts excluded from first 72 hours
- human control: analyst must review and edit before saving to case record

## Release Bundle
- code commit: 9f2c81a
- serving image: registry/aml-serving@sha256:2c1
- model version: aml-narrative-ranker 1.8.0
- prompt version: aml-summary-system-prompt v2.3
- evidence tool schema: case-evidence-tool v1.1
- transaction feature view: aml-alert-features v6
- RAG index: aml-policy-index-2026Q2-v2

## Eval Results
| Gate metric | Result | Threshold | Decision |
|---|---:|---:|---|
| Critical unsupported claim | 0 | 0 | pass |
| PII leakage | 0 | 0 | pass |
| Citation correctness | 96.8% | 95.0% | pass |
| Historical incident regression | 100% | 100% | pass |
| High-risk slice groundedness | 97.1% | 97.0% | pass |
| P95 latency | 7.8s | 9.0s | pass |

## Conditions
- first 72 hours: all generated narratives reviewed by AML Quality Lead sample queue
- high-risk entity ambiguity cases remain disabled until targeted sample count reaches approved coverage
- daily release check for citation correctness, unsupported claim, analyst major rewrite rate and latency

## Rollback
- automatic rollback trigger: any unsupported suspicious activity conclusion saved to case record
- manual rollback trigger: citation correctness below threshold for two consecutive daily reviews
- rollback action: route pointer returns to AIREL-AML-COPILOT-2026-006 and narrative generation is disabled for affected alerts

## Approval
- Business owner: AML Operations Director
- Model Risk: Model Risk VP
- Compliance: BSA/AML Compliance Lead
- Platform: AI Platform Owner
- Security / Privacy: approved for canary scope

8.2 Model Promotion Card

Field	Example
Promotion ID	PROMO-FRAUD-2026-018
From stage	shadow
To stage	canary
Model	fraud-realtime-score-v3 version 3.4.2
Release bundle	AIREL-FRAUD-2026-018
Baseline	production version 3.3.9
Candidate benefit	4.2% improvement in fraud capture at same false positive cost
Critical risks	false positive in vulnerable customer segment, feature freshness delay
Required controls	feature freshness alert, high-value decline human review, threshold rollback
Decision	promote to 5% low-risk card-present traffic
Expiry	decision expires after 14 days if ramp not completed

8.3 Dataset and Feature Validation Card

Field	Example
Validation ID	DATA-FEATURE-AML-2026-011
Dataset snapshot	AML-TRAIN-2026Q2-v3
Feature view	aml-alert-features-v6
Owner	AML Data Product Owner
Source systems	transaction monitoring warehouse, KYC profile store, case management
Sampling method	stratified by alert type, risk score band, entity type and investigation outcome
Schema result	pass, 0 incompatible fields
Missingness result	pass, all critical features below threshold
Freshness result	pass, transaction velocity features updated within SLA
Leakage check	pass, post-investigation outcome fields excluded
Access control	pass, restricted AML fields available only in approved environment
Decision	approved for Tier 1 candidate training and offline eval

8.4 Experiment Comparison Report

Section	Content
Experiment ID	EXP-CREDIT-RAG-2026-009
Baseline	prompt v1.7 + index 2026Q2-v1 + model route A
Candidate	prompt v1.8 + index 2026Q2-v2 + model route A
Dataset	CREDIT-GOLDEN-2026Q2-v3, CREDIT-REGRESSION-2026Q2-v2, CREDIT-REDTEAM-2026Q2-v1
Primary outcome	citation correctness improved from 93.4% to 96.2%
Critical failures	0 in baseline, 0 in candidate
Material regression	candidate over-refusal increased in small business lending policy slice
Cost / latency	latency +0.6s due to reranker; cost within approved budget
Decision	limited go for consumer credit policy; small business slice remains on baseline
Evidence	eval run report, failure sample review, SME sign-off, prompt diff

8.5 Canary Plan

Field	Example
Canary ID	CANARY-CX-AI-2026-004
Use case	customer-facing fee and account service assistant
Entry condition	shadow passed for 10 business days, 0 customer-visible critical failures
Traffic	2% authenticated web chat, excluding complaint, hardship and vulnerable customer tags
Duration	5 business days before ramp decision
Monitored metrics	policy violation, incorrect fee statement, handoff failure, DLP hit, customer negative feedback, p95 latency
Auto rollback	any incorrect fee waiver commitment, any PII leak, handoff failure above threshold
Manual review	daily QA sample of 100 conversations plus all negative feedback with fee / complaint keywords
Exit decision	go to 10% ramp, extend canary, rollback or no-go

8.6 Rollback Runbook

Step	Owner	Action	Evidence
1	Incident Commander	Declare rollback trigger and freeze ramp	incident / decision log
2	Platform Owner	Set route to previous approved release bundle	deployment event, route diff
3	Model Registry Owner	Confirm production stage points to rollback target	registry audit record
4	Prompt / Policy Owner	Revert prompt and guardrail config if affected	prompt registry version diff
5	Data / RAG Owner	Revert index or feature view if affected	index manifest, feature view version
6	Business Owner	Activate fallback workflow and user communication	operations notice
7	Risk / Compliance	Confirm impacted population query and evidence preservation	impact query result
8	EvalOps Owner	Add failure cases to regression set and validate fix	regression run ID
9	Release Manager	Prepare restart gate memo	restart decision record

8.7 Continuous Training Trigger Record

Field	Example
Trigger ID	CT-FRAUD-2026-021
Trigger source	production drift dashboard
Trigger condition	card-not-present merchant category distribution shifted beyond approved threshold
Candidate action	train fraud model candidate using 2026Q2-v6 snapshot
Release rule	candidate generation is automatic; production promotion requires full Tier 1 gate
Required eval	segment performance, false positive cost, vulnerable customer slice, latency, feature freshness
Owner	Fraud Analytics Lead
Decision	candidate trained and held in staging pending model risk review

8.8 Evidence Binder Index

Binder section	Artifact
01 Scope	use case card, risk tier memo, approved/prohibited use
02 Architecture	CD4ML diagram, component map, data flow, rollback map
03 Data	dataset card, feature validation, lineage and access evidence
04 Build	CI report, container digest, dependency and security scan
05 Model	model card, training run, registry entry, calibration record
06 Prompt / Policy	prompt diff, policy tests, tool schema review
07 Eval	eval contract, experiment report, slice failures, SME review
08 Gate	release gate memo, approvals, limited-go conditions
09 Deployment	manifest, shadow result, canary plan, ramp log
10 Monitoring	dashboard, alert rules, human review sample, drift report
11 Rollback	rollback target, runbook, drill result, actual rollback record
12 Post-release	review minutes, issues, remediation, regression updates

9. 30-Day Training Plan

目标：30 天内围绕一个金融零售 AI 用例，完成一套可展示的 CD4ML / MLOps release engineering 作品集。推荐主线选择 Credit Policy RAG、AML Copilot、Fraud Scoring Model、Payment Dispute Agent 或 Customer-Facing Service AI。

Day	任务	Artifact
1	选择 use case，定义业务目标、用户、客户影响和禁止用途	Use Case Card
2	判定 risk tier，写出为什么是 Tier 1 / Tier 2 / Tier 3	Risk Tier Memo
3	画 AS-IS / TO-BE AI release lifecycle	Release Lifecycle Map
4	设计 CD4ML reference architecture	Architecture Diagram
5	拆 release bundle：code、data、feature、model、prompt、policy、eval、deployment	Release Bundle Spec
6	设计 model registry 字段和 stage promotion	Model Registry Spec
7	设计 dataset registry 和 dataset card	Dataset Governance Pack
8	设计 feature validation checks	Feature Validation Card
9	设计 prompt / policy registry 和 prompt diff 流程	Prompt Governance Spec
10	设计 CI 检查：code、schema、prompt、policy、security	CI Checklist
11	设计 training pipeline 和 reproducibility evidence	Training Pipeline Spec
12	设计 eval dataset：golden、regression、red-team、slice	Eval Dataset Plan
13	设计 evaluator：deterministic、human、judge、business metric	Evaluator Card
14	写 baseline vs candidate experiment report	Experiment Report
15	设计 release gate stack：G0 到 G9	Gate Model
16	写 Tier 1 release gate memo 示例	Gate Memo
17	设计 shadow 流程和 production-like replay	Shadow Plan
18	设计 canary 和 ramp 策略	Canary Plan
19	设计 rollback matrix：model、prompt、index、feature、tool、route	Rollback Matrix
20	设计 continuous training trigger 和发布约束	CT Trigger Policy
21	设计 monitoring dashboard：quality、drift、cost、latency、human override	Monitoring Spec
22	设计 artifact lineage 和 trace schema	Lineage Spec
23	设计 governance evidence binder	Evidence Binder Index
24	写金融零售案例 1：Credit Policy RAG	Case Study 1
25	写金融零售案例 2：AML / Fraud / Payment Agent	Case Study 2
26	做一次 release tabletop：candidate fail、canary rollback、vendor change	Tabletop Decision Log
27	写 post-release review 模板和 issue loop	Post-Release Review
28	写 build vs buy ADR：TFX / managed MLOps / internal platform	Architecture Decision Record
29	整理 15 页作品集包	Portfolio Deck
30	准备 8-10 个面试答案和 5 分钟讲述	Interview Pack

10. Interview Answers

Q1：CD4ML 和普通 CI/CD 最大区别是什么？

版本	回答
30 秒	普通 CI/CD 主要发布代码，CD4ML 发布的是 code、data、feature、model、prompt、eval 和 deployment route 的组合。它不仅要能部署，还要能复现训练、证明质量、分阶段放量、监控漂移并回滚到受控状态。
2 分钟	我会把 CD4ML 理解成 AI release engineering。传统软件中，同一个 commit 构建出 artifact，通过测试就能部署。ML/AI 中，模型行为取决于训练数据、标签、特征、参数、prompt、RAG index、tool schema 和生产分布。所以 release 必须是一个 bundle：代码 commit、数据 snapshot、feature view、model artifact、prompt/policy version、eval run、deployment manifest 和 rollback target。金融零售场景还要把这些证据放进 release gate 和 audit binder。

版本

回答

30 秒

普通 CI/CD 主要发布代码，CD4ML 发布的是 code、data、feature、model、prompt、eval 和 deployment route 的组合。它不仅要能部署，还要能复现训练、证明质量、分阶段放量、监控漂移并回滚到受控状态。

2 分钟

我会把 CD4ML 理解成 AI release engineering。传统软件中，同一个 commit 构建出 artifact，通过测试就能部署。ML/AI 中，模型行为取决于训练数据、标签、特征、参数、prompt、RAG index、tool schema 和生产分布。所以 release 必须是一个 bundle：代码 commit、数据 snapshot、feature view、model artifact、prompt/policy version、eval run、deployment manifest 和 rollback target。金融零售场景还要把这些证据放进 release gate 和 audit binder。

Q2：为什么模型上线需要 model/data/code/prompt 版本耦合？

版本	回答
30 秒	因为模型行为不是 model artifact 单独决定的。数据、特征、prompt、policy、RAG index 和 serving route 任一变化都会改变输出。没有版本耦合，就无法复现、审计、回滚或解释事故。
2 分钟	我会用 release bundle 管理耦合。比如 AML Copilot 的一个版本不仅包括模型，还包括交易特征、case evidence tool、prompt、guardrail policy、RAG index、eval dataset 和 judge version。事故时如果只知道 model version，无法判断是新 prompt 导致过度自信，还是 index 引用了旧政策，还是 tool schema 允许了错误动作。版本耦合让我们能做 lineage、gate、rollback 和 model risk evidence。

版本

回答

30 秒

因为模型行为不是 model artifact 单独决定的。数据、特征、prompt、policy、RAG index 和 serving route 任一变化都会改变输出。没有版本耦合，就无法复现、审计、回滚或解释事故。

2 分钟

我会用 release bundle 管理耦合。比如 AML Copilot 的一个版本不仅包括模型，还包括交易特征、case evidence tool、prompt、guardrail policy、RAG index、eval dataset 和 judge version。事故时如果只知道 model version，无法判断是新 prompt 导致过度自信，还是 index 引用了旧政策，还是 tool schema 允许了错误动作。版本耦合让我们能做 lineage、gate、rollback 和 model risk evidence。

Q3：CI/CD/CT 在 MLOps 中如何分工？

版本	回答
30 秒	CI 验证代码、schema、feature、prompt、policy 和安全；CD 把通过 gate 的 release bundle 分阶段部署；CT 根据数据漂移、标签到达、业务规则变化或性能退化生成候选模型，但候选模型不能绕过 release gate 自动进生产。
2 分钟	CI 是工程质量入口，包括 pipeline tests、feature transform tests、schema tests、prompt tests、policy tests、eval code tests 和 dependency scan。CD 是 deployment orchestration，包括 registry promotion、staging、shadow、canary、ramp 和 rollback。CT 是 continuous training，触发器可以是数据漂移、标签到达、模型退化或周期刷新。关键是 CT 只自动生成 candidate，不等于自动发布。高风险金融场景必须经过 eval、model risk、business approval 和 canary。

版本

回答

30 秒

CI 验证代码、schema、feature、prompt、policy 和安全；CD 把通过 gate 的 release bundle 分阶段部署；CT 根据数据漂移、标签到达、业务规则变化或性能退化生成候选模型，但候选模型不能绕过 release gate 自动进生产。

2 分钟

CI 是工程质量入口，包括 pipeline tests、feature transform tests、schema tests、prompt tests、policy tests、eval code tests 和 dependency scan。CD 是 deployment orchestration，包括 registry promotion、staging、shadow、canary、ramp 和 rollback。CT 是 continuous training，触发器可以是数据漂移、标签到达、模型退化或周期刷新。关键是 CT 只自动生成 candidate，不等于自动发布。高风险金融场景必须经过 eval、model risk、business approval 和 canary。

Q4：Feature validation 为什么是 release gate 的硬门槛？

版本	回答
30 秒	特征是模型真实看到的业务世界。schema、freshness、missingness、range、label leakage 或 training-serving skew 出错，会让模型在看似成功部署的情况下产生错误决策。
2 分钟	传统 API 测试可能只证明服务可用，但不能证明模型输入是对的。比如欺诈模型如果线上交易金额用毛额而训练用净额，模型会系统性偏移；AML velocity 特征延迟两天，会漏掉关键行为；信贷模型如果把未来 outcome 放进训练，就是 label leakage。Feature validation 要检查 schema、分布、缺失、freshness、range、cardinality、access control 和 training-serving skew。Tier 1 模型没有通过这些检查不应进入 canary。

版本

回答

30 秒

特征是模型真实看到的业务世界。schema、freshness、missingness、range、label leakage 或 training-serving skew 出错，会让模型在看似成功部署的情况下产生错误决策。

2 分钟

传统 API 测试可能只证明服务可用，但不能证明模型输入是对的。比如欺诈模型如果线上交易金额用毛额而训练用净额，模型会系统性偏移；AML velocity 特征延迟两天，会漏掉关键行为；信贷模型如果把未来 outcome 放进训练，就是 label leakage。Feature validation 要检查 schema、分布、缺失、freshness、range、cardinality、access control 和 training-serving skew。Tier 1 模型没有通过这些检查不应进入 canary。

Q5：你如何设计 AI release gate？

版本	回答
30 秒	我会按 risk tier 设计 gate stack：scope、data、CI、feature validation、offline eval、security/privacy、shadow、canary、ramp、post-release review。高风险 use case 的 critical failure 必须为 0，并且要有 rollback target。
2 分钟	Release gate 要把工程、质量和风险分开看。工程上看构建、测试、schema、container 和 deployment manifest。质量上看 golden set、regression set、red-team、slice analysis、cost 和 latency。风险上看客户影响、监管流程、人工控制、隐私、安全、model risk 和 residual risk。决策可以是 go、limited go、no-go、rollback 或 risk acceptance。金融零售里，客户可见错误承诺、PII 泄露、越权工具动作、unsupported regulated claim 都是 hard stop。

版本

回答

30 秒

我会按 risk tier 设计 gate stack：scope、data、CI、feature validation、offline eval、security/privacy、shadow、canary、ramp、post-release review。高风险 use case 的 critical failure 必须为 0，并且要有 rollback target。

2 分钟

Release gate 要把工程、质量和风险分开看。工程上看构建、测试、schema、container 和 deployment manifest。质量上看 golden set、regression set、red-team、slice analysis、cost 和 latency。风险上看客户影响、监管流程、人工控制、隐私、安全、model risk 和 residual risk。决策可以是 go、limited go、no-go、rollback 或 risk acceptance。金融零售里，客户可见错误承诺、PII 泄露、越权工具动作、unsupported regulated claim 都是 hard stop。

Q6：Shadow、canary、ramp 应该如何用？

版本	回答
30 秒	Shadow 用真实输入旁路评估但不影响用户；canary 用小流量真实用户验证质量、成本、延迟和人工反馈；ramp 按 segment 扩展。每一步都要有成功标准和自动停止条件。
2 分钟	我会先在 shadow 中跑真实请求 replay，确认 trace、policy、latency、cost 和输出质量，但不触发写工具和客户可见动作。Canary 从低风险 segment 或 trained internal users 开始，比如 1%-5% 流量。监控 critical failure、human override、complaint、DLP、latency 和 cost。Ramp 不能按百分比机械扩大，要按产品、地区、用户、风险等级和支持能力分段。任何 high-risk slice 退化都应暂停扩展。

版本

回答

30 秒

Shadow 用真实输入旁路评估但不影响用户；canary 用小流量真实用户验证质量、成本、延迟和人工反馈；ramp 按 segment 扩展。每一步都要有成功标准和自动停止条件。

2 分钟

我会先在 shadow 中跑真实请求 replay，确认 trace、policy、latency、cost 和输出质量，但不触发写工具和客户可见动作。Canary 从低风险 segment 或 trained internal users 开始，比如 1%-5% 流量。监控 critical failure、human override、complaint、DLP、latency 和 cost。Ramp 不能按百分比机械扩大，要按产品、地区、用户、风险等级和支持能力分段。任何 high-risk slice 退化都应暂停扩展。

Q7：AI 系统如何设计 rollback？

版本	回答
30 秒	AI rollback 必须按组件设计：model route、prompt、RAG index、feature view、tool permission、policy config 和 batch output 都可能需要分别回滚。高风险系统没有 rollback target 不应上线。
2 分钟	我会在 release bundle 中明确 rollback target。比如 Credit Policy RAG 出现过期引用，可以回滚 index；prompt 导致错误拒答，可以回滚 prompt；fraud model false positive 上升，可以回滚 model 或 threshold；Payment Agent tool misuse，则先关闭写工具，保留 read-only summary。Rollback 还要包括 impacted population query、证据保全、客户补救和 regression case 更新。

版本

回答

30 秒

AI rollback 必须按组件设计：model route、prompt、RAG index、feature view、tool permission、policy config 和 batch output 都可能需要分别回滚。高风险系统没有 rollback target 不应上线。

2 分钟

我会在 release bundle 中明确 rollback target。比如 Credit Policy RAG 出现过期引用，可以回滚 index；prompt 导致错误拒答，可以回滚 prompt；fraud model false positive 上升，可以回滚 model 或 threshold；Payment Agent tool misuse，则先关闭写工具，保留 read-only summary。Rollback 还要包括 impacted population query、证据保全、客户补救和 regression case 更新。

Q8：Model registry 在治理中有什么价值？

版本	回答
30 秒	Model registry 不是文件仓库，而是模型发布控制面。它记录 artifact、lineage、metrics、risk tier、approval status、deployment stage、rollback target 和 review expiry。
2 分钟	在金融零售里，model registry 要服务工程和治理两边。工程需要 artifact hash、signature、container、endpoint 和 stage。治理需要 training data、feature view、eval result、owner、risk tier、approval、限制、monitoring dashboard 和下次 validation 日期。这样 model promotion 不只是“把模型复制到生产”，而是一个可审计的 stage transition。

版本

回答

30 秒

Model registry 不是文件仓库，而是模型发布控制面。它记录 artifact、lineage、metrics、risk tier、approval status、deployment stage、rollback target 和 review expiry。

2 分钟

在金融零售里，model registry 要服务工程和治理两边。工程需要 artifact hash、signature、container、endpoint 和 stage。治理需要 training data、feature view、eval result、owner、risk tier、approval、限制、monitoring dashboard 和下次 validation 日期。这样 model promotion 不只是“把模型复制到生产”，而是一个可审计的 stage transition。

Q9：如何把 NIST AI RMF 接到 MLOps release？

版本	回答
30 秒	我会用 Govern / Map / Measure / Manage 对齐 release：Govern 定义责任和证据，Map 定义 use case 和风险，Measure 用 eval 和监控度量风险，Manage 用 gate、rollback、issue remediation 和 risk acceptance 处置风险。
2 分钟	NIST AI RMF 可以成为跨职能沟通语言。Map 阶段对应 use case intake、risk tier、数据和客户影响。Measure 阶段对应 offline eval、feature validation、red-team、shadow/canary monitoring。Manage 阶段对应 release gate、rollback、fallback、incident response 和 remediation。Govern 贯穿全程，包括 owner、policy、approval、evidence binder 和 review cadence。这样 MLOps 不只是技术 pipeline，而是 AI 风险管理的执行机制。

版本

回答

30 秒

我会用 Govern / Map / Measure / Manage 对齐 release：Govern 定义责任和证据，Map 定义 use case 和风险，Measure 用 eval 和监控度量风险，Manage 用 gate、rollback、issue remediation 和 risk acceptance 处置风险。

2 分钟

NIST AI RMF 可以成为跨职能沟通语言。Map 阶段对应 use case intake、risk tier、数据和客户影响。Measure 阶段对应 offline eval、feature validation、red-team、shadow/canary monitoring。Manage 阶段对应 release gate、rollback、fallback、incident response 和 remediation。Govern 贯穿全程，包括 owner、policy、approval、evidence binder 和 review cadence。这样 MLOps 不只是技术 pipeline，而是 AI 风险管理的执行机制。

Q10：你如何把 CD4ML 做成作品集？

版本	回答
30 秒	我会选一个金融零售高风险用例，展示 release architecture、release bundle、gate model、eval report、shadow/canary plan、rollback runbook 和 evidence binder，让面试官看到我能把 AI 从实验带到受控生产。
2 分钟	作品集不应只展示模型效果。我会用 Credit Policy RAG 或 AML Copilot 做主线，先说明业务流程和 risk tier，再展示 CD4ML 架构：source control、data validation、feature store、training pipeline、model/prompt registry、eval gate、deployment、monitoring、rollback。然后展示一个 candidate release：baseline vs candidate、slice analysis、limited-go decision、canary plan、rollback trigger 和 post-release review。最后把所有证据组织成 binder，体现产品、架构、风险和治理一体化能力。

版本

回答

30 秒

我会选一个金融零售高风险用例，展示 release architecture、release bundle、gate model、eval report、shadow/canary plan、rollback runbook 和 evidence binder，让面试官看到我能把 AI 从实验带到受控生产。

2 分钟

作品集不应只展示模型效果。我会用 Credit Policy RAG 或 AML Copilot 做主线，先说明业务流程和 risk tier，再展示 CD4ML 架构：source control、data validation、feature store、training pipeline、model/prompt registry、eval gate、deployment、monitoring、rollback。然后展示一个 candidate release：baseline vs candidate、slice analysis、limited-go decision、canary plan、rollback trigger 和 post-release review。最后把所有证据组织成 binder，体现产品、架构、风险和治理一体化能力。

11. Portfolio Package

一个高级 CD4ML / MLOps release engineering 作品集建议做成 15-20 页，不要只放 pipeline 截图。

Page	内容	展示能力
1	Executive summary：为什么金融零售 AI 需要 release engineering	高管沟通
2	Source anchors：CD4ML、Google MLOps、TFX、NIST AI RMF	学习锚点和方法论来源
3	Use case：Credit Policy RAG / AML Copilot / Fraud Model	业务理解
4	Risk tier and approved use	风险分级和边界
5	CD4ML reference architecture	架构设计
6	Release bundle：code/data/feature/model/prompt/eval/deployment	版本耦合
7	Data and feature validation	数据产品和特征治理
8	Model / prompt / policy registry	资产控制
9	CI/CD/CT workflow	工程体系
10	Eval gate：golden、regression、red-team、slice	质量门禁
11	Release gate memo	决策表达
12	Shadow / canary / ramp plan	上线策略
13	Rollback and incident trigger	可靠性
14	Monitoring dashboard	生产运营
15	Governance evidence binder	审计和模型风险证据
16	Financial retail case examples	行业化表达
17	Build vs buy ADR	平台产品判断
18	Interview story	求职转化

11.1 作品集标题示例

CD4ML Release Engineering Pack:
Controlled Production Rollout for Credit Policy RAG

MLOps Continuous Delivery Evidence Binder:
AML Copilot Model / Prompt / Data / Eval Release Governance

Risk-Tiered AI Release System:
Fraud Scoring Continuous Training, Canary and Rollback Design

11.2 5 分钟讲述结构

1. 我选择了一个金融零售高风险 AI 用例。
2. 我没有从模型开始，而是先定义 approved use、prohibited use、risk tier 和客户影响。
3. 我把 AI release 定义成 code、data、feature、model、prompt、eval 和 deployment 的 bundle。
4. 我设计了 CI/CD/CT，但 CT 只生成 candidate，不绕过 release gate。
5. 我用 feature validation、offline eval、security/privacy review 和 model risk gate 控制上线。
6. 我用 shadow、canary 和 ramp 降低生产风险。
7. 我为 model、prompt、index、feature、tool 和 route 分别设计 rollback。
8. 我把 lineage、approval、monitoring、incident 和 post-release review 整理成 evidence binder。

11.3 自检清单

Check	达标标准
Architecture	有端到端 CD4ML pipeline，不只是训练脚本
Version coupling	release bundle 绑定 code、data、feature、model、prompt、policy、eval、deployment
CI/CD/CT	说明 CI、CD、CT 分工，且 CT 不自动绕过 gate
Feature validation	覆盖 schema、distribution、missingness、freshness、leakage、serving skew
Eval gate	覆盖 golden、regression、red-team、slice、cost、latency、critical failure
Promotion workflow	包含 staging、shadow、canary、ramp、post-release review
Rollback	能按 model、prompt、index、feature、tool、policy、route 回滚
Registry	model / dataset / feature / prompt / eval / deployment registry 有最小字段
Reproducibility	能复现训练和评估，保留 container、snapshot、参数、hash
Lineage	production trace 能追溯 release_id 和关键组件版本
Governance	有 gate memo、approval、exception、monitoring、issue、evidence binder
Financial retail	案例体现 AML、KYC、fraud、credit、payment、customer-facing AI 的风险差异

12. Final Principle

AI release engineering 的成熟度可以用一句话检验：

当明天模型、prompt、数据、特征、RAG index 或工具权限发生变化时，团队能否在同一天构建候选版本、复现训练来源、跑完风险分级 eval、做出 gate 决策、受控放量、实时监控、快速回滚，并拿出完整治理证据？

如果答案是肯定的，MLOps 就不只是模型工程，而是金融零售 AI 规模化的生产操作系统。