AI 底层逻辑 / 经典论文

CD4ML / MLOps：AI 持续交付与发布工程

一句话:

242 行ai-foundations/papers/60-cd4ml-mlops-continuous-delivery-ai-release.md

CD4ML / MLOps / Continuous Delivery for AI Release 解读

面向对象: AI Platform PM / AI Architect / Release Governance Lead / Model Risk Partner / 金融零售 AI 交付负责人。核心问题: AI 发布不是把模型文件部署上去。生产 AI 同时变更 code、data、model、prompt、features、threshold、policy 和 knowledge base；如果没有持续交付和门禁，团队会卡在不可复现、不可回滚、不可审计的状态。学习目标: 理解 Continuous Delivery for Machine Learning、CI/CD/CT、model registry、pipeline automation、release gate、shadow/canary/ramp、rollback、artifact lineage，并转成金融零售 AI 发布体系。

Source Anchors

Source	Link	用途
Continuous Delivery for Machine Learning	https://martinfowler.com/articles/cd4ml.html	理解 CD4ML 如何把 continuous delivery 扩展到 code、data、model 三轴变更
Google Cloud MLOps: Continuous delivery and automation pipelines	https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning	参考 MLOps 自动化等级、pipeline、CI/CD/CT 和生产 ML 流程
TensorFlow Extended	https://www.tensorflow.org/tfx	参考生产 ML pipeline 的数据验证、训练、评估、部署组件
NIST AI RMF	https://www.nist.gov/itl/ai-risk-management-framework	把 AI 发布门禁、风险测量和生产监控纳入治理框架

一句话:

CD4ML 是把 AI 系统保持在“任何时候都可安全、可复现、可审计发布”的工程和治理能力。

1. AI 发布的三轴变更

传统软件主要管理代码变更。AI 系统至少有三轴:

code changes
data changes
model changes

GenAI/Agent 系统还要加:

prompt changes。
retriever / index changes。
knowledge source changes。
tool permission changes。
policy / guardrail changes。
evaluator / rubric changes。
cost and latency routing changes。

金融零售场景的发布风险:

变更	风险
欺诈模型升级	误拦截、漏拦截、人工队列暴涨
RAG 知识库更新	客户收到旧政策或错误引用
prompt 改写	拒答率、语气、边界声明改变
阈值调整	影响信贷批准、KYC 通过、AML escalation
vendor model 升级	行为、成本、延迟、隐私路径变化

2. CD4ML Pipeline

source control
  -> data validation
  -> feature / prompt / policy validation
  -> training or model selection
  -> offline eval
  -> bias / safety / robustness eval
  -> package release bundle
  -> staging shadow
  -> canary / ramp
  -> production monitoring
  -> rollback / retrain / recalibrate

2.1 Release Bundle

AI 发布不能只记录 model version。需要 release bundle:

Artifact	示例
Code version	service、pipeline、feature transforms
Data version	training set、eval set、calibration set
Model version	base model、fine-tune、adapter、vendor model
Prompt version	system prompt、tool instructions、templates
Retriever version	index、embedding model、reranker、chunking
Policy version	risk routing、guardrails、HITL thresholds
Eval version	golden set、judge、rubric、metrics
Config version	thresholds、model routing、cache、rate limits

没有 bundle，就没有可复现和可回滚。

2.2 Continuous Training

Continuous training 不等于自动把新模型推生产。

正确路径:

new data
  -> validate data
  -> retrain candidate
  -> compare with champion
  -> evaluate risk segments
  -> shadow or canary
  -> approve or reject

高风险金融 AI 应区分:

自动训练。
自动评估。
自动推荐。
人工批准。
自动发布。

越靠近客户权益和资金影响，越不能无审批自动发布。

3. Release Gate Model

Gate	目标	失败动作
Data validation	schema、range、freshness、missing、drift	阻断训练或发布
Training reproducibility	同版本可重跑	修 pipeline
Offline eval	quality、safety、fairness、latency、cost	不进入 staging
Segment gate	关键客群/渠道/产品线表现	缩小范围或降级自动化
Human review	高风险输出抽检	修 prompt/model/policy
Shadow gate	生产流量旁路观察	不影响客户，收集差异
Canary gate	小流量真实上线	自动 rollback
Monitoring gate	drift、complaint、override、SLO	incident 或 freeze

AI release decision 应写成:

approve pilot / approve limited ramp / rollback / hold / approve with controls

4. Shadow、Canary、Ramp

模式	用法	注意
Shadow	新模型看真实请求但不影响结果	适合高风险初测
Canary	小比例真实流量	要有自动 rollback 和客户影响监控
Ramp	分阶段扩大范围	每一阶段有 stop rule
Champion-Challenger	新旧策略对照	注意公平、样本和业务周期

金融零售例子:

信贷: 先 shadow，不直接影响审批。
欺诈: 对低金额 segment canary，高金额仍人工。
客服 RAG: 先内部员工 shadow，再低风险 FAQ ramp。
AML: challenger 给 analyst 建议，不直接决定 filing。

5. AI Release Evidence

Evidence	内容
Release manifest	bundle 中所有 artifact 版本
Evaluation report	指标、阈值、失败案例、segment analysis
Risk acceptance	残余风险和批准人
Rollback plan	如何回到上一版本
Monitoring plan	指标、阈值、owner、频率
Customer impact assessment	可能影响的客户和补救路径
Change log	和上一版本相比变了什么
Incident triggers	何时暂停、回滚、降级

6. 典型架构

git / model registry / dataset registry / prompt registry
  -> pipeline orchestrator
  -> validation components
  -> training / eval jobs
  -> release bundle builder
  -> approval workflow
  -> deployment controller
  -> monitoring and incident system

关键平台能力:

Dataset registry。
Model registry。
Prompt registry。
EvalOps。
Feature store。
Policy gateway。
Observability。
Evidence binder。
Rollback controller。

7. 常见失败模式

失败模式	表现	修正
Notebook 发布	无法复现训练和评估	pipeline 化训练和 eval
只版本模型	prompt/数据/阈值未绑定	release bundle
eval set 被污染	指标看似提升	eval set governance
无 shadow/canary	一上线就全量事故	分阶段 release
没有 rollback	出事只能热修	rollback package
监控晚于发布	事故靠客户投诉发现	pre-defined monitoring gate
风险审批线下化	证据散落邮件	evidence binder

8. 面试表达

30 秒版本

CD4ML 把 continuous delivery 扩展到 AI: 不只发布代码，还要管理数据、模型、prompt、特征、阈值、policy 和 eval。我的发布方案会用 release bundle、data validation、offline eval、segment gate、shadow/canary/ramp、monitoring 和 rollback，确保 AI 系统可复现、可审计、可安全回滚。

2 分钟版本

金融 AI 发布不能靠模型团队发一个 pkl 或换一个 API endpoint。以 RAG 客服为例，发布包必须绑定模型版本、prompt、知识库索引、embedding、reranker、引用策略、拒答策略、eval set 和监控阈值。上线前跑数据/知识源校验、offline eval、红队和人工抽检；上线时先 shadow，再低风险 canary，再分阶段 ramp。每一步都有 stop rule，比如 unsupported claim、人工 override、投诉、延迟和成本超阈值。所有证据进入 release memo 和 evidence binder。这样发布决策是业务和风险可批准的，而不是工程黑箱。

CTO 追问

如果问是否要全自动 continuous training，我会回答: 训练和评估可以自动化，但高风险金融 AI 的生产发布通常不能无审批自动化。风险分层很关键: 低风险内部工具可以高度自动化，高风险客户权益和资金相关路径需要人工批准、shadow/canary 和回滚门禁。

9. Portfolio Task

做一个 “AI Release Engineering Pack”:

Artifact	内容
Release pipeline	code/data/model/prompt/eval/policy 到生产的流程图
Release bundle spec	每次发布必须绑定的 artifact
Gate checklist	data、eval、risk、segment、shadow、canary、monitoring
Rollback runbook	回滚模型、prompt、阈值、retriever、policy
Evidence memo	发布结论、残余风险、批准人、监控
Dashboard	release health、quality、cost、latency、complaints、override

最终要能讲清楚: AI 发布不是模型部署，而是多 artifact、多风险、多团队的可控变更系统。