AI 扩展计划 / Playbooks

AI Value Stream Management / Flow Metrics Playbook

这些来源作为术语和方法锚点。访问日期按 2026-06-29 记录。

1,157 行AI_VALUE_STREAM_MANAGEMENT_FLOW_METRICS_PLAYBOOK.md

AI Value Stream Management / Flow Metrics Playbook

适用对象: AI PM、AI Product Operations Lead、AI Portfolio Lead、AI Product Architect、Enterprise Architect、Value Office Lead、金融零售 AI 转型负责人。目标: 把 AI use case 从 idea 到 safe release 到 adoption / value realization 的流动管起来, 并把 DORA / SPACE、Flow Metrics、risk gates、platform runway、portfolio funding 和 business value 连接成一套可运营的价值流管理系统。核心观点: AI Value Stream Management 不是画流程图, 而是持续管理 "价值流动、风险证据、平台能力、组织容量和收益兑现"。边界说明: 本文是学习、架构设计和作品集材料, 不构成法律、监管、审计、模型验证、财务确认或供应商选型意见。金融零售正式项目必须由 business owner、risk、model risk、legal、compliance、privacy、security、finance、architecture、operations 和 internal audit 共同确认适用要求。

1. Source Anchors

这些来源作为术语和方法锚点。访问日期按 2026-06-29 记录。

Anchor	Official / primary source	本 playbook 中的用法
APQC Process Classification Framework	https://www.apqc.org/process-frameworks	用 PCF 的流程分类和流程绩效语言连接 business capability、end-to-end process、benchmark、process owner 和 AI value stream。
Flow Framework	https://flowframework.org/	用 Value Stream Management、Flow Metrics 和 business-technology common language 连接 project-to-product、portfolio flow、product value stream 和 flow bottleneck。
DORA	https://dora.dev/	用 delivery performance、throughput、stability、lead time、deployment frequency、change failure、recovery 和 continuous improvement 连接 AI SDLC 流动效率。
NIST AI RMF	https://www.nist.gov/itl/ai-risk-management-framework	用 Govern / Map / Measure / Manage 组织 AI risk gate、evidence、monitoring、incident 和持续治理。

Source-to-artifact mapping:

Source lens	可以产出的 artifact	高级表达
APQC PCF	AI process-to-capability map、process owner map、benchmark baseline	"我先把 AI use case 挂到业务流程和 capability 上, 避免从模型能力反推需求。"
Flow Framework	AI value stream map、flow metrics dashboard、blocked work taxonomy	"我用 flow 管业务价值如何从 idea 流到生产和收益, 不是只管项目状态。"
DORA	AI SDLC delivery dashboard、release stability scorecard、change recovery review	"AI 交付要同时更快和更稳, 不能用 demo 速度掩盖 release risk。"
NIST AI RMF	AI risk gate checklist、evidence binder、monitoring and response loop	"每个流动阶段都有风险证据, 不是上线前一次性审批。"

2. One-Sentence Positioning

AI Value Stream Management 是把 AI opportunity、delivery work、risk evidence、platform capability、adoption behavior 和 realized business value 放进同一条可度量价值流, 让组织能决定 fund、accelerate、block、scale、hold、stop 或 retire。

更短的面试版:

I manage AI use cases as value streams, not feature queues:
from idea evidence to safe release, adoption, risk-controlled operation and finance-recognized value.

管理层中文表达:

我不会用 "上线了多少 AI 功能" 或 "调用了多少次模型" 判断 AI 成功。我会看 AI 是否穿过了完整价值流: 业务问题是否真实, 数据和风险证据是否成立, 平台能力是否复用, 上线是否安全, 用户是否采用, 业务收益是否被 finance 和业务 owner 认可, 风险是否在 scale 后仍然稳定。

3. 目的 / 适用对象 / 核心观点

3.1 目的

本文解决五个高级问题:

AI use case 如何从 idea funnel 流到 production, 而不是停在 PoC 或 dashboard。
如何用 Flow Metrics 找出 AI delivery 和 AI-enabled operations 中的等待、返工、风险阻塞和收益断点。
如何把 DORA / SPACE 的工程生产力语言, 与 portfolio funding、platform runway、risk gates 和 business outcomes 连接。
如何证明 AI 价值不是 activity、usage、demo 或模型分数, 而是受控风险下的 adoption 和 benefits realization。
如何把金融零售中的 AML、客服、信贷、分行、投诉、风控等流程转成可投资、可治理、可度量的 AI value streams。

3.2 适用对象

角色	关心的问题	本文输出
AI PM / AI Product Ops	如何把 AI 产品从功能交付管到 adoption 和 value realization	value stream map、dashboard、benefits loop
AI Portfolio Lead	如何比较多个 AI 投资机会并控制 WIP	portfolio-to-product trace、flow load、evidence gates
Product Architect / Enterprise Architect	如何把 AI use case 连接到 capability、platform、data、risk controls	capability map、platform runway、risk gate architecture
Engineering Productivity / Platform PM	AI 工具和平台是否改善 delivery flow 和 stability	DORA / SPACE / Flow metric integration
Value Office / Finance Partner	AI 收益如何被量化、归因、兑现和复核	benefits realization loop、portfolio evidence pack
Risk / Compliance / Model Risk	AI 风险如何贯穿 idea、pilot、release、scale、operate	NIST AI RMF-aligned gate evidence

3.3 核心观点

成熟的 AI VSM 不是 "流程可视化", 而是四个系统合一:

系统	管理对象	关键问题
Value flow system	idea -> evidence -> release -> adoption -> benefits	价值是否在流动, 还是卡在 PoC、审批、集成或 adoption
Risk flow system	risk hypothesis -> control -> evidence -> monitoring -> incident learning	风险证据是否跟着价值一起流动
Platform flow system	reusable gateway、eval、RAG、observability、policy、review workflow	单个 use case 是否沉淀平台能力, 还是制造一次性负担
Funding flow system	capacity、budget、SME review、risk assurance、change management	资金和容量是否流向最值得验证和放大的 AI bets

4. 为什么 AI 成功不能只看功能 / 调用量

AI 产品常见汇报方式:

上线了 12 个 AI features。
LLM API calls 达到 200 万次。
生成了 50 万段摘要。
模型准确率达到 92%。
用户满意度问卷达到 4.6/5。
预计节省 30% 人工时间。

这些都是 signals, 但不是完整 value evidence。

指标	能说明什么	不能说明什么	VSM 补强方式
Feature count	团队产出了可见功能	是否解决高价值流程问题	连接到 value stream stage 和 business outcome
API call / token count	系统被调用	调用是否产生合格业务结果	改成 qualified value event throughput
Generated output count	AI 很活跃	输出是否被采纳、正确、合规、可审计	加入 acceptance、quality pass、risk guardrail
Model accuracy	离线任务表现	生产流程中是否改善结果	接入 workflow metric、decision quality、online monitoring
User rating	体验倾向	是否存在选择偏差、风险转移或短期新鲜感	分 cohort、risk tier、workflow step 追踪
Estimated time saved	潜在效率	时间是否被释放、转用或财务认可	建 benefits realization loop 和 finance sign-off

成熟 AI 价值叙事:

Model capability
-> AI behavior evidence
-> workflow adoption
-> qualified value event
-> risk-adjusted business outcome
-> finance-recognized benefit
-> portfolio scale / stop decision

一句话:

AI value stream 的主语不是模型, 而是受 AI 改变的业务流程和决策流。

4.1 Vanity Metrics 到 Flow Metrics 的迁移

Vanity metric	更成熟的 flow / value metric
AI features shipped	use cases passing release gate and adoption gate
Prompts created	prompts with eval coverage, owner and production trace
Model calls	qualified AI-assisted workflow completions
Users activated	eligible users reaching repeat adoption in target workflow
Time saved estimate	finance-recognized capacity release or SLA improvement
PoCs completed	PoCs converted to release, platform capability, or stop decision
Accuracy score	risk-tiered eval pass plus online quality and guardrail stability
Platform integrations	production workflows reusing approved platform controls

4.2 AI Value Event 的定义

AI value event 不是 "AI 被调用", 而是一个带质量、风险、采用和经济性约束的业务事件。

Qualified AI value event =
  eligible workflow item
  + AI exposure or AI-assisted action
  + user / system adoption signal
  + quality threshold pass
  + risk guardrail pass
  + auditable trace
  + unit economics within boundary

金融零售示例:

场景	不成熟事件	合格 AI value event
AML alert triage	generated summaries	analyst-reviewed case summaries that reduce investigation time with no critical evidence defect
Customer service copilot	answer generated	grounded answer accepted by agent, customer issue resolved, no reopen or policy breach
Credit memo assistant	draft created	underwriter-accepted memo section with policy citation pass and no prohibited recommendation
Branch knowledge assistant	employee question asked	cited policy answer accepted by employee and no escalation caused by stale knowledge
AI platform	applications connected	production AI workflows passing value, risk, reliability and cost gates through shared platform

5. Value Stream Map: AI Delivery

AI delivery value stream 的目标是把 AI idea 从 "想法" 管到 "可控生产价值", 而不是管任务清单。

5.1 End-to-End AI Delivery Flow

Strategic theme
  -> Idea intake
  -> Discovery evidence
  -> Value stream design
  -> Data / eval readiness
  -> Build and integration
  -> Risk and release gate
  -> Limited release
  -> Adoption and operating model
  -> Benefits realization
  -> Scale / hold / stop / retire

5.2 Stage-by-Stage Map

Stage	Primary question	Key evidence	Flow metric	Gate decision
Strategic theme	这个 AI 投资方向是否符合战略和风险偏好	portfolio thesis、capability map、process baseline	strategy-to-intake lead time	fund theme / narrow theme / park
Idea intake	是否值得进入 discovery	problem owner、APQC process、baseline hypothesis、risk hypothesis	idea acceptance rate、intake queue age	accept / reject / merge
Discovery evidence	问题是否真实且适合 AI	workflow map、no-AI option、data readiness、AI fit	idea-to-evidence lead time	fund pilot / refine / stop
Value stream design	AI 插入哪个流程步骤并改变什么决策	current-state / target-state map、RACI、decision boundary	flow design cycle time	approve value stream / redesign
Data / eval readiness	数据、知识、eval 和 telemetry 是否可用	source owner、freshness、label quality、golden set、metric contract	data wait time、eval design lead time	build / data investment / stop
Build and integration	是否能安全接入生产系统和平台	architecture sketch、model gateway、RAG、tool permissions、tests	build flow time、review turnaround	release candidate / rework
Risk and release gate	是否可控上线	eval result、risk control、rollback、monitoring、audit trace	gate queue time、evidence aging	go / limited go / no-go
Limited release	真实流程中是否安全有效	cohort results、usage、quality、incident、cost	release-to-adoption lead time	continue / restrict / rollback
Adoption and operating model	用户是否在正确流程中持续采用	training、support、manager cadence、override analysis	adoption flow time、support load	scale / change plan / hold
Benefits realization	业务收益是否兑现并被认可	baseline、incremental effect、cost、risk adjustment、finance sign-off	benefit recognition cycle time	scale / hold / stop
Scale / stop / retire	是否扩大、限制、停止或沉淀平台能力	realized benefits、risk trend、platform capacity、unit economics	scale cycle time、retire cycle time	scale / hold / retire / stop

5.3 Current-State vs Target-State AI VSM

Current-state map 要暴露真实阻塞:

Current-state symptom	可能真实阻塞	需要测的指标
PoC 很快, 生产很慢	eval、security、data access、integration、risk approval 后置	PoC-to-release lead time、gate queue time
业务提很多 AI idea	没有 problem baseline 和 process owner	idea rejection reason、discovery WIP
模型效果不错但不用	workflow fit、trust、training、manager incentive 不匹配	adoption flow time、accepted output rate
平台建了但用例仍慢	平台能力没有覆盖数据、eval、risk evidence 或 operations	platform reuse lead time、self-service completion
上线后价值说不清	没有 baseline、telemetry、causal design、finance owner	benefit recognition cycle time、metric contract coverage
审批越来越多	风险证据不可复用, gate 只靠人工会议	evidence reuse rate、risk review capacity load

Target-state map 要定义三个改进方向:

Reduce waiting: data access、risk review、SME review、eval queue、release approval。
Reduce rework: unclear problem、weak eval、ambiguous decision boundary、poor telemetry。
Increase value throughput: qualified value events、benefit realization、platform reuse。

6. Value Stream Map: AI-Enabled Operations

AI-enabled operations 是 AI 上线后的业务运营价值流。它回答: AI 是否真正改变了日常流程, 并在真实风险边界内创造价值。

6.1 Generic AI-Enabled Operations Flow

Business event / case
  -> eligibility check
  -> context and knowledge retrieval
  -> AI draft / recommendation / action proposal
  -> human judgment or policy automation
  -> workflow action
  -> QA / sampling / exception handling
  -> customer / operations outcome
  -> monitoring and incident response
  -> benefits and learning feedback

6.2 Operations Flow Metrics

Flow step	AI control point	Metric	Risk guardrail
Case eligibility	AI only touches approved workflow items	eligible case coverage、exclusion accuracy	high-risk case wrongly included
Context retrieval	AI uses permitted and fresh sources	retrieval success、citation correctness、knowledge freshness	unauthorized data access、stale policy
AI output	Output fits task and boundary	groundedness、format validity、answerability、tool call correctness	hallucination、unsupported claim、overconfident answer
Human judgment	Accountability remains clear	acceptance rate、override rate、review time	blind acceptance、review fatigue
Workflow action	Action improves flow	cycle time、touches per case、queue age、rework	wrong action、customer harm
QA / exception	Quality feedback is captured	QA defect rate、sample coverage、failure taxonomy closure	critical defect not escalated
Customer / ops outcome	Business result improves	FCR、AHT、backlog age、loss avoided、complaint rate	unfair outcome、policy breach
Monitoring	Drift and incidents are visible	incident detection time、cost drift、model / data drift	unmonitored degradation
Learning feedback	Evidence changes product and platform	eval case added、runbook updated、control improved	repeated failure

6.3 Delivery Flow vs Operations Flow

Dimension	AI delivery value stream	AI-enabled operations value stream
Primary object	AI change from idea to release	Business work item from intake to outcome
Main bottleneck	discovery, data, eval, risk gate, integration	adoption, trust, exception handling, QA, operating capacity
Main metrics	lead time, WIP, blocked time, gate queue, release stability	qualified value events, cycle time, quality, guardrail, benefit
Main owner	AI PM, product architect, engineering, platform	business ops owner, product owner, risk owner
Scale risk	shipping unsafe or low-value capabilities	amplifying wrong workflow behavior at scale
Feedback loop	production telemetry updates SDLC gates	operational outcomes update product, policy, eval and training

高级判断:

Delivery VSM 证明 AI 能安全上线; Operations VSM 证明 AI 在真实业务流程中持续创造价值。

7. Flow Metrics and AI-Specific Extensions

Flow Metrics 的价值是把价值流看成一个系统, 而不是看单个团队产出。AI 场景需要扩展到 data、eval、risk、platform、adoption 和 benefits realization。

7.1 Core Flow Metrics

Metric	AI VSM 解释	典型问题
Flow time	一个 AI work item 从进入 value stream 到完成某个价值阶段的总时间	从 idea 到 release 为什么 6 个月
Flow velocity	单位时间内完成的 AI value items 数量	每月有多少 use case 穿过 release gate 或 adoption gate
Flow efficiency	真正工作时间 / 总流动时间	时间花在 build, 还是卡在 data、review、approval
Flow load	正在进行的 AI work items 数量	组合是否 WIP 过高导致所有东西都慢
Flow distribution	work item 类型比例	是否全是 features, 没有 risk work、platform runway、debt、defects
Flow predictability	flow time / delivery outcome 是否稳定	release 和 benefits 是否可预测

7.2 AI-Specific Flow Metrics

Metric	Definition	Owner	Decision it supports
Idea-to-evidence lead time	从 idea intake 到 discovery evidence 可用于决策的时间	AI PM / BA	discovery capacity 是否足够
Evidence conversion rate	进入 discovery 的 idea 中, 有多少转成 pilot、platform investment 或 stop decision	Portfolio owner	idea funnel 质量
Data readiness wait time	work item 等待数据 owner、access、quality、lineage 或 retention 决策的时间	Data owner / Architect	是否需要 data product investment
Eval design lead time	从需求确认到 eval contract 可运行的时间	EvalOps / AI PM	eval 是否前置
Risk gate queue time	work item 等待 risk、privacy、security、model risk、legal 评审的时间	Risk governance owner	gate 是否被证据自动化支撑
Evidence aging	关键 gate evidence 距离最新生产配置或政策版本的年龄	Product architect / Risk	是否需要重新验证
Platform reuse lead time	从 use case 需要能力到通过共享平台完成接入的时间	Platform PM	平台是否真的降低接入成本
Human review capacity load	SME、risk reviewer、QA reviewer 的排队和使用率	Ops / Risk	review capacity 是否成为 scale bottleneck
Release-to-adoption lead time	从 limited release 到目标 cohort 达到 repeat adoption 的时间	Product Ops / Business ops	change management 是否有效
Qualified value throughput	单位时间内通过质量和风险门槛的 AI value events	Product / Value Office	AI 是否产生真实业务价值
Benefit recognition cycle time	从上线到 finance / business owner 确认收益的时间	Value Office / Finance	收益兑现是否过慢
Risk-adjusted flow velocity	按风险、质量和成本调整后的 value event throughput	Portfolio owner	哪些 use case 值得 scale
Stop decision latency	已出现 stop signal 到正式停止或限制的时间	Portfolio governance	是否敢于及时停止低质量投资

7.3 DORA Extensions for AI Flow

DORA lens	AI extension	Interpretation
Change lead time	intent-to-production lead time, spec-to-eval lead time, eval-to-release lead time	AI 不是从 commit 开始, 而是从业务意图和可评估需求开始
Deployment frequency	risk-tiered release frequency for code, prompt, model, data, RAG index, policy, tool schema	频率必须按 artifact type 和 risk tier 分层解释
Change fail rate	AI changes causing rollback, hotfix, eval regression, policy breach, customer impact or manual remediation	包含行为失败, 不只是服务崩溃
Failed deployment recovery time	time to disable AI path, rollback prompt / index / model, switch to fallback or human-only mode	AI recovery 常需要流程和运营一起恢复
Deployment rework rate	unplanned releases caused by AI incident, eval miss, data drift or control failure	衡量 gate 是否把问题前置

7.4 SPACE Signals for AI VSM

SPACE 不应该用来给个人排名。它用于解释 AI value stream 中的人和协作是否健康。

SPACE dimension	AI VSM 信号	为什么重要
Satisfaction and well-being	reviewer load、SME fatigue、trust in AI outputs、change fatigue	价值流不能靠压垮专家和一线员工来换速度
Performance	quality-passed outcomes、risk-adjusted value、customer / ops result	activity 上升不等于绩效改善
Activity	accepted AI-assisted work、reviewed outputs、evidence updates	activity 只作为诊断输入
Communication and collaboration	handoff clarity、decision log quality、cross-owner response time	AI use case 跨 product、data、risk、ops、engineering
Efficiency and flow	blocked time、context switching、review turnaround、work item aging	AI 团队最常见损失来自等待和返工

7.5 Flow Distribution for AI Portfolio

AI portfolio 如果只塞满 features, 生产会越来越慢。建议把 work item 分成六类:

Work type	说明	健康组合信号
Value feature	直接改善业务流程的 AI capability	与 portfolio theme 和 process baseline 绑定
Risk and assurance	eval、red-team、policy、privacy、安全、model risk、audit evidence	高风险用例必须配套足够容量
Platform runway	model gateway、RAG、eval、observability、policy-as-code、review workflow	能被多个 use case 复用
Data / knowledge product	数据质量、知识 ownership、taxonomy、lineage、freshness	解决重复 data readiness 阻塞
Adoption / change	training、manager cadence、support、process redesign、role redesign	release 后 adoption 不会自然发生
Operational debt	legacy bot retirement、duplicate prompt cleanup、stale index remediation、runbook gaps	防止 AI landscape 变成不可治理

8. Connecting Flow, DORA / SPACE, Risk Gates and Business Value

成熟 AI VSM 的 dashboard 不应该是一个单层报表, 而是一个多层 operating system。

Portfolio thesis
  -> value stream and process baseline
  -> delivery flow metrics
  -> DORA / SPACE signals
  -> risk gate evidence
  -> operational adoption metrics
  -> qualified value events
  -> benefits realization
  -> portfolio funding and scale / stop decisions

8.1 Metric Stack

Layer	关键指标	决策
Portfolio	WIP by stage、risk tier distribution、flow load、evidence conversion、scale / stop ratio	fund、hold、stop、allocate capacity
Product value stream	idea-to-evidence、release-to-adoption、qualified value throughput、benefit recognition	prioritize、redesign workflow、scale
Engineering / platform	DORA metrics、platform reuse lead time、self-service success、incident recovery	invest in platform runway、reduce bottleneck
Team / collaboration	SPACE signals、review load、SME fatigue、blocked time、handoff age	adjust WIP、staff review capacity、change cadence
Risk / governance	gate queue、evidence aging、critical failure、control coverage、incident trend	approve、limit、rollback、increase assurance
Business value	AHT、FCR、loss avoided、backlog age、complaint、finance-recognized benefit	scale、redeploy capacity、change funding

8.2 Operating Review Cadence

Cadence	Participants	Focus	Decisions
Daily / twice-weekly flow clearing	AI PM、tech lead、data owner、risk liaison、ops lead	blocked work、aging items、review queue、failed eval	unblock, split work, route owner, reduce WIP
Weekly value stream review	Product, engineering, platform, risk, ops, data	flow metrics、DORA trend、eval status、adoption signal	select one bottleneck and one improvement bet
Biweekly release / adoption review	Product Ops, business owner, risk, support, QA	limited release, training, support tickets, override, guardrail	continue, restrict, rollback, expand cohort
Monthly benefits review	Value Office, finance, product, business owner, risk	baseline, incremental effect, cost, risk adjustment	recognize benefit, revise model, stop weak investment
Quarterly portfolio review	Executive sponsor, portfolio owner, CTO / CPO, risk, finance	stage distribution, capacity allocation, platform runway, scale / stop	fund, scale, retire, increase assurance, shift capacity

8.3 Metric Tension Pairs

AI VSM 必须同时看正向指标和反向指标, 防止局部优化。

Positive metric	Tension metric	Why it matters
Flow velocity	change fail rate、critical failure rate	快不能牺牲安全
Deployment frequency	risk-tiered approval and incident trend	高频发布必须上下文解释
Adoption rate	override quality、blind acceptance signal	采用可能来自过度信任
AHT reduction	rework、complaint、QA defect	快不能降低质量
Platform reuse	platform wait time、custom exception rate	平台复用不能变成新瓶颈
Qualified value throughput	benefit recognition and risk-adjusted value	事件数量必须能转成收益
SME review coverage	reviewer fatigue and queue age	不能把风险控制建立在不可持续人工负荷上

9. Portfolio-to-Platform-to-Product Trace

AI VSM 的高级能力是把一个 use case 从 portfolio thesis 一直追到 product telemetry 和 benefit ledger。

9.1 Trace Chain

Portfolio theme
  -> business capability / APQC process
  -> end-to-end value stream
  -> AI use case hypothesis
  -> platform capability dependency
  -> product workflow intervention
  -> release gate evidence
  -> adoption telemetry
  -> qualified value event
  -> benefits realization
  -> scale / stop decision

9.2 Trace Table

Trace element	Example	Owner	Evidence
Portfolio theme	Regulated operations intelligence	Portfolio owner	investment thesis、capacity allocation
Business capability	Financial crime operations、customer servicing、credit operations	Enterprise architect / business owner	capability map、process baseline
Value stream	Alert intake to investigation closure	Business ops owner	current-state and target-state VSM
Use case hypothesis	AI summary and checklist reduce low-risk alert handling time without quality loss	AI PM	discovery brief、no-AI alternative
Platform dependency	RAG, model gateway, eval harness, trace logging, human review queue	Platform PM	platform service contract
Product intervention	Analyst sees cited summary and next-step checklist in case workflow	Product owner	UX / workflow spec、decision boundary
Release evidence	eval pass, SME review, privacy approval, rollback, monitoring	Risk / release owner	evidence binder
Adoption telemetry	weekly active analysts, accepted summaries, edits, overrides	Product Ops	telemetry contract
Value event	quality-approved AI-assisted investigations completed within SLA	Value Office	metric contract、dashboard
Benefit	AHT reduction, backlog age reduction, QA defect stability	Finance / business owner	benefits register
Decision	scale to second queue, hold for SME load, or stop	Portfolio governance	scale / stop memo

9.3 Platform Runway Link

AI VSM should show whether a use case creates platform leverage.

Platform capability	Flow bottleneck it removes	Metric
Model gateway	repeated model approval, cost logging, vendor switch friction	model access lead time、policy compliance coverage
Eval harness	manual release evidence, inconsistent thresholds	eval design lead time、regression coverage
RAG / knowledge service	duplicate indexes, stale knowledge, weak citation	retrieval integration lead time、freshness SLA
Observability and trace	value and incident evidence missing	trace coverage、incident detection time
Policy-as-code	slow manual risk review	gate automation rate、policy exception aging
Human review workflow	SME review queue hidden and unmanaged	review queue age、review capacity load
Evidence binder	audit response depends on manual reconstruction	evidence completeness、audit response effort

9.4 Portfolio Funding Implication

The funding question changes from:

Should we fund this AI feature?

to:

Should we fund this value stream improvement,
including product workflow, platform runway, risk assurance,
adoption support and benefits measurement?

Funding buckets:

Bucket	Why it matters for flow	Example investment
Use case delivery	creates near-term business value	AML triage assistant limited release
Platform runway	reduces future flow time and risk cost	shared eval, RAG, audit trace
Risk assurance	prevents late gate blockage and unsafe scale	red-team, policy controls, model risk evidence
Data / knowledge readiness	removes repeated upstream blockers	policy knowledge ownership, case label quality
Adoption and change	converts release into operational value	training, manager cadence, support playbook
Benefits measurement	turns activity into finance-recognized value	baseline, telemetry, causal design, benefit register

10. Risk Gates and Business Value Gates

AI VSM treats gates as flow control points. A good gate can block, limit, route, accelerate or retire work based on evidence.

10.1 Gate Architecture

Gate	Purpose	Evidence	Decision
Intake gate	prevent weak AI ideas from entering delivery	business problem、process owner、baseline hypothesis、risk hypothesis	accept / reject / merge / park
Discovery gate	prove the problem is worth pilot capacity	workflow map、AI fit、no-AI option、data readiness、risk tier	fund pilot / refine / stop
Eval readiness gate	make quality measurable before build completion	eval contract、golden set、failure taxonomy、metric owner	build / revise eval / data work
Architecture gate	avoid one-off design and unsafe integration	target architecture、platform dependency、data boundary、rollback sketch	approve / redesign / use platform
Risk gate	ensure control path fits risk tier	privacy, security, model risk, compliance, human oversight, monitoring	go / limited go / no-go
Release gate	validate production readiness	release bundle, versions, SLO, incident route, evidence binder	release / conditional release / rollback
Adoption gate	prove real users changed workflow	cohort usage、acceptance、override、support load、training	scale / improve change plan / restrict
Benefits gate	prove value is real and recognized	baseline、incremental effect、cost、risk adjustment、finance sign-off	scale / hold / stop
Scale / retire gate	decide where capacity goes next	risk trend、platform capacity、unit economics、adoption durability	scale / hold / retire / stop

10.2 Gate Principles

Gate evidence should be created during work, not reconstructed for meetings.
High-risk use cases need earlier gates, not only stricter release gates.
Gate queue time is a flow metric. If it grows, the operating system is under-designed.
A gate without a stop or limit decision is just paperwork.
A gate should distinguish "not ready yet" from "not worth doing" from "convert this into platform or data investment"。

10.3 AI Risk Gate Checklist

Risk dimension	Gate question	Example evidence
Customer harm	Could AI mislead, deny, delay, overcharge, expose or unfairly treat a customer	customer impact assessment、complaint trigger
Decision authority	Is AI search, draft, recommend, triage, decision support or automated action	AI authority statement、human oversight RACI
Data and privacy	Are data rights, retention, PII handling and access boundaries clear	data inventory、privacy review、access log
Model behavior	Are hallucination, unsupported claim, toxicity, calibration and answerability controlled	eval results、failure taxonomy
Security	Are prompt injection, tool abuse, secret exposure and supply chain risks addressed	security test、tool permission matrix
Compliance	Are regulated statements, advice, KYC / AML / credit policies controlled	policy tests、compliance sign-off
Operations	Can users handle exceptions, overrides, appeals and support load	runbook、support model、QA sampling plan
Reliability	Can the system fail safely and recover quickly	fallback、feature flag、rollback drill
Auditability	Can decisions and outputs be explained and reconstructed	trace, versions, citations, approval records
Benefits integrity	Can value claims be measured and attributed	metric contract、baseline、benefits register

11. Financial Retail Case: Regulated Operations Intelligence

11.1 Context

A financial retail institution wants to use AI across AML operations, customer servicing and branch knowledge. The executive goal is not "launch more AI", but:

Reduce regulated operations workload,
improve decision quality,
shorten customer and employee waiting time,
and build reusable AI platform controls
without increasing compliance, privacy, fairness or operational risk.

11.2 Portfolio Theme and Value Streams

Portfolio theme	Business capability	Value stream	Candidate AI intervention
Regulated operations intelligence	Financial crime operations	Alert intake -> investigation -> QA -> closure / escalation	AML alert triage assistant
Regulated operations intelligence	Customer servicing	Contact intake -> intent -> answer -> resolution -> QA	Customer service policy copilot
Regulated operations intelligence	Branch operations	Employee query -> policy retrieval -> customer conversation -> follow-up	Branch knowledge assistant
Regulated operations intelligence	Credit operations	Document intake -> policy check -> memo -> underwriting review	Credit memo assistant

11.3 Current-State Observations

Observation	Flow implication	Evidence to collect
AML analysts spend significant time assembling context before judgment	high wait and search time inside operations flow	time study、case system log、analyst interview
Customer agents search multiple knowledge bases during calls	high context switching and inconsistent answers	desktop telemetry、transfer reason、QA defects
Branch staff ask support teams repetitive policy questions	hidden demand and slow knowledge flow	support ticket volume、policy query categories
Credit memo quality varies by underwriter and source documents	high rework and policy citation gaps	memo QA、exception reason、cycle time
AI PoCs depend on different RAG and logging patterns	platform fragmentation	architecture inventory、reuse gap

11.4 Target Value Stream: AML Alert Triage Assistant

Alert created
  -> eligibility check
  -> retrieve case history, transaction context and policy guidance
  -> AI produces cited summary and investigation checklist
  -> analyst reviews, edits and decides next action
  -> QA samples output and decision evidence
  -> case closed or escalated
  -> telemetry updates eval set, risk controls and benefits register

11.5 Metrics for the AML Stream

Metric layer	Metric	Baseline / target logic
Flow	alert-to-first-action time、case cycle time、queue age	reduce waiting and context assembly time
Adoption	eligible analyst weekly active usage、summary acceptance、edit distance、override reason	prove analyst workflow fit
AI quality	citation correctness、unsupported claim rate、missing key fact rate、checklist relevance	release and regression evidence
Risk guardrail	critical evidence defect、wrong policy reference、unauthorized data retrieval、SAR escalation error	hard stop or scale restriction
DORA / platform	prompt / index change lead time、eval-to-release time、rollback time、change fail rate	govern AI artifact changes
SPACE	analyst trust pulse、SME review load、review queue age、change fatigue	ensure control system is sustainable
Business value	AHT reduction、backlog age、QA defect stability、capacity redeployment	benefits realization
Unit economics	AI cost per case、QA cost per case、platform support cost	scale economics

11.6 90-Day Decision Example

Evidence	Result	Interpretation
Summary acceptance	72% of eligible pilot alerts	adoption signal positive
Case handling time	median handling time down 12% in pilot queue	value signal positive
QA defect	no increase in critical defects	quality guardrail stable
SME review load	review queue age grew from 1 day to 4 days	scale bottleneck
Citation correctness	96% pass on QA sample	release evidence acceptable
Cost per case	within approved pilot envelope	economics acceptable for limited scale
Benefit recognition	finance accepts capacity release only for low-risk queue	benefits are partial but credible

Decision:

Hold broad scale.
Scale only to a second low-risk queue after human review workflow is improved.
Fund platform runway for reusable SME sampling, evidence binder and eval case management.
Do not expand to high-risk typologies until critical failure taxonomy and review capacity are stronger.

11.7 Portfolio Flow Snapshot

Stage	Use cases	Governance focus
Idea	collections contact strategy	risk and fairness pathway unclear, keep outside delivery WIP
Discovery	credit memo assistant、complaint classification	data readiness, policy boundaries, baseline
Pilot	AML alert triage assistant	eval, SME review, limited cohort, benefit signal
Release	customer service policy copilot	monitoring, training, wrong-policy guardrail
Scale	branch knowledge assistant	knowledge freshness, platform reuse, support deflection
Retire	legacy FAQ bot	migrate content to governed knowledge service

11.8 What This Case Shows in an Interview

Signal	What it demonstrates
AI use case tied to APQC-like process and business capability	enterprise architecture maturity
Delivery VSM plus operations VSM	ability to connect build, release, adoption and business outcomes
Flow Metrics plus DORA / SPACE	advanced product operations and engineering productivity thinking
Risk gate and benefit gate	regulated AI governance maturity
Platform runway investment	architecture leverage and portfolio economics
Scale held due to SME bottleneck	mature decision discipline, not AI hype

12. Templates

12.1 AI Value Stream Canvas

# AI Value Stream Canvas

Value stream name: AML alert triage to investigation closure
Portfolio theme: Regulated operations intelligence
Business capability: Financial crime operations
Business owner: Head of AML Operations
Product owner: AI Operations Product Lead
Risk owner: Financial Crime Risk
Platform owner: AI Platform Lead
Review date: 2026-06-29

## 1. Business outcome
Reduce low-risk alert handling time and backlog age while maintaining investigation quality and auditability.

## 2. Current-state flow
Alert created -> analyst searches customer and transaction context -> analyst checks policy and typology notes -> analyst drafts case narrative -> QA sample -> closure or escalation.

## 3. Current bottlenecks
- Context assembly consumes analyst time.
- Policy search is inconsistent across teams.
- QA defects often relate to missing evidence or weak narrative support.
- Review capacity is limited for pilot expansion.

## 4. AI intervention
AI retrieves approved case context and policy guidance, drafts a cited summary, and recommends an investigation checklist.
AI does not close alerts, file SARs, downgrade risk, or make final compliance decisions.

## 5. Target-state flow
Alert created -> eligibility check -> approved context retrieval -> AI cited summary and checklist -> analyst review and decision -> QA sample -> telemetry feedback to eval and benefits register.

## 6. Flow metrics
| Metric | Baseline | Target / decision rule |
|---|---:|---|
| Idea-to-evidence lead time | 18 business days | <= 15 business days for similar future use cases |
| Release-to-adoption lead time | 6 weeks | repeat adoption by >= 65% eligible analysts within 6 weeks |
| Qualified value throughput | 0 | quality-approved assisted alerts per week after limited release |
| SME review queue age | 1 day | must not exceed 3 days during pilot |

## 7. Risk gates
| Gate | Evidence |
|---|---|
| Eval readiness | 500 historical case golden set, critical failure taxonomy, citation rubric |
| Release | rollback path, trace logging, human approval, QA sample plan |
| Scale | no critical evidence defect, stable QA, review capacity within threshold |

## 8. Benefits realization
Recognize benefit only for eligible low-risk alerts where AI was exposed, analyst accepted or materially used the output, QA passed, and case handling time improved versus baseline or control.

## 9. Scale / stop rule
Scale to another low-risk queue if AHT improves >= 10%, QA critical defects remain 0, citation pass >= 95%, review queue age <= 3 days, and cost per case stays within approved ceiling.
Stop expansion if any critical privacy, evidence or regulated decision boundary breach occurs.

12.2 Flow Metrics Dashboard

# AI Flow Metrics Dashboard

Audience: AI portfolio review, product operations, platform leadership, risk governance, Value Office
Cadence: weekly for flow, monthly for benefits, quarterly for portfolio funding

## Executive summary
| Decision area | Metric | Current signal | Decision |
|---|---|---|---|
| Portfolio WIP | active use cases by stage and risk tier | Pilot WIP exceeds review capacity | freeze new high-risk pilots |
| Value flow | qualified value throughput | customer service copilot improving | prepare scale memo |
| Bottleneck | SME review queue age | AML review queue aging | fund review workflow automation |
| Risk | critical guardrail breaches | none in limited release | continue release with monitoring |
| Benefits | finance-recognized benefit | partial recognition for service queue | expand causal measurement |

## Portfolio flow
| Metric | Definition | Slice |
|---|---|---|
| Flow load | active AI work items in idea, discovery, pilot, release, scale | stage, risk tier, business capability |
| Flow distribution | percent of work in feature, platform, risk, data, adoption, debt | portfolio theme |
| Evidence conversion | percent of discovery items converted to pilot, platform investment or stop decision | business unit |
| Stop decision latency | days from stop signal to decision | owner, risk tier |

## Product value stream
| Metric | Definition | Slice |
|---|---|---|
| Idea-to-evidence lead time | intake accepted to discovery evidence ready | use case, capability |
| Release-to-adoption lead time | limited release to repeat adoption threshold | cohort, role |
| Qualified value throughput | quality and risk-passed AI value events per week | workflow, risk tier |
| Benefit recognition cycle time | production release to finance-recognized benefit | business unit |

## Engineering and platform
| Metric | Definition | Slice |
|---|---|---|
| AI change lead time | spec / eval / build / release time for AI artifacts | code, prompt, model, index, policy |
| Change fail rate | AI changes requiring rollback, hotfix or control intervention | service, risk tier |
| Platform reuse lead time | request to production use of shared platform capability | capability |
| Eval queue time | eval submitted to gate decision | use case, test suite |

## Risk and operations
| Metric | Definition | Slice |
|---|---|---|
| Evidence aging | days since evidence matched current production config | evidence type |
| Critical failure rate | no-go failures per evaluated workflow item | risk category |
| Human review load | reviewer capacity used and queue age | SME, risk, QA |
| Incident recovery time | detection to fallback / rollback / stable operation | incident type |

## Adoption and SPACE
| Metric | Definition | Slice |
|---|---|---|
| Repeat adoption | eligible users using AI in target workflow over repeated periods | role, team |
| Override reason distribution | why users reject or edit AI output | workflow step |
| Trust pulse | user trust score with free-text reason coding | cohort |
| Review fatigue signal | reviewer load, after-hours review, queue stress | reviewer group |

12.3 Blocked Work Taxonomy

Block category	Symptom	Metric	Owner	Response
Business problem unclear	idea keeps changing, no baseline	discovery rework count	business owner / AI PM	rewrite problem statement, freeze baseline hypothesis
Process owner missing	no one owns workflow change	intake aging	portfolio owner	require process owner before discovery
Data access / rights	waiting for permissions or data retention decision	data readiness wait time	data owner / privacy	create data decision record and approved access path
Data quality / labels	eval or pilot cannot trust labels	label defect rate、golden set aging	data product owner	fund data product or reduce scope
Knowledge freshness	RAG answers cite stale policy	freshness breach count	knowledge owner	assign content owner and freshness SLA
Eval ambiguity	teams debate quality after build	eval design lead time、eval rework	EvalOps / AI PM	define eval contract before release candidate
Risk review queue	gate waits on manual review	risk gate queue time	risk governance owner	add evidence checklist, risk liaison, pre-review
Security / privacy boundary	unclear PII, tool or prompt injection exposure	security exception age	security / privacy	reduce tool scope, add controls, retest
Architecture fit	one-off solution bypasses platform	platform exception rate	architect / platform PM	route through shared capability or approve bounded exception
Human review capacity	SME queue grows during pilot	review queue age、review utilization	ops / risk	cap WIP, adjust sampling, fund workflow
Adoption friction	users ignore AI after release	release-to-adoption lead time	Product Ops / business ops	redesign workflow, training, manager cadence
Telemetry gap	value cannot be measured	metric contract coverage	analytics owner	instrument assignment, exposure, action, outcome
Finance recognition	benefit claim not accepted	benefit recognition cycle time	Value Office / finance	agree baseline, effect method, cost treatment
Vendor / procurement	model or tool contract delays release	vendor wait time	procurement / platform	use approved gateway or define exit path

12.4 Benefits Realization Loop

Baseline
  -> eligibility and exposure logging
  -> adoption and action telemetry
  -> quality and risk qualification
  -> incremental effect estimate
  -> cost and risk adjustment
  -> finance / business sign-off
  -> capacity redeployment or value capture
  -> post-scale audit
  -> portfolio scale / stop decision

Template:

# AI Benefits Realization Loop

Use case: Customer service policy copilot
Business owner: Contact Center Operations
Finance owner: FP&A partner
Risk owner: Customer Conduct Risk
Review period: 2026 Q3 month 2

## Baseline
Eligible contacts: 620,000 per month
Baseline AHT: P50 7.8 minutes
Baseline reopen rate: 11.2%
Baseline complaint rate: 0.42%
Baseline QA policy defect rate: 3.1%

## Exposure and adoption
Treatment cohort: servicing queues A and B
Eligible exposed contacts: 184,000
Accepted AI answer: 118,000
Repeat adoption: 68% of eligible agents

## Quality and guardrail
Citation QA pass: 96.5%
Critical wrong policy answer: 0
PII leakage: 0
Reopen rate: no credible increase versus control
Complaint rate: stable within approved threshold

## Incremental effect
Cluster rollout estimate: AHT reduction of 0.6 minutes per accepted exposed resolved contact.
Effect recognized only for contacts passing quality and guardrail filters.

## Cost and risk adjustment
Costs included: model, retrieval, platform support, QA sampling, training, monitoring.
Risk adjustment: high-risk intents excluded from recognition until separate gate.

## Realized benefit
Recognized benefit: capacity redeployed to peak-hour backlog and training coverage.
Finance treatment: limited-scale capacity benefit accepted for Q3 operating review.

## Next decision
Scale to two additional low-risk servicing queues.
Hold complaints, hardship and vulnerable customer intents until policy and escalation controls pass separate gate.

12.5 Portfolio Evidence Pack

# AI Portfolio Evidence Pack

Portfolio theme: Regulated operations intelligence
Quarter: 2026 Q3
Decision meeting: Quarterly AI portfolio review

## 1. Portfolio thesis
Invest in AI capabilities that reduce regulated operations workload, improve quality, reuse approved platform controls, and produce measurable value evidence within one quarter.

## 2. Stage distribution
| Stage | Items | Risk profile | Decision need |
|---|---|---|---|
| Idea | collections contact strategy | high fairness and conduct risk | park until risk pathway matures |
| Discovery | credit memo assistant | high credit policy risk | continue data readiness work |
| Pilot | AML alert triage assistant | high AML evidence risk | limited scale only if review load is controlled |
| Release | customer service policy copilot | medium conduct risk | scale low-risk intents |
| Scale | branch knowledge assistant | low / medium policy risk | expand through knowledge platform |
| Retire | legacy FAQ bot | audit and content duplication risk | migrate and shut down |

## 3. Flow metrics
| Metric | Portfolio signal | Decision |
|---|---|---|
| Flow load | high-risk pilots exceed SME capacity | freeze new high-risk pilot starts |
| Gate queue time | privacy and model risk review stable, SME review aging | fund review workflow |
| Platform reuse | RAG and trace reused by 3 use cases | continue platform runway |
| Benefit recognition | 2 of 4 production use cases have finance-recognized benefit | strengthen telemetry for others |

## 4. Risk evidence
- No critical PII leakage in production AI workflows.
- Two policy citation defects added to eval regression set.
- One tool permission exception closed before release.
- SME review capacity is the primary scale constraint.

## 5. Funding recommendation
- Continue customer service copilot scale for low-risk intents.
- Hold AML broad scale until review queue and sampling workflow improve.
- Convert repeated RAG and evidence needs into platform runway funding.
- Stop independent FAQ bot maintenance and migrate content.

12.6 Scale / Stop Memo

# AI Scale / Stop Memo

Use case: AML alert triage assistant
Current stage: Limited release
Decision requested: Limited scale to second low-risk queue
Decision date: 2026-06-29

## Value evidence
- Median handling time decreased 12% in pilot queue.
- Backlog age decreased 9% for eligible low-risk alerts.
- Analyst repeat adoption reached 72%.
- Summary acceptance reached 76% after week 4.

## Quality and risk evidence
- Critical evidence defect: 0.
- Citation correctness: 96%.
- Unsupported claim rate: below approved threshold.
- AI did not close alerts, file SARs or downgrade risk.

## Flow evidence
- Eval-to-release lead time: 9 business days.
- Risk gate queue time: stable.
- SME review queue age increased to 4 days, above target.

## Unit economics
- AI cost per case inside approved limited-release ceiling.
- QA cost increased due to manual sampling.
- Platform trace and RAG components reused by customer service copilot.

## Decision
Limited scale to one additional low-risk queue.
Condition: review queue age must return to <= 3 days before further expansion.
Fund shared review workflow and evidence binder as platform runway.
Do not expand to high-risk typologies until review capacity, critical failure taxonomy and escalation controls pass scale gate.

## Stop triggers
- Any confirmed critical privacy breach.
- Any AI-attributable alert closure or SAR escalation boundary breach.
- Two consecutive monthly reviews with no handling-time or backlog benefit.
- SME review queue age above 5 business days for two consecutive review cycles.

13. Review Checklists

13.1 AI VSM Design Checklist

Is the AI use case tied to a named business capability and end-to-end process.
Is there a current-state and target-state value stream, not only a feature description.
Are flow metrics defined from idea to release and from release to value realization.
Are delivery flow and operations flow separated but connected.
Are DORA metrics adapted to code, prompt, model, data, index, policy and tool changes.
Are SPACE signals used at team / system level, not for individual ranking.
Are WIP limits and flow load visible by risk tier and value stream stage.
Is blocked work classified with owner and response path.
Are risk gates integrated into flow rather than attached at the end.
Is benefits realization designed before scale.

13.2 Flow Metrics Review

Does flow time include waiting, review, gate queue and adoption delay.
Is flow efficiency exposing delay and handoff, not blaming teams.
Is flow velocity adjusted for risk, quality and qualified value events.
Is flow distribution balanced across feature, platform, risk, data, adoption and debt.
Is flow predictability interpreted by risk tier and service context.
Are stop decisions tracked as healthy portfolio outcomes.
Are metrics paired with tension indicators to prevent gaming.

13.3 Risk and Gate Review

Does each gate have a clear go, limited go, no-go, rollback or stop decision.
Are privacy, security, compliance, model risk and operational risk addressed before release.
Is AI authority explicit: search, draft, recommend, triage, decision support or automated action.
Is human accountability visible in workflow and telemetry.
Are eval results connected to production monitoring and incident learning.
Are critical failures added to regression, policy or runbook updates.
Is evidence aging tracked after model, prompt, data, index or policy changes.

13.4 Benefits Realization Review

Is there a pre-release baseline.
Are assignment, exposure, adoption, action, outcome and guardrail logged.
Is value counted only for quality-passed and risk-passed events.
Are model, platform, QA, training, monitoring, governance and risk costs included.
Is finance treatment explicit: cost reduction, capacity redeployment, loss avoided, SLA improvement or revenue protection.
Is post-scale audit planned to detect benefit decay.
Is the scale decision based on realized value and risk stability, not only release success.

14. Anti-Patterns

Anti-pattern	Why it is dangerous	Better practice
Counting AI features as value	encourages shipping without workflow impact	count qualified value events and benefits
Counting calls or tokens as adoption	usage can be waste, rework or curiosity	measure eligible workflow adoption and accepted action
Treating PoC completion as success	PoC may avoid production controls	track PoC-to-release and PoC-to-stop conversion
Running risk gates only at the end	late rejection creates rework and political pressure	use risk tiering and evidence gates from intake
Building every use case as custom	portfolio flow slows and audit burden grows	fund platform runway for repeated capabilities
Averaging high and low risk use cases	masks risk and flow differences	slice by risk tier, process, channel and artifact type
Using DORA as a ranking tool	damages collaboration and distorts behavior	use DORA for team / service improvement
Ignoring SPACE signals	reviewers, SMEs and users become hidden bottlenecks	monitor review load, trust, fatigue and handoff quality
Skipping no-AI alternatives	AI becomes solution-first theater	compare rules, UI, workflow redesign, training and automation
Claiming time saved without release path	finance cannot recognize vague savings	define capacity redeployment or cost treatment
Scaling before adoption is stable	production cost grows without realized value	add adoption gate before scale
Treating stop as failure	weak use cases consume scarce capacity	make stop, retire and merge normal portfolio decisions

15. 30-Day Training Plan

目标: 30 天内产出一套可放入作品集的 AI Value Stream Management / Flow Metrics 证据包。

Day	Theme	Output
1	Select financial retail AI value stream	use case brief and business capability map
2	Map current-state process	current-state AI VSM
3	Define target-state AI intervention	target-state VSM and decision boundary
4	Identify process owner, risk owner, platform owner	RACI
5	Define problem baseline	baseline metric table
6	Define qualified AI value event	value event contract
7	Review week 1	one-page value stream narrative
8	Build delivery flow map	idea-to-release flow
9	Build operations flow map	business event-to-outcome flow
10	Define core Flow Metrics	flow time, velocity, efficiency, load, distribution
11	Extend metrics for AI	data, eval, risk, platform, adoption, benefits
12	Map DORA metrics to AI artifacts	DORA extension sheet
13	Map SPACE signals	reviewer, SME, user and team flow signals
14	Review week 2	metrics stack summary
15	Design risk gates	intake, discovery, eval, release, adoption, benefits
16	Design blocked work taxonomy	blocker table with owner and response
17	Design flow dashboard	executive, portfolio, product, platform, risk views
18	Design benefits realization loop	baseline to finance sign-off
19	Build portfolio-to-platform-to-product trace	traceability table
20	Define platform runway linkage	platform capability map
21	Review week 3	operating model narrative
22	Write AML case	case flow and metrics
23	Write customer service case	value event and dashboard
24	Write credit ops case	risk gate and benefits model
25	Write branch knowledge case	knowledge freshness and platform reuse
26	Build scale / stop memo	decision memo
27	Write interview answers	6 advanced answers
28	Assemble portfolio pack	artifacts list and story
29	Self-review against checklist	gap fixes
30	Final executive narrative	5-minute storyline

完成标准:

One current-state and one target-state AI value stream map.
Flow Metrics dashboard with portfolio, product, platform, risk, adoption and benefits layers.
DORA / SPACE integration without vanity metrics.
Risk gates and benefits gates connected to scale / stop decisions.
At least one financial retail case with qualified value event and evidence pack.
Clear portfolio-to-platform-to-product trace.

16. Interview Answers

Q1: 你如何用 VSM 管 AI use case?

30 秒版本:

我会把 AI use case 当成 value stream item, 从 idea 到 discovery、data / eval readiness、risk gate、limited release、adoption、benefits realization 和 scale / stop 全链路管理。指标上不只看功能上线和调用量, 而是看 flow time、blocked time、gate queue、qualified value events、risk guardrail 和 finance-recognized benefits。

2 分钟版本:

我会先把 use case 挂到业务 capability 和 end-to-end process, 例如 AML alert triage 或 customer servicing。然后画 current-state 和 target-state value stream, 明确 AI 插入哪个步骤、改变什么决策、谁承担 accountability。接着定义 Flow Metrics: idea-to-evidence lead time、data readiness wait time、eval design lead time、risk gate queue time、release-to-adoption lead time、qualified value throughput 和 benefit recognition cycle time。工程侧用 DORA 看 AI artifact change 的 lead time、release frequency、change fail rate 和 recovery; 团队侧用 SPACE 看 review load、SME fatigue、trust 和 collaboration。最后用 risk gate 和 benefits gate 决定 scale、hold、stop 或转成 platform investment。

Q2: 为什么不能用 AI 调用量证明 AI 价值?

30 秒版本:

调用量只能说明 AI 被使用或被系统触发, 不能说明它产生了合格业务结果。金融零售里我更看 qualified AI value event: 目标流程合格、AI 暴露、用户采纳、质量通过、风险门槛通过、可审计、单位经济成立。

2 分钟版本:

例如客服 copilot 的调用量上升可能是因为知识库难用、agent 反复问、答案不准或用户好奇。真正的价值事件应该是: eligible contact 中 AI 给出有引用的答案, agent 采纳或合理编辑, 客户问题一次解决, 7 天内无 reopen, 没有 wrong policy answer、PII leakage 或投诉上升, 并且 cost per resolved contact 在边界内。只有这样的事件才能进入 benefits realization。否则 API calls 和 tokens consumed 只是 activity, 甚至可能是成本和风险。

Q3: Flow Metrics、DORA 和 SPACE 怎么放在一起?

30 秒版本:

Flow Metrics 看价值流是否顺畅, DORA 看交付系统是否更快更稳, SPACE 看人和协作是否健康。AI 场景要再叠加 eval、risk gate、platform reuse 和 benefits realization。

2 分钟版本:

Flow Metrics 用于 portfolio 和 product value stream: flow time、velocity、efficiency、load、distribution、predictability。DORA 用于 AI SDLC: prompt、model、index、policy、tool schema 和 code 的 change lead time、deployment frequency、change fail rate、recovery 和 rework。SPACE 用于解释为什么流动变好或变差, 例如 reviewer load、SME fatigue、context switching、trust 和 collaboration。三者共同连接到 business value: 只有当交付更快更稳、团队没有被 review 和治理压垮、上线后 adoption 和 qualified value events 改善, 才能说 AI operating system 真的有效。

Q4: 高风险金融 AI 如何做 release 和 scale gate?

30 秒版本:

高风险 AI 不能把 pilot approval 当成 scale approval。我会分 intake、discovery、eval readiness、architecture、risk、release、adoption、benefits、scale gate, 每个 gate 都有证据和明确 go / limited go / no-go / rollback / stop decision。

2 分钟版本:

以 AML alert triage 为例, intake gate 要证明 business owner、process baseline 和 risk hypothesis; discovery gate 要有 workflow map、AI fit、data readiness 和 no-AI alternative; eval gate 要有 golden set、failure taxonomy、citation rubric; risk gate 要确认 AI 不自动关闭 alert、不做 SAR 决策、有人审阅、日志可追溯; release gate 要有 rollback、monitoring 和 evidence binder; adoption gate 要看 analyst acceptance、override、support load; benefits gate 要看 AHT、backlog、QA defect 和 finance treatment。scale 只有在 value、risk、review capacity、platform capacity 和 unit economics 同时成立时才放行。

Q5: 如何向 CFO 解释 AI VSM 的价值?

30 秒版本:

我会把 AI VSM 从流程管理翻译成投资组合效率: 更快发现无效 use case, 更少 late-stage rework, 更高 qualified value throughput, 更短 benefit recognition cycle, 更清楚的风险调整收益。

2 分钟版本:

CFO 不需要看 AI 调用量, 需要看资金和容量是否流向可兑现价值。我会展示 portfolio flow: 多少 idea 转成有效 pilot, 多少被及时停止, 哪些平台能力降低后续接入成本, 哪些上线用例产生 finance-recognized benefit。收益端看 capacity redeployment、loss avoided、SLA improvement、cost per value event; 成本端包括模型、平台、QA、training、monitoring、governance 和风险成本。这样 AI 预算不是一堆项目申请, 而是一套能持续学习、放大有效投资并停止低质量投资的 operating system。

Q6: 如何证明平台 runway 不只是技术成本?

30 秒版本:

平台 runway 的价值要用 flow 和 risk economics 证明: 它是否缩短 platform reuse lead time、eval design lead time、risk gate queue time、release recovery time, 并提高 evidence reuse 和 qualified value throughput。

2 分钟版本:

Model gateway、eval harness、RAG service、observability、policy-as-code、human review workflow 和 evidence binder 如果只按技术组件汇报, CFO 和业务很难感知价值。我会把它们映射到 value stream bottleneck: gateway 减少模型审批和供应商切换成本; eval harness 减少 release gate 重工; RAG service 减少重复索引和引用错误; evidence binder 降低审计重建成本; review workflow 解决 SME capacity bottleneck。平台投资的 KPI 不是接了多少模型, 而是让更多 AI use case 更快、更稳、更可审计地通过 release 和 benefits gate。

Q7: 如果 AI 用例上线了但 adoption 很低, 你怎么处理?

30 秒版本:

我不会立刻判定模型失败。我会看 release-to-adoption lead time、eligible exposure、workflow fit、trust、manager cadence、training、support load、override reason 和 incentive mismatch, 然后决定 redesign workflow、调整 change plan、限制范围或停止。

2 分钟版本:

Adoption 低通常有多种原因: AI 不在用户自然工作流里, 输出缺少 citation, 用户不信任, manager 没有把新流程纳入日常管理, 或使用 AI 增加了 QA 负担。我会把 adoption 当作 operations value stream 的一段来测: eligible users 是否暴露, 是否首次使用, 是否 repeat use, 输出是否被采纳或编辑, 采纳后 workflow outcome 是否改善。若 adoption 低但质量好, 可能需要产品和变更管理; 若 adoption 低且 override 指向质量问题, 回到 eval 和 RAG; 若 adoption 长期低且收益不成立, 应停止或缩小。

Q8: VSM 如何帮助你做 AI portfolio governance?

30 秒版本:

VSM 让 portfolio review 从状态汇报变成资金和容量决策。我们可以看到 WIP、stage aging、risk tier、blocked work、platform dependency、benefits status 和 stop signals, 然后决定 fund、scale、hold、stop、retire 或投资平台 runway。

2 分钟版本:

AI portfolio 最大问题通常不是 idea 少, 而是太多 idea 争夺产品、工程、数据、risk、SME 和平台容量。VSM 可以显示哪些 use case 卡在 data readiness, 哪些卡在 risk gate, 哪些 release 后 adoption 不动, 哪些已经有收益但 review capacity 限制 scale。这样季度 review 不再问 "每个项目进展如何", 而是问 "哪条价值流值得加速, 哪些阻塞需要平台投资, 哪些高风险 use case 应限制, 哪些低价值 work 应停止"。这就是 portfolio-to-platform-to-product 的治理闭环。

17. Portfolio Package

一套高级 AI VSM 作品集可以包含:

Artifact	内容	展示能力
Executive one-pager	问题、目标、核心 value stream、metric stack、governance principle	高管沟通
AI value stream map	current-state, target-state, AI intervention, bottleneck	架构和流程分析
Flow metrics dashboard	portfolio、product、platform、risk、adoption、benefits 分层	产品运营指标
DORA / SPACE integration	AI SDLC delivery and team health metrics	工程生产力
Risk gate pack	intake, discovery, eval, release, adoption, benefits, scale gates	可信 AI 治理
Blocked work taxonomy	blocker categories, metric, owner, response	flow improvement
Portfolio-to-platform trace	theme -> capability -> use case -> platform -> value event -> benefit	投资组合和架构连接
Benefits realization loop	baseline, exposure, quality, effect, cost, finance sign-off	价值证明
Financial retail case	AML / customer service / credit / branch use case	行业落地
Scale / stop memo	evidence, decision, conditions, stop triggers	成熟治理判断
Interview answer bank	6-8 个高阶问题答案	求职表达

5 分钟讲述结构:

0:00-0:40  问题定义
AI success cannot be measured by features shipped or model calls.

0:40-1:30  方法
I manage AI as value streams: idea, evidence, safe release, adoption, benefits and scale / stop.

1:30-2:30  Metrics
I combine Flow Metrics, DORA, SPACE, risk gates and qualified value events.

2:30-3:30  Financial retail case
Use AML or customer service to show delivery flow, operations flow, risk controls and benefits.

3:30-4:30  Portfolio and platform
Show how repeated blockers become platform runway and how funding decisions change.

4:30-5:00  Close
The goal is not more AI activity. The goal is faster, safer, reusable and finance-recognized AI value.

Final memory card:

AI VSM = portfolio flow + product flow + platform flow + risk flow + benefit flow.

Do not manage AI by feature count or model calls.
Manage the path from idea evidence to safe release,
from release to adoption,
from adoption to qualified value events,
from value events to benefits realization,
and from benefits to portfolio scale / stop decisions.