返回 Papers
AI 扩展计划 / Playbooks

AI Value Stream Management / Flow Metrics Playbook

这些来源作为术语和方法锚点。访问日期按 2026-06-29 记录。

1,157AI_VALUE_STREAM_MANAGEMENT_FLOW_METRICS_PLAYBOOK.md

AI Value Stream Management / Flow Metrics Playbook

适用对象: AI PM、AI Product Operations Lead、AI Portfolio Lead、AI Product Architect、Enterprise Architect、Value Office Lead、金融零售 AI 转型负责人。 目标: 把 AI use case 从 idea 到 safe release 到 adoption / value realization 的流动管起来, 并把 DORA / SPACE、Flow Metrics、risk gates、platform runway、portfolio funding 和 business value 连接成一套可运营的价值流管理系统。 核心观点: AI Value Stream Management 不是画流程图, 而是持续管理 "价值流动、风险证据、平台能力、组织容量和收益兑现"。 边界说明: 本文是学习、架构设计和作品集材料, 不构成法律、监管、审计、模型验证、财务确认或供应商选型意见。金融零售正式项目必须由 business owner、risk、model risk、legal、compliance、privacy、security、finance、architecture、operations 和 internal audit 共同确认适用要求。


1. Source Anchors

这些来源作为术语和方法锚点。访问日期按 2026-06-29 记录。

AnchorOfficial / primary source本 playbook 中的用法
APQC Process Classification Frameworkhttps://www.apqc.org/process-frameworks用 PCF 的流程分类和流程绩效语言连接 business capability、end-to-end process、benchmark、process owner 和 AI value stream。
Flow Frameworkhttps://flowframework.org/用 Value Stream Management、Flow Metrics 和 business-technology common language 连接 project-to-product、portfolio flow、product value stream 和 flow bottleneck。
DORAhttps://dora.dev/用 delivery performance、throughput、stability、lead time、deployment frequency、change failure、recovery 和 continuous improvement 连接 AI SDLC 流动效率。
NIST AI RMFhttps://www.nist.gov/itl/ai-risk-management-framework用 Govern / Map / Measure / Manage 组织 AI risk gate、evidence、monitoring、incident 和持续治理。

Source-to-artifact mapping:

Source lens可以产出的 artifact高级表达
APQC PCFAI process-to-capability map、process owner map、benchmark baseline"我先把 AI use case 挂到业务流程和 capability 上, 避免从模型能力反推需求。"
Flow FrameworkAI value stream map、flow metrics dashboard、blocked work taxonomy"我用 flow 管业务价值如何从 idea 流到生产和收益, 不是只管项目状态。"
DORAAI SDLC delivery dashboard、release stability scorecard、change recovery review"AI 交付要同时更快和更稳, 不能用 demo 速度掩盖 release risk。"
NIST AI RMFAI risk gate checklist、evidence binder、monitoring and response loop"每个流动阶段都有风险证据, 不是上线前一次性审批。"

2. One-Sentence Positioning

AI Value Stream Management 是把 AI opportunity、delivery work、risk evidence、platform capability、adoption behavior 和 realized business value 放进同一条可度量价值流, 让组织能决定 fund、accelerate、block、scale、hold、stop 或 retire。

更短的面试版:

I manage AI use cases as value streams, not feature queues:
from idea evidence to safe release, adoption, risk-controlled operation and finance-recognized value.

管理层中文表达:

我不会用 "上线了多少 AI 功能" 或 "调用了多少次模型" 判断 AI 成功。我会看 AI 是否穿过了完整价值流: 业务问题是否真实, 数据和风险证据是否成立, 平台能力是否复用, 上线是否安全, 用户是否采用, 业务收益是否被 finance 和业务 owner 认可, 风险是否在 scale 后仍然稳定。


3. 目的 / 适用对象 / 核心观点

3.1 目的

本文解决五个高级问题:

  1. AI use case 如何从 idea funnel 流到 production, 而不是停在 PoC 或 dashboard。
  2. 如何用 Flow Metrics 找出 AI delivery 和 AI-enabled operations 中的等待、返工、风险阻塞和收益断点。
  3. 如何把 DORA / SPACE 的工程生产力语言, 与 portfolio funding、platform runway、risk gates 和 business outcomes 连接。
  4. 如何证明 AI 价值不是 activity、usage、demo 或模型分数, 而是受控风险下的 adoption 和 benefits realization。
  5. 如何把金融零售中的 AML、客服、信贷、分行、投诉、风控等流程转成可投资、可治理、可度量的 AI value streams。

3.2 适用对象

角色关心的问题本文输出
AI PM / AI Product Ops如何把 AI 产品从功能交付管到 adoption 和 value realizationvalue stream map、dashboard、benefits loop
AI Portfolio Lead如何比较多个 AI 投资机会并控制 WIPportfolio-to-product trace、flow load、evidence gates
Product Architect / Enterprise Architect如何把 AI use case 连接到 capability、platform、data、risk controlscapability map、platform runway、risk gate architecture
Engineering Productivity / Platform PMAI 工具和平台是否改善 delivery flow 和 stabilityDORA / SPACE / Flow metric integration
Value Office / Finance PartnerAI 收益如何被量化、归因、兑现和复核benefits realization loop、portfolio evidence pack
Risk / Compliance / Model RiskAI 风险如何贯穿 idea、pilot、release、scale、operateNIST AI RMF-aligned gate evidence

3.3 核心观点

成熟的 AI VSM 不是 "流程可视化", 而是四个系统合一:

系统管理对象关键问题
Value flow systemidea -> evidence -> release -> adoption -> benefits价值是否在流动, 还是卡在 PoC、审批、集成或 adoption
Risk flow systemrisk hypothesis -> control -> evidence -> monitoring -> incident learning风险证据是否跟着价值一起流动
Platform flow systemreusable gateway、eval、RAG、observability、policy、review workflow单个 use case 是否沉淀平台能力, 还是制造一次性负担
Funding flow systemcapacity、budget、SME review、risk assurance、change management资金和容量是否流向最值得验证和放大的 AI bets

4. 为什么 AI 成功不能只看功能 / 调用量

AI 产品常见汇报方式:

  • 上线了 12 个 AI features。
  • LLM API calls 达到 200 万次。
  • 生成了 50 万段摘要。
  • 模型准确率达到 92%。
  • 用户满意度问卷达到 4.6/5。
  • 预计节省 30% 人工时间。

这些都是 signals, 但不是完整 value evidence。

指标能说明什么不能说明什么VSM 补强方式
Feature count团队产出了可见功能是否解决高价值流程问题连接到 value stream stage 和 business outcome
API call / token count系统被调用调用是否产生合格业务结果改成 qualified value event throughput
Generated output countAI 很活跃输出是否被采纳、正确、合规、可审计加入 acceptance、quality pass、risk guardrail
Model accuracy离线任务表现生产流程中是否改善结果接入 workflow metric、decision quality、online monitoring
User rating体验倾向是否存在选择偏差、风险转移或短期新鲜感分 cohort、risk tier、workflow step 追踪
Estimated time saved潜在效率时间是否被释放、转用或财务认可建 benefits realization loop 和 finance sign-off

成熟 AI 价值叙事:

Model capability
-> AI behavior evidence
-> workflow adoption
-> qualified value event
-> risk-adjusted business outcome
-> finance-recognized benefit
-> portfolio scale / stop decision

一句话:

AI value stream 的主语不是模型, 而是受 AI 改变的业务流程和决策流。

4.1 Vanity Metrics 到 Flow Metrics 的迁移

Vanity metric更成熟的 flow / value metric
AI features shippeduse cases passing release gate and adoption gate
Prompts createdprompts with eval coverage, owner and production trace
Model callsqualified AI-assisted workflow completions
Users activatedeligible users reaching repeat adoption in target workflow
Time saved estimatefinance-recognized capacity release or SLA improvement
PoCs completedPoCs converted to release, platform capability, or stop decision
Accuracy scorerisk-tiered eval pass plus online quality and guardrail stability
Platform integrationsproduction workflows reusing approved platform controls

4.2 AI Value Event 的定义

AI value event 不是 "AI 被调用", 而是一个带质量、风险、采用和经济性约束的业务事件。

Qualified AI value event =
  eligible workflow item
  + AI exposure or AI-assisted action
  + user / system adoption signal
  + quality threshold pass
  + risk guardrail pass
  + auditable trace
  + unit economics within boundary

金融零售示例:

场景不成熟事件合格 AI value event
AML alert triagegenerated summariesanalyst-reviewed case summaries that reduce investigation time with no critical evidence defect
Customer service copilotanswer generatedgrounded answer accepted by agent, customer issue resolved, no reopen or policy breach
Credit memo assistantdraft createdunderwriter-accepted memo section with policy citation pass and no prohibited recommendation
Branch knowledge assistantemployee question askedcited policy answer accepted by employee and no escalation caused by stale knowledge
AI platformapplications connectedproduction AI workflows passing value, risk, reliability and cost gates through shared platform

5. Value Stream Map: AI Delivery

AI delivery value stream 的目标是把 AI idea 从 "想法" 管到 "可控生产价值", 而不是管任务清单。

5.1 End-to-End AI Delivery Flow

Strategic theme
  -> Idea intake
  -> Discovery evidence
  -> Value stream design
  -> Data / eval readiness
  -> Build and integration
  -> Risk and release gate
  -> Limited release
  -> Adoption and operating model
  -> Benefits realization
  -> Scale / hold / stop / retire

5.2 Stage-by-Stage Map

StagePrimary questionKey evidenceFlow metricGate decision
Strategic theme这个 AI 投资方向是否符合战略和风险偏好portfolio thesis、capability map、process baselinestrategy-to-intake lead timefund theme / narrow theme / park
Idea intake是否值得进入 discoveryproblem owner、APQC process、baseline hypothesis、risk hypothesisidea acceptance rate、intake queue ageaccept / reject / merge
Discovery evidence问题是否真实且适合 AIworkflow map、no-AI option、data readiness、AI fitidea-to-evidence lead timefund pilot / refine / stop
Value stream designAI 插入哪个流程步骤并改变什么决策current-state / target-state map、RACI、decision boundaryflow design cycle timeapprove value stream / redesign
Data / eval readiness数据、知识、eval 和 telemetry 是否可用source owner、freshness、label quality、golden set、metric contractdata wait time、eval design lead timebuild / data investment / stop
Build and integration是否能安全接入生产系统和平台architecture sketch、model gateway、RAG、tool permissions、testsbuild flow time、review turnaroundrelease candidate / rework
Risk and release gate是否可控上线eval result、risk control、rollback、monitoring、audit tracegate queue time、evidence aginggo / limited go / no-go
Limited release真实流程中是否安全有效cohort results、usage、quality、incident、costrelease-to-adoption lead timecontinue / restrict / rollback
Adoption and operating model用户是否在正确流程中持续采用training、support、manager cadence、override analysisadoption flow time、support loadscale / change plan / hold
Benefits realization业务收益是否兑现并被认可baseline、incremental effect、cost、risk adjustment、finance sign-offbenefit recognition cycle timescale / hold / stop
Scale / stop / retire是否扩大、限制、停止或沉淀平台能力realized benefits、risk trend、platform capacity、unit economicsscale cycle time、retire cycle timescale / hold / retire / stop

5.3 Current-State vs Target-State AI VSM

Current-state map 要暴露真实阻塞:

Current-state symptom可能真实阻塞需要测的指标
PoC 很快, 生产很慢eval、security、data access、integration、risk approval 后置PoC-to-release lead time、gate queue time
业务提很多 AI idea没有 problem baseline 和 process owneridea rejection reason、discovery WIP
模型效果不错但不用workflow fit、trust、training、manager incentive 不匹配adoption flow time、accepted output rate
平台建了但用例仍慢平台能力没有覆盖数据、eval、risk evidence 或 operationsplatform reuse lead time、self-service completion
上线后价值说不清没有 baseline、telemetry、causal design、finance ownerbenefit recognition cycle time、metric contract coverage
审批越来越多风险证据不可复用, gate 只靠人工会议evidence reuse rate、risk review capacity load

Target-state map 要定义三个改进方向:

  1. Reduce waiting: data access、risk review、SME review、eval queue、release approval。
  2. Reduce rework: unclear problem、weak eval、ambiguous decision boundary、poor telemetry。
  3. Increase value throughput: qualified value events、benefit realization、platform reuse。

6. Value Stream Map: AI-Enabled Operations

AI-enabled operations 是 AI 上线后的业务运营价值流。它回答: AI 是否真正改变了日常流程, 并在真实风险边界内创造价值。

6.1 Generic AI-Enabled Operations Flow

Business event / case
  -> eligibility check
  -> context and knowledge retrieval
  -> AI draft / recommendation / action proposal
  -> human judgment or policy automation
  -> workflow action
  -> QA / sampling / exception handling
  -> customer / operations outcome
  -> monitoring and incident response
  -> benefits and learning feedback

6.2 Operations Flow Metrics

Flow stepAI control pointMetricRisk guardrail
Case eligibilityAI only touches approved workflow itemseligible case coverage、exclusion accuracyhigh-risk case wrongly included
Context retrievalAI uses permitted and fresh sourcesretrieval success、citation correctness、knowledge freshnessunauthorized data access、stale policy
AI outputOutput fits task and boundarygroundedness、format validity、answerability、tool call correctnesshallucination、unsupported claim、overconfident answer
Human judgmentAccountability remains clearacceptance rate、override rate、review timeblind acceptance、review fatigue
Workflow actionAction improves flowcycle time、touches per case、queue age、reworkwrong action、customer harm
QA / exceptionQuality feedback is capturedQA defect rate、sample coverage、failure taxonomy closurecritical defect not escalated
Customer / ops outcomeBusiness result improvesFCR、AHT、backlog age、loss avoided、complaint rateunfair outcome、policy breach
MonitoringDrift and incidents are visibleincident detection time、cost drift、model / data driftunmonitored degradation
Learning feedbackEvidence changes product and platformeval case added、runbook updated、control improvedrepeated failure

6.3 Delivery Flow vs Operations Flow

DimensionAI delivery value streamAI-enabled operations value stream
Primary objectAI change from idea to releaseBusiness work item from intake to outcome
Main bottleneckdiscovery, data, eval, risk gate, integrationadoption, trust, exception handling, QA, operating capacity
Main metricslead time, WIP, blocked time, gate queue, release stabilityqualified value events, cycle time, quality, guardrail, benefit
Main ownerAI PM, product architect, engineering, platformbusiness ops owner, product owner, risk owner
Scale riskshipping unsafe or low-value capabilitiesamplifying wrong workflow behavior at scale
Feedback loopproduction telemetry updates SDLC gatesoperational outcomes update product, policy, eval and training

高级判断:

Delivery VSM 证明 AI 能安全上线; Operations VSM 证明 AI 在真实业务流程中持续创造价值。


7. Flow Metrics and AI-Specific Extensions

Flow Metrics 的价值是把价值流看成一个系统, 而不是看单个团队产出。AI 场景需要扩展到 data、eval、risk、platform、adoption 和 benefits realization。

7.1 Core Flow Metrics

MetricAI VSM 解释典型问题
Flow time一个 AI work item 从进入 value stream 到完成某个价值阶段的总时间从 idea 到 release 为什么 6 个月
Flow velocity单位时间内完成的 AI value items 数量每月有多少 use case 穿过 release gate 或 adoption gate
Flow efficiency真正工作时间 / 总流动时间时间花在 build, 还是卡在 data、review、approval
Flow load正在进行的 AI work items 数量组合是否 WIP 过高导致所有东西都慢
Flow distributionwork item 类型比例是否全是 features, 没有 risk work、platform runway、debt、defects
Flow predictabilityflow time / delivery outcome 是否稳定release 和 benefits 是否可预测

7.2 AI-Specific Flow Metrics

MetricDefinitionOwnerDecision it supports
Idea-to-evidence lead time从 idea intake 到 discovery evidence 可用于决策的时间AI PM / BAdiscovery capacity 是否足够
Evidence conversion rate进入 discovery 的 idea 中, 有多少转成 pilot、platform investment 或 stop decisionPortfolio owneridea funnel 质量
Data readiness wait timework item 等待数据 owner、access、quality、lineage 或 retention 决策的时间Data owner / Architect是否需要 data product investment
Eval design lead time从需求确认到 eval contract 可运行的时间EvalOps / AI PMeval 是否前置
Risk gate queue timework item 等待 risk、privacy、security、model risk、legal 评审的时间Risk governance ownergate 是否被证据自动化支撑
Evidence aging关键 gate evidence 距离最新生产配置或政策版本的年龄Product architect / Risk是否需要重新验证
Platform reuse lead time从 use case 需要能力到通过共享平台完成接入的时间Platform PM平台是否真的降低接入成本
Human review capacity loadSME、risk reviewer、QA reviewer 的排队和使用率Ops / Riskreview capacity 是否成为 scale bottleneck
Release-to-adoption lead time从 limited release 到目标 cohort 达到 repeat adoption 的时间Product Ops / Business opschange management 是否有效
Qualified value throughput单位时间内通过质量和风险门槛的 AI value eventsProduct / Value OfficeAI 是否产生真实业务价值
Benefit recognition cycle time从上线到 finance / business owner 确认收益的时间Value Office / Finance收益兑现是否过慢
Risk-adjusted flow velocity按风险、质量和成本调整后的 value event throughputPortfolio owner哪些 use case 值得 scale
Stop decision latency已出现 stop signal 到正式停止或限制的时间Portfolio governance是否敢于及时停止低质量投资

7.3 DORA Extensions for AI Flow

DORA lensAI extensionInterpretation
Change lead timeintent-to-production lead time, spec-to-eval lead time, eval-to-release lead timeAI 不是从 commit 开始, 而是从业务意图和可评估需求开始
Deployment frequencyrisk-tiered release frequency for code, prompt, model, data, RAG index, policy, tool schema频率必须按 artifact type 和 risk tier 分层解释
Change fail rateAI changes causing rollback, hotfix, eval regression, policy breach, customer impact or manual remediation包含行为失败, 不只是服务崩溃
Failed deployment recovery timetime to disable AI path, rollback prompt / index / model, switch to fallback or human-only modeAI recovery 常需要流程和运营一起恢复
Deployment rework rateunplanned releases caused by AI incident, eval miss, data drift or control failure衡量 gate 是否把问题前置

7.4 SPACE Signals for AI VSM

SPACE 不应该用来给个人排名。它用于解释 AI value stream 中的人和协作是否健康。

SPACE dimensionAI VSM 信号为什么重要
Satisfaction and well-beingreviewer load、SME fatigue、trust in AI outputs、change fatigue价值流不能靠压垮专家和一线员工来换速度
Performancequality-passed outcomes、risk-adjusted value、customer / ops resultactivity 上升不等于绩效改善
Activityaccepted AI-assisted work、reviewed outputs、evidence updatesactivity 只作为诊断输入
Communication and collaborationhandoff clarity、decision log quality、cross-owner response timeAI use case 跨 product、data、risk、ops、engineering
Efficiency and flowblocked time、context switching、review turnaround、work item agingAI 团队最常见损失来自等待和返工

7.5 Flow Distribution for AI Portfolio

AI portfolio 如果只塞满 features, 生产会越来越慢。建议把 work item 分成六类:

Work type说明健康组合信号
Value feature直接改善业务流程的 AI capability与 portfolio theme 和 process baseline 绑定
Risk and assuranceeval、red-team、policy、privacy、安全、model risk、audit evidence高风险用例必须配套足够容量
Platform runwaymodel gateway、RAG、eval、observability、policy-as-code、review workflow能被多个 use case 复用
Data / knowledge product数据质量、知识 ownership、taxonomy、lineage、freshness解决重复 data readiness 阻塞
Adoption / changetraining、manager cadence、support、process redesign、role redesignrelease 后 adoption 不会自然发生
Operational debtlegacy bot retirement、duplicate prompt cleanup、stale index remediation、runbook gaps防止 AI landscape 变成不可治理

8. Connecting Flow, DORA / SPACE, Risk Gates and Business Value

成熟 AI VSM 的 dashboard 不应该是一个单层报表, 而是一个多层 operating system。

Portfolio thesis
  -> value stream and process baseline
  -> delivery flow metrics
  -> DORA / SPACE signals
  -> risk gate evidence
  -> operational adoption metrics
  -> qualified value events
  -> benefits realization
  -> portfolio funding and scale / stop decisions

8.1 Metric Stack

Layer关键指标决策
PortfolioWIP by stage、risk tier distribution、flow load、evidence conversion、scale / stop ratiofund、hold、stop、allocate capacity
Product value streamidea-to-evidence、release-to-adoption、qualified value throughput、benefit recognitionprioritize、redesign workflow、scale
Engineering / platformDORA metrics、platform reuse lead time、self-service success、incident recoveryinvest in platform runway、reduce bottleneck
Team / collaborationSPACE signals、review load、SME fatigue、blocked time、handoff ageadjust WIP、staff review capacity、change cadence
Risk / governancegate queue、evidence aging、critical failure、control coverage、incident trendapprove、limit、rollback、increase assurance
Business valueAHT、FCR、loss avoided、backlog age、complaint、finance-recognized benefitscale、redeploy capacity、change funding

8.2 Operating Review Cadence

CadenceParticipantsFocusDecisions
Daily / twice-weekly flow clearingAI PM、tech lead、data owner、risk liaison、ops leadblocked work、aging items、review queue、failed evalunblock, split work, route owner, reduce WIP
Weekly value stream reviewProduct, engineering, platform, risk, ops, dataflow metrics、DORA trend、eval status、adoption signalselect one bottleneck and one improvement bet
Biweekly release / adoption reviewProduct Ops, business owner, risk, support, QAlimited release, training, support tickets, override, guardrailcontinue, restrict, rollback, expand cohort
Monthly benefits reviewValue Office, finance, product, business owner, riskbaseline, incremental effect, cost, risk adjustmentrecognize benefit, revise model, stop weak investment
Quarterly portfolio reviewExecutive sponsor, portfolio owner, CTO / CPO, risk, financestage distribution, capacity allocation, platform runway, scale / stopfund, scale, retire, increase assurance, shift capacity

8.3 Metric Tension Pairs

AI VSM 必须同时看正向指标和反向指标, 防止局部优化。

Positive metricTension metricWhy it matters
Flow velocitychange fail rate、critical failure rate快不能牺牲安全
Deployment frequencyrisk-tiered approval and incident trend高频发布必须上下文解释
Adoption rateoverride quality、blind acceptance signal采用可能来自过度信任
AHT reductionrework、complaint、QA defect快不能降低质量
Platform reuseplatform wait time、custom exception rate平台复用不能变成新瓶颈
Qualified value throughputbenefit recognition and risk-adjusted value事件数量必须能转成收益
SME review coveragereviewer fatigue and queue age不能把风险控制建立在不可持续人工负荷上

9. Portfolio-to-Platform-to-Product Trace

AI VSM 的高级能力是把一个 use case 从 portfolio thesis 一直追到 product telemetry 和 benefit ledger。

9.1 Trace Chain

Portfolio theme
  -> business capability / APQC process
  -> end-to-end value stream
  -> AI use case hypothesis
  -> platform capability dependency
  -> product workflow intervention
  -> release gate evidence
  -> adoption telemetry
  -> qualified value event
  -> benefits realization
  -> scale / stop decision

9.2 Trace Table

Trace elementExampleOwnerEvidence
Portfolio themeRegulated operations intelligencePortfolio ownerinvestment thesis、capacity allocation
Business capabilityFinancial crime operations、customer servicing、credit operationsEnterprise architect / business ownercapability map、process baseline
Value streamAlert intake to investigation closureBusiness ops ownercurrent-state and target-state VSM
Use case hypothesisAI summary and checklist reduce low-risk alert handling time without quality lossAI PMdiscovery brief、no-AI alternative
Platform dependencyRAG, model gateway, eval harness, trace logging, human review queuePlatform PMplatform service contract
Product interventionAnalyst sees cited summary and next-step checklist in case workflowProduct ownerUX / workflow spec、decision boundary
Release evidenceeval pass, SME review, privacy approval, rollback, monitoringRisk / release ownerevidence binder
Adoption telemetryweekly active analysts, accepted summaries, edits, overridesProduct Opstelemetry contract
Value eventquality-approved AI-assisted investigations completed within SLAValue Officemetric contract、dashboard
BenefitAHT reduction, backlog age reduction, QA defect stabilityFinance / business ownerbenefits register
Decisionscale to second queue, hold for SME load, or stopPortfolio governancescale / stop memo

AI VSM should show whether a use case creates platform leverage.

Platform capabilityFlow bottleneck it removesMetric
Model gatewayrepeated model approval, cost logging, vendor switch frictionmodel access lead time、policy compliance coverage
Eval harnessmanual release evidence, inconsistent thresholdseval design lead time、regression coverage
RAG / knowledge serviceduplicate indexes, stale knowledge, weak citationretrieval integration lead time、freshness SLA
Observability and tracevalue and incident evidence missingtrace coverage、incident detection time
Policy-as-codeslow manual risk reviewgate automation rate、policy exception aging
Human review workflowSME review queue hidden and unmanagedreview queue age、review capacity load
Evidence binderaudit response depends on manual reconstructionevidence completeness、audit response effort

9.4 Portfolio Funding Implication

The funding question changes from:

Should we fund this AI feature?

to:

Should we fund this value stream improvement,
including product workflow, platform runway, risk assurance,
adoption support and benefits measurement?

Funding buckets:

BucketWhy it matters for flowExample investment
Use case deliverycreates near-term business valueAML triage assistant limited release
Platform runwayreduces future flow time and risk costshared eval, RAG, audit trace
Risk assuranceprevents late gate blockage and unsafe scalered-team, policy controls, model risk evidence
Data / knowledge readinessremoves repeated upstream blockerspolicy knowledge ownership, case label quality
Adoption and changeconverts release into operational valuetraining, manager cadence, support playbook
Benefits measurementturns activity into finance-recognized valuebaseline, telemetry, causal design, benefit register

10. Risk Gates and Business Value Gates

AI VSM treats gates as flow control points. A good gate can block, limit, route, accelerate or retire work based on evidence.

10.1 Gate Architecture

GatePurposeEvidenceDecision
Intake gateprevent weak AI ideas from entering deliverybusiness problem、process owner、baseline hypothesis、risk hypothesisaccept / reject / merge / park
Discovery gateprove the problem is worth pilot capacityworkflow map、AI fit、no-AI option、data readiness、risk tierfund pilot / refine / stop
Eval readiness gatemake quality measurable before build completioneval contract、golden set、failure taxonomy、metric ownerbuild / revise eval / data work
Architecture gateavoid one-off design and unsafe integrationtarget architecture、platform dependency、data boundary、rollback sketchapprove / redesign / use platform
Risk gateensure control path fits risk tierprivacy, security, model risk, compliance, human oversight, monitoringgo / limited go / no-go
Release gatevalidate production readinessrelease bundle, versions, SLO, incident route, evidence binderrelease / conditional release / rollback
Adoption gateprove real users changed workflowcohort usage、acceptance、override、support load、trainingscale / improve change plan / restrict
Benefits gateprove value is real and recognizedbaseline、incremental effect、cost、risk adjustment、finance sign-offscale / hold / stop
Scale / retire gatedecide where capacity goes nextrisk trend、platform capacity、unit economics、adoption durabilityscale / hold / retire / stop

10.2 Gate Principles

  1. Gate evidence should be created during work, not reconstructed for meetings.
  2. High-risk use cases need earlier gates, not only stricter release gates.
  3. Gate queue time is a flow metric. If it grows, the operating system is under-designed.
  4. A gate without a stop or limit decision is just paperwork.
  5. A gate should distinguish "not ready yet" from "not worth doing" from "convert this into platform or data investment"。

10.3 AI Risk Gate Checklist

Risk dimensionGate questionExample evidence
Customer harmCould AI mislead, deny, delay, overcharge, expose or unfairly treat a customercustomer impact assessment、complaint trigger
Decision authorityIs AI search, draft, recommend, triage, decision support or automated actionAI authority statement、human oversight RACI
Data and privacyAre data rights, retention, PII handling and access boundaries cleardata inventory、privacy review、access log
Model behaviorAre hallucination, unsupported claim, toxicity, calibration and answerability controlledeval results、failure taxonomy
SecurityAre prompt injection, tool abuse, secret exposure and supply chain risks addressedsecurity test、tool permission matrix
ComplianceAre regulated statements, advice, KYC / AML / credit policies controlledpolicy tests、compliance sign-off
OperationsCan users handle exceptions, overrides, appeals and support loadrunbook、support model、QA sampling plan
ReliabilityCan the system fail safely and recover quicklyfallback、feature flag、rollback drill
AuditabilityCan decisions and outputs be explained and reconstructedtrace, versions, citations, approval records
Benefits integrityCan value claims be measured and attributedmetric contract、baseline、benefits register

11. Financial Retail Case: Regulated Operations Intelligence

11.1 Context

A financial retail institution wants to use AI across AML operations, customer servicing and branch knowledge. The executive goal is not "launch more AI", but:

Reduce regulated operations workload,
improve decision quality,
shorten customer and employee waiting time,
and build reusable AI platform controls
without increasing compliance, privacy, fairness or operational risk.

11.2 Portfolio Theme and Value Streams

Portfolio themeBusiness capabilityValue streamCandidate AI intervention
Regulated operations intelligenceFinancial crime operationsAlert intake -> investigation -> QA -> closure / escalationAML alert triage assistant
Regulated operations intelligenceCustomer servicingContact intake -> intent -> answer -> resolution -> QACustomer service policy copilot
Regulated operations intelligenceBranch operationsEmployee query -> policy retrieval -> customer conversation -> follow-upBranch knowledge assistant
Regulated operations intelligenceCredit operationsDocument intake -> policy check -> memo -> underwriting reviewCredit memo assistant

11.3 Current-State Observations

ObservationFlow implicationEvidence to collect
AML analysts spend significant time assembling context before judgmenthigh wait and search time inside operations flowtime study、case system log、analyst interview
Customer agents search multiple knowledge bases during callshigh context switching and inconsistent answersdesktop telemetry、transfer reason、QA defects
Branch staff ask support teams repetitive policy questionshidden demand and slow knowledge flowsupport ticket volume、policy query categories
Credit memo quality varies by underwriter and source documentshigh rework and policy citation gapsmemo QA、exception reason、cycle time
AI PoCs depend on different RAG and logging patternsplatform fragmentationarchitecture inventory、reuse gap

11.4 Target Value Stream: AML Alert Triage Assistant

Alert created
  -> eligibility check
  -> retrieve case history, transaction context and policy guidance
  -> AI produces cited summary and investigation checklist
  -> analyst reviews, edits and decides next action
  -> QA samples output and decision evidence
  -> case closed or escalated
  -> telemetry updates eval set, risk controls and benefits register

11.5 Metrics for the AML Stream

Metric layerMetricBaseline / target logic
Flowalert-to-first-action time、case cycle time、queue agereduce waiting and context assembly time
Adoptioneligible analyst weekly active usage、summary acceptance、edit distance、override reasonprove analyst workflow fit
AI qualitycitation correctness、unsupported claim rate、missing key fact rate、checklist relevancerelease and regression evidence
Risk guardrailcritical evidence defect、wrong policy reference、unauthorized data retrieval、SAR escalation errorhard stop or scale restriction
DORA / platformprompt / index change lead time、eval-to-release time、rollback time、change fail rategovern AI artifact changes
SPACEanalyst trust pulse、SME review load、review queue age、change fatigueensure control system is sustainable
Business valueAHT reduction、backlog age、QA defect stability、capacity redeploymentbenefits realization
Unit economicsAI cost per case、QA cost per case、platform support costscale economics

11.6 90-Day Decision Example

EvidenceResultInterpretation
Summary acceptance72% of eligible pilot alertsadoption signal positive
Case handling timemedian handling time down 12% in pilot queuevalue signal positive
QA defectno increase in critical defectsquality guardrail stable
SME review loadreview queue age grew from 1 day to 4 daysscale bottleneck
Citation correctness96% pass on QA samplerelease evidence acceptable
Cost per casewithin approved pilot envelopeeconomics acceptable for limited scale
Benefit recognitionfinance accepts capacity release only for low-risk queuebenefits are partial but credible

Decision:

Hold broad scale.
Scale only to a second low-risk queue after human review workflow is improved.
Fund platform runway for reusable SME sampling, evidence binder and eval case management.
Do not expand to high-risk typologies until critical failure taxonomy and review capacity are stronger.

11.7 Portfolio Flow Snapshot

StageUse casesGovernance focus
Ideacollections contact strategyrisk and fairness pathway unclear, keep outside delivery WIP
Discoverycredit memo assistant、complaint classificationdata readiness, policy boundaries, baseline
PilotAML alert triage assistanteval, SME review, limited cohort, benefit signal
Releasecustomer service policy copilotmonitoring, training, wrong-policy guardrail
Scalebranch knowledge assistantknowledge freshness, platform reuse, support deflection
Retirelegacy FAQ botmigrate content to governed knowledge service

11.8 What This Case Shows in an Interview

SignalWhat it demonstrates
AI use case tied to APQC-like process and business capabilityenterprise architecture maturity
Delivery VSM plus operations VSMability to connect build, release, adoption and business outcomes
Flow Metrics plus DORA / SPACEadvanced product operations and engineering productivity thinking
Risk gate and benefit gateregulated AI governance maturity
Platform runway investmentarchitecture leverage and portfolio economics
Scale held due to SME bottleneckmature decision discipline, not AI hype

12. Templates

12.1 AI Value Stream Canvas

# AI Value Stream Canvas

Value stream name: AML alert triage to investigation closure
Portfolio theme: Regulated operations intelligence
Business capability: Financial crime operations
Business owner: Head of AML Operations
Product owner: AI Operations Product Lead
Risk owner: Financial Crime Risk
Platform owner: AI Platform Lead
Review date: 2026-06-29

## 1. Business outcome
Reduce low-risk alert handling time and backlog age while maintaining investigation quality and auditability.

## 2. Current-state flow
Alert created -> analyst searches customer and transaction context -> analyst checks policy and typology notes -> analyst drafts case narrative -> QA sample -> closure or escalation.

## 3. Current bottlenecks
- Context assembly consumes analyst time.
- Policy search is inconsistent across teams.
- QA defects often relate to missing evidence or weak narrative support.
- Review capacity is limited for pilot expansion.

## 4. AI intervention
AI retrieves approved case context and policy guidance, drafts a cited summary, and recommends an investigation checklist.
AI does not close alerts, file SARs, downgrade risk, or make final compliance decisions.

## 5. Target-state flow
Alert created -> eligibility check -> approved context retrieval -> AI cited summary and checklist -> analyst review and decision -> QA sample -> telemetry feedback to eval and benefits register.

## 6. Flow metrics
| Metric | Baseline | Target / decision rule |
|---|---:|---|
| Idea-to-evidence lead time | 18 business days | <= 15 business days for similar future use cases |
| Release-to-adoption lead time | 6 weeks | repeat adoption by >= 65% eligible analysts within 6 weeks |
| Qualified value throughput | 0 | quality-approved assisted alerts per week after limited release |
| SME review queue age | 1 day | must not exceed 3 days during pilot |

## 7. Risk gates
| Gate | Evidence |
|---|---|
| Eval readiness | 500 historical case golden set, critical failure taxonomy, citation rubric |
| Release | rollback path, trace logging, human approval, QA sample plan |
| Scale | no critical evidence defect, stable QA, review capacity within threshold |

## 8. Benefits realization
Recognize benefit only for eligible low-risk alerts where AI was exposed, analyst accepted or materially used the output, QA passed, and case handling time improved versus baseline or control.

## 9. Scale / stop rule
Scale to another low-risk queue if AHT improves >= 10%, QA critical defects remain 0, citation pass >= 95%, review queue age <= 3 days, and cost per case stays within approved ceiling.
Stop expansion if any critical privacy, evidence or regulated decision boundary breach occurs.

12.2 Flow Metrics Dashboard

# AI Flow Metrics Dashboard

Audience: AI portfolio review, product operations, platform leadership, risk governance, Value Office
Cadence: weekly for flow, monthly for benefits, quarterly for portfolio funding

## Executive summary
| Decision area | Metric | Current signal | Decision |
|---|---|---|---|
| Portfolio WIP | active use cases by stage and risk tier | Pilot WIP exceeds review capacity | freeze new high-risk pilots |
| Value flow | qualified value throughput | customer service copilot improving | prepare scale memo |
| Bottleneck | SME review queue age | AML review queue aging | fund review workflow automation |
| Risk | critical guardrail breaches | none in limited release | continue release with monitoring |
| Benefits | finance-recognized benefit | partial recognition for service queue | expand causal measurement |

## Portfolio flow
| Metric | Definition | Slice |
|---|---|---|
| Flow load | active AI work items in idea, discovery, pilot, release, scale | stage, risk tier, business capability |
| Flow distribution | percent of work in feature, platform, risk, data, adoption, debt | portfolio theme |
| Evidence conversion | percent of discovery items converted to pilot, platform investment or stop decision | business unit |
| Stop decision latency | days from stop signal to decision | owner, risk tier |

## Product value stream
| Metric | Definition | Slice |
|---|---|---|
| Idea-to-evidence lead time | intake accepted to discovery evidence ready | use case, capability |
| Release-to-adoption lead time | limited release to repeat adoption threshold | cohort, role |
| Qualified value throughput | quality and risk-passed AI value events per week | workflow, risk tier |
| Benefit recognition cycle time | production release to finance-recognized benefit | business unit |

## Engineering and platform
| Metric | Definition | Slice |
|---|---|---|
| AI change lead time | spec / eval / build / release time for AI artifacts | code, prompt, model, index, policy |
| Change fail rate | AI changes requiring rollback, hotfix or control intervention | service, risk tier |
| Platform reuse lead time | request to production use of shared platform capability | capability |
| Eval queue time | eval submitted to gate decision | use case, test suite |

## Risk and operations
| Metric | Definition | Slice |
|---|---|---|
| Evidence aging | days since evidence matched current production config | evidence type |
| Critical failure rate | no-go failures per evaluated workflow item | risk category |
| Human review load | reviewer capacity used and queue age | SME, risk, QA |
| Incident recovery time | detection to fallback / rollback / stable operation | incident type |

## Adoption and SPACE
| Metric | Definition | Slice |
|---|---|---|
| Repeat adoption | eligible users using AI in target workflow over repeated periods | role, team |
| Override reason distribution | why users reject or edit AI output | workflow step |
| Trust pulse | user trust score with free-text reason coding | cohort |
| Review fatigue signal | reviewer load, after-hours review, queue stress | reviewer group |

12.3 Blocked Work Taxonomy

Block categorySymptomMetricOwnerResponse
Business problem unclearidea keeps changing, no baselinediscovery rework countbusiness owner / AI PMrewrite problem statement, freeze baseline hypothesis
Process owner missingno one owns workflow changeintake agingportfolio ownerrequire process owner before discovery
Data access / rightswaiting for permissions or data retention decisiondata readiness wait timedata owner / privacycreate data decision record and approved access path
Data quality / labelseval or pilot cannot trust labelslabel defect rate、golden set agingdata product ownerfund data product or reduce scope
Knowledge freshnessRAG answers cite stale policyfreshness breach countknowledge ownerassign content owner and freshness SLA
Eval ambiguityteams debate quality after buildeval design lead time、eval reworkEvalOps / AI PMdefine eval contract before release candidate
Risk review queuegate waits on manual reviewrisk gate queue timerisk governance owneradd evidence checklist, risk liaison, pre-review
Security / privacy boundaryunclear PII, tool or prompt injection exposuresecurity exception agesecurity / privacyreduce tool scope, add controls, retest
Architecture fitone-off solution bypasses platformplatform exception ratearchitect / platform PMroute through shared capability or approve bounded exception
Human review capacitySME queue grows during pilotreview queue age、review utilizationops / riskcap WIP, adjust sampling, fund workflow
Adoption frictionusers ignore AI after releaserelease-to-adoption lead timeProduct Ops / business opsredesign workflow, training, manager cadence
Telemetry gapvalue cannot be measuredmetric contract coverageanalytics ownerinstrument assignment, exposure, action, outcome
Finance recognitionbenefit claim not acceptedbenefit recognition cycle timeValue Office / financeagree baseline, effect method, cost treatment
Vendor / procurementmodel or tool contract delays releasevendor wait timeprocurement / platformuse approved gateway or define exit path

12.4 Benefits Realization Loop

Baseline
  -> eligibility and exposure logging
  -> adoption and action telemetry
  -> quality and risk qualification
  -> incremental effect estimate
  -> cost and risk adjustment
  -> finance / business sign-off
  -> capacity redeployment or value capture
  -> post-scale audit
  -> portfolio scale / stop decision

Template:

# AI Benefits Realization Loop

Use case: Customer service policy copilot
Business owner: Contact Center Operations
Finance owner: FP&A partner
Risk owner: Customer Conduct Risk
Review period: 2026 Q3 month 2

## Baseline
Eligible contacts: 620,000 per month
Baseline AHT: P50 7.8 minutes
Baseline reopen rate: 11.2%
Baseline complaint rate: 0.42%
Baseline QA policy defect rate: 3.1%

## Exposure and adoption
Treatment cohort: servicing queues A and B
Eligible exposed contacts: 184,000
Accepted AI answer: 118,000
Repeat adoption: 68% of eligible agents

## Quality and guardrail
Citation QA pass: 96.5%
Critical wrong policy answer: 0
PII leakage: 0
Reopen rate: no credible increase versus control
Complaint rate: stable within approved threshold

## Incremental effect
Cluster rollout estimate: AHT reduction of 0.6 minutes per accepted exposed resolved contact.
Effect recognized only for contacts passing quality and guardrail filters.

## Cost and risk adjustment
Costs included: model, retrieval, platform support, QA sampling, training, monitoring.
Risk adjustment: high-risk intents excluded from recognition until separate gate.

## Realized benefit
Recognized benefit: capacity redeployed to peak-hour backlog and training coverage.
Finance treatment: limited-scale capacity benefit accepted for Q3 operating review.

## Next decision
Scale to two additional low-risk servicing queues.
Hold complaints, hardship and vulnerable customer intents until policy and escalation controls pass separate gate.

12.5 Portfolio Evidence Pack

# AI Portfolio Evidence Pack

Portfolio theme: Regulated operations intelligence
Quarter: 2026 Q3
Decision meeting: Quarterly AI portfolio review

## 1. Portfolio thesis
Invest in AI capabilities that reduce regulated operations workload, improve quality, reuse approved platform controls, and produce measurable value evidence within one quarter.

## 2. Stage distribution
| Stage | Items | Risk profile | Decision need |
|---|---|---|---|
| Idea | collections contact strategy | high fairness and conduct risk | park until risk pathway matures |
| Discovery | credit memo assistant | high credit policy risk | continue data readiness work |
| Pilot | AML alert triage assistant | high AML evidence risk | limited scale only if review load is controlled |
| Release | customer service policy copilot | medium conduct risk | scale low-risk intents |
| Scale | branch knowledge assistant | low / medium policy risk | expand through knowledge platform |
| Retire | legacy FAQ bot | audit and content duplication risk | migrate and shut down |

## 3. Flow metrics
| Metric | Portfolio signal | Decision |
|---|---|---|
| Flow load | high-risk pilots exceed SME capacity | freeze new high-risk pilot starts |
| Gate queue time | privacy and model risk review stable, SME review aging | fund review workflow |
| Platform reuse | RAG and trace reused by 3 use cases | continue platform runway |
| Benefit recognition | 2 of 4 production use cases have finance-recognized benefit | strengthen telemetry for others |

## 4. Risk evidence
- No critical PII leakage in production AI workflows.
- Two policy citation defects added to eval regression set.
- One tool permission exception closed before release.
- SME review capacity is the primary scale constraint.

## 5. Funding recommendation
- Continue customer service copilot scale for low-risk intents.
- Hold AML broad scale until review queue and sampling workflow improve.
- Convert repeated RAG and evidence needs into platform runway funding.
- Stop independent FAQ bot maintenance and migrate content.

12.6 Scale / Stop Memo

# AI Scale / Stop Memo

Use case: AML alert triage assistant
Current stage: Limited release
Decision requested: Limited scale to second low-risk queue
Decision date: 2026-06-29

## Value evidence
- Median handling time decreased 12% in pilot queue.
- Backlog age decreased 9% for eligible low-risk alerts.
- Analyst repeat adoption reached 72%.
- Summary acceptance reached 76% after week 4.

## Quality and risk evidence
- Critical evidence defect: 0.
- Citation correctness: 96%.
- Unsupported claim rate: below approved threshold.
- AI did not close alerts, file SARs or downgrade risk.

## Flow evidence
- Eval-to-release lead time: 9 business days.
- Risk gate queue time: stable.
- SME review queue age increased to 4 days, above target.

## Unit economics
- AI cost per case inside approved limited-release ceiling.
- QA cost increased due to manual sampling.
- Platform trace and RAG components reused by customer service copilot.

## Decision
Limited scale to one additional low-risk queue.
Condition: review queue age must return to <= 3 days before further expansion.
Fund shared review workflow and evidence binder as platform runway.
Do not expand to high-risk typologies until review capacity, critical failure taxonomy and escalation controls pass scale gate.

## Stop triggers
- Any confirmed critical privacy breach.
- Any AI-attributable alert closure or SAR escalation boundary breach.
- Two consecutive monthly reviews with no handling-time or backlog benefit.
- SME review queue age above 5 business days for two consecutive review cycles.

13. Review Checklists

13.1 AI VSM Design Checklist

  • Is the AI use case tied to a named business capability and end-to-end process.
  • Is there a current-state and target-state value stream, not only a feature description.
  • Are flow metrics defined from idea to release and from release to value realization.
  • Are delivery flow and operations flow separated but connected.
  • Are DORA metrics adapted to code, prompt, model, data, index, policy and tool changes.
  • Are SPACE signals used at team / system level, not for individual ranking.
  • Are WIP limits and flow load visible by risk tier and value stream stage.
  • Is blocked work classified with owner and response path.
  • Are risk gates integrated into flow rather than attached at the end.
  • Is benefits realization designed before scale.

13.2 Flow Metrics Review

  • Does flow time include waiting, review, gate queue and adoption delay.
  • Is flow efficiency exposing delay and handoff, not blaming teams.
  • Is flow velocity adjusted for risk, quality and qualified value events.
  • Is flow distribution balanced across feature, platform, risk, data, adoption and debt.
  • Is flow predictability interpreted by risk tier and service context.
  • Are stop decisions tracked as healthy portfolio outcomes.
  • Are metrics paired with tension indicators to prevent gaming.

13.3 Risk and Gate Review

  • Does each gate have a clear go, limited go, no-go, rollback or stop decision.
  • Are privacy, security, compliance, model risk and operational risk addressed before release.
  • Is AI authority explicit: search, draft, recommend, triage, decision support or automated action.
  • Is human accountability visible in workflow and telemetry.
  • Are eval results connected to production monitoring and incident learning.
  • Are critical failures added to regression, policy or runbook updates.
  • Is evidence aging tracked after model, prompt, data, index or policy changes.

13.4 Benefits Realization Review

  • Is there a pre-release baseline.
  • Are assignment, exposure, adoption, action, outcome and guardrail logged.
  • Is value counted only for quality-passed and risk-passed events.
  • Are model, platform, QA, training, monitoring, governance and risk costs included.
  • Is finance treatment explicit: cost reduction, capacity redeployment, loss avoided, SLA improvement or revenue protection.
  • Is post-scale audit planned to detect benefit decay.
  • Is the scale decision based on realized value and risk stability, not only release success.

14. Anti-Patterns

Anti-patternWhy it is dangerousBetter practice
Counting AI features as valueencourages shipping without workflow impactcount qualified value events and benefits
Counting calls or tokens as adoptionusage can be waste, rework or curiositymeasure eligible workflow adoption and accepted action
Treating PoC completion as successPoC may avoid production controlstrack PoC-to-release and PoC-to-stop conversion
Running risk gates only at the endlate rejection creates rework and political pressureuse risk tiering and evidence gates from intake
Building every use case as customportfolio flow slows and audit burden growsfund platform runway for repeated capabilities
Averaging high and low risk use casesmasks risk and flow differencesslice by risk tier, process, channel and artifact type
Using DORA as a ranking tooldamages collaboration and distorts behavioruse DORA for team / service improvement
Ignoring SPACE signalsreviewers, SMEs and users become hidden bottlenecksmonitor review load, trust, fatigue and handoff quality
Skipping no-AI alternativesAI becomes solution-first theatercompare rules, UI, workflow redesign, training and automation
Claiming time saved without release pathfinance cannot recognize vague savingsdefine capacity redeployment or cost treatment
Scaling before adoption is stableproduction cost grows without realized valueadd adoption gate before scale
Treating stop as failureweak use cases consume scarce capacitymake stop, retire and merge normal portfolio decisions

15. 30-Day Training Plan

目标: 30 天内产出一套可放入作品集的 AI Value Stream Management / Flow Metrics 证据包。

DayThemeOutput
1Select financial retail AI value streamuse case brief and business capability map
2Map current-state processcurrent-state AI VSM
3Define target-state AI interventiontarget-state VSM and decision boundary
4Identify process owner, risk owner, platform ownerRACI
5Define problem baselinebaseline metric table
6Define qualified AI value eventvalue event contract
7Review week 1one-page value stream narrative
8Build delivery flow mapidea-to-release flow
9Build operations flow mapbusiness event-to-outcome flow
10Define core Flow Metricsflow time, velocity, efficiency, load, distribution
11Extend metrics for AIdata, eval, risk, platform, adoption, benefits
12Map DORA metrics to AI artifactsDORA extension sheet
13Map SPACE signalsreviewer, SME, user and team flow signals
14Review week 2metrics stack summary
15Design risk gatesintake, discovery, eval, release, adoption, benefits
16Design blocked work taxonomyblocker table with owner and response
17Design flow dashboardexecutive, portfolio, product, platform, risk views
18Design benefits realization loopbaseline to finance sign-off
19Build portfolio-to-platform-to-product tracetraceability table
20Define platform runway linkageplatform capability map
21Review week 3operating model narrative
22Write AML casecase flow and metrics
23Write customer service casevalue event and dashboard
24Write credit ops caserisk gate and benefits model
25Write branch knowledge caseknowledge freshness and platform reuse
26Build scale / stop memodecision memo
27Write interview answers6 advanced answers
28Assemble portfolio packartifacts list and story
29Self-review against checklistgap fixes
30Final executive narrative5-minute storyline

完成标准:

  • One current-state and one target-state AI value stream map.
  • Flow Metrics dashboard with portfolio, product, platform, risk, adoption and benefits layers.
  • DORA / SPACE integration without vanity metrics.
  • Risk gates and benefits gates connected to scale / stop decisions.
  • At least one financial retail case with qualified value event and evidence pack.
  • Clear portfolio-to-platform-to-product trace.

16. Interview Answers

Q1: 你如何用 VSM 管 AI use case?

30 秒版本:

我会把 AI use case 当成 value stream item, 从 idea 到 discovery、data / eval readiness、risk gate、limited release、adoption、benefits realization 和 scale / stop 全链路管理。指标上不只看功能上线和调用量, 而是看 flow time、blocked time、gate queue、qualified value events、risk guardrail 和 finance-recognized benefits。

2 分钟版本:

我会先把 use case 挂到业务 capability 和 end-to-end process, 例如 AML alert triage 或 customer servicing。然后画 current-state 和 target-state value stream, 明确 AI 插入哪个步骤、改变什么决策、谁承担 accountability。接着定义 Flow Metrics: idea-to-evidence lead time、data readiness wait time、eval design lead time、risk gate queue time、release-to-adoption lead time、qualified value throughput 和 benefit recognition cycle time。工程侧用 DORA 看 AI artifact change 的 lead time、release frequency、change fail rate 和 recovery; 团队侧用 SPACE 看 review load、SME fatigue、trust 和 collaboration。最后用 risk gate 和 benefits gate 决定 scale、hold、stop 或转成 platform investment。

Q2: 为什么不能用 AI 调用量证明 AI 价值?

30 秒版本:

调用量只能说明 AI 被使用或被系统触发, 不能说明它产生了合格业务结果。金融零售里我更看 qualified AI value event: 目标流程合格、AI 暴露、用户采纳、质量通过、风险门槛通过、可审计、单位经济成立。

2 分钟版本:

例如客服 copilot 的调用量上升可能是因为知识库难用、agent 反复问、答案不准或用户好奇。真正的价值事件应该是: eligible contact 中 AI 给出有引用的答案, agent 采纳或合理编辑, 客户问题一次解决, 7 天内无 reopen, 没有 wrong policy answer、PII leakage 或投诉上升, 并且 cost per resolved contact 在边界内。只有这样的事件才能进入 benefits realization。否则 API calls 和 tokens consumed 只是 activity, 甚至可能是成本和风险。

Q3: Flow Metrics、DORA 和 SPACE 怎么放在一起?

30 秒版本:

Flow Metrics 看价值流是否顺畅, DORA 看交付系统是否更快更稳, SPACE 看人和协作是否健康。AI 场景要再叠加 eval、risk gate、platform reuse 和 benefits realization。

2 分钟版本:

Flow Metrics 用于 portfolio 和 product value stream: flow time、velocity、efficiency、load、distribution、predictability。DORA 用于 AI SDLC: prompt、model、index、policy、tool schema 和 code 的 change lead time、deployment frequency、change fail rate、recovery 和 rework。SPACE 用于解释为什么流动变好或变差, 例如 reviewer load、SME fatigue、context switching、trust 和 collaboration。三者共同连接到 business value: 只有当交付更快更稳、团队没有被 review 和治理压垮、上线后 adoption 和 qualified value events 改善, 才能说 AI operating system 真的有效。

Q4: 高风险金融 AI 如何做 release 和 scale gate?

30 秒版本:

高风险 AI 不能把 pilot approval 当成 scale approval。我会分 intake、discovery、eval readiness、architecture、risk、release、adoption、benefits、scale gate, 每个 gate 都有证据和明确 go / limited go / no-go / rollback / stop decision。

2 分钟版本:

以 AML alert triage 为例, intake gate 要证明 business owner、process baseline 和 risk hypothesis; discovery gate 要有 workflow map、AI fit、data readiness 和 no-AI alternative; eval gate 要有 golden set、failure taxonomy、citation rubric; risk gate 要确认 AI 不自动关闭 alert、不做 SAR 决策、有人审阅、日志可追溯; release gate 要有 rollback、monitoring 和 evidence binder; adoption gate 要看 analyst acceptance、override、support load; benefits gate 要看 AHT、backlog、QA defect 和 finance treatment。scale 只有在 value、risk、review capacity、platform capacity 和 unit economics 同时成立时才放行。

Q5: 如何向 CFO 解释 AI VSM 的价值?

30 秒版本:

我会把 AI VSM 从流程管理翻译成投资组合效率: 更快发现无效 use case, 更少 late-stage rework, 更高 qualified value throughput, 更短 benefit recognition cycle, 更清楚的风险调整收益。

2 分钟版本:

CFO 不需要看 AI 调用量, 需要看资金和容量是否流向可兑现价值。我会展示 portfolio flow: 多少 idea 转成有效 pilot, 多少被及时停止, 哪些平台能力降低后续接入成本, 哪些上线用例产生 finance-recognized benefit。收益端看 capacity redeployment、loss avoided、SLA improvement、cost per value event; 成本端包括模型、平台、QA、training、monitoring、governance 和风险成本。这样 AI 预算不是一堆项目申请, 而是一套能持续学习、放大有效投资并停止低质量投资的 operating system。

Q6: 如何证明平台 runway 不只是技术成本?

30 秒版本:

平台 runway 的价值要用 flow 和 risk economics 证明: 它是否缩短 platform reuse lead time、eval design lead time、risk gate queue time、release recovery time, 并提高 evidence reuse 和 qualified value throughput。

2 分钟版本:

Model gateway、eval harness、RAG service、observability、policy-as-code、human review workflow 和 evidence binder 如果只按技术组件汇报, CFO 和业务很难感知价值。我会把它们映射到 value stream bottleneck: gateway 减少模型审批和供应商切换成本; eval harness 减少 release gate 重工; RAG service 减少重复索引和引用错误; evidence binder 降低审计重建成本; review workflow 解决 SME capacity bottleneck。平台投资的 KPI 不是接了多少模型, 而是让更多 AI use case 更快、更稳、更可审计地通过 release 和 benefits gate。

Q7: 如果 AI 用例上线了但 adoption 很低, 你怎么处理?

30 秒版本:

我不会立刻判定模型失败。我会看 release-to-adoption lead time、eligible exposure、workflow fit、trust、manager cadence、training、support load、override reason 和 incentive mismatch, 然后决定 redesign workflow、调整 change plan、限制范围或停止。

2 分钟版本:

Adoption 低通常有多种原因: AI 不在用户自然工作流里, 输出缺少 citation, 用户不信任, manager 没有把新流程纳入日常管理, 或使用 AI 增加了 QA 负担。我会把 adoption 当作 operations value stream 的一段来测: eligible users 是否暴露, 是否首次使用, 是否 repeat use, 输出是否被采纳或编辑, 采纳后 workflow outcome 是否改善。若 adoption 低但质量好, 可能需要产品和变更管理; 若 adoption 低且 override 指向质量问题, 回到 eval 和 RAG; 若 adoption 长期低且收益不成立, 应停止或缩小。

Q8: VSM 如何帮助你做 AI portfolio governance?

30 秒版本:

VSM 让 portfolio review 从状态汇报变成资金和容量决策。我们可以看到 WIP、stage aging、risk tier、blocked work、platform dependency、benefits status 和 stop signals, 然后决定 fund、scale、hold、stop、retire 或投资平台 runway。

2 分钟版本:

AI portfolio 最大问题通常不是 idea 少, 而是太多 idea 争夺产品、工程、数据、risk、SME 和平台容量。VSM 可以显示哪些 use case 卡在 data readiness, 哪些卡在 risk gate, 哪些 release 后 adoption 不动, 哪些已经有收益但 review capacity 限制 scale。这样季度 review 不再问 "每个项目进展如何", 而是问 "哪条价值流值得加速, 哪些阻塞需要平台投资, 哪些高风险 use case 应限制, 哪些低价值 work 应停止"。这就是 portfolio-to-platform-to-product 的治理闭环。


17. Portfolio Package

一套高级 AI VSM 作品集可以包含:

Artifact内容展示能力
Executive one-pager问题、目标、核心 value stream、metric stack、governance principle高管沟通
AI value stream mapcurrent-state, target-state, AI intervention, bottleneck架构和流程分析
Flow metrics dashboardportfolio、product、platform、risk、adoption、benefits 分层产品运营指标
DORA / SPACE integrationAI SDLC delivery and team health metrics工程生产力
Risk gate packintake, discovery, eval, release, adoption, benefits, scale gates可信 AI 治理
Blocked work taxonomyblocker categories, metric, owner, responseflow improvement
Portfolio-to-platform tracetheme -> capability -> use case -> platform -> value event -> benefit投资组合和架构连接
Benefits realization loopbaseline, exposure, quality, effect, cost, finance sign-off价值证明
Financial retail caseAML / customer service / credit / branch use case行业落地
Scale / stop memoevidence, decision, conditions, stop triggers成熟治理判断
Interview answer bank6-8 个高阶问题答案求职表达

5 分钟讲述结构:

0:00-0:40  问题定义
AI success cannot be measured by features shipped or model calls.

0:40-1:30  方法
I manage AI as value streams: idea, evidence, safe release, adoption, benefits and scale / stop.

1:30-2:30  Metrics
I combine Flow Metrics, DORA, SPACE, risk gates and qualified value events.

2:30-3:30  Financial retail case
Use AML or customer service to show delivery flow, operations flow, risk controls and benefits.

3:30-4:30  Portfolio and platform
Show how repeated blockers become platform runway and how funding decisions change.

4:30-5:00  Close
The goal is not more AI activity. The goal is faster, safer, reusable and finance-recognized AI value.

Final memory card:

AI VSM = portfolio flow + product flow + platform flow + risk flow + benefit flow.

Do not manage AI by feature count or model calls.
Manage the path from idea evidence to safe release,
from release to adoption,
from adoption to qualified value events,
from value events to benefits realization,
and from benefits to portfolio scale / stop decisions.