AI Value Stream Management / Flow Metrics Playbook
这些来源作为术语和方法锚点。访问日期按 2026-06-29 记录。
AI Value Stream Management / Flow Metrics Playbook
适用对象: AI PM、AI Product Operations Lead、AI Portfolio Lead、AI Product Architect、Enterprise Architect、Value Office Lead、金融零售 AI 转型负责人。 目标: 把 AI use case 从 idea 到 safe release 到 adoption / value realization 的流动管起来, 并把 DORA / SPACE、Flow Metrics、risk gates、platform runway、portfolio funding 和 business value 连接成一套可运营的价值流管理系统。 核心观点: AI Value Stream Management 不是画流程图, 而是持续管理 "价值流动、风险证据、平台能力、组织容量和收益兑现"。 边界说明: 本文是学习、架构设计和作品集材料, 不构成法律、监管、审计、模型验证、财务确认或供应商选型意见。金融零售正式项目必须由 business owner、risk、model risk、legal、compliance、privacy、security、finance、architecture、operations 和 internal audit 共同确认适用要求。
1. Source Anchors
这些来源作为术语和方法锚点。访问日期按 2026-06-29 记录。
| Anchor | Official / primary source | 本 playbook 中的用法 |
|---|---|---|
| APQC Process Classification Framework | https://www.apqc.org/process-frameworks | 用 PCF 的流程分类和流程绩效语言连接 business capability、end-to-end process、benchmark、process owner 和 AI value stream。 |
| Flow Framework | https://flowframework.org/ | 用 Value Stream Management、Flow Metrics 和 business-technology common language 连接 project-to-product、portfolio flow、product value stream 和 flow bottleneck。 |
| DORA | https://dora.dev/ | 用 delivery performance、throughput、stability、lead time、deployment frequency、change failure、recovery 和 continuous improvement 连接 AI SDLC 流动效率。 |
| NIST AI RMF | https://www.nist.gov/itl/ai-risk-management-framework | 用 Govern / Map / Measure / Manage 组织 AI risk gate、evidence、monitoring、incident 和持续治理。 |
Source-to-artifact mapping:
| Source lens | 可以产出的 artifact | 高级表达 |
|---|---|---|
| APQC PCF | AI process-to-capability map、process owner map、benchmark baseline | "我先把 AI use case 挂到业务流程和 capability 上, 避免从模型能力反推需求。" |
| Flow Framework | AI value stream map、flow metrics dashboard、blocked work taxonomy | "我用 flow 管业务价值如何从 idea 流到生产和收益, 不是只管项目状态。" |
| DORA | AI SDLC delivery dashboard、release stability scorecard、change recovery review | "AI 交付要同时更快和更稳, 不能用 demo 速度掩盖 release risk。" |
| NIST AI RMF | AI risk gate checklist、evidence binder、monitoring and response loop | "每个流动阶段都有风险证据, 不是上线前一次性审批。" |
2. One-Sentence Positioning
AI Value Stream Management 是把 AI opportunity、delivery work、risk evidence、platform capability、adoption behavior 和 realized business value 放进同一条可度量价值流, 让组织能决定 fund、accelerate、block、scale、hold、stop 或 retire。
更短的面试版:
I manage AI use cases as value streams, not feature queues:
from idea evidence to safe release, adoption, risk-controlled operation and finance-recognized value.
管理层中文表达:
我不会用 "上线了多少 AI 功能" 或 "调用了多少次模型" 判断 AI 成功。我会看 AI 是否穿过了完整价值流: 业务问题是否真实, 数据和风险证据是否成立, 平台能力是否复用, 上线是否安全, 用户是否采用, 业务收益是否被 finance 和业务 owner 认可, 风险是否在 scale 后仍然稳定。
3. 目的 / 适用对象 / 核心观点
3.1 目的
本文解决五个高级问题:
- AI use case 如何从 idea funnel 流到 production, 而不是停在 PoC 或 dashboard。
- 如何用 Flow Metrics 找出 AI delivery 和 AI-enabled operations 中的等待、返工、风险阻塞和收益断点。
- 如何把 DORA / SPACE 的工程生产力语言, 与 portfolio funding、platform runway、risk gates 和 business outcomes 连接。
- 如何证明 AI 价值不是 activity、usage、demo 或模型分数, 而是受控风险下的 adoption 和 benefits realization。
- 如何把金融零售中的 AML、客服、信贷、分行、投诉、风控等流程转成可投资、可治理、可度量的 AI value streams。
3.2 适用对象
| 角色 | 关心的问题 | 本文输出 |
|---|---|---|
| AI PM / AI Product Ops | 如何把 AI 产品从功能交付管到 adoption 和 value realization | value stream map、dashboard、benefits loop |
| AI Portfolio Lead | 如何比较多个 AI 投资机会并控制 WIP | portfolio-to-product trace、flow load、evidence gates |
| Product Architect / Enterprise Architect | 如何把 AI use case 连接到 capability、platform、data、risk controls | capability map、platform runway、risk gate architecture |
| Engineering Productivity / Platform PM | AI 工具和平台是否改善 delivery flow 和 stability | DORA / SPACE / Flow metric integration |
| Value Office / Finance Partner | AI 收益如何被量化、归因、兑现和复核 | benefits realization loop、portfolio evidence pack |
| Risk / Compliance / Model Risk | AI 风险如何贯穿 idea、pilot、release、scale、operate | NIST AI RMF-aligned gate evidence |
3.3 核心观点
成熟的 AI VSM 不是 "流程可视化", 而是四个系统合一:
| 系统 | 管理对象 | 关键问题 |
|---|---|---|
| Value flow system | idea -> evidence -> release -> adoption -> benefits | 价值是否在流动, 还是卡在 PoC、审批、集成或 adoption |
| Risk flow system | risk hypothesis -> control -> evidence -> monitoring -> incident learning | 风险证据是否跟着价值一起流动 |
| Platform flow system | reusable gateway、eval、RAG、observability、policy、review workflow | 单个 use case 是否沉淀平台能力, 还是制造一次性负担 |
| Funding flow system | capacity、budget、SME review、risk assurance、change management | 资金和容量是否流向最值得验证和放大的 AI bets |
4. 为什么 AI 成功不能只看功能 / 调用量
AI 产品常见汇报方式:
- 上线了 12 个 AI features。
- LLM API calls 达到 200 万次。
- 生成了 50 万段摘要。
- 模型准确率达到 92%。
- 用户满意度问卷达到 4.6/5。
- 预计节省 30% 人工时间。
这些都是 signals, 但不是完整 value evidence。
| 指标 | 能说明什么 | 不能说明什么 | VSM 补强方式 |
|---|---|---|---|
| Feature count | 团队产出了可见功能 | 是否解决高价值流程问题 | 连接到 value stream stage 和 business outcome |
| API call / token count | 系统被调用 | 调用是否产生合格业务结果 | 改成 qualified value event throughput |
| Generated output count | AI 很活跃 | 输出是否被采纳、正确、合规、可审计 | 加入 acceptance、quality pass、risk guardrail |
| Model accuracy | 离线任务表现 | 生产流程中是否改善结果 | 接入 workflow metric、decision quality、online monitoring |
| User rating | 体验倾向 | 是否存在选择偏差、风险转移或短期新鲜感 | 分 cohort、risk tier、workflow step 追踪 |
| Estimated time saved | 潜在效率 | 时间是否被释放、转用或财务认可 | 建 benefits realization loop 和 finance sign-off |
成熟 AI 价值叙事:
Model capability
-> AI behavior evidence
-> workflow adoption
-> qualified value event
-> risk-adjusted business outcome
-> finance-recognized benefit
-> portfolio scale / stop decision
一句话:
AI value stream 的主语不是模型, 而是受 AI 改变的业务流程和决策流。
4.1 Vanity Metrics 到 Flow Metrics 的迁移
| Vanity metric | 更成熟的 flow / value metric |
|---|---|
| AI features shipped | use cases passing release gate and adoption gate |
| Prompts created | prompts with eval coverage, owner and production trace |
| Model calls | qualified AI-assisted workflow completions |
| Users activated | eligible users reaching repeat adoption in target workflow |
| Time saved estimate | finance-recognized capacity release or SLA improvement |
| PoCs completed | PoCs converted to release, platform capability, or stop decision |
| Accuracy score | risk-tiered eval pass plus online quality and guardrail stability |
| Platform integrations | production workflows reusing approved platform controls |
4.2 AI Value Event 的定义
AI value event 不是 "AI 被调用", 而是一个带质量、风险、采用和经济性约束的业务事件。
Qualified AI value event =
eligible workflow item
+ AI exposure or AI-assisted action
+ user / system adoption signal
+ quality threshold pass
+ risk guardrail pass
+ auditable trace
+ unit economics within boundary
金融零售示例:
| 场景 | 不成熟事件 | 合格 AI value event |
|---|---|---|
| AML alert triage | generated summaries | analyst-reviewed case summaries that reduce investigation time with no critical evidence defect |
| Customer service copilot | answer generated | grounded answer accepted by agent, customer issue resolved, no reopen or policy breach |
| Credit memo assistant | draft created | underwriter-accepted memo section with policy citation pass and no prohibited recommendation |
| Branch knowledge assistant | employee question asked | cited policy answer accepted by employee and no escalation caused by stale knowledge |
| AI platform | applications connected | production AI workflows passing value, risk, reliability and cost gates through shared platform |
5. Value Stream Map: AI Delivery
AI delivery value stream 的目标是把 AI idea 从 "想法" 管到 "可控生产价值", 而不是管任务清单。
5.1 End-to-End AI Delivery Flow
Strategic theme
-> Idea intake
-> Discovery evidence
-> Value stream design
-> Data / eval readiness
-> Build and integration
-> Risk and release gate
-> Limited release
-> Adoption and operating model
-> Benefits realization
-> Scale / hold / stop / retire
5.2 Stage-by-Stage Map
| Stage | Primary question | Key evidence | Flow metric | Gate decision |
|---|---|---|---|---|
| Strategic theme | 这个 AI 投资方向是否符合战略和风险偏好 | portfolio thesis、capability map、process baseline | strategy-to-intake lead time | fund theme / narrow theme / park |
| Idea intake | 是否值得进入 discovery | problem owner、APQC process、baseline hypothesis、risk hypothesis | idea acceptance rate、intake queue age | accept / reject / merge |
| Discovery evidence | 问题是否真实且适合 AI | workflow map、no-AI option、data readiness、AI fit | idea-to-evidence lead time | fund pilot / refine / stop |
| Value stream design | AI 插入哪个流程步骤并改变什么决策 | current-state / target-state map、RACI、decision boundary | flow design cycle time | approve value stream / redesign |
| Data / eval readiness | 数据、知识、eval 和 telemetry 是否可用 | source owner、freshness、label quality、golden set、metric contract | data wait time、eval design lead time | build / data investment / stop |
| Build and integration | 是否能安全接入生产系统和平台 | architecture sketch、model gateway、RAG、tool permissions、tests | build flow time、review turnaround | release candidate / rework |
| Risk and release gate | 是否可控上线 | eval result、risk control、rollback、monitoring、audit trace | gate queue time、evidence aging | go / limited go / no-go |
| Limited release | 真实流程中是否安全有效 | cohort results、usage、quality、incident、cost | release-to-adoption lead time | continue / restrict / rollback |
| Adoption and operating model | 用户是否在正确流程中持续采用 | training、support、manager cadence、override analysis | adoption flow time、support load | scale / change plan / hold |
| Benefits realization | 业务收益是否兑现并被认可 | baseline、incremental effect、cost、risk adjustment、finance sign-off | benefit recognition cycle time | scale / hold / stop |
| Scale / stop / retire | 是否扩大、限制、停止或沉淀平台能力 | realized benefits、risk trend、platform capacity、unit economics | scale cycle time、retire cycle time | scale / hold / retire / stop |
5.3 Current-State vs Target-State AI VSM
Current-state map 要暴露真实阻塞:
| Current-state symptom | 可能真实阻塞 | 需要测的指标 |
|---|---|---|
| PoC 很快, 生产很慢 | eval、security、data access、integration、risk approval 后置 | PoC-to-release lead time、gate queue time |
| 业务提很多 AI idea | 没有 problem baseline 和 process owner | idea rejection reason、discovery WIP |
| 模型效果不错但不用 | workflow fit、trust、training、manager incentive 不匹配 | adoption flow time、accepted output rate |
| 平台建了但用例仍慢 | 平台能力没有覆盖数据、eval、risk evidence 或 operations | platform reuse lead time、self-service completion |
| 上线后价值说不清 | 没有 baseline、telemetry、causal design、finance owner | benefit recognition cycle time、metric contract coverage |
| 审批越来越多 | 风险证据不可复用, gate 只靠人工会议 | evidence reuse rate、risk review capacity load |
Target-state map 要定义三个改进方向:
- Reduce waiting: data access、risk review、SME review、eval queue、release approval。
- Reduce rework: unclear problem、weak eval、ambiguous decision boundary、poor telemetry。
- Increase value throughput: qualified value events、benefit realization、platform reuse。
6. Value Stream Map: AI-Enabled Operations
AI-enabled operations 是 AI 上线后的业务运营价值流。它回答: AI 是否真正改变了日常流程, 并在真实风险边界内创造价值。
6.1 Generic AI-Enabled Operations Flow
Business event / case
-> eligibility check
-> context and knowledge retrieval
-> AI draft / recommendation / action proposal
-> human judgment or policy automation
-> workflow action
-> QA / sampling / exception handling
-> customer / operations outcome
-> monitoring and incident response
-> benefits and learning feedback
6.2 Operations Flow Metrics
| Flow step | AI control point | Metric | Risk guardrail |
|---|---|---|---|
| Case eligibility | AI only touches approved workflow items | eligible case coverage、exclusion accuracy | high-risk case wrongly included |
| Context retrieval | AI uses permitted and fresh sources | retrieval success、citation correctness、knowledge freshness | unauthorized data access、stale policy |
| AI output | Output fits task and boundary | groundedness、format validity、answerability、tool call correctness | hallucination、unsupported claim、overconfident answer |
| Human judgment | Accountability remains clear | acceptance rate、override rate、review time | blind acceptance、review fatigue |
| Workflow action | Action improves flow | cycle time、touches per case、queue age、rework | wrong action、customer harm |
| QA / exception | Quality feedback is captured | QA defect rate、sample coverage、failure taxonomy closure | critical defect not escalated |
| Customer / ops outcome | Business result improves | FCR、AHT、backlog age、loss avoided、complaint rate | unfair outcome、policy breach |
| Monitoring | Drift and incidents are visible | incident detection time、cost drift、model / data drift | unmonitored degradation |
| Learning feedback | Evidence changes product and platform | eval case added、runbook updated、control improved | repeated failure |
6.3 Delivery Flow vs Operations Flow
| Dimension | AI delivery value stream | AI-enabled operations value stream |
|---|---|---|
| Primary object | AI change from idea to release | Business work item from intake to outcome |
| Main bottleneck | discovery, data, eval, risk gate, integration | adoption, trust, exception handling, QA, operating capacity |
| Main metrics | lead time, WIP, blocked time, gate queue, release stability | qualified value events, cycle time, quality, guardrail, benefit |
| Main owner | AI PM, product architect, engineering, platform | business ops owner, product owner, risk owner |
| Scale risk | shipping unsafe or low-value capabilities | amplifying wrong workflow behavior at scale |
| Feedback loop | production telemetry updates SDLC gates | operational outcomes update product, policy, eval and training |
高级判断:
Delivery VSM 证明 AI 能安全上线; Operations VSM 证明 AI 在真实业务流程中持续创造价值。
7. Flow Metrics and AI-Specific Extensions
Flow Metrics 的价值是把价值流看成一个系统, 而不是看单个团队产出。AI 场景需要扩展到 data、eval、risk、platform、adoption 和 benefits realization。
7.1 Core Flow Metrics
| Metric | AI VSM 解释 | 典型问题 |
|---|---|---|
| Flow time | 一个 AI work item 从进入 value stream 到完成某个价值阶段的总时间 | 从 idea 到 release 为什么 6 个月 |
| Flow velocity | 单位时间内完成的 AI value items 数量 | 每月有多少 use case 穿过 release gate 或 adoption gate |
| Flow efficiency | 真正工作时间 / 总流动时间 | 时间花在 build, 还是卡在 data、review、approval |
| Flow load | 正在进行的 AI work items 数量 | 组合是否 WIP 过高导致所有东西都慢 |
| Flow distribution | work item 类型比例 | 是否全是 features, 没有 risk work、platform runway、debt、defects |
| Flow predictability | flow time / delivery outcome 是否稳定 | release 和 benefits 是否可预测 |
7.2 AI-Specific Flow Metrics
| Metric | Definition | Owner | Decision it supports |
|---|---|---|---|
| Idea-to-evidence lead time | 从 idea intake 到 discovery evidence 可用于决策的时间 | AI PM / BA | discovery capacity 是否足够 |
| Evidence conversion rate | 进入 discovery 的 idea 中, 有多少转成 pilot、platform investment 或 stop decision | Portfolio owner | idea funnel 质量 |
| Data readiness wait time | work item 等待数据 owner、access、quality、lineage 或 retention 决策的时间 | Data owner / Architect | 是否需要 data product investment |
| Eval design lead time | 从需求确认到 eval contract 可运行的时间 | EvalOps / AI PM | eval 是否前置 |
| Risk gate queue time | work item 等待 risk、privacy、security、model risk、legal 评审的时间 | Risk governance owner | gate 是否被证据自动化支撑 |
| Evidence aging | 关键 gate evidence 距离最新生产配置或政策版本的年龄 | Product architect / Risk | 是否需要重新验证 |
| Platform reuse lead time | 从 use case 需要能力到通过共享平台完成接入的时间 | Platform PM | 平台是否真的降低接入成本 |
| Human review capacity load | SME、risk reviewer、QA reviewer 的排队和使用率 | Ops / Risk | review capacity 是否成为 scale bottleneck |
| Release-to-adoption lead time | 从 limited release 到目标 cohort 达到 repeat adoption 的时间 | Product Ops / Business ops | change management 是否有效 |
| Qualified value throughput | 单位时间内通过质量和风险门槛的 AI value events | Product / Value Office | AI 是否产生真实业务价值 |
| Benefit recognition cycle time | 从上线到 finance / business owner 确认收益的时间 | Value Office / Finance | 收益兑现是否过慢 |
| Risk-adjusted flow velocity | 按风险、质量和成本调整后的 value event throughput | Portfolio owner | 哪些 use case 值得 scale |
| Stop decision latency | 已出现 stop signal 到正式停止或限制的时间 | Portfolio governance | 是否敢于及时停止低质量投资 |
7.3 DORA Extensions for AI Flow
| DORA lens | AI extension | Interpretation |
|---|---|---|
| Change lead time | intent-to-production lead time, spec-to-eval lead time, eval-to-release lead time | AI 不是从 commit 开始, 而是从业务意图和可评估需求开始 |
| Deployment frequency | risk-tiered release frequency for code, prompt, model, data, RAG index, policy, tool schema | 频率必须按 artifact type 和 risk tier 分层解释 |
| Change fail rate | AI changes causing rollback, hotfix, eval regression, policy breach, customer impact or manual remediation | 包含行为失败, 不只是服务崩溃 |
| Failed deployment recovery time | time to disable AI path, rollback prompt / index / model, switch to fallback or human-only mode | AI recovery 常需要流程和运营一起恢复 |
| Deployment rework rate | unplanned releases caused by AI incident, eval miss, data drift or control failure | 衡量 gate 是否把问题前置 |
7.4 SPACE Signals for AI VSM
SPACE 不应该用来给个人排名。它用于解释 AI value stream 中的人和协作是否健康。
| SPACE dimension | AI VSM 信号 | 为什么重要 |
|---|---|---|
| Satisfaction and well-being | reviewer load、SME fatigue、trust in AI outputs、change fatigue | 价值流不能靠压垮专家和一线员工来换速度 |
| Performance | quality-passed outcomes、risk-adjusted value、customer / ops result | activity 上升不等于绩效改善 |
| Activity | accepted AI-assisted work、reviewed outputs、evidence updates | activity 只作为诊断输入 |
| Communication and collaboration | handoff clarity、decision log quality、cross-owner response time | AI use case 跨 product、data、risk、ops、engineering |
| Efficiency and flow | blocked time、context switching、review turnaround、work item aging | AI 团队最常见损失来自等待和返工 |
7.5 Flow Distribution for AI Portfolio
AI portfolio 如果只塞满 features, 生产会越来越慢。建议把 work item 分成六类:
| Work type | 说明 | 健康组合信号 |
|---|---|---|
| Value feature | 直接改善业务流程的 AI capability | 与 portfolio theme 和 process baseline 绑定 |
| Risk and assurance | eval、red-team、policy、privacy、安全、model risk、audit evidence | 高风险用例必须配套足够容量 |
| Platform runway | model gateway、RAG、eval、observability、policy-as-code、review workflow | 能被多个 use case 复用 |
| Data / knowledge product | 数据质量、知识 ownership、taxonomy、lineage、freshness | 解决重复 data readiness 阻塞 |
| Adoption / change | training、manager cadence、support、process redesign、role redesign | release 后 adoption 不会自然发生 |
| Operational debt | legacy bot retirement、duplicate prompt cleanup、stale index remediation、runbook gaps | 防止 AI landscape 变成不可治理 |
8. Connecting Flow, DORA / SPACE, Risk Gates and Business Value
成熟 AI VSM 的 dashboard 不应该是一个单层报表, 而是一个多层 operating system。
Portfolio thesis
-> value stream and process baseline
-> delivery flow metrics
-> DORA / SPACE signals
-> risk gate evidence
-> operational adoption metrics
-> qualified value events
-> benefits realization
-> portfolio funding and scale / stop decisions
8.1 Metric Stack
| Layer | 关键指标 | 决策 |
|---|---|---|
| Portfolio | WIP by stage、risk tier distribution、flow load、evidence conversion、scale / stop ratio | fund、hold、stop、allocate capacity |
| Product value stream | idea-to-evidence、release-to-adoption、qualified value throughput、benefit recognition | prioritize、redesign workflow、scale |
| Engineering / platform | DORA metrics、platform reuse lead time、self-service success、incident recovery | invest in platform runway、reduce bottleneck |
| Team / collaboration | SPACE signals、review load、SME fatigue、blocked time、handoff age | adjust WIP、staff review capacity、change cadence |
| Risk / governance | gate queue、evidence aging、critical failure、control coverage、incident trend | approve、limit、rollback、increase assurance |
| Business value | AHT、FCR、loss avoided、backlog age、complaint、finance-recognized benefit | scale、redeploy capacity、change funding |
8.2 Operating Review Cadence
| Cadence | Participants | Focus | Decisions |
|---|---|---|---|
| Daily / twice-weekly flow clearing | AI PM、tech lead、data owner、risk liaison、ops lead | blocked work、aging items、review queue、failed eval | unblock, split work, route owner, reduce WIP |
| Weekly value stream review | Product, engineering, platform, risk, ops, data | flow metrics、DORA trend、eval status、adoption signal | select one bottleneck and one improvement bet |
| Biweekly release / adoption review | Product Ops, business owner, risk, support, QA | limited release, training, support tickets, override, guardrail | continue, restrict, rollback, expand cohort |
| Monthly benefits review | Value Office, finance, product, business owner, risk | baseline, incremental effect, cost, risk adjustment | recognize benefit, revise model, stop weak investment |
| Quarterly portfolio review | Executive sponsor, portfolio owner, CTO / CPO, risk, finance | stage distribution, capacity allocation, platform runway, scale / stop | fund, scale, retire, increase assurance, shift capacity |
8.3 Metric Tension Pairs
AI VSM 必须同时看正向指标和反向指标, 防止局部优化。
| Positive metric | Tension metric | Why it matters |
|---|---|---|
| Flow velocity | change fail rate、critical failure rate | 快不能牺牲安全 |
| Deployment frequency | risk-tiered approval and incident trend | 高频发布必须上下文解释 |
| Adoption rate | override quality、blind acceptance signal | 采用可能来自过度信任 |
| AHT reduction | rework、complaint、QA defect | 快不能降低质量 |
| Platform reuse | platform wait time、custom exception rate | 平台复用不能变成新瓶颈 |
| Qualified value throughput | benefit recognition and risk-adjusted value | 事件数量必须能转成收益 |
| SME review coverage | reviewer fatigue and queue age | 不能把风险控制建立在不可持续人工负荷上 |
9. Portfolio-to-Platform-to-Product Trace
AI VSM 的高级能力是把一个 use case 从 portfolio thesis 一直追到 product telemetry 和 benefit ledger。
9.1 Trace Chain
Portfolio theme
-> business capability / APQC process
-> end-to-end value stream
-> AI use case hypothesis
-> platform capability dependency
-> product workflow intervention
-> release gate evidence
-> adoption telemetry
-> qualified value event
-> benefits realization
-> scale / stop decision
9.2 Trace Table
| Trace element | Example | Owner | Evidence |
|---|---|---|---|
| Portfolio theme | Regulated operations intelligence | Portfolio owner | investment thesis、capacity allocation |
| Business capability | Financial crime operations、customer servicing、credit operations | Enterprise architect / business owner | capability map、process baseline |
| Value stream | Alert intake to investigation closure | Business ops owner | current-state and target-state VSM |
| Use case hypothesis | AI summary and checklist reduce low-risk alert handling time without quality loss | AI PM | discovery brief、no-AI alternative |
| Platform dependency | RAG, model gateway, eval harness, trace logging, human review queue | Platform PM | platform service contract |
| Product intervention | Analyst sees cited summary and next-step checklist in case workflow | Product owner | UX / workflow spec、decision boundary |
| Release evidence | eval pass, SME review, privacy approval, rollback, monitoring | Risk / release owner | evidence binder |
| Adoption telemetry | weekly active analysts, accepted summaries, edits, overrides | Product Ops | telemetry contract |
| Value event | quality-approved AI-assisted investigations completed within SLA | Value Office | metric contract、dashboard |
| Benefit | AHT reduction, backlog age reduction, QA defect stability | Finance / business owner | benefits register |
| Decision | scale to second queue, hold for SME load, or stop | Portfolio governance | scale / stop memo |
9.3 Platform Runway Link
AI VSM should show whether a use case creates platform leverage.
| Platform capability | Flow bottleneck it removes | Metric |
|---|---|---|
| Model gateway | repeated model approval, cost logging, vendor switch friction | model access lead time、policy compliance coverage |
| Eval harness | manual release evidence, inconsistent thresholds | eval design lead time、regression coverage |
| RAG / knowledge service | duplicate indexes, stale knowledge, weak citation | retrieval integration lead time、freshness SLA |
| Observability and trace | value and incident evidence missing | trace coverage、incident detection time |
| Policy-as-code | slow manual risk review | gate automation rate、policy exception aging |
| Human review workflow | SME review queue hidden and unmanaged | review queue age、review capacity load |
| Evidence binder | audit response depends on manual reconstruction | evidence completeness、audit response effort |
9.4 Portfolio Funding Implication
The funding question changes from:
Should we fund this AI feature?
to:
Should we fund this value stream improvement,
including product workflow, platform runway, risk assurance,
adoption support and benefits measurement?
Funding buckets:
| Bucket | Why it matters for flow | Example investment |
|---|---|---|
| Use case delivery | creates near-term business value | AML triage assistant limited release |
| Platform runway | reduces future flow time and risk cost | shared eval, RAG, audit trace |
| Risk assurance | prevents late gate blockage and unsafe scale | red-team, policy controls, model risk evidence |
| Data / knowledge readiness | removes repeated upstream blockers | policy knowledge ownership, case label quality |
| Adoption and change | converts release into operational value | training, manager cadence, support playbook |
| Benefits measurement | turns activity into finance-recognized value | baseline, telemetry, causal design, benefit register |
10. Risk Gates and Business Value Gates
AI VSM treats gates as flow control points. A good gate can block, limit, route, accelerate or retire work based on evidence.
10.1 Gate Architecture
| Gate | Purpose | Evidence | Decision |
|---|---|---|---|
| Intake gate | prevent weak AI ideas from entering delivery | business problem、process owner、baseline hypothesis、risk hypothesis | accept / reject / merge / park |
| Discovery gate | prove the problem is worth pilot capacity | workflow map、AI fit、no-AI option、data readiness、risk tier | fund pilot / refine / stop |
| Eval readiness gate | make quality measurable before build completion | eval contract、golden set、failure taxonomy、metric owner | build / revise eval / data work |
| Architecture gate | avoid one-off design and unsafe integration | target architecture、platform dependency、data boundary、rollback sketch | approve / redesign / use platform |
| Risk gate | ensure control path fits risk tier | privacy, security, model risk, compliance, human oversight, monitoring | go / limited go / no-go |
| Release gate | validate production readiness | release bundle, versions, SLO, incident route, evidence binder | release / conditional release / rollback |
| Adoption gate | prove real users changed workflow | cohort usage、acceptance、override、support load、training | scale / improve change plan / restrict |
| Benefits gate | prove value is real and recognized | baseline、incremental effect、cost、risk adjustment、finance sign-off | scale / hold / stop |
| Scale / retire gate | decide where capacity goes next | risk trend、platform capacity、unit economics、adoption durability | scale / hold / retire / stop |
10.2 Gate Principles
- Gate evidence should be created during work, not reconstructed for meetings.
- High-risk use cases need earlier gates, not only stricter release gates.
- Gate queue time is a flow metric. If it grows, the operating system is under-designed.
- A gate without a stop or limit decision is just paperwork.
- A gate should distinguish "not ready yet" from "not worth doing" from "convert this into platform or data investment"。
10.3 AI Risk Gate Checklist
| Risk dimension | Gate question | Example evidence |
|---|---|---|
| Customer harm | Could AI mislead, deny, delay, overcharge, expose or unfairly treat a customer | customer impact assessment、complaint trigger |
| Decision authority | Is AI search, draft, recommend, triage, decision support or automated action | AI authority statement、human oversight RACI |
| Data and privacy | Are data rights, retention, PII handling and access boundaries clear | data inventory、privacy review、access log |
| Model behavior | Are hallucination, unsupported claim, toxicity, calibration and answerability controlled | eval results、failure taxonomy |
| Security | Are prompt injection, tool abuse, secret exposure and supply chain risks addressed | security test、tool permission matrix |
| Compliance | Are regulated statements, advice, KYC / AML / credit policies controlled | policy tests、compliance sign-off |
| Operations | Can users handle exceptions, overrides, appeals and support load | runbook、support model、QA sampling plan |
| Reliability | Can the system fail safely and recover quickly | fallback、feature flag、rollback drill |
| Auditability | Can decisions and outputs be explained and reconstructed | trace, versions, citations, approval records |
| Benefits integrity | Can value claims be measured and attributed | metric contract、baseline、benefits register |
11. Financial Retail Case: Regulated Operations Intelligence
11.1 Context
A financial retail institution wants to use AI across AML operations, customer servicing and branch knowledge. The executive goal is not "launch more AI", but:
Reduce regulated operations workload,
improve decision quality,
shorten customer and employee waiting time,
and build reusable AI platform controls
without increasing compliance, privacy, fairness or operational risk.
11.2 Portfolio Theme and Value Streams
| Portfolio theme | Business capability | Value stream | Candidate AI intervention |
|---|---|---|---|
| Regulated operations intelligence | Financial crime operations | Alert intake -> investigation -> QA -> closure / escalation | AML alert triage assistant |
| Regulated operations intelligence | Customer servicing | Contact intake -> intent -> answer -> resolution -> QA | Customer service policy copilot |
| Regulated operations intelligence | Branch operations | Employee query -> policy retrieval -> customer conversation -> follow-up | Branch knowledge assistant |
| Regulated operations intelligence | Credit operations | Document intake -> policy check -> memo -> underwriting review | Credit memo assistant |
11.3 Current-State Observations
| Observation | Flow implication | Evidence to collect |
|---|---|---|
| AML analysts spend significant time assembling context before judgment | high wait and search time inside operations flow | time study、case system log、analyst interview |
| Customer agents search multiple knowledge bases during calls | high context switching and inconsistent answers | desktop telemetry、transfer reason、QA defects |
| Branch staff ask support teams repetitive policy questions | hidden demand and slow knowledge flow | support ticket volume、policy query categories |
| Credit memo quality varies by underwriter and source documents | high rework and policy citation gaps | memo QA、exception reason、cycle time |
| AI PoCs depend on different RAG and logging patterns | platform fragmentation | architecture inventory、reuse gap |
11.4 Target Value Stream: AML Alert Triage Assistant
Alert created
-> eligibility check
-> retrieve case history, transaction context and policy guidance
-> AI produces cited summary and investigation checklist
-> analyst reviews, edits and decides next action
-> QA samples output and decision evidence
-> case closed or escalated
-> telemetry updates eval set, risk controls and benefits register
11.5 Metrics for the AML Stream
| Metric layer | Metric | Baseline / target logic |
|---|---|---|
| Flow | alert-to-first-action time、case cycle time、queue age | reduce waiting and context assembly time |
| Adoption | eligible analyst weekly active usage、summary acceptance、edit distance、override reason | prove analyst workflow fit |
| AI quality | citation correctness、unsupported claim rate、missing key fact rate、checklist relevance | release and regression evidence |
| Risk guardrail | critical evidence defect、wrong policy reference、unauthorized data retrieval、SAR escalation error | hard stop or scale restriction |
| DORA / platform | prompt / index change lead time、eval-to-release time、rollback time、change fail rate | govern AI artifact changes |
| SPACE | analyst trust pulse、SME review load、review queue age、change fatigue | ensure control system is sustainable |
| Business value | AHT reduction、backlog age、QA defect stability、capacity redeployment | benefits realization |
| Unit economics | AI cost per case、QA cost per case、platform support cost | scale economics |
11.6 90-Day Decision Example
| Evidence | Result | Interpretation |
|---|---|---|
| Summary acceptance | 72% of eligible pilot alerts | adoption signal positive |
| Case handling time | median handling time down 12% in pilot queue | value signal positive |
| QA defect | no increase in critical defects | quality guardrail stable |
| SME review load | review queue age grew from 1 day to 4 days | scale bottleneck |
| Citation correctness | 96% pass on QA sample | release evidence acceptable |
| Cost per case | within approved pilot envelope | economics acceptable for limited scale |
| Benefit recognition | finance accepts capacity release only for low-risk queue | benefits are partial but credible |
Decision:
Hold broad scale.
Scale only to a second low-risk queue after human review workflow is improved.
Fund platform runway for reusable SME sampling, evidence binder and eval case management.
Do not expand to high-risk typologies until critical failure taxonomy and review capacity are stronger.
11.7 Portfolio Flow Snapshot
| Stage | Use cases | Governance focus |
|---|---|---|
| Idea | collections contact strategy | risk and fairness pathway unclear, keep outside delivery WIP |
| Discovery | credit memo assistant、complaint classification | data readiness, policy boundaries, baseline |
| Pilot | AML alert triage assistant | eval, SME review, limited cohort, benefit signal |
| Release | customer service policy copilot | monitoring, training, wrong-policy guardrail |
| Scale | branch knowledge assistant | knowledge freshness, platform reuse, support deflection |
| Retire | legacy FAQ bot | migrate content to governed knowledge service |
11.8 What This Case Shows in an Interview
| Signal | What it demonstrates |
|---|---|
| AI use case tied to APQC-like process and business capability | enterprise architecture maturity |
| Delivery VSM plus operations VSM | ability to connect build, release, adoption and business outcomes |
| Flow Metrics plus DORA / SPACE | advanced product operations and engineering productivity thinking |
| Risk gate and benefit gate | regulated AI governance maturity |
| Platform runway investment | architecture leverage and portfolio economics |
| Scale held due to SME bottleneck | mature decision discipline, not AI hype |
12. Templates
12.1 AI Value Stream Canvas
# AI Value Stream Canvas
Value stream name: AML alert triage to investigation closure
Portfolio theme: Regulated operations intelligence
Business capability: Financial crime operations
Business owner: Head of AML Operations
Product owner: AI Operations Product Lead
Risk owner: Financial Crime Risk
Platform owner: AI Platform Lead
Review date: 2026-06-29
## 1. Business outcome
Reduce low-risk alert handling time and backlog age while maintaining investigation quality and auditability.
## 2. Current-state flow
Alert created -> analyst searches customer and transaction context -> analyst checks policy and typology notes -> analyst drafts case narrative -> QA sample -> closure or escalation.
## 3. Current bottlenecks
- Context assembly consumes analyst time.
- Policy search is inconsistent across teams.
- QA defects often relate to missing evidence or weak narrative support.
- Review capacity is limited for pilot expansion.
## 4. AI intervention
AI retrieves approved case context and policy guidance, drafts a cited summary, and recommends an investigation checklist.
AI does not close alerts, file SARs, downgrade risk, or make final compliance decisions.
## 5. Target-state flow
Alert created -> eligibility check -> approved context retrieval -> AI cited summary and checklist -> analyst review and decision -> QA sample -> telemetry feedback to eval and benefits register.
## 6. Flow metrics
| Metric | Baseline | Target / decision rule |
|---|---:|---|
| Idea-to-evidence lead time | 18 business days | <= 15 business days for similar future use cases |
| Release-to-adoption lead time | 6 weeks | repeat adoption by >= 65% eligible analysts within 6 weeks |
| Qualified value throughput | 0 | quality-approved assisted alerts per week after limited release |
| SME review queue age | 1 day | must not exceed 3 days during pilot |
## 7. Risk gates
| Gate | Evidence |
|---|---|
| Eval readiness | 500 historical case golden set, critical failure taxonomy, citation rubric |
| Release | rollback path, trace logging, human approval, QA sample plan |
| Scale | no critical evidence defect, stable QA, review capacity within threshold |
## 8. Benefits realization
Recognize benefit only for eligible low-risk alerts where AI was exposed, analyst accepted or materially used the output, QA passed, and case handling time improved versus baseline or control.
## 9. Scale / stop rule
Scale to another low-risk queue if AHT improves >= 10%, QA critical defects remain 0, citation pass >= 95%, review queue age <= 3 days, and cost per case stays within approved ceiling.
Stop expansion if any critical privacy, evidence or regulated decision boundary breach occurs.
12.2 Flow Metrics Dashboard
# AI Flow Metrics Dashboard
Audience: AI portfolio review, product operations, platform leadership, risk governance, Value Office
Cadence: weekly for flow, monthly for benefits, quarterly for portfolio funding
## Executive summary
| Decision area | Metric | Current signal | Decision |
|---|---|---|---|
| Portfolio WIP | active use cases by stage and risk tier | Pilot WIP exceeds review capacity | freeze new high-risk pilots |
| Value flow | qualified value throughput | customer service copilot improving | prepare scale memo |
| Bottleneck | SME review queue age | AML review queue aging | fund review workflow automation |
| Risk | critical guardrail breaches | none in limited release | continue release with monitoring |
| Benefits | finance-recognized benefit | partial recognition for service queue | expand causal measurement |
## Portfolio flow
| Metric | Definition | Slice |
|---|---|---|
| Flow load | active AI work items in idea, discovery, pilot, release, scale | stage, risk tier, business capability |
| Flow distribution | percent of work in feature, platform, risk, data, adoption, debt | portfolio theme |
| Evidence conversion | percent of discovery items converted to pilot, platform investment or stop decision | business unit |
| Stop decision latency | days from stop signal to decision | owner, risk tier |
## Product value stream
| Metric | Definition | Slice |
|---|---|---|
| Idea-to-evidence lead time | intake accepted to discovery evidence ready | use case, capability |
| Release-to-adoption lead time | limited release to repeat adoption threshold | cohort, role |
| Qualified value throughput | quality and risk-passed AI value events per week | workflow, risk tier |
| Benefit recognition cycle time | production release to finance-recognized benefit | business unit |
## Engineering and platform
| Metric | Definition | Slice |
|---|---|---|
| AI change lead time | spec / eval / build / release time for AI artifacts | code, prompt, model, index, policy |
| Change fail rate | AI changes requiring rollback, hotfix or control intervention | service, risk tier |
| Platform reuse lead time | request to production use of shared platform capability | capability |
| Eval queue time | eval submitted to gate decision | use case, test suite |
## Risk and operations
| Metric | Definition | Slice |
|---|---|---|
| Evidence aging | days since evidence matched current production config | evidence type |
| Critical failure rate | no-go failures per evaluated workflow item | risk category |
| Human review load | reviewer capacity used and queue age | SME, risk, QA |
| Incident recovery time | detection to fallback / rollback / stable operation | incident type |
## Adoption and SPACE
| Metric | Definition | Slice |
|---|---|---|
| Repeat adoption | eligible users using AI in target workflow over repeated periods | role, team |
| Override reason distribution | why users reject or edit AI output | workflow step |
| Trust pulse | user trust score with free-text reason coding | cohort |
| Review fatigue signal | reviewer load, after-hours review, queue stress | reviewer group |
12.3 Blocked Work Taxonomy
| Block category | Symptom | Metric | Owner | Response |
|---|---|---|---|---|
| Business problem unclear | idea keeps changing, no baseline | discovery rework count | business owner / AI PM | rewrite problem statement, freeze baseline hypothesis |
| Process owner missing | no one owns workflow change | intake aging | portfolio owner | require process owner before discovery |
| Data access / rights | waiting for permissions or data retention decision | data readiness wait time | data owner / privacy | create data decision record and approved access path |
| Data quality / labels | eval or pilot cannot trust labels | label defect rate、golden set aging | data product owner | fund data product or reduce scope |
| Knowledge freshness | RAG answers cite stale policy | freshness breach count | knowledge owner | assign content owner and freshness SLA |
| Eval ambiguity | teams debate quality after build | eval design lead time、eval rework | EvalOps / AI PM | define eval contract before release candidate |
| Risk review queue | gate waits on manual review | risk gate queue time | risk governance owner | add evidence checklist, risk liaison, pre-review |
| Security / privacy boundary | unclear PII, tool or prompt injection exposure | security exception age | security / privacy | reduce tool scope, add controls, retest |
| Architecture fit | one-off solution bypasses platform | platform exception rate | architect / platform PM | route through shared capability or approve bounded exception |
| Human review capacity | SME queue grows during pilot | review queue age、review utilization | ops / risk | cap WIP, adjust sampling, fund workflow |
| Adoption friction | users ignore AI after release | release-to-adoption lead time | Product Ops / business ops | redesign workflow, training, manager cadence |
| Telemetry gap | value cannot be measured | metric contract coverage | analytics owner | instrument assignment, exposure, action, outcome |
| Finance recognition | benefit claim not accepted | benefit recognition cycle time | Value Office / finance | agree baseline, effect method, cost treatment |
| Vendor / procurement | model or tool contract delays release | vendor wait time | procurement / platform | use approved gateway or define exit path |
12.4 Benefits Realization Loop
Baseline
-> eligibility and exposure logging
-> adoption and action telemetry
-> quality and risk qualification
-> incremental effect estimate
-> cost and risk adjustment
-> finance / business sign-off
-> capacity redeployment or value capture
-> post-scale audit
-> portfolio scale / stop decision
Template:
# AI Benefits Realization Loop
Use case: Customer service policy copilot
Business owner: Contact Center Operations
Finance owner: FP&A partner
Risk owner: Customer Conduct Risk
Review period: 2026 Q3 month 2
## Baseline
Eligible contacts: 620,000 per month
Baseline AHT: P50 7.8 minutes
Baseline reopen rate: 11.2%
Baseline complaint rate: 0.42%
Baseline QA policy defect rate: 3.1%
## Exposure and adoption
Treatment cohort: servicing queues A and B
Eligible exposed contacts: 184,000
Accepted AI answer: 118,000
Repeat adoption: 68% of eligible agents
## Quality and guardrail
Citation QA pass: 96.5%
Critical wrong policy answer: 0
PII leakage: 0
Reopen rate: no credible increase versus control
Complaint rate: stable within approved threshold
## Incremental effect
Cluster rollout estimate: AHT reduction of 0.6 minutes per accepted exposed resolved contact.
Effect recognized only for contacts passing quality and guardrail filters.
## Cost and risk adjustment
Costs included: model, retrieval, platform support, QA sampling, training, monitoring.
Risk adjustment: high-risk intents excluded from recognition until separate gate.
## Realized benefit
Recognized benefit: capacity redeployed to peak-hour backlog and training coverage.
Finance treatment: limited-scale capacity benefit accepted for Q3 operating review.
## Next decision
Scale to two additional low-risk servicing queues.
Hold complaints, hardship and vulnerable customer intents until policy and escalation controls pass separate gate.
12.5 Portfolio Evidence Pack
# AI Portfolio Evidence Pack
Portfolio theme: Regulated operations intelligence
Quarter: 2026 Q3
Decision meeting: Quarterly AI portfolio review
## 1. Portfolio thesis
Invest in AI capabilities that reduce regulated operations workload, improve quality, reuse approved platform controls, and produce measurable value evidence within one quarter.
## 2. Stage distribution
| Stage | Items | Risk profile | Decision need |
|---|---|---|---|
| Idea | collections contact strategy | high fairness and conduct risk | park until risk pathway matures |
| Discovery | credit memo assistant | high credit policy risk | continue data readiness work |
| Pilot | AML alert triage assistant | high AML evidence risk | limited scale only if review load is controlled |
| Release | customer service policy copilot | medium conduct risk | scale low-risk intents |
| Scale | branch knowledge assistant | low / medium policy risk | expand through knowledge platform |
| Retire | legacy FAQ bot | audit and content duplication risk | migrate and shut down |
## 3. Flow metrics
| Metric | Portfolio signal | Decision |
|---|---|---|
| Flow load | high-risk pilots exceed SME capacity | freeze new high-risk pilot starts |
| Gate queue time | privacy and model risk review stable, SME review aging | fund review workflow |
| Platform reuse | RAG and trace reused by 3 use cases | continue platform runway |
| Benefit recognition | 2 of 4 production use cases have finance-recognized benefit | strengthen telemetry for others |
## 4. Risk evidence
- No critical PII leakage in production AI workflows.
- Two policy citation defects added to eval regression set.
- One tool permission exception closed before release.
- SME review capacity is the primary scale constraint.
## 5. Funding recommendation
- Continue customer service copilot scale for low-risk intents.
- Hold AML broad scale until review queue and sampling workflow improve.
- Convert repeated RAG and evidence needs into platform runway funding.
- Stop independent FAQ bot maintenance and migrate content.
12.6 Scale / Stop Memo
# AI Scale / Stop Memo
Use case: AML alert triage assistant
Current stage: Limited release
Decision requested: Limited scale to second low-risk queue
Decision date: 2026-06-29
## Value evidence
- Median handling time decreased 12% in pilot queue.
- Backlog age decreased 9% for eligible low-risk alerts.
- Analyst repeat adoption reached 72%.
- Summary acceptance reached 76% after week 4.
## Quality and risk evidence
- Critical evidence defect: 0.
- Citation correctness: 96%.
- Unsupported claim rate: below approved threshold.
- AI did not close alerts, file SARs or downgrade risk.
## Flow evidence
- Eval-to-release lead time: 9 business days.
- Risk gate queue time: stable.
- SME review queue age increased to 4 days, above target.
## Unit economics
- AI cost per case inside approved limited-release ceiling.
- QA cost increased due to manual sampling.
- Platform trace and RAG components reused by customer service copilot.
## Decision
Limited scale to one additional low-risk queue.
Condition: review queue age must return to <= 3 days before further expansion.
Fund shared review workflow and evidence binder as platform runway.
Do not expand to high-risk typologies until review capacity, critical failure taxonomy and escalation controls pass scale gate.
## Stop triggers
- Any confirmed critical privacy breach.
- Any AI-attributable alert closure or SAR escalation boundary breach.
- Two consecutive monthly reviews with no handling-time or backlog benefit.
- SME review queue age above 5 business days for two consecutive review cycles.
13. Review Checklists
13.1 AI VSM Design Checklist
- Is the AI use case tied to a named business capability and end-to-end process.
- Is there a current-state and target-state value stream, not only a feature description.
- Are flow metrics defined from idea to release and from release to value realization.
- Are delivery flow and operations flow separated but connected.
- Are DORA metrics adapted to code, prompt, model, data, index, policy and tool changes.
- Are SPACE signals used at team / system level, not for individual ranking.
- Are WIP limits and flow load visible by risk tier and value stream stage.
- Is blocked work classified with owner and response path.
- Are risk gates integrated into flow rather than attached at the end.
- Is benefits realization designed before scale.
13.2 Flow Metrics Review
- Does flow time include waiting, review, gate queue and adoption delay.
- Is flow efficiency exposing delay and handoff, not blaming teams.
- Is flow velocity adjusted for risk, quality and qualified value events.
- Is flow distribution balanced across feature, platform, risk, data, adoption and debt.
- Is flow predictability interpreted by risk tier and service context.
- Are stop decisions tracked as healthy portfolio outcomes.
- Are metrics paired with tension indicators to prevent gaming.
13.3 Risk and Gate Review
- Does each gate have a clear go, limited go, no-go, rollback or stop decision.
- Are privacy, security, compliance, model risk and operational risk addressed before release.
- Is AI authority explicit: search, draft, recommend, triage, decision support or automated action.
- Is human accountability visible in workflow and telemetry.
- Are eval results connected to production monitoring and incident learning.
- Are critical failures added to regression, policy or runbook updates.
- Is evidence aging tracked after model, prompt, data, index or policy changes.
13.4 Benefits Realization Review
- Is there a pre-release baseline.
- Are assignment, exposure, adoption, action, outcome and guardrail logged.
- Is value counted only for quality-passed and risk-passed events.
- Are model, platform, QA, training, monitoring, governance and risk costs included.
- Is finance treatment explicit: cost reduction, capacity redeployment, loss avoided, SLA improvement or revenue protection.
- Is post-scale audit planned to detect benefit decay.
- Is the scale decision based on realized value and risk stability, not only release success.
14. Anti-Patterns
| Anti-pattern | Why it is dangerous | Better practice |
|---|---|---|
| Counting AI features as value | encourages shipping without workflow impact | count qualified value events and benefits |
| Counting calls or tokens as adoption | usage can be waste, rework or curiosity | measure eligible workflow adoption and accepted action |
| Treating PoC completion as success | PoC may avoid production controls | track PoC-to-release and PoC-to-stop conversion |
| Running risk gates only at the end | late rejection creates rework and political pressure | use risk tiering and evidence gates from intake |
| Building every use case as custom | portfolio flow slows and audit burden grows | fund platform runway for repeated capabilities |
| Averaging high and low risk use cases | masks risk and flow differences | slice by risk tier, process, channel and artifact type |
| Using DORA as a ranking tool | damages collaboration and distorts behavior | use DORA for team / service improvement |
| Ignoring SPACE signals | reviewers, SMEs and users become hidden bottlenecks | monitor review load, trust, fatigue and handoff quality |
| Skipping no-AI alternatives | AI becomes solution-first theater | compare rules, UI, workflow redesign, training and automation |
| Claiming time saved without release path | finance cannot recognize vague savings | define capacity redeployment or cost treatment |
| Scaling before adoption is stable | production cost grows without realized value | add adoption gate before scale |
| Treating stop as failure | weak use cases consume scarce capacity | make stop, retire and merge normal portfolio decisions |
15. 30-Day Training Plan
目标: 30 天内产出一套可放入作品集的 AI Value Stream Management / Flow Metrics 证据包。
| Day | Theme | Output |
|---|---|---|
| 1 | Select financial retail AI value stream | use case brief and business capability map |
| 2 | Map current-state process | current-state AI VSM |
| 3 | Define target-state AI intervention | target-state VSM and decision boundary |
| 4 | Identify process owner, risk owner, platform owner | RACI |
| 5 | Define problem baseline | baseline metric table |
| 6 | Define qualified AI value event | value event contract |
| 7 | Review week 1 | one-page value stream narrative |
| 8 | Build delivery flow map | idea-to-release flow |
| 9 | Build operations flow map | business event-to-outcome flow |
| 10 | Define core Flow Metrics | flow time, velocity, efficiency, load, distribution |
| 11 | Extend metrics for AI | data, eval, risk, platform, adoption, benefits |
| 12 | Map DORA metrics to AI artifacts | DORA extension sheet |
| 13 | Map SPACE signals | reviewer, SME, user and team flow signals |
| 14 | Review week 2 | metrics stack summary |
| 15 | Design risk gates | intake, discovery, eval, release, adoption, benefits |
| 16 | Design blocked work taxonomy | blocker table with owner and response |
| 17 | Design flow dashboard | executive, portfolio, product, platform, risk views |
| 18 | Design benefits realization loop | baseline to finance sign-off |
| 19 | Build portfolio-to-platform-to-product trace | traceability table |
| 20 | Define platform runway linkage | platform capability map |
| 21 | Review week 3 | operating model narrative |
| 22 | Write AML case | case flow and metrics |
| 23 | Write customer service case | value event and dashboard |
| 24 | Write credit ops case | risk gate and benefits model |
| 25 | Write branch knowledge case | knowledge freshness and platform reuse |
| 26 | Build scale / stop memo | decision memo |
| 27 | Write interview answers | 6 advanced answers |
| 28 | Assemble portfolio pack | artifacts list and story |
| 29 | Self-review against checklist | gap fixes |
| 30 | Final executive narrative | 5-minute storyline |
完成标准:
- One current-state and one target-state AI value stream map.
- Flow Metrics dashboard with portfolio, product, platform, risk, adoption and benefits layers.
- DORA / SPACE integration without vanity metrics.
- Risk gates and benefits gates connected to scale / stop decisions.
- At least one financial retail case with qualified value event and evidence pack.
- Clear portfolio-to-platform-to-product trace.
16. Interview Answers
Q1: 你如何用 VSM 管 AI use case?
30 秒版本:
我会把 AI use case 当成 value stream item, 从 idea 到 discovery、data / eval readiness、risk gate、limited release、adoption、benefits realization 和 scale / stop 全链路管理。指标上不只看功能上线和调用量, 而是看 flow time、blocked time、gate queue、qualified value events、risk guardrail 和 finance-recognized benefits。
2 分钟版本:
我会先把 use case 挂到业务 capability 和 end-to-end process, 例如 AML alert triage 或 customer servicing。然后画 current-state 和 target-state value stream, 明确 AI 插入哪个步骤、改变什么决策、谁承担 accountability。接着定义 Flow Metrics: idea-to-evidence lead time、data readiness wait time、eval design lead time、risk gate queue time、release-to-adoption lead time、qualified value throughput 和 benefit recognition cycle time。工程侧用 DORA 看 AI artifact change 的 lead time、release frequency、change fail rate 和 recovery; 团队侧用 SPACE 看 review load、SME fatigue、trust 和 collaboration。最后用 risk gate 和 benefits gate 决定 scale、hold、stop 或转成 platform investment。
Q2: 为什么不能用 AI 调用量证明 AI 价值?
30 秒版本:
调用量只能说明 AI 被使用或被系统触发, 不能说明它产生了合格业务结果。金融零售里我更看 qualified AI value event: 目标流程合格、AI 暴露、用户采纳、质量通过、风险门槛通过、可审计、单位经济成立。
2 分钟版本:
例如客服 copilot 的调用量上升可能是因为知识库难用、agent 反复问、答案不准或用户好奇。真正的价值事件应该是: eligible contact 中 AI 给出有引用的答案, agent 采纳或合理编辑, 客户问题一次解决, 7 天内无 reopen, 没有 wrong policy answer、PII leakage 或投诉上升, 并且 cost per resolved contact 在边界内。只有这样的事件才能进入 benefits realization。否则 API calls 和 tokens consumed 只是 activity, 甚至可能是成本和风险。
Q3: Flow Metrics、DORA 和 SPACE 怎么放在一起?
30 秒版本:
Flow Metrics 看价值流是否顺畅, DORA 看交付系统是否更快更稳, SPACE 看人和协作是否健康。AI 场景要再叠加 eval、risk gate、platform reuse 和 benefits realization。
2 分钟版本:
Flow Metrics 用于 portfolio 和 product value stream: flow time、velocity、efficiency、load、distribution、predictability。DORA 用于 AI SDLC: prompt、model、index、policy、tool schema 和 code 的 change lead time、deployment frequency、change fail rate、recovery 和 rework。SPACE 用于解释为什么流动变好或变差, 例如 reviewer load、SME fatigue、context switching、trust 和 collaboration。三者共同连接到 business value: 只有当交付更快更稳、团队没有被 review 和治理压垮、上线后 adoption 和 qualified value events 改善, 才能说 AI operating system 真的有效。
Q4: 高风险金融 AI 如何做 release 和 scale gate?
30 秒版本:
高风险 AI 不能把 pilot approval 当成 scale approval。我会分 intake、discovery、eval readiness、architecture、risk、release、adoption、benefits、scale gate, 每个 gate 都有证据和明确 go / limited go / no-go / rollback / stop decision。
2 分钟版本:
以 AML alert triage 为例, intake gate 要证明 business owner、process baseline 和 risk hypothesis; discovery gate 要有 workflow map、AI fit、data readiness 和 no-AI alternative; eval gate 要有 golden set、failure taxonomy、citation rubric; risk gate 要确认 AI 不自动关闭 alert、不做 SAR 决策、有人审阅、日志可追溯; release gate 要有 rollback、monitoring 和 evidence binder; adoption gate 要看 analyst acceptance、override、support load; benefits gate 要看 AHT、backlog、QA defect 和 finance treatment。scale 只有在 value、risk、review capacity、platform capacity 和 unit economics 同时成立时才放行。
Q5: 如何向 CFO 解释 AI VSM 的价值?
30 秒版本:
我会把 AI VSM 从流程管理翻译成投资组合效率: 更快发现无效 use case, 更少 late-stage rework, 更高 qualified value throughput, 更短 benefit recognition cycle, 更清楚的风险调整收益。
2 分钟版本:
CFO 不需要看 AI 调用量, 需要看资金和容量是否流向可兑现价值。我会展示 portfolio flow: 多少 idea 转成有效 pilot, 多少被及时停止, 哪些平台能力降低后续接入成本, 哪些上线用例产生 finance-recognized benefit。收益端看 capacity redeployment、loss avoided、SLA improvement、cost per value event; 成本端包括模型、平台、QA、training、monitoring、governance 和风险成本。这样 AI 预算不是一堆项目申请, 而是一套能持续学习、放大有效投资并停止低质量投资的 operating system。
Q6: 如何证明平台 runway 不只是技术成本?
30 秒版本:
平台 runway 的价值要用 flow 和 risk economics 证明: 它是否缩短 platform reuse lead time、eval design lead time、risk gate queue time、release recovery time, 并提高 evidence reuse 和 qualified value throughput。
2 分钟版本:
Model gateway、eval harness、RAG service、observability、policy-as-code、human review workflow 和 evidence binder 如果只按技术组件汇报, CFO 和业务很难感知价值。我会把它们映射到 value stream bottleneck: gateway 减少模型审批和供应商切换成本; eval harness 减少 release gate 重工; RAG service 减少重复索引和引用错误; evidence binder 降低审计重建成本; review workflow 解决 SME capacity bottleneck。平台投资的 KPI 不是接了多少模型, 而是让更多 AI use case 更快、更稳、更可审计地通过 release 和 benefits gate。
Q7: 如果 AI 用例上线了但 adoption 很低, 你怎么处理?
30 秒版本:
我不会立刻判定模型失败。我会看 release-to-adoption lead time、eligible exposure、workflow fit、trust、manager cadence、training、support load、override reason 和 incentive mismatch, 然后决定 redesign workflow、调整 change plan、限制范围或停止。
2 分钟版本:
Adoption 低通常有多种原因: AI 不在用户自然工作流里, 输出缺少 citation, 用户不信任, manager 没有把新流程纳入日常管理, 或使用 AI 增加了 QA 负担。我会把 adoption 当作 operations value stream 的一段来测: eligible users 是否暴露, 是否首次使用, 是否 repeat use, 输出是否被采纳或编辑, 采纳后 workflow outcome 是否改善。若 adoption 低但质量好, 可能需要产品和变更管理; 若 adoption 低且 override 指向质量问题, 回到 eval 和 RAG; 若 adoption 长期低且收益不成立, 应停止或缩小。
Q8: VSM 如何帮助你做 AI portfolio governance?
30 秒版本:
VSM 让 portfolio review 从状态汇报变成资金和容量决策。我们可以看到 WIP、stage aging、risk tier、blocked work、platform dependency、benefits status 和 stop signals, 然后决定 fund、scale、hold、stop、retire 或投资平台 runway。
2 分钟版本:
AI portfolio 最大问题通常不是 idea 少, 而是太多 idea 争夺产品、工程、数据、risk、SME 和平台容量。VSM 可以显示哪些 use case 卡在 data readiness, 哪些卡在 risk gate, 哪些 release 后 adoption 不动, 哪些已经有收益但 review capacity 限制 scale。这样季度 review 不再问 "每个项目进展如何", 而是问 "哪条价值流值得加速, 哪些阻塞需要平台投资, 哪些高风险 use case 应限制, 哪些低价值 work 应停止"。这就是 portfolio-to-platform-to-product 的治理闭环。
17. Portfolio Package
一套高级 AI VSM 作品集可以包含:
| Artifact | 内容 | 展示能力 |
|---|---|---|
| Executive one-pager | 问题、目标、核心 value stream、metric stack、governance principle | 高管沟通 |
| AI value stream map | current-state, target-state, AI intervention, bottleneck | 架构和流程分析 |
| Flow metrics dashboard | portfolio、product、platform、risk、adoption、benefits 分层 | 产品运营指标 |
| DORA / SPACE integration | AI SDLC delivery and team health metrics | 工程生产力 |
| Risk gate pack | intake, discovery, eval, release, adoption, benefits, scale gates | 可信 AI 治理 |
| Blocked work taxonomy | blocker categories, metric, owner, response | flow improvement |
| Portfolio-to-platform trace | theme -> capability -> use case -> platform -> value event -> benefit | 投资组合和架构连接 |
| Benefits realization loop | baseline, exposure, quality, effect, cost, finance sign-off | 价值证明 |
| Financial retail case | AML / customer service / credit / branch use case | 行业落地 |
| Scale / stop memo | evidence, decision, conditions, stop triggers | 成熟治理判断 |
| Interview answer bank | 6-8 个高阶问题答案 | 求职表达 |
5 分钟讲述结构:
0:00-0:40 问题定义
AI success cannot be measured by features shipped or model calls.
0:40-1:30 方法
I manage AI as value streams: idea, evidence, safe release, adoption, benefits and scale / stop.
1:30-2:30 Metrics
I combine Flow Metrics, DORA, SPACE, risk gates and qualified value events.
2:30-3:30 Financial retail case
Use AML or customer service to show delivery flow, operations flow, risk controls and benefits.
3:30-4:30 Portfolio and platform
Show how repeated blockers become platform runway and how funding decisions change.
4:30-5:00 Close
The goal is not more AI activity. The goal is faster, safer, reusable and finance-recognized AI value.
Final memory card:
AI VSM = portfolio flow + product flow + platform flow + risk flow + benefit flow.
Do not manage AI by feature count or model calls.
Manage the path from idea evidence to safe release,
from release to adoption,
from adoption to qualified value events,
from value events to benefits realization,
and from benefits to portfolio scale / stop decisions.