AI 扩展计划 / Playbooks

AI Adoption Analytics / Behavior Change / Value Realization Playbook

完成本 playbook 后, 你应该能:

955 行AI_ADOPTION_ANALYTICS_BEHAVIOR_CHANGE_VALUE_REALIZATION_PLAYBOOK.md

AI Adoption Analytics / Behavior Change / Value Realization Playbook

目标: 给 Senior AI PM / AI Architect / CBAP-level BA 一套可执行的方法, 用于证明 AI 在金融零售场景中被真实采用、改变工作方式、改善流程质量并产生可持续净价值。核心原则: AI adoption 不是培训完成率、登录数、prompt 数或 license 激活率; adoption 是有证据的工作行为改变和价值实现。适用场景: AML investigator adoption、contact-center agent-assist、KYC onboarding、credit ops、branch / relationship manager copilot、内部运营 copilots 和 AI workflow automation。

1. Target Audience

Audience	使用本手册完成什么
Senior AI PM	定义 adoption success criteria、behavior funnel、scale/stop gates 和 product improvement loop
AI Architect	设计 telemetry、trace、schema、identity、workflow outcome join 和 evidence store
CBAP-level BA	建立 work-as-done baseline、change impact map、resistance taxonomy 和 process outcome model
AI Value Office	将 adoption 证据纳入 portfolio value realization、funding gate 和 finance sign-off
Operations Leader	用 adoption evidence 管理经理 coaching、SOP 调整、队列负荷和服务质量
Risk / Control Partner	查看 over-reliance、control override、human review load、complaint 和 exception evidence

2. Learning Objectives

完成本 playbook 后, 你应该能:

把 AI usage metric 改造成 adoption event taxonomy。
用 work-as-done baseline 识别真实工作中的行为改变点。
设计 telemetry schema, 连接 user action、workflow context、model version、control action 和 outcome。
建立 leading / lagging / risk / value / durability 指标层级。
用 behavior funnel、cohort analysis 和 resistance signals 诊断 adoption。
用 ADKAR 思路建立行为改变 operating model, 但不把培训当 adoption。
计算 value leakage, 包括 human review load、rework、support、latency、control overhead 和 customer harm adjustment。
为 AI use case 建立 evidence pack 和 scale/stop decision memo。

3. Executive Summary

AI 项目上线后, 常见报告是:

licenses activated: 1,200
weekly active users: 870
prompts submitted: 42,000
average satisfaction: 4.2/5

这些数字不能证明 AI 创造了价值。它们只证明有人接触了工具。成熟的 adoption analytics 必须证明:

Eligible workflow population
  -> real exposure
  -> qualified task use
  -> trust-calibrated human action
  -> changed work artifact or decision
  -> improved process flow / quality / control
  -> realized net value
  -> reinforced behavior over time

本手册把 adoption analytics 拆成 12 个执行资产:

Asset	用途
Work-as-done baseline	捕捉真实流程和当前价值基线
Adoption event taxonomy	定义什么算真实 adoption
Telemetry schema	让 adoption 可测、可追溯、可审查
Metrics hierarchy	防止 usage 指标冒充业务价值
Behavior funnel	定位 adoption drop-off
Cohort analysis	识别角色、经理、case type 和风险等级差异
Resistance signal map	解释用户不用、误用或绕用的原因
Change saturation review	判断组织是否有容量吸收变化
Outcome attribution model	解释结果变化与 AI 的关系
Value leakage model	从 gross benefit 到 net realized value
Risk/control pack	监控 over-reliance、override、review load 和客户影响
Operating review loop	把证据变成产品、流程、控制和管理动作

4. Source Anchors

这些来源用于对齐治理、管理体系、变更管理、可观测性和工程绩效语言。本文不提供法律意见; 所有治理内容均作为产品、架构和管理证据设计。

Source	Link	Playbook 用法
NIST AI RMF	https://www.nist.gov/itl/ai-risk-management-framework	用 Govern / Map / Measure / Manage 组织 AI adoption risk and evidence
NIST AI RMF Playbook	https://airc.nist.gov/AI_RMF_Knowledge_Base/Playbook	用 action-oriented review pattern 设计 evidence and monitoring routine
ISO/IEC 42001	https://www.iso.org/standard/81230.html	用 AI management system 的 objective、operation、performance evaluation、improvement 语言
Prosci ADKAR	https://www.prosci.com/blog/adkar-model	用 Awareness、Desire、Knowledge、Ability、Reinforcement 诊断行为改变
OpenTelemetry	https://opentelemetry.io/docs/	用 traces、metrics、logs 思路设计 adoption observability
DORA	https://dora.dev/	用 engineering performance and reliability thinking 连接 release、learning and restore
FFIEC Management IT Handbook	https://ithandbook.ffiec.gov/it-booklets/management.aspx	对接金融机构治理、风险识别、监控和报告证据

5. Conceptual Model

5.1 Adoption-to-Value Chain

Problem and baseline
  -> AI intervention
  -> exposure
  -> qualified use
  -> human trust action
  -> behavior change
  -> process quality change
  -> business outcome
  -> net value
  -> reinforcement

5.2 Definitions

Term	Definition
Exposure	目标用户在真实工作步骤中有机会看到或调用 AI
Qualified use	用户在目标任务、目标 case type、目标流程阶段使用 AI
Trust-calibrated action	用户能正确接受、编辑、拒绝、升级或覆盖 AI 输出
Behavior change	工作顺序、工件、决策、handoff 或控制执行发生可观察变化
Workflow outcome	周期、质量、返工、队列、客户体验、风险控制等流程结果
Net realized value	扣除运行、复核、返工、支持、风险和变更成本后的收益
Durability	adoption 和 outcome 在 novelty effect 后仍然持续

5.3 The Senior Test

如果一个 AI use case 不能回答下面 6 个问题, 不应进入 scale:

当前 work-as-done baseline 是什么?
什么事件证明用户在目标流程中真实采用?
采用行为改变了哪个工件、判断、handoff 或控制?
哪些 leading 和 lagging indicators 证明流程改善?
human review load、override、rework 和 cost-to-serve 是否吞掉价值?
组织如何通过经理节奏、SOP、培训和产品改进强化新行为?

6. Architecture Components

Component	Owner	Execution details
Workflow map	BA / Ops	AS-IS, work-as-done, exception path, control point, artifact map
Event taxonomy	AI PM / BA	Define exposure, intent, output, response, influence, control, outcome events
Instrumentation SDK	Architect / Engineering	Emit events with workflow context, model version and user action
Identity and cohort layer	Architect / Analytics	Role, team, manager, training wave, region, risk entitlement
Model and prompt registry	Platform / Architect	model_id, prompt_version, tool version, policy pack
Outcome connector	Data / Analytics	Join events to AHT, cycle time, quality, rework, STP, complaint, loss
Control evidence store	Risk / Architect	Overrides, escalations, QA defects, dual review, policy boundary hits
Adoption mart	Analytics	Curated tables for funnel, cohort, attribution and value leakage
Dashboard and evidence pack	PM / Value Office	Monthly operating pack and scale/stop memo
Operational learning loop	PM / Ops	Product backlog, SOP update, coaching, training, control tuning

6.1 Reference Data Flow

AI surface
  -> adoption event stream
  -> workflow context resolver
  -> event validation and privacy filtering
  -> adoption mart
  -> outcome and control joins
  -> behavior funnel / cohort / value analytics
  -> operating review and action backlog

6.2 Architecture Principles

Principle	Design implication
Context first	Every event carries workflow_id, stage, role, case type and risk tier
Version everything	model_id, prompt_version, policy_pack_version, SOP_version and feature_flag
Capture human judgment	accept/edit/reject/ignore/regenerate/override/escalate are first-class
Do not over-collect payload	Store event facts and references, not unnecessary customer content
Link to outcomes	Adoption metrics without outcome join are not value evidence
Preserve negative evidence	Rejection, complaint, defect and bypass signals drive learning
Reviewability	Metric definitions, lineage and sample drilldown must be inspectable

7. Work-as-Done Baseline Template

Use this before instrumentation. Do not start with the AI tool; start with how work happens today.

Field	Questions	Example: KYC onboarding
Workflow	Which end-to-end process?	New SMB account onboarding
Trigger	What starts the work?	Application submitted with documents
Actor	Who does the work?	KYC analyst, RM, onboarding ops, QA
Case mix	What types and complexity?	Sole proprietor, LLC, high-risk country exposure
Systems	Which systems are used?	CRM, document store, screening, core banking
Artifacts	What records are created?	deficiency notice, review note, approval record
Controls	Which control points matter?	sanctions, beneficial ownership, risk rating
Pain points	Where is work slow or poor?	repeated customer document chase
Informal work	What unofficial workarounds exist?	analyst checklist spreadsheet
Current metrics	What is baseline?	cycle time, first-pass completeness, rework
Failure modes	What causes defects?	outdated policy, missing doc, unclear ownership
Change capacity	What else is changing?	new onboarding policy and CRM migration

7.1 Baseline Evidence Sources

Source	What it proves
SME observation	Actual sequence, friction and judgment
Process logs	Timing, queue, handoff and rework
Case notes	Artifact quality and evidence gaps
QA samples	Defect type and severity
Manager coaching logs	Behavioral patterns and recurring issues
Complaint records	Customer harm or confusion
Policy and SOP	Expected controls and business rules
Informal tools review	Workarounds not visible in system logs

8. Adoption Event Taxonomy

8.1 Event Classes

Class	Required events	Why it matters
Exposure	AI panel shown, suggestion presented, feature available in eligible case	Proves opportunity to use
Intent	User opens assistant, asks task-specific question, requests summary	Proves user pull
Output	Summary, classification, recommendation, draft, next action generated	Proves AI response existed
Human response	Accept, edit, reject, ignore, regenerate	Proves trust and fit
Decision influence	Used in note, customer response, disposition, package, handoff	Proves workflow impact
Control action	Override, escalation, dual review, policy boundary hit	Proves governed use
Learning signal	Feedback reason, defect report, manager comment	Proves improvement signal
Outcome	Case closed, call completed, application approved, package passed QA	Proves process link
Reinforcement	Manager coaching, SOP update, training wave, team review	Proves behavior support

8.2 Event Naming Convention

Use this pattern:

<workflow>.<stage>.<ai_surface>.<event_action>

Examples:

Event name	Meaning
`aml.triage.case_summary.generated`	AML summary produced during triage
`aml.investigation.narrative.accepted_with_edit`	Investigator used AI narrative with edits
`contact_center.customer_response.suggestion.rejected`	Agent rejected suggested response
`kyc.document_review.completeness_flag.overridden`	Analyst overrode AI document flag
`credit_ops.package_review.condition.extracted`	Credit condition extracted into review package
`branch.rm_prep.next_action.escalated`	RM escalated AI next action due to policy boundary

8.3 Qualified Adoption Event

Recommended definition:

A qualified adoption event occurs when an eligible user, in an eligible workflow stage and case type, uses an AI output to influence a governed work artifact, decision, handoff or customer interaction, with human action and control outcome captured.

9. Data / Telemetry Schema

9.1 Canonical Event Contract

Field	Required	Description
event_id	Yes	Unique event id
event_time	Yes	Event timestamp
event_name	Yes	Taxonomy event name
event_class	Yes	Exposure, intent, output, response, influence, control, learning, outcome, reinforcement
user_id_hash	Yes	Pseudonymous worker id
role	Yes	Agent, investigator, analyst, RM, manager, QA
team_id	Yes	Team, branch, region or operations unit
manager_id_hash	Recommended	Enables manager effect analysis
cohort_id	Yes	Pilot wave, training wave or feature flag cohort
workflow_id	Yes	AML, KYC, contact center, credit ops, branch RM
workflow_stage	Yes	Triage, document review, customer response, QA, decision
case_id_hash	Yes	Pseudonymous case id
case_type	Yes	Alert type, call reason, product, onboarding type
case_complexity	Recommended	Low, medium, high or scoring band
risk_tier	Yes	Business risk tier
ai_surface	Yes	Panel, inline suggestion, draft generator, policy search
model_id	Yes	Model registry id
prompt_version	Yes	Prompt or policy pack version
tool_ids	Recommended	Tools or connectors invoked
output_type	Yes	Summary, recommendation, draft, classification, next action
user_action	Yes	Accept, edit, reject, ignore, regenerate, override, escalate
edit_distance_band	Recommended	None, light, material, rewrite
reason_code	Recommended	Useful, inaccurate, incomplete, unsafe, policy unclear, slow, irrelevant
control_point_id	Recommended	Link to control or policy boundary
override_reason	Conditional	Required when override occurs
human_review_required	Yes	True or false
human_review_minutes	Recommended	Review load
downstream_artifact_id	Recommended	Note, letter, case record or decision package
outcome_event_id	Recommended	Link to process outcome
latency_ms	Recommended	Response latency
cost_estimate	Recommended	Unit cost estimate
privacy_class	Yes	Event-only, sensitive-reference, restricted
retention_class	Yes	Analytics, business-record-link, control-evidence

9.2 OpenTelemetry Mapping

Adoption concept	Observability mapping
Case journey	Trace
Workflow step	Span
AI call	Span with model and prompt attributes
User action	Event on span
Control override	Event with control attributes
Outcome	Linked span or downstream event
Aggregate adoption	Metric
Defect or complaint	Log/event with trace link

Example trace structure:

trace: kyc_onboarding_case_abc
  span: document_review
    event: ai_completeness_check.generated
    event: analyst_flag.accepted
  span: customer_deficiency_notice
    event: ai_notice_draft.edited
  span: qa_review
    event: qa_passed

10. Metrics Hierarchy

10.1 Metric Stack

Layer	Metrics	Owner
Telemetry quality	event completeness, missing context, join rate, schema drift	Architect / Analytics
Exposure	eligible users exposed, eligible case exposure, workflow placement coverage	PM / Analytics
Qualified adoption	qualified use rate, returning qualified use, case penetration	PM
Trust and behavior	accept/edit/reject mix, edit distance, override, escalation, artifact reuse	PM / BA
Flow and quality	cycle time, AHT, queue aging, first-pass quality, rework, QA defects	Ops
Risk and control	over-reliance, under-reliance, policy boundary hits, complaint linkage	Risk / QA
Value	net hours released, cost-to-serve, loss reduction, conversion, complaint reduction	Value Office / Finance
Durability	4/8/12-week retention, manager variance, post-release stability	PM / Ops

10.2 Leading and Lagging Indicators

Indicator type	Example	Decision use
Leading	eligible exposure, qualified use, light-edit acceptance, feedback density	Improve product and enablement
Behavior	artifact reuse, reduced manual search, correct escalation, SOP-aligned action	Confirm work is changing
Flow	AHT, cycle time, queue depth, first-pass quality, rework	Confirm process improvement
Risk	defect rate, control override, complaint linkage, review burden	Confirm governed adoption
Lagging	cost reduction, revenue lift, loss reduction, customer retention	Confirm value realization
Durability	cohort retention after 8-12 weeks, manager variance, version stability	Confirm scale readiness

10.3 Metric Guardrails

Metric	Must not be interpreted alone
High prompt count	Could mean confusion or poor output
High accept rate	Could mean automation bias
Low override rate	Could mean users do not understand controls
AHT reduction	Could hide repeat contact or QA rework
Time saved survey	Could ignore review load and support cost
High NPS from users	Could coexist with customer harm

11. Behavior Change Model

11.1 ADKAR-to-Evidence Map

ADKAR stage	Execution evidence	Analytics signal
Awareness	Managers communicate why workflow changes	Awareness pulse, team briefing completion
Desire	Users believe AI helps their work and does not punish them	opt-in demand, low resistance, champion pull
Knowledge	Users know when to use, avoid, escalate and override	correct reason codes, policy quiz, guidance views
Ability	Users perform the new workflow in real cases	qualified completion, light-edit acceptance, reduced rework
Reinforcement	Managers, SOP and metrics reinforce new behavior	returning use, coaching logs, SOP_version adoption

11.2 Resistance Signal Taxonomy

Signal	Diagnostic question	Response
Ignore	Is AI shown at the wrong time?	Move trigger closer to decision point
Reject	Is output inaccurate, irrelevant or untrusted?	Improve retrieval, prompt, source evidence
Regenerate	Is user trying to force a better answer?	Add structured task templates
Heavy edit	Is output format mismatched to artifact?	Redesign output contract
Override	Is user bypassing control or correcting AI?	Require reason and review patterns
Shadow AI	Is sanctioned tool missing a real need?	Bring unmet need into roadmap
Low returning use	Was initial experience poor or reinforcement absent?	Fix first-run quality and manager coaching
Team variance	Is adoption manager-led?	Add manager enablement and peer learning
Complaint rise	Is AI improving internal speed at customer expense?	Stop or restrict affected scenario

11.3 Change Saturation Review

Score each factor 1-5 before scale:

Factor	Evidence
Concurrent initiatives	Core migration, CRM change, policy update, org redesign
Queue pressure	Backlog, staffing shortage, overtime
Risk pressure	Active audit issue, recent incident, regulatory remediation
Manager capacity	Manager span, turnover, coaching time
Training load	Other mandatory training, certification, release fatigue
Process stability	SOP changes, exception volume, policy ambiguity

Decision rule:

High value + high change saturation = controlled rollout with manager capacity plan, not broad launch.

12. Behavior Funnel

12.1 Standard Funnel

Step	Measure	Drop-off interpretation
Eligible	Users and cases in target population	Scope and entitlement
Exposed	AI shown in target workflow	Integration and placement
Engaged	User opens or responds	Initial relevance and desire
Assisted	AI returns usable output	Model and retrieval quality
Influenced	User accepts, edits, uses in artifact or decision	Trust and workflow fit
Completed	Task completed in governed process	Downstream process compatibility
Improved	Flow, quality or control improves	Business value potential
Repeated	User returns in next eligible case	Durability
Reinforced	Manager or SOP supports behavior	Organizational adoption

12.2 Funnel Example: Contact-Center Agent Assist

Step	Metric
Eligible	Calls in target call reasons
Exposed	Percent of eligible calls where suggestion panel appears before response point
Engaged	Agent views or expands suggested response
Assisted	Suggestion returned with policy citation within latency SLO
Influenced	Agent accepts or edits suggestion into response
Completed	Call completes without transfer caused by AI confusion
Improved	AHT and after-call work decrease, FCR and QA do not decline
Repeated	Agent uses assist in next 5 eligible calls
Reinforced	Team manager reviews quality and coaching signals weekly

13. Cohort Analysis

13.1 Required Cohorts

Cohort	Why it matters
Role	Agent, investigator, analyst, RM and manager have different adoption paths
Experience	New users may need guidance; experts need precision and speed
Manager/team	Reinforcement is often manager-mediated
Region/branch	Local processes, customers and capacity differ
Case type	Adoption in easy cases does not prove hard-case adoption
Risk tier	High-risk adoption needs stronger control evidence
Training wave	Separates enablement from product quality
Feature flag	Supports staged rollout and attribution
Model/prompt version	Detects quality changes and trust debt

13.2 Cohort Dashboard Rows

Row	Columns
Team	eligible cases, exposed cases, qualified use, influence rate, quality, risk, value
Manager	returning use, resistance reasons, coaching completion, defect trend
Experience	time to first qualified use, edit mix, rework, confidence
Case type	penetration, accept/edit/reject, cycle time, QA defect, complaint
Risk tier	adoption, override, escalation, dual review, defect severity

14. Outcome Attribution

14.1 Evidence Options

Method	Use when	Required evidence
Matched cohort	Similar teams or cases exist	matching variables, baseline balance
Stepped rollout	Rollout can be phased	rollout schedule, no major confounder
Difference-in-differences	Pre/post and comparison group exist	trend check, external change log
Interrupted time series	Long stable history exists	seasonality, policy and staffing changes
Shadow mode	Need pre-production quality evidence	historical case set, expert judgment
Workflow replay	Need compare AI-assisted path	replay rules, representative samples

14.2 Attribution Checklist

Item	Question
Baseline period	Is baseline long enough and representative?
Eligibility	Who and what cases were included?
Exposure	Did eligible users actually see AI?
Confounders	Staffing, campaign, policy, queue, seasonality, system changes?
Quality guardrail	Did quality, complaint or risk degrade?
Cost adjustment	Are review load and AI run costs included?
Durability	Does effect persist after initial novelty?
Decision	Does evidence support scale, redesign, restrict or stop?

15. Value Realization and Leakage

15.1 Benefits Register

Field	Definition
use_case_id	Unique AI use case
workflow_id	Target workflow
baseline_metric	Current performance
target_metric	Expected improvement
benefit_type	Cost, revenue, risk reduction, quality, capacity, customer experience
attribution_method	Cohort, rollout, time series, expert-reviewed
gross_benefit	Benefit before cost and leakage
AI_run_cost	Model, infra, license, vendor
review_load_cost	Human review and QA time
rework_cost	Corrections, reopened cases, repeat contact
support_change_cost	Training, manager coaching, support tickets
risk_adjustment	Control remediation, complaint, incident
net_realized_value	Gross benefit minus costs and leakage
finance_status	Draft, challenged, accepted, rejected
scale_decision	Scale, continue pilot, redesign, restrict, stop

15.2 Value Leakage Patterns

Leakage	Detection	Action
Review load	Reviewer minutes per AI-assisted case rise	Improve confidence, sampling strategy, output quality
Rework	Reopen or correction rate rises	Fix artifact contract and source evidence
Support burden	Help desk or manager questions rise	Improve in-product guidance
Latency	User abandons or works around AI	Optimize model route or precompute
Control friction	Overrides and false positives rise	Tune policy boundary and escalation
Customer harm	Complaint or repeat contact rises	Restrict scenario and review content
Cost creep	Unit cost grows with scale	Cost guardrails and routing
Trust debt	Adoption drops after incident	Recovery plan and quality evidence

15.3 Net Value Formula

net_realized_value
= gross_benefit
- AI_run_cost
- human_review_load_cost
- rework_cost
- support_and_change_cost
- risk_and_control_cost
- customer_harm_adjustment

16. Risk / Control Framework

16.1 Risk Signals

Risk	Signal	Control
Over-reliance	High accept, low edit, rising defects	sampling QA, rationale, high-risk friction
Under-reliance	High reject despite good quality	trust evidence, workflow placement, coaching
Control bypass	Override without reason	mandatory reason, manager review
Hidden review burden	Review queue grows	end-to-end capacity dashboard
Policy boundary drift	AI answers outside allowed domain	policy engine, refusal, escalation
Customer harm	Complaint, repeat contact, correction	scenario restriction, content review
Uneven access	Low exposure in certain branches	entitlement and training remediation
Model version trust decay	Adoption drops after release	version rollback and communication

16.2 Control Override Classification

Classification	Meaning	Review action
Corrective override	User corrected AI error	Feed defect into model/product backlog
Risk override	User bypassed control	Manager/risk review
Policy ambiguity	User could not determine boundary	Clarify SOP and policy evidence
Workflow mismatch	AI suggestion did not fit actual process	Redesign output or trigger
Emergency override	Used due to service or customer urgency	Review exception governance

16.3 Human Review Load

Human-in-the-loop is not free. Track:

Metric	Why
review_minutes_per_case	Measures hidden labor
reviewer_queue_depth	Detects backlog transfer
review_defect_yield	Shows whether review finds real issues
review_sampling_rate	Controls auditability and cost
review_escalation_rate	Shows uncertainty and boundary issues
review_reversal_rate	Shows AI or user judgment quality

17. Operating Model

17.1 Review Cadence

Cadence	Forum	Decision
Daily	Ops pulse	Blockers, latency, incidents, support questions
Weekly	Adoption working session	Funnel drop-off, resistance, product fixes
Biweekly	Risk/control review	Overrides, defects, complaints, review load
Monthly	Value realization review	Benefit, leakage, finance challenge, scale/stop
Quarterly	Architecture and portfolio review	Platform reuse, telemetry maturity, lifecycle

17.2 RACI

Activity	AI PM	BA	Architect	Ops	Risk	Analytics	Finance
Define adoption taxonomy	A/R	R	C	C	C	C	I
Build work-as-done baseline	C	A/R	I	R	C	C	I
Implement telemetry	C	C	A/R	I	C	R	I
Validate data quality	C	C	R	I	C	A/R	I
Run behavior funnel review	A/R	R	I	R	C	R	I
Manage resistance actions	A/R	R	I	R	I	C	I
Review risk/control	C	C	C	R	A/R	C	I
Calculate value	R	C	I	C	C	C	A/R
Decide scale/stop	A/R	C	C	C	C	C	C

17.3 Operational Learning Loop

Step	Output
Observe	Adoption telemetry, quality, review and outcome evidence
Diagnose	Funnel drop-off, cohort variance, resistance signal
Decide	Product, workflow, control, training or manager action
Change	Release, SOP update, coaching, control tuning
Reinforce	Team cadence and performance conversation
Measure	Same metrics after change
Record	Decision log and evidence pack update

18. Evidence Pack

18.1 Evidence Pack Structure

Section	Content
Executive summary	Adoption, behavior, risk, value, decision
Problem and baseline	Work-as-done, pain points, baseline metrics
Intervention	AI capability, workflow integration, model/prompt version
Event taxonomy	Qualified adoption definition and events
Telemetry quality	Completeness, join rate, known limitations
Behavior funnel	Step conversion and drop-off
Cohort analysis	Role, manager, team, case type, risk tier
Outcome attribution	Method, baseline, confounders, confidence
Value realization	Gross benefit, leakage, net value
Risk/control	Over-reliance, override, review load, defects, complaints
User trust	Reason codes, qualitative themes, sentiment
Operating actions	Backlog, SOP, training, manager coaching
Decision	Scale, continue pilot, redesign, restrict or stop

18.2 Evidence Quality Rubric

Level	Meaning
Weak	Usage-only, no baseline, no outcome join
Developing	Baseline and adoption funnel exist, limited cohort analysis
Strong	Cohort, outcome, risk, review load and value leakage included
Executive-ready	Finance-challenged value, risk-reviewed controls, clear scale/stop action

19. Execution Roadmap

Days 1-15: Baseline and Taxonomy

Day range	Work
1-3	Select one high-value workflow and define business owner
4-6	Build work-as-done baseline from observation, logs, QA and SME review
7-9	Define adoption event taxonomy and qualified adoption event
10-12	Define metrics hierarchy and risk/control signals
13-15	Review with ops, risk, architecture and finance

Days 16-35: Instrumentation and First Dashboard

Day range	Work
16-20	Implement event contract with workflow context and model version
21-24	Connect identity, cohort, feature flag and training wave
25-28	Join process outcome and QA/control data
29-32	Build behavior funnel and cohort dashboard
33-35	Validate telemetry completeness and metric definitions

Days 36-60: Pilot Evidence

Day range	Work
36-42	Run pilot with manager reinforcement and support path
43-48	Analyze resistance signals, edit/reject/override reasons
49-53	Calculate review load, rework, cost and early value leakage
54-57	Run risk/control review
58-60	Produce pilot evidence pack and decision recommendation

Days 61-90: Scale, Redesign or Stop

Day range	Work
61-70	Expand only if risk and value evidence meet gate
71-78	Add attribution method and durability monitoring
79-84	Update SOP, training, manager coaching and product backlog
85-88	Finance challenge net value and unit economics
89-90	Finalize scale/stop memo and portfolio recommendation

20. Financial Retail Examples

20.1 AML Investigator Copilot

Area	Execution
Qualified adoption	Investigation summary used in narrative or evidence review for eligible alert
Leading indicators	Summary request rate, source citation view, light-edit narrative acceptance
Lagging indicators	Alert aging, QA correction, SAR prep quality, re-open rate
Risk controls	High-risk alert dual review, source verification, override reason
Value leakage	Senior reviewer time, false comfort, rework due to incomplete narrative
Scale rule	Scale only if aging improves and QA/control defects do not rise

20.2 Contact-Center Agent Assist

Area	Execution
Qualified adoption	Suggested response used in target call reason with policy citation
Leading indicators	Suggestion view, accept-with-edit, latency under SLO
Lagging indicators	AHT, FCR, repeat contact, QA score, complaint
Risk controls	Sensitive topic handoff, script boundary, supervisor sampling
Value leakage	After-call correction, customer repeat contact, QA review burden
Scale rule	Scale by call reason, not by whole contact center

20.3 KYC Onboarding Assistant

Area	Execution
Qualified adoption	AI document completeness flag used before customer chase
Leading indicators	Completeness check rate, deficiency draft edit rate
Lagging indicators	First-pass completeness, cycle time, customer chase count
Risk controls	High-risk entity review, sanctions and beneficial ownership controls
Value leakage	False deficiency notices, analyst re-review, customer frustration
Scale rule	Scale only where false deficiency rate and remediation do not rise

20.4 Credit Ops Reviewer

Area	Execution
Qualified adoption	AI extraction used in package review, not final credit judgment
Leading indicators	Extracted condition reviewed, collateral summary edited
Lagging indicators	First-pass package quality, approval rework, condition miss rate
Risk controls	Human decision owner, exception rationale, QA sampling
Value leakage	Analyst over-review, downstream correction, risk review escalation
Scale rule	Scale after package quality improves across product cohorts

20.5 Branch / Relationship Manager Copilot

Area	Execution
Qualified adoption	RM uses permitted insight to prepare client follow-up
Leading indicators	Meeting prep use, next-action capture, follow-up completion
Lagging indicators	Retention, qualified referral, customer satisfaction, complaint
Risk controls	Advice boundary, disclosure, approved product language
Value leakage	Compliance review, unsuitable suggestion correction, trust damage
Scale rule	Scale by relationship segment and product boundary

21. Templates

21.1 Adoption Event Card

Field	Fill with concrete value
Event name	`workflow.stage.surface.action`
Event class	Exposure / intent / output / response / influence / control / learning / outcome
Workflow	Named workflow
Stage	Exact process step
Eligible users	Roles and cohorts
Eligible cases	Case types and risk tiers
Human action	Accept, edit, reject, ignore, regenerate, override, escalate
Business artifact	Note, decision, response, package, handoff
Control point	Policy or control id
Outcome link	Downstream result
Misinterpretation risk	How this event could be over-read

21.2 Behavior Funnel Dashboard

Section	Metrics
Population	eligible users, eligible cases, cohort mix
Exposure	surface shown, workflow availability
Engagement	open, view, ask, response
Influence	accept, edit, reject, artifact use
Completion	task completed, handoff completed
Improvement	cycle, quality, rework, customer result
Risk	override, escalation, defect, complaint
Durability	4/8/12-week repeat use

21.3 Monthly Operating Review Agenda

Agenda item	Decision
Telemetry quality	Can we trust the data?
Funnel drop-off	What is the biggest adoption bottleneck?
Cohort variance	Which manager/team/case type needs action?
Resistance signals	Product, workflow, trust, control or incentive issue?
Risk/control	Any over-reliance, override or complaint trend?
Value leakage	Is review load or rework consuming benefit?
Product backlog	What changes ship next?
Ops and manager actions	What coaching, SOP or process changes happen?
Scale/stop	Continue, scale, redesign, restrict or stop?

21.4 Scale / Stop Memo

# AI Adoption Scale / Stop Memo

## Decision requested
Scale / continue pilot / redesign / restrict / stop.

## Workflow and target population
Named workflow, users, case types, risk tiers and rollout cohort.

## Baseline
Work-as-done summary and baseline metrics.

## Adoption evidence
Qualified adoption, behavior funnel, cohort findings and durability.

## Outcome evidence
Flow, quality, customer and risk/control outcomes.

## Value evidence
Gross benefit, cost, human review load, rework, support, risk adjustment and net realized value.

## Risk/control evidence
Override, escalation, defects, complaints, over-reliance and under-reliance.

## Recommendation
Decision, rationale, constraints and next review date.

22. Anti-Patterns

Anti-pattern	Consequence	Replacement
Reporting MAU as adoption	Hides whether work changed	Qualified adoption event
Counting prompts as value	Rewards friction	Outcome-linked behavior metrics
Treating training as adoption	Ignores real workflow	Work-as-done and behavior funnel
Celebrating high accept rate	Encourages automation bias	Accept/edit/reject with quality and defects
Ignoring rejection reasons	Misses product and trust issues	Structured reason codes
Using averages only	Hides manager and case mix effects	Cohort analysis
Not measuring review load	Overstates benefit	Human review load and value leakage
No control override taxonomy	Confuses healthy challenge with bypass	Override classification
No change saturation view	Overloads teams	Rollout capacity review
Dashboard without action	Creates reporting theater	Operating learning loop and decision log

23. Interview Answers

Q1: 你如何设计 AI adoption analytics?

我会先建立 work-as-done baseline, 明确目标流程、目标用户、case type、当前周期、质量、返工、控制点和非正式绕行。然后定义 adoption event taxonomy, 把 exposure、qualified use、human action、decision influence、control override 和 outcome link 分开。架构上用 telemetry schema 连接 user、workflow、model version、prompt version、case risk、artifact 和 outcome。指标上分 leading、behavior、flow、risk、value、durability。最后通过 behavior funnel、cohort analysis、value leakage 和 monthly operating review 把证据转化成 scale、redesign、restrict 或 stop 决策。

Q2: 为什么不能用 prompt count 证明 AI 价值?

Prompt count 只能说明交互次数, 不能说明 AI 是否在正确任务中被采用, 也不能说明是否改善流程。高 prompt count 可能代表用户反复修错、输出不稳定、政策不清或系统摩擦。真正的价值需要连接 qualified adoption、work artifact influence、cycle time、quality、risk/control、human review load 和 net realized value。AI adoption 的重点是行为和结果, 不是聊天量。

Q3: 如何判断低 adoption 是用户抵抗还是产品问题?

我会看 behavior funnel 和 resistance signals。若 exposure 低, 可能是 workflow placement 或 entitlement 问题。若 engagement 低, 可能是价值不清或入口干扰。若 reject、regenerate、heavy edit 高, 多半是输出质量、格式或信任问题。若某些团队 adoption 高而其他低, 可能是 manager reinforcement 或 change saturation。还要看 reason codes、访谈、QA 和 support tickets。不要先归因给用户, 要把产品、流程、控制、激励和组织容量一起诊断。

Q4: 如何证明 AI 带来的收益是可归因的?

我会优先设计分批 rollout、matched cohort、difference-in-differences 或 interrupted time series, 根据业务和风险约束选择。报告中明确 baseline period、eligible population、exposure、case mix、manager/team、model version 和同时发生的政策、人员、系统变化。收益还要扣除 AI run cost、human review load、rework、support 和 risk adjustment。最后给出 business confidence, 不把相关性包装成绝对因果。

Q5: AI adoption 为什么要看 control override?

Control override 是判断 AI 是否被正确信任的关键。用户覆盖 AI 可能是健康的专业判断, 也可能是绕过控制, 还可能说明 AI 输出不适合真实流程。没有 override reason 和 review, 管理层无法区分这些情况。对金融零售, override 还直接关系到客户影响、审计证据、QA 和风险边界。因此我会把 override 作为 first-class event, 连接 role、case risk、workflow stage、reason、defect 和 manager review。

Q6: 如何向业务负责人解释 value leakage?

我会说 AI 的 gross saving 不是最终收益。比如 agent assist 可能每通电话节省 30 秒, 但如果 repeat contact 上升、QA 复核增加、投诉增加或 after-call correction 增加, 净价值会下降。Value leakage 就是这些被转移或新增的成本。成熟的 value realization 必须把 review load、rework、support、latency、模型成本、控制成本和客户影响纳入 net realized value。

Q7: AML investigator copilot 的 scale gate 怎么设?

我会按 alert type 和 risk tier 分阶段 scale。Gate 包括 qualified case penetration、narrative quality、source verification、QA defect、review load、override reason、alert aging 和 high-risk escalation appropriateness。只有当 aging 或 prep time 改善, QA/control defects 不上升, human review load 可控, 且 investigator 在 8-12 周仍持续使用, 才扩大到更复杂 alert cohort。

24. Portfolio Exercise

24.1 Assignment

为一家金融零售企业建立 AI Adoption Analytics Portfolio Pack。选择 3 个用例:

Use case	Suggested focus
AML investigator copilot	Risk, quality and review load
Contact-center agent assist	AHT, FCR, QA and customer impact
KYC onboarding assistant	Cycle time, rework and document completeness
Credit ops reviewer	First-pass package quality and decision support boundary
Branch / RM copilot	Trust, compliance boundary and relationship action quality

24.2 Deliverables

Deliverable	Required content
Portfolio adoption map	Use cases, users, workflows, risk tiers, value hypotheses
Work-as-done baseline	One detailed baseline and two lighter baselines
Event taxonomy	At least 15 events across exposure, response, influence, control and outcome
Telemetry schema	Required fields and privacy/retention class
Metrics hierarchy	Leading, behavior, flow, risk, value and durability
Behavior funnel	Funnel and drop-off diagnosis per use case
Cohort plan	Role, team, manager, case type, risk tier, training wave
Value leakage model	Human review load and net realized value formula
Risk/control pack	Override, over-reliance, complaint, QA and escalation
Operating model	Review cadence, RACI and decision log
Scale/stop memo	Recommendation for each use case

24.3 Scoring Rubric

Criterion	Strong answer
Senior framing	Treats adoption as behavior and operating model change
BA rigor	Captures work-as-done, exceptions, controls and artifacts
Architecture rigor	Provides event schema, traceability and outcome joins
Product rigor	Defines funnel, cohorts, resistance signals and backlog actions
Risk rigor	Includes override, review load, over-reliance and customer impact
Value rigor	Calculates net realized value and value leakage
Executive readiness	Produces scale/stop decisions with evidence

25. Quality Bar

An AI adoption analytics pack is ready for senior review only if:

Qualified adoption is defined separately from exposure and usage.
Work-as-done baseline includes real exceptions and informal work.
Telemetry joins workflow context, model version, human action, control and outcome.
Metrics include leading, lagging, risk, value and durability.
Cohort analysis can explain manager, role, case type and risk-tier differences.
Value realization subtracts human review load, rework, support and AI run cost.
Risk/control evidence includes override, escalation, defect and complaint.
Operating review produces actions, not just reporting.
Scale/stop recommendation is explicit and evidence-backed.

Final principle:

Do not ask whether users used AI.
Ask whether AI changed governed work in a way that improved durable net outcomes.