返回 Papers
AI 扩展计划 / Playbooks

AI Adoption Analytics / Behavior Change / Value Realization Playbook

完成本 playbook 后, 你应该能:

955AI_ADOPTION_ANALYTICS_BEHAVIOR_CHANGE_VALUE_REALIZATION_PLAYBOOK.md

AI Adoption Analytics / Behavior Change / Value Realization Playbook

目标: 给 Senior AI PM / AI Architect / CBAP-level BA 一套可执行的方法, 用于证明 AI 在金融零售场景中被真实采用、改变工作方式、改善流程质量并产生可持续净价值。 核心原则: AI adoption 不是培训完成率、登录数、prompt 数或 license 激活率; adoption 是有证据的工作行为改变和价值实现。 适用场景: AML investigator adoption、contact-center agent-assist、KYC onboarding、credit ops、branch / relationship manager copilot、内部运营 copilots 和 AI workflow automation。


1. Target Audience

Audience使用本手册完成什么
Senior AI PM定义 adoption success criteria、behavior funnel、scale/stop gates 和 product improvement loop
AI Architect设计 telemetry、trace、schema、identity、workflow outcome join 和 evidence store
CBAP-level BA建立 work-as-done baseline、change impact map、resistance taxonomy 和 process outcome model
AI Value Office将 adoption 证据纳入 portfolio value realization、funding gate 和 finance sign-off
Operations Leader用 adoption evidence 管理经理 coaching、SOP 调整、队列负荷和服务质量
Risk / Control Partner查看 over-reliance、control override、human review load、complaint 和 exception evidence

2. Learning Objectives

完成本 playbook 后, 你应该能:

  1. 把 AI usage metric 改造成 adoption event taxonomy。
  2. 用 work-as-done baseline 识别真实工作中的行为改变点。
  3. 设计 telemetry schema, 连接 user action、workflow context、model version、control action 和 outcome。
  4. 建立 leading / lagging / risk / value / durability 指标层级。
  5. 用 behavior funnel、cohort analysis 和 resistance signals 诊断 adoption。
  6. 用 ADKAR 思路建立行为改变 operating model, 但不把培训当 adoption。
  7. 计算 value leakage, 包括 human review load、rework、support、latency、control overhead 和 customer harm adjustment。
  8. 为 AI use case 建立 evidence pack 和 scale/stop decision memo。

3. Executive Summary

AI 项目上线后, 常见报告是:

licenses activated: 1,200
weekly active users: 870
prompts submitted: 42,000
average satisfaction: 4.2/5

这些数字不能证明 AI 创造了价值。它们只证明有人接触了工具。成熟的 adoption analytics 必须证明:

Eligible workflow population
  -> real exposure
  -> qualified task use
  -> trust-calibrated human action
  -> changed work artifact or decision
  -> improved process flow / quality / control
  -> realized net value
  -> reinforced behavior over time

本手册把 adoption analytics 拆成 12 个执行资产:

Asset用途
Work-as-done baseline捕捉真实流程和当前价值基线
Adoption event taxonomy定义什么算真实 adoption
Telemetry schema让 adoption 可测、可追溯、可审查
Metrics hierarchy防止 usage 指标冒充业务价值
Behavior funnel定位 adoption drop-off
Cohort analysis识别角色、经理、case type 和风险等级差异
Resistance signal map解释用户不用、误用或绕用的原因
Change saturation review判断组织是否有容量吸收变化
Outcome attribution model解释结果变化与 AI 的关系
Value leakage model从 gross benefit 到 net realized value
Risk/control pack监控 over-reliance、override、review load 和客户影响
Operating review loop把证据变成产品、流程、控制和管理动作

4. Source Anchors

这些来源用于对齐治理、管理体系、变更管理、可观测性和工程绩效语言。本文不提供法律意见; 所有治理内容均作为产品、架构和管理证据设计。

SourceLinkPlaybook 用法
NIST AI RMFhttps://www.nist.gov/itl/ai-risk-management-framework用 Govern / Map / Measure / Manage 组织 AI adoption risk and evidence
NIST AI RMF Playbookhttps://airc.nist.gov/AI_RMF_Knowledge_Base/Playbook用 action-oriented review pattern 设计 evidence and monitoring routine
ISO/IEC 42001https://www.iso.org/standard/81230.html用 AI management system 的 objective、operation、performance evaluation、improvement 语言
Prosci ADKARhttps://www.prosci.com/blog/adkar-model用 Awareness、Desire、Knowledge、Ability、Reinforcement 诊断行为改变
OpenTelemetryhttps://opentelemetry.io/docs/用 traces、metrics、logs 思路设计 adoption observability
DORAhttps://dora.dev/用 engineering performance and reliability thinking 连接 release、learning and restore
FFIEC Management IT Handbookhttps://ithandbook.ffiec.gov/it-booklets/management.aspx对接金融机构治理、风险识别、监控和报告证据

5. Conceptual Model

5.1 Adoption-to-Value Chain

Problem and baseline
  -> AI intervention
  -> exposure
  -> qualified use
  -> human trust action
  -> behavior change
  -> process quality change
  -> business outcome
  -> net value
  -> reinforcement

5.2 Definitions

TermDefinition
Exposure目标用户在真实工作步骤中有机会看到或调用 AI
Qualified use用户在目标任务、目标 case type、目标流程阶段使用 AI
Trust-calibrated action用户能正确接受、编辑、拒绝、升级或覆盖 AI 输出
Behavior change工作顺序、工件、决策、handoff 或控制执行发生可观察变化
Workflow outcome周期、质量、返工、队列、客户体验、风险控制等流程结果
Net realized value扣除运行、复核、返工、支持、风险和变更成本后的收益
Durabilityadoption 和 outcome 在 novelty effect 后仍然持续

5.3 The Senior Test

如果一个 AI use case 不能回答下面 6 个问题, 不应进入 scale:

  1. 当前 work-as-done baseline 是什么?
  2. 什么事件证明用户在目标流程中真实采用?
  3. 采用行为改变了哪个工件、判断、handoff 或控制?
  4. 哪些 leading 和 lagging indicators 证明流程改善?
  5. human review load、override、rework 和 cost-to-serve 是否吞掉价值?
  6. 组织如何通过经理节奏、SOP、培训和产品改进强化新行为?

6. Architecture Components

ComponentOwnerExecution details
Workflow mapBA / OpsAS-IS, work-as-done, exception path, control point, artifact map
Event taxonomyAI PM / BADefine exposure, intent, output, response, influence, control, outcome events
Instrumentation SDKArchitect / EngineeringEmit events with workflow context, model version and user action
Identity and cohort layerArchitect / AnalyticsRole, team, manager, training wave, region, risk entitlement
Model and prompt registryPlatform / Architectmodel_id, prompt_version, tool version, policy pack
Outcome connectorData / AnalyticsJoin events to AHT, cycle time, quality, rework, STP, complaint, loss
Control evidence storeRisk / ArchitectOverrides, escalations, QA defects, dual review, policy boundary hits
Adoption martAnalyticsCurated tables for funnel, cohort, attribution and value leakage
Dashboard and evidence packPM / Value OfficeMonthly operating pack and scale/stop memo
Operational learning loopPM / OpsProduct backlog, SOP update, coaching, training, control tuning

6.1 Reference Data Flow

AI surface
  -> adoption event stream
  -> workflow context resolver
  -> event validation and privacy filtering
  -> adoption mart
  -> outcome and control joins
  -> behavior funnel / cohort / value analytics
  -> operating review and action backlog

6.2 Architecture Principles

PrincipleDesign implication
Context firstEvery event carries workflow_id, stage, role, case type and risk tier
Version everythingmodel_id, prompt_version, policy_pack_version, SOP_version and feature_flag
Capture human judgmentaccept/edit/reject/ignore/regenerate/override/escalate are first-class
Do not over-collect payloadStore event facts and references, not unnecessary customer content
Link to outcomesAdoption metrics without outcome join are not value evidence
Preserve negative evidenceRejection, complaint, defect and bypass signals drive learning
ReviewabilityMetric definitions, lineage and sample drilldown must be inspectable

7. Work-as-Done Baseline Template

Use this before instrumentation. Do not start with the AI tool; start with how work happens today.

FieldQuestionsExample: KYC onboarding
WorkflowWhich end-to-end process?New SMB account onboarding
TriggerWhat starts the work?Application submitted with documents
ActorWho does the work?KYC analyst, RM, onboarding ops, QA
Case mixWhat types and complexity?Sole proprietor, LLC, high-risk country exposure
SystemsWhich systems are used?CRM, document store, screening, core banking
ArtifactsWhat records are created?deficiency notice, review note, approval record
ControlsWhich control points matter?sanctions, beneficial ownership, risk rating
Pain pointsWhere is work slow or poor?repeated customer document chase
Informal workWhat unofficial workarounds exist?analyst checklist spreadsheet
Current metricsWhat is baseline?cycle time, first-pass completeness, rework
Failure modesWhat causes defects?outdated policy, missing doc, unclear ownership
Change capacityWhat else is changing?new onboarding policy and CRM migration

7.1 Baseline Evidence Sources

SourceWhat it proves
SME observationActual sequence, friction and judgment
Process logsTiming, queue, handoff and rework
Case notesArtifact quality and evidence gaps
QA samplesDefect type and severity
Manager coaching logsBehavioral patterns and recurring issues
Complaint recordsCustomer harm or confusion
Policy and SOPExpected controls and business rules
Informal tools reviewWorkarounds not visible in system logs

8. Adoption Event Taxonomy

8.1 Event Classes

ClassRequired eventsWhy it matters
ExposureAI panel shown, suggestion presented, feature available in eligible caseProves opportunity to use
IntentUser opens assistant, asks task-specific question, requests summaryProves user pull
OutputSummary, classification, recommendation, draft, next action generatedProves AI response existed
Human responseAccept, edit, reject, ignore, regenerateProves trust and fit
Decision influenceUsed in note, customer response, disposition, package, handoffProves workflow impact
Control actionOverride, escalation, dual review, policy boundary hitProves governed use
Learning signalFeedback reason, defect report, manager commentProves improvement signal
OutcomeCase closed, call completed, application approved, package passed QAProves process link
ReinforcementManager coaching, SOP update, training wave, team reviewProves behavior support

8.2 Event Naming Convention

Use this pattern:

<workflow>.<stage>.<ai_surface>.<event_action>

Examples:

Event nameMeaning
aml.triage.case_summary.generatedAML summary produced during triage
aml.investigation.narrative.accepted_with_editInvestigator used AI narrative with edits
contact_center.customer_response.suggestion.rejectedAgent rejected suggested response
kyc.document_review.completeness_flag.overriddenAnalyst overrode AI document flag
credit_ops.package_review.condition.extractedCredit condition extracted into review package
branch.rm_prep.next_action.escalatedRM escalated AI next action due to policy boundary

8.3 Qualified Adoption Event

Recommended definition:

A qualified adoption event occurs when an eligible user, in an eligible workflow stage and case type, uses an AI output to influence a governed work artifact, decision, handoff or customer interaction, with human action and control outcome captured.

9. Data / Telemetry Schema

9.1 Canonical Event Contract

FieldRequiredDescription
event_idYesUnique event id
event_timeYesEvent timestamp
event_nameYesTaxonomy event name
event_classYesExposure, intent, output, response, influence, control, learning, outcome, reinforcement
user_id_hashYesPseudonymous worker id
roleYesAgent, investigator, analyst, RM, manager, QA
team_idYesTeam, branch, region or operations unit
manager_id_hashRecommendedEnables manager effect analysis
cohort_idYesPilot wave, training wave or feature flag cohort
workflow_idYesAML, KYC, contact center, credit ops, branch RM
workflow_stageYesTriage, document review, customer response, QA, decision
case_id_hashYesPseudonymous case id
case_typeYesAlert type, call reason, product, onboarding type
case_complexityRecommendedLow, medium, high or scoring band
risk_tierYesBusiness risk tier
ai_surfaceYesPanel, inline suggestion, draft generator, policy search
model_idYesModel registry id
prompt_versionYesPrompt or policy pack version
tool_idsRecommendedTools or connectors invoked
output_typeYesSummary, recommendation, draft, classification, next action
user_actionYesAccept, edit, reject, ignore, regenerate, override, escalate
edit_distance_bandRecommendedNone, light, material, rewrite
reason_codeRecommendedUseful, inaccurate, incomplete, unsafe, policy unclear, slow, irrelevant
control_point_idRecommendedLink to control or policy boundary
override_reasonConditionalRequired when override occurs
human_review_requiredYesTrue or false
human_review_minutesRecommendedReview load
downstream_artifact_idRecommendedNote, letter, case record or decision package
outcome_event_idRecommendedLink to process outcome
latency_msRecommendedResponse latency
cost_estimateRecommendedUnit cost estimate
privacy_classYesEvent-only, sensitive-reference, restricted
retention_classYesAnalytics, business-record-link, control-evidence

9.2 OpenTelemetry Mapping

Adoption conceptObservability mapping
Case journeyTrace
Workflow stepSpan
AI callSpan with model and prompt attributes
User actionEvent on span
Control overrideEvent with control attributes
OutcomeLinked span or downstream event
Aggregate adoptionMetric
Defect or complaintLog/event with trace link

Example trace structure:

trace: kyc_onboarding_case_abc
  span: document_review
    event: ai_completeness_check.generated
    event: analyst_flag.accepted
  span: customer_deficiency_notice
    event: ai_notice_draft.edited
  span: qa_review
    event: qa_passed

10. Metrics Hierarchy

10.1 Metric Stack

LayerMetricsOwner
Telemetry qualityevent completeness, missing context, join rate, schema driftArchitect / Analytics
Exposureeligible users exposed, eligible case exposure, workflow placement coveragePM / Analytics
Qualified adoptionqualified use rate, returning qualified use, case penetrationPM
Trust and behavioraccept/edit/reject mix, edit distance, override, escalation, artifact reusePM / BA
Flow and qualitycycle time, AHT, queue aging, first-pass quality, rework, QA defectsOps
Risk and controlover-reliance, under-reliance, policy boundary hits, complaint linkageRisk / QA
Valuenet hours released, cost-to-serve, loss reduction, conversion, complaint reductionValue Office / Finance
Durability4/8/12-week retention, manager variance, post-release stabilityPM / Ops

10.2 Leading and Lagging Indicators

Indicator typeExampleDecision use
Leadingeligible exposure, qualified use, light-edit acceptance, feedback densityImprove product and enablement
Behaviorartifact reuse, reduced manual search, correct escalation, SOP-aligned actionConfirm work is changing
FlowAHT, cycle time, queue depth, first-pass quality, reworkConfirm process improvement
Riskdefect rate, control override, complaint linkage, review burdenConfirm governed adoption
Laggingcost reduction, revenue lift, loss reduction, customer retentionConfirm value realization
Durabilitycohort retention after 8-12 weeks, manager variance, version stabilityConfirm scale readiness

10.3 Metric Guardrails

MetricMust not be interpreted alone
High prompt countCould mean confusion or poor output
High accept rateCould mean automation bias
Low override rateCould mean users do not understand controls
AHT reductionCould hide repeat contact or QA rework
Time saved surveyCould ignore review load and support cost
High NPS from usersCould coexist with customer harm

11. Behavior Change Model

11.1 ADKAR-to-Evidence Map

ADKAR stageExecution evidenceAnalytics signal
AwarenessManagers communicate why workflow changesAwareness pulse, team briefing completion
DesireUsers believe AI helps their work and does not punish themopt-in demand, low resistance, champion pull
KnowledgeUsers know when to use, avoid, escalate and overridecorrect reason codes, policy quiz, guidance views
AbilityUsers perform the new workflow in real casesqualified completion, light-edit acceptance, reduced rework
ReinforcementManagers, SOP and metrics reinforce new behaviorreturning use, coaching logs, SOP_version adoption

11.2 Resistance Signal Taxonomy

SignalDiagnostic questionResponse
IgnoreIs AI shown at the wrong time?Move trigger closer to decision point
RejectIs output inaccurate, irrelevant or untrusted?Improve retrieval, prompt, source evidence
RegenerateIs user trying to force a better answer?Add structured task templates
Heavy editIs output format mismatched to artifact?Redesign output contract
OverrideIs user bypassing control or correcting AI?Require reason and review patterns
Shadow AIIs sanctioned tool missing a real need?Bring unmet need into roadmap
Low returning useWas initial experience poor or reinforcement absent?Fix first-run quality and manager coaching
Team varianceIs adoption manager-led?Add manager enablement and peer learning
Complaint riseIs AI improving internal speed at customer expense?Stop or restrict affected scenario

11.3 Change Saturation Review

Score each factor 1-5 before scale:

FactorEvidence
Concurrent initiativesCore migration, CRM change, policy update, org redesign
Queue pressureBacklog, staffing shortage, overtime
Risk pressureActive audit issue, recent incident, regulatory remediation
Manager capacityManager span, turnover, coaching time
Training loadOther mandatory training, certification, release fatigue
Process stabilitySOP changes, exception volume, policy ambiguity

Decision rule:

High value + high change saturation = controlled rollout with manager capacity plan, not broad launch.

12. Behavior Funnel

12.1 Standard Funnel

StepMeasureDrop-off interpretation
EligibleUsers and cases in target populationScope and entitlement
ExposedAI shown in target workflowIntegration and placement
EngagedUser opens or respondsInitial relevance and desire
AssistedAI returns usable outputModel and retrieval quality
InfluencedUser accepts, edits, uses in artifact or decisionTrust and workflow fit
CompletedTask completed in governed processDownstream process compatibility
ImprovedFlow, quality or control improvesBusiness value potential
RepeatedUser returns in next eligible caseDurability
ReinforcedManager or SOP supports behaviorOrganizational adoption

12.2 Funnel Example: Contact-Center Agent Assist

StepMetric
EligibleCalls in target call reasons
ExposedPercent of eligible calls where suggestion panel appears before response point
EngagedAgent views or expands suggested response
AssistedSuggestion returned with policy citation within latency SLO
InfluencedAgent accepts or edits suggestion into response
CompletedCall completes without transfer caused by AI confusion
ImprovedAHT and after-call work decrease, FCR and QA do not decline
RepeatedAgent uses assist in next 5 eligible calls
ReinforcedTeam manager reviews quality and coaching signals weekly

13. Cohort Analysis

13.1 Required Cohorts

CohortWhy it matters
RoleAgent, investigator, analyst, RM and manager have different adoption paths
ExperienceNew users may need guidance; experts need precision and speed
Manager/teamReinforcement is often manager-mediated
Region/branchLocal processes, customers and capacity differ
Case typeAdoption in easy cases does not prove hard-case adoption
Risk tierHigh-risk adoption needs stronger control evidence
Training waveSeparates enablement from product quality
Feature flagSupports staged rollout and attribution
Model/prompt versionDetects quality changes and trust debt

13.2 Cohort Dashboard Rows

RowColumns
Teameligible cases, exposed cases, qualified use, influence rate, quality, risk, value
Managerreturning use, resistance reasons, coaching completion, defect trend
Experiencetime to first qualified use, edit mix, rework, confidence
Case typepenetration, accept/edit/reject, cycle time, QA defect, complaint
Risk tieradoption, override, escalation, dual review, defect severity

14. Outcome Attribution

14.1 Evidence Options

MethodUse whenRequired evidence
Matched cohortSimilar teams or cases existmatching variables, baseline balance
Stepped rolloutRollout can be phasedrollout schedule, no major confounder
Difference-in-differencesPre/post and comparison group existtrend check, external change log
Interrupted time seriesLong stable history existsseasonality, policy and staffing changes
Shadow modeNeed pre-production quality evidencehistorical case set, expert judgment
Workflow replayNeed compare AI-assisted pathreplay rules, representative samples

14.2 Attribution Checklist

ItemQuestion
Baseline periodIs baseline long enough and representative?
EligibilityWho and what cases were included?
ExposureDid eligible users actually see AI?
ConfoundersStaffing, campaign, policy, queue, seasonality, system changes?
Quality guardrailDid quality, complaint or risk degrade?
Cost adjustmentAre review load and AI run costs included?
DurabilityDoes effect persist after initial novelty?
DecisionDoes evidence support scale, redesign, restrict or stop?

15. Value Realization and Leakage

15.1 Benefits Register

FieldDefinition
use_case_idUnique AI use case
workflow_idTarget workflow
baseline_metricCurrent performance
target_metricExpected improvement
benefit_typeCost, revenue, risk reduction, quality, capacity, customer experience
attribution_methodCohort, rollout, time series, expert-reviewed
gross_benefitBenefit before cost and leakage
AI_run_costModel, infra, license, vendor
review_load_costHuman review and QA time
rework_costCorrections, reopened cases, repeat contact
support_change_costTraining, manager coaching, support tickets
risk_adjustmentControl remediation, complaint, incident
net_realized_valueGross benefit minus costs and leakage
finance_statusDraft, challenged, accepted, rejected
scale_decisionScale, continue pilot, redesign, restrict, stop

15.2 Value Leakage Patterns

LeakageDetectionAction
Review loadReviewer minutes per AI-assisted case riseImprove confidence, sampling strategy, output quality
ReworkReopen or correction rate risesFix artifact contract and source evidence
Support burdenHelp desk or manager questions riseImprove in-product guidance
LatencyUser abandons or works around AIOptimize model route or precompute
Control frictionOverrides and false positives riseTune policy boundary and escalation
Customer harmComplaint or repeat contact risesRestrict scenario and review content
Cost creepUnit cost grows with scaleCost guardrails and routing
Trust debtAdoption drops after incidentRecovery plan and quality evidence

15.3 Net Value Formula

net_realized_value
= gross_benefit
- AI_run_cost
- human_review_load_cost
- rework_cost
- support_and_change_cost
- risk_and_control_cost
- customer_harm_adjustment

16. Risk / Control Framework

16.1 Risk Signals

RiskSignalControl
Over-relianceHigh accept, low edit, rising defectssampling QA, rationale, high-risk friction
Under-relianceHigh reject despite good qualitytrust evidence, workflow placement, coaching
Control bypassOverride without reasonmandatory reason, manager review
Hidden review burdenReview queue growsend-to-end capacity dashboard
Policy boundary driftAI answers outside allowed domainpolicy engine, refusal, escalation
Customer harmComplaint, repeat contact, correctionscenario restriction, content review
Uneven accessLow exposure in certain branchesentitlement and training remediation
Model version trust decayAdoption drops after releaseversion rollback and communication

16.2 Control Override Classification

ClassificationMeaningReview action
Corrective overrideUser corrected AI errorFeed defect into model/product backlog
Risk overrideUser bypassed controlManager/risk review
Policy ambiguityUser could not determine boundaryClarify SOP and policy evidence
Workflow mismatchAI suggestion did not fit actual processRedesign output or trigger
Emergency overrideUsed due to service or customer urgencyReview exception governance

16.3 Human Review Load

Human-in-the-loop is not free. Track:

MetricWhy
review_minutes_per_caseMeasures hidden labor
reviewer_queue_depthDetects backlog transfer
review_defect_yieldShows whether review finds real issues
review_sampling_rateControls auditability and cost
review_escalation_rateShows uncertainty and boundary issues
review_reversal_rateShows AI or user judgment quality

17. Operating Model

17.1 Review Cadence

CadenceForumDecision
DailyOps pulseBlockers, latency, incidents, support questions
WeeklyAdoption working sessionFunnel drop-off, resistance, product fixes
BiweeklyRisk/control reviewOverrides, defects, complaints, review load
MonthlyValue realization reviewBenefit, leakage, finance challenge, scale/stop
QuarterlyArchitecture and portfolio reviewPlatform reuse, telemetry maturity, lifecycle

17.2 RACI

ActivityAI PMBAArchitectOpsRiskAnalyticsFinance
Define adoption taxonomyA/RRCCCCI
Build work-as-done baselineCA/RIRCCI
Implement telemetryCCA/RICRI
Validate data qualityCCRICA/RI
Run behavior funnel reviewA/RRIRCRI
Manage resistance actionsA/RRIRICI
Review risk/controlCCCRA/RCI
Calculate valueRCICCCA/R
Decide scale/stopA/RCCCCCC

17.3 Operational Learning Loop

StepOutput
ObserveAdoption telemetry, quality, review and outcome evidence
DiagnoseFunnel drop-off, cohort variance, resistance signal
DecideProduct, workflow, control, training or manager action
ChangeRelease, SOP update, coaching, control tuning
ReinforceTeam cadence and performance conversation
MeasureSame metrics after change
RecordDecision log and evidence pack update

18. Evidence Pack

18.1 Evidence Pack Structure

SectionContent
Executive summaryAdoption, behavior, risk, value, decision
Problem and baselineWork-as-done, pain points, baseline metrics
InterventionAI capability, workflow integration, model/prompt version
Event taxonomyQualified adoption definition and events
Telemetry qualityCompleteness, join rate, known limitations
Behavior funnelStep conversion and drop-off
Cohort analysisRole, manager, team, case type, risk tier
Outcome attributionMethod, baseline, confounders, confidence
Value realizationGross benefit, leakage, net value
Risk/controlOver-reliance, override, review load, defects, complaints
User trustReason codes, qualitative themes, sentiment
Operating actionsBacklog, SOP, training, manager coaching
DecisionScale, continue pilot, redesign, restrict or stop

18.2 Evidence Quality Rubric

LevelMeaning
WeakUsage-only, no baseline, no outcome join
DevelopingBaseline and adoption funnel exist, limited cohort analysis
StrongCohort, outcome, risk, review load and value leakage included
Executive-readyFinance-challenged value, risk-reviewed controls, clear scale/stop action

19. Execution Roadmap

Days 1-15: Baseline and Taxonomy

Day rangeWork
1-3Select one high-value workflow and define business owner
4-6Build work-as-done baseline from observation, logs, QA and SME review
7-9Define adoption event taxonomy and qualified adoption event
10-12Define metrics hierarchy and risk/control signals
13-15Review with ops, risk, architecture and finance

Days 16-35: Instrumentation and First Dashboard

Day rangeWork
16-20Implement event contract with workflow context and model version
21-24Connect identity, cohort, feature flag and training wave
25-28Join process outcome and QA/control data
29-32Build behavior funnel and cohort dashboard
33-35Validate telemetry completeness and metric definitions

Days 36-60: Pilot Evidence

Day rangeWork
36-42Run pilot with manager reinforcement and support path
43-48Analyze resistance signals, edit/reject/override reasons
49-53Calculate review load, rework, cost and early value leakage
54-57Run risk/control review
58-60Produce pilot evidence pack and decision recommendation

Days 61-90: Scale, Redesign or Stop

Day rangeWork
61-70Expand only if risk and value evidence meet gate
71-78Add attribution method and durability monitoring
79-84Update SOP, training, manager coaching and product backlog
85-88Finance challenge net value and unit economics
89-90Finalize scale/stop memo and portfolio recommendation

20. Financial Retail Examples

20.1 AML Investigator Copilot

AreaExecution
Qualified adoptionInvestigation summary used in narrative or evidence review for eligible alert
Leading indicatorsSummary request rate, source citation view, light-edit narrative acceptance
Lagging indicatorsAlert aging, QA correction, SAR prep quality, re-open rate
Risk controlsHigh-risk alert dual review, source verification, override reason
Value leakageSenior reviewer time, false comfort, rework due to incomplete narrative
Scale ruleScale only if aging improves and QA/control defects do not rise

20.2 Contact-Center Agent Assist

AreaExecution
Qualified adoptionSuggested response used in target call reason with policy citation
Leading indicatorsSuggestion view, accept-with-edit, latency under SLO
Lagging indicatorsAHT, FCR, repeat contact, QA score, complaint
Risk controlsSensitive topic handoff, script boundary, supervisor sampling
Value leakageAfter-call correction, customer repeat contact, QA review burden
Scale ruleScale by call reason, not by whole contact center

20.3 KYC Onboarding Assistant

AreaExecution
Qualified adoptionAI document completeness flag used before customer chase
Leading indicatorsCompleteness check rate, deficiency draft edit rate
Lagging indicatorsFirst-pass completeness, cycle time, customer chase count
Risk controlsHigh-risk entity review, sanctions and beneficial ownership controls
Value leakageFalse deficiency notices, analyst re-review, customer frustration
Scale ruleScale only where false deficiency rate and remediation do not rise

20.4 Credit Ops Reviewer

AreaExecution
Qualified adoptionAI extraction used in package review, not final credit judgment
Leading indicatorsExtracted condition reviewed, collateral summary edited
Lagging indicatorsFirst-pass package quality, approval rework, condition miss rate
Risk controlsHuman decision owner, exception rationale, QA sampling
Value leakageAnalyst over-review, downstream correction, risk review escalation
Scale ruleScale after package quality improves across product cohorts

20.5 Branch / Relationship Manager Copilot

AreaExecution
Qualified adoptionRM uses permitted insight to prepare client follow-up
Leading indicatorsMeeting prep use, next-action capture, follow-up completion
Lagging indicatorsRetention, qualified referral, customer satisfaction, complaint
Risk controlsAdvice boundary, disclosure, approved product language
Value leakageCompliance review, unsuitable suggestion correction, trust damage
Scale ruleScale by relationship segment and product boundary

21. Templates

21.1 Adoption Event Card

FieldFill with concrete value
Event nameworkflow.stage.surface.action
Event classExposure / intent / output / response / influence / control / learning / outcome
WorkflowNamed workflow
StageExact process step
Eligible usersRoles and cohorts
Eligible casesCase types and risk tiers
Human actionAccept, edit, reject, ignore, regenerate, override, escalate
Business artifactNote, decision, response, package, handoff
Control pointPolicy or control id
Outcome linkDownstream result
Misinterpretation riskHow this event could be over-read

21.2 Behavior Funnel Dashboard

SectionMetrics
Populationeligible users, eligible cases, cohort mix
Exposuresurface shown, workflow availability
Engagementopen, view, ask, response
Influenceaccept, edit, reject, artifact use
Completiontask completed, handoff completed
Improvementcycle, quality, rework, customer result
Riskoverride, escalation, defect, complaint
Durability4/8/12-week repeat use

21.3 Monthly Operating Review Agenda

Agenda itemDecision
Telemetry qualityCan we trust the data?
Funnel drop-offWhat is the biggest adoption bottleneck?
Cohort varianceWhich manager/team/case type needs action?
Resistance signalsProduct, workflow, trust, control or incentive issue?
Risk/controlAny over-reliance, override or complaint trend?
Value leakageIs review load or rework consuming benefit?
Product backlogWhat changes ship next?
Ops and manager actionsWhat coaching, SOP or process changes happen?
Scale/stopContinue, scale, redesign, restrict or stop?

21.4 Scale / Stop Memo

# AI Adoption Scale / Stop Memo

## Decision requested
Scale / continue pilot / redesign / restrict / stop.

## Workflow and target population
Named workflow, users, case types, risk tiers and rollout cohort.

## Baseline
Work-as-done summary and baseline metrics.

## Adoption evidence
Qualified adoption, behavior funnel, cohort findings and durability.

## Outcome evidence
Flow, quality, customer and risk/control outcomes.

## Value evidence
Gross benefit, cost, human review load, rework, support, risk adjustment and net realized value.

## Risk/control evidence
Override, escalation, defects, complaints, over-reliance and under-reliance.

## Recommendation
Decision, rationale, constraints and next review date.

22. Anti-Patterns

Anti-patternConsequenceReplacement
Reporting MAU as adoptionHides whether work changedQualified adoption event
Counting prompts as valueRewards frictionOutcome-linked behavior metrics
Treating training as adoptionIgnores real workflowWork-as-done and behavior funnel
Celebrating high accept rateEncourages automation biasAccept/edit/reject with quality and defects
Ignoring rejection reasonsMisses product and trust issuesStructured reason codes
Using averages onlyHides manager and case mix effectsCohort analysis
Not measuring review loadOverstates benefitHuman review load and value leakage
No control override taxonomyConfuses healthy challenge with bypassOverride classification
No change saturation viewOverloads teamsRollout capacity review
Dashboard without actionCreates reporting theaterOperating learning loop and decision log

23. Interview Answers

Q1: 你如何设计 AI adoption analytics?

我会先建立 work-as-done baseline, 明确目标流程、目标用户、case type、当前周期、质量、返工、控制点和非正式绕行。然后定义 adoption event taxonomy, 把 exposure、qualified use、human action、decision influence、control override 和 outcome link 分开。架构上用 telemetry schema 连接 user、workflow、model version、prompt version、case risk、artifact 和 outcome。指标上分 leading、behavior、flow、risk、value、durability。最后通过 behavior funnel、cohort analysis、value leakage 和 monthly operating review 把证据转化成 scale、redesign、restrict 或 stop 决策。

Q2: 为什么不能用 prompt count 证明 AI 价值?

Prompt count 只能说明交互次数, 不能说明 AI 是否在正确任务中被采用, 也不能说明是否改善流程。高 prompt count 可能代表用户反复修错、输出不稳定、政策不清或系统摩擦。真正的价值需要连接 qualified adoption、work artifact influence、cycle time、quality、risk/control、human review load 和 net realized value。AI adoption 的重点是行为和结果, 不是聊天量。

Q3: 如何判断低 adoption 是用户抵抗还是产品问题?

我会看 behavior funnel 和 resistance signals。若 exposure 低, 可能是 workflow placement 或 entitlement 问题。若 engagement 低, 可能是价值不清或入口干扰。若 reject、regenerate、heavy edit 高, 多半是输出质量、格式或信任问题。若某些团队 adoption 高而其他低, 可能是 manager reinforcement 或 change saturation。还要看 reason codes、访谈、QA 和 support tickets。不要先归因给用户, 要把产品、流程、控制、激励和组织容量一起诊断。

Q4: 如何证明 AI 带来的收益是可归因的?

我会优先设计分批 rollout、matched cohort、difference-in-differences 或 interrupted time series, 根据业务和风险约束选择。报告中明确 baseline period、eligible population、exposure、case mix、manager/team、model version 和同时发生的政策、人员、系统变化。收益还要扣除 AI run cost、human review load、rework、support 和 risk adjustment。最后给出 business confidence, 不把相关性包装成绝对因果。

Q5: AI adoption 为什么要看 control override?

Control override 是判断 AI 是否被正确信任的关键。用户覆盖 AI 可能是健康的专业判断, 也可能是绕过控制, 还可能说明 AI 输出不适合真实流程。没有 override reason 和 review, 管理层无法区分这些情况。对金融零售, override 还直接关系到客户影响、审计证据、QA 和风险边界。因此我会把 override 作为 first-class event, 连接 role、case risk、workflow stage、reason、defect 和 manager review。

Q6: 如何向业务负责人解释 value leakage?

我会说 AI 的 gross saving 不是最终收益。比如 agent assist 可能每通电话节省 30 秒, 但如果 repeat contact 上升、QA 复核增加、投诉增加或 after-call correction 增加, 净价值会下降。Value leakage 就是这些被转移或新增的成本。成熟的 value realization 必须把 review load、rework、support、latency、模型成本、控制成本和客户影响纳入 net realized value。

Q7: AML investigator copilot 的 scale gate 怎么设?

我会按 alert type 和 risk tier 分阶段 scale。Gate 包括 qualified case penetration、narrative quality、source verification、QA defect、review load、override reason、alert aging 和 high-risk escalation appropriateness。只有当 aging 或 prep time 改善, QA/control defects 不上升, human review load 可控, 且 investigator 在 8-12 周仍持续使用, 才扩大到更复杂 alert cohort。


24. Portfolio Exercise

24.1 Assignment

为一家金融零售企业建立 AI Adoption Analytics Portfolio Pack。选择 3 个用例:

Use caseSuggested focus
AML investigator copilotRisk, quality and review load
Contact-center agent assistAHT, FCR, QA and customer impact
KYC onboarding assistantCycle time, rework and document completeness
Credit ops reviewerFirst-pass package quality and decision support boundary
Branch / RM copilotTrust, compliance boundary and relationship action quality

24.2 Deliverables

DeliverableRequired content
Portfolio adoption mapUse cases, users, workflows, risk tiers, value hypotheses
Work-as-done baselineOne detailed baseline and two lighter baselines
Event taxonomyAt least 15 events across exposure, response, influence, control and outcome
Telemetry schemaRequired fields and privacy/retention class
Metrics hierarchyLeading, behavior, flow, risk, value and durability
Behavior funnelFunnel and drop-off diagnosis per use case
Cohort planRole, team, manager, case type, risk tier, training wave
Value leakage modelHuman review load and net realized value formula
Risk/control packOverride, over-reliance, complaint, QA and escalation
Operating modelReview cadence, RACI and decision log
Scale/stop memoRecommendation for each use case

24.3 Scoring Rubric

CriterionStrong answer
Senior framingTreats adoption as behavior and operating model change
BA rigorCaptures work-as-done, exceptions, controls and artifacts
Architecture rigorProvides event schema, traceability and outcome joins
Product rigorDefines funnel, cohorts, resistance signals and backlog actions
Risk rigorIncludes override, review load, over-reliance and customer impact
Value rigorCalculates net realized value and value leakage
Executive readinessProduces scale/stop decisions with evidence

25. Quality Bar

An AI adoption analytics pack is ready for senior review only if:

  • Qualified adoption is defined separately from exposure and usage.
  • Work-as-done baseline includes real exceptions and informal work.
  • Telemetry joins workflow context, model version, human action, control and outcome.
  • Metrics include leading, lagging, risk, value and durability.
  • Cohort analysis can explain manager, role, case type and risk-tier differences.
  • Value realization subtracts human review load, rework, support and AI run cost.
  • Risk/control evidence includes override, escalation, defect and complaint.
  • Operating review produces actions, not just reporting.
  • Scale/stop recommendation is explicit and evidence-backed.

Final principle:

Do not ask whether users used AI.
Ask whether AI changed governed work in a way that improved durable net outcomes.