返回 Papers
AI 扩展计划 / Playbooks

AI Product Operations / Operating Cadence / Outcome Review Playbook

完成本 playbook 后, 你应该能:

1,120AI_PRODUCT_OPERATIONS_OPERATING_CADENCE_OUTCOME_REVIEW_PLAYBOOK.md

AI Product Operations / Operating Cadence / Outcome Review Playbook

目标: 给 Senior AI PM / AI Architect / CBAP-level BA 一套可执行方法, 用于建立 AI 产品上线后的运营节奏、证据审查、指标治理、事故学习、成本容量管理和 roadmap 决策闭环。 适用场景: contact-center agent assist、complaint intelligence、KYC onboarding、collections hardship、AML triage、personalized pricing governance、internal copilots 和 regulated workflow automation。 核心原则: AI Product Ops 不是会议制度, 而是把 outcome、risk、adoption、quality、cost、release、incident 和 action closure 放进同一套 post-launch operating system。


1. Target Audience

Audience使用本手册完成什么
Senior AI PM运行 weekly ops review、monthly value review 和 roadmap decision loop
AI Architect设计 release calendar、runtime telemetry、SLO、version trace 和 evidence dashboard
CBAP-level BA建立 metric contract、assumption ledger、action closure、complaint/incident learning
Product Operations Lead维护 operating calendar、review pack、decision log 和 forum hygiene
Operations Leader管理 adoption、capacity、review queue、coaching 和 frontline feedback
Risk / Compliance Partner把 post-launch monitoring 连接到 policy drift、customer harm、control effectiveness
Data / Analytics Partner维护 outcome join、metric definitions、segmentation 和 evidence quality
Finance / Value Office评估 net realized value、unit economics、portfolio allocation 和 retirement

2. Learning Objectives

完成本 playbook 后, 你应该能:

  1. 为已上线 AI product 建立 90 天 post-launch operating cadence。
  2. 定义 weekly ops review、monthly value review、quarterly portfolio review 的 agenda、inputs、outputs。
  3. 写出 metric contract, 覆盖 outcome、adoption、quality、risk、cost、reliability 和 learning。
  4. 设计 evidence review pack, 支撑 scale、restrict、redesign、retire、release、rollback 决策。
  5. 运行 model / prompt / data / knowledge / tool / policy release calendar。
  6. 建立 experiment registry、assumption ledger、decision log 和 action closure register。
  7. 把 incident、complaint、near miss、policy drift 和 capacity issue 转化为 backlog item。
  8. 为金融零售案例设计 RACI、dashboard、anti-patterns、interview answers 和 portfolio exercise。

3. Executive Summary

上线后的 AI 产品会持续变化:

model behavior changes
prompt changes
knowledge changes
tool permissions change
policy changes
workflow behavior changes
cost profile changes
customer and regulator expectations change

因此 AI Product Ops 的核心任务是建立证据节奏:

Telemetry and samples
  -> metric contract
  -> evidence review pack
  -> weekly / monthly / quarterly decisions
  -> backlog and release calendar
  -> action closure
  -> roadmap and portfolio update

一个成熟的 post-launch operating system 至少包含 12 个资产:

Asset用途
Operating calendar固化 weekly, monthly, quarterly, release, experiment, incident review
Metric contract定义指标口径、owner、threshold、guardrail、action rule
Evidence review pack把 outcome、adoption、quality、risk、cost、incident 放到决策页
Weekly ops agenda处理 adoption、quality、SLO、capacity、incident、open action
Monthly value review评估 net value、value leakage、risk trend、scale/stop
Quarterly portfolio review比较 use cases、funding、risk concentration、platform leverage
Release calendar管理 model、prompt、data、knowledge、tool、policy、workflow 变更
Experiment registry管理假设、cohort、metric、guardrail、结果和决策
Decision log记录关键决策和证据依据
Assumption ledger追踪价值、风险、采用、成本假设是否仍成立
Incident-to-roadmap loop把 complaint、near miss、failure 变成 roadmap learning
Action closure register防止会议行动没有证据闭环

4. Source Anchors

这些来源用于对齐治理、管理体系、需求工程、工程绩效、可观测性和可靠性语言。本文不提供法律、监管、审计或认证意见。

SourceLinkPlaybook 用法
NIST AI RMFhttps://www.nist.gov/itl/ai-risk-management-framework用 Govern / Map / Measure / Manage 组织 post-launch risk monitoring and action closure
ISO/IEC 42001https://www.iso.org/standard/81230.html用 AI management system 的 objectives、operation、performance evaluation、improvement 组织 operating cadence
ISO/IEC/IEEE 29148https://www.iso.org/standard/72089.html用 requirements traceability 和 verification 思路定义 metric contract、assumption ledger 和 decision log
DORAhttps://dora.dev/用 delivery performance 和 reliability 思路管理 release cadence、change failure and restore
OpenTelemetryhttps://opentelemetry.io/docs/用 traces、metrics、logs 思路设计 AI product runtime evidence
Google SRE SLOhttps://sre.google/sre-book/service-level-objectives/用 SLO / threshold / error budget 思路定义 AI workflow reliability

5. Operating Setup

5.1 Entry Criteria

Use this playbook when an AI capability has at least one of the following:

  • controlled pilot with real users。
  • production release for a target workflow。
  • monthly executive value review requirement。
  • regulated customer, operations, risk or cost exposure。
  • repeat changes to model, prompt, data, knowledge, tool or policy。
  • complaints, incidents, near misses or adoption concerns。

5.2 Required Inputs

InputMinimum quality
Product scopenamed workflow, users, case types, customer impact
Baselinepre-AI outcome, quality, cost, risk, capacity
Initial release packagemodel/prompt/data/tool versions, eval results, rollback path
Risk tiercustomer harm, compliance, autonomy, reversibility
Telemetryuser action, workflow context, version, outcome join
Owner mapPM, architect, ops, risk, analytics, support, finance
Existing incidentscomplaint, defect, override, QA, support, reliability history

5.3 First 10 Days

DayActionOutput
1Confirm scope and use-case ownerproduct ops charter
2List post-launch decisions neededdecision inventory
3Draft metric contract for top 10 metricsmetric contract v1
4Build evidence review pack outlinereview pack v1
5Create operating calendarcadence calendar
6Create release calendarrelease calendar v1
7Create experiment registryexperiment registry v1
8Create incident-to-roadmap taxonomyincident learning map
9Create dashboard wireframeweekly / monthly / portfolio dashboard
10Run first weekly ops review simulationaction closure register

6. Cadence Architecture

6.1 Cadence Map

CadenceOwnerAttendeesPrimary decision
Daily signal triageOps / PlatformPM, Architect, Ops, Support, Risk by severitymitigate, rollback, escalate
Weekly ops reviewAI PMPM, BA, Architect, Ops, Analytics, Risk, Platformfix, assign, close, adjust release
Monthly value reviewAI PM / SponsorSponsor, PM, Finance, Risk, Ops, Analyticsscale, restrict, redesign, retire
Quarterly portfolio reviewValue OfficeExecutives, Portfolio, Platform, Risk, Financefund, pause, consolidate, retire
Release reviewArchitect / Release LeadPM, Risk, Platform, Data, Opsapprove, canary, hold, rollback
Incident reviewOps / RiskPM, Architect, Risk, Ops, Support, Dataroot cause, corrective action, recurrence prevention
Experiment reviewPM / AnalyticsPM, BA, Data Science, Ops, Riskcontinue, scale, stop, redesign

6.2 Cadence Design Rules

  1. Every recurring forum has a decision type.
  2. Every review pack starts with decision requested.
  3. Every metric in executive review has a contract.
  4. Every action has owner, due date, closure evidence and reopen trigger.
  5. Every model / prompt / data / tool / policy change appears on the release calendar.
  6. Every incident creates or updates at least one backlog, metric, eval, process or policy item.
  7. Every monthly value review updates the assumption ledger.
  8. Every quarterly review can decide to retire, not only expand.

7. Weekly Ops Review

7.1 Purpose

Weekly ops review answers:

What changed in production this week?
What signals require action?
Which actions closed with evidence?
Which release or backlog decision is needed before next review?

7.2 Agenda

TimeBlockInputsOutputs
5 minDecision requestsproposed releases, incidents, stuck actionsagenda priority
10 minAdoption and workflowqualified use, cohort funnel, rejection reasonsadoption action
10 minQuality and evalQA sample, eval delta, defect classesfix / sample / hold
10 minReliability and SLOlatency, availability, fallback, restoreplatform action
10 minRisk / complaintscomplaints, overrides, policy breach, near missrisk action / escalation
10 minCost / capacitycost per case, review queue, support loadcapacity / cost action
10 minRelease and experimentsversion changes, experiment readoutrelease / experiment decision
10 minAction closurelast actions, closure evidence, blockersclose / reopen / escalate

7.3 Weekly Review Questions

DomainQuestions
AdoptionAre target users using AI in the intended workflow step? Which cohorts are dropping off?
QualityWhich failure classes changed? Are defects tied to a release version?
ReliabilityAre SLOs met for the workflow moment where AI is needed?
RiskDid any complaint, override or QA sample indicate customer harm or policy drift?
CostIs unit cost aligned to value? Is human review load growing faster than usage?
ReleaseDid any model/prompt/data/tool/policy change create outcome movement or regression?
ActionWhich actions truly closed, and what evidence proves closure?

7.4 Weekly Outputs

Use this format for each action:

| action_id | source_signal | action | owner | due_date | closure_evidence | reopen_trigger | forum |
|---|---|---|---|---|---|---|---|
| ACT-001 | contact-center repeat contact up 8% in billing calls | sample 50 edited suggestions and update billing knowledge article | Ops QA Lead | 2026-07-07 | QA sample shows defect class reduced below warning threshold | repeat contact remains above threshold for two weekly reviews | Weekly Ops |

No action should be accepted without closure evidence.


8. Monthly Value Review

8.1 Purpose

Monthly value review answers:

Is this AI capability improving business outcomes after adoption, risk, cost, capacity and incident learning are included?

8.2 Required Pack

SectionRequired evidence
Decision requestedcontinue, scale, restrict, redesign, retire
Business outcomebaseline, current, trend, cohort, confidence
Adoption durabilityreturning qualified use, manager reinforcement, work-as-done evidence
Quality / controleval trend, QA defects, control overrides, policy breaches
Risk / customer impactcomplaints, harm signals, vulnerable customer impact, fairness/conduct indicators
Cost / capacityunit cost, review load, support load, platform spend
Value leakagerework, human review, delay, support, remediation, redress
Release / experiment impactchanges made and observed effects
Assumption ledger updateconfirmed, weakened, invalidated assumptions
Recommendationspecific decision, next evidence needed, review trigger

8.3 Decision Matrix

Evidence patternDecision
Outcome improves, adoption durable, risk stable, cost acceptablescale
Outcome improves only in low-risk cohortscale selectively
Usage high, outcome flat, cost risingredesign or restrict
Outcome improves, complaints riserestrict and investigate
Adoption low, users cite workflow frictionproduct / workflow redesign
Adoption low, value thesis weakretire or park
Quality defects tied to recent releaserollback or hotfix
Value depends on manual review load that cannot scaleredesign operating model before expansion

8.4 Monthly Decision Memo

# Monthly AI Value Review Decision Memo

## Capability
- Product:
- Workflow:
- Target population:
- Current version set:

## Decision Requested
- Recommendation:
- Decision deadline:
- Decision owner:

## Outcome Evidence
- Baseline:
- Current:
- Movement:
- Confidence:

## Adoption Evidence
- Qualified use:
- Cohort durability:
- Work-as-done observation:

## Risk / Control Evidence
- Complaints:
- Overrides:
- Policy drift:
- Customer harm signals:

## Cost / Capacity Evidence
- Cost per case:
- Human review load:
- Support load:
- Platform capacity:

## Release / Experiment Evidence
- Recent releases:
- Experiment result:
- Regressions:

## Assumption Ledger Update
- Confirmed:
- Weakened:
- Invalidated:

## Decision and Actions
- Decision:
- Actions:
- Closure evidence:
- Next review trigger:

9. Quarterly Portfolio Review

9.1 Purpose

Quarterly review compares AI capabilities across value, risk, cost, platform leverage and organizational capacity.

9.2 Scorecard

DimensionRating guide
Net valuefinance-supported benefit after cost and leakage
Adoption durabilitysustained qualified use across target cohorts
Risk stabilityrisk signals within appetite, controls effective
Evidence maturitytelemetry, sampling, experiment and finance evidence quality
Platform reusereusable model gateway, eval, RAG, tool, policy patterns
Capacity burdenSME review, operations queue, platform support, risk review load
Strategic fitalignment to business capability and product strategy
Retirement logicvalue weak, risk high, platform cost high, better alternative available

9.3 Portfolio Decisions

DecisionWhen to use
Fundstrong evidence, scalable economics, strategic fit
Scaleproven cohort, acceptable risk, operational capacity available
Restrictvalue only in specific population or risk high in certain segments
Consolidatemultiple teams built similar capability
Platformizerepeated pattern should become reusable golden path
Pauseevidence insufficient or operating capacity constrained
Retireassumptions failed or risk/cost exceeds value

9.4 Portfolio Review Output

| capability | decision | evidence_basis | funding_change | platform_dependency | risk_condition | next_review |
|---|---|---|---|---|---|---|
| KYC document assistant | scale selectively | cycle time down in low-risk retail, EDD defects unchanged | fund 2 engineers for workflow integration | document extraction quality dashboard | no expansion to complex entity until false-pass sample remains clean | 2026-Q3 monthly value review |

10. Metric Contract Template

10.1 Contract Fields

# Metric Contract: [metric_id]

## Business Question
- Decision this metric supports:

## Definition
- Numerator:
- Denominator:
- Inclusion rules:
- Exclusion rules:
- Time window:

## Population
- Workflow:
- User roles:
- Case types:
- Channel:
- Risk tier:

## Source and Trace
- Source systems:
- Event names:
- Version fields:
- Join keys:
- Data latency:

## Owner
- Business owner:
- Analytics owner:
- Risk reviewer:

## Thresholds and Action Rules
- Target:
- Warning:
- Breach:
- Stop rule:
- Action when warning:
- Action when breach:

## Guardrails
- Customer harm guardrail:
- Cost guardrail:
- Quality guardrail:
- Risk guardrail:

## Segmentation
- Required cohort cuts:
- Vulnerable customer / protected class proxy review:
- Manager/team cut:

## Evidence Quality
- Level:
- Sampling or reconciliation approach:
- Last reviewed:
- Next review:

10.2 Example: Contact-Center Repeat Contact

FieldExample
metric_idcc_agent_assist.repeat_contact_7d
business questionDid agent assist reduce AHT without pushing customers to call again?
numeratorcustomers with another contact within 7 days for same intent
denominatorassisted calls for eligible intents
required cutsintent, team, agent tenure, suggestion acceptance, prompt version
guardrailif repeat contact rises while AHT falls, monthly value cannot claim net improvement
action rulewarning triggers 50-call QA sample; breach triggers release hold for affected intent

10.3 Example: KYC False Pass

FieldExample
metric_idkyc_ai.document_false_pass_sample_rate
business questionIs onboarding speed improvement weakening document completeness controls?
numeratorsampled cases where AI marked document complete but QA found material defect
denominatorQA sampled AI-complete cases
required cutsdocument type, entity type, geography, channel, model/prompt version
stop ruleany severe false pass in high-risk entity type triggers expansion hold
action ruleupdate eval set, increase human review threshold, review policy pack

11. Evidence Review Pack Template

11.1 Standard Pack

# AI Product Operations Evidence Review Pack

## Decision Requested
- Forum:
- Requested decision:
- Deadline:
- Owner:

## Product and Version Scope
- Product:
- Workflow:
- Population:
- Model version:
- Prompt version:
- Data / knowledge version:
- Tool version:
- Policy version:

## Outcome Evidence
| metric | baseline | current | trend | evidence quality | interpretation |
|---|---|---|---|---|---|

## Adoption Evidence
| cohort | exposure | qualified use | accept/edit/reject | durability | issue |
|---|---|---|---|---|---|

## Quality Evidence
| failure class | count/rate | affected version | severity | action |
|---|---|---|---|---|

## Risk / Control Evidence
| signal | population | severity | control impact | action |
|---|---|---|---|---|

## Cost / Capacity Evidence
| metric | current | threshold | driver | action |
|---|---|---|---|---|

## Release and Experiment Evidence
| item | release/experiment | population | observed impact | decision |
|---|---|---|---|---|

## Incident and Complaint Learning
| signal | root cause | corrective action | roadmap impact | closure evidence |
|---|---|---|---|---|

## Assumption Ledger Update
| assumption | status | evidence | decision impact |
|---|---|---|---|

## Action Closure
| action | owner | due date | closure evidence | status |
|---|---|---|---|---|

11.2 Evidence Quality Levels

LevelNameMeaning
E1anecdotaluseful signal but not decision-grade
E2sampledstructured QA / complaint / interview sample
E3instrumentedtelemetry joined to workflow and version context
E4experiment-gradecontrolled, matched, cohort or time-series analysis
E5certifiedreconciled with finance, risk, audit or model validation evidence

12. Release Calendar

12.1 Release Objects

ObjectExamplesRequired review
Modelprovider upgrade, fallback model, model class changeeval, latency, cost, risk, rollback
Promptsystem prompt, tool instruction, response formatregression eval, policy review by risk tier
Datafeature refresh, label update, training sample changelineage, privacy, bias, validation
Knowledgepolicy corpus, FAQ, product terms, scriptsfreshness, authority, retrieval quality
ToolCRM write, fee waiver, case closure, document requestauthorization, side effect, audit, dual control
Policyhardship policy, complaint taxonomy, AML escalationlegal/compliance, frontline comms, eval update
WorkflowUI step, routing, human review thresholdadoption, capacity, control impact
Dashboardmetric definition, threshold, evidence viewmetric owner approval

12.2 Calendar Fields

| release_id | date | object_type | change | affected_population | risk_tier | evidence_required | owner | canary | rollback | review_date | decision_log |
|---|---|---|---|---|---|---|---|---|---|---|---|

12.3 Release Rules

  • Prompt and knowledge changes are production changes.
  • Tool permission changes require side-effect review.
  • Policy changes require eval and frontline communication updates.
  • Model changes require cost, latency and quality comparison.
  • Data changes require lineage and segmentation review.
  • Every release has a post-release impact review date.
  • Release calendar is reviewed in weekly ops and monthly value review.

13. Experiment Registry

13.1 Registry Fields

| experiment_id | hypothesis | population | treatment | control_or_baseline | primary_metric | guardrails | duration | owner | result | decision |
|---|---|---|---|---|---|---|---|---|---|---|

13.2 Example Experiments

Use caseHypothesisGuardrails
Agent assistshorter, policy-cited suggestions reduce AHT and editsrepeat contact, QA script defect, complaint
Complaint intelligencenew taxonomy improves regulatory complaint routingfalse negative regulatory sample, cycle time
KYC onboardingdocument completeness assistant reduces customer chasefalse pass, abandonment, EDD escalation
Collections hardshipvulnerability-aware guidance improves arrangement suitabilitycomplaint, broken promise, vulnerable customer review
AML triagetypology-aware retrieval improves case narrative qualityaudit sample, escalation miss, reviewer edit distance
Pricing governancereason-code explanation reduces branch overridesfairness signal, complaint, margin leakage

13.3 Experiment Review

Experiment review asks:

  1. Did the experiment move the primary metric?
  2. Did guardrails remain within threshold?
  3. Did different cohorts respond differently?
  4. Did cost or capacity change?
  5. Which assumption changed?
  6. What release, backlog or policy decision follows?

14. Decision Log and Assumption Ledger

14.1 Decision Log

| decision_id | date | forum | decision | options_considered | evidence | owner | review_trigger |
|---|---|---|---|---|---|---|---|

Decision log should capture:

  • scale / restrict / redesign / retire decisions。
  • release approval or rollback。
  • metric definition changes。
  • risk acceptance or risk appetite change。
  • platform investment decision。
  • incident corrective action acceptance。

14.2 Assumption Ledger

| assumption_id | statement | type | owner | evidence | confidence | expiry | status | decision_impact |
|---|---|---|---|---|---|---|---|---|

Assumption types:

TypeExample
Valuereducing average handle time creates net benefit after repeat contact
Adoptionagents will use suggestions if citations are visible
Riskhuman review catches high-impact hallucinations
Costtoken cost per case remains below threshold at scale
CapacityQA can sample 2% of assisted cases without queue impact
Policycurrent hardship guidance remains valid for next quarter
Technicalretrieval latency supports live-call workflow

Monthly value review updates assumption status:

  • confirmed。
  • weakened。
  • invalidated。
  • needs more evidence。
  • expired due to policy, product, model or workflow change。

15. Incident-to-Roadmap Loop

15.1 Severity Model

SeverityExampleRequired action
Sev 1customer harm, regulatory breach, unauthorized tool actionimmediate containment, executive/risk escalation, rollback/restrict
Sev 2material workflow defect, repeated complaint, major SLO breachweekly incident review, corrective action, metric update
Sev 3localized quality regression, manageable cost spikebacklog item, targeted release, monitor
Sev 4nuisance defect, unclear signalsample, classify, watchlist

15.2 Incident Record

# AI Product Incident Record

## Signal
- Source:
- Date:
- Severity:
- Affected population:
- Affected versions:

## Impact
- Customer:
- Operations:
- Risk / compliance:
- Cost / capacity:

## Containment
- Immediate action:
- Rollback / restriction:
- Communication:

## Root Cause
- Model:
- Prompt:
- Data:
- Knowledge:
- Tool:
- Workflow:
- Policy:
- Training / adoption:
- Metric / monitoring:

## Corrective Actions
| action | owner | due_date | closure_evidence | reopen_trigger |
|---|---|---|---|---|

## Roadmap Impact
- Backlog item:
- Release calendar update:
- Metric contract update:
- Eval update:
- Policy / SOP update:
- Next review:

15.3 Complaint Learning

Complaint and incident reviews should classify:

SignalProduct question
misleading answerprompt, knowledge or policy problem
inconsistent treatmentsegmentation, policy or training problem
slow resolutionworkflow, routing or capacity problem
unfair outcomerisk appetite, guardrail or model problem
customer confusionexplanation, UX or frontline script problem
repeated escalationAI boundary or human oversight problem

16. Backlog Governance

16.1 Backlog Classes

ClassExample sourceRequired link
Outcome improvementmonthly value gapmetric contract
Adoption fixcohort drop-offadoption funnel
Quality fixeval failure classQA / eval evidence
Risk controlcomplaint or policy breachincident / risk signal
Cost / capacityunit cost or queue thresholdcost dashboard
ReliabilitySLO breachruntime signal
Evidence gapmissing join or weak sampleevidence pack
Release dependencymodel/prompt/tool/policy changerelease calendar
Experimentassumption needs testexperiment registry
Retirementweak value or high riskportfolio decision

16.2 Prioritization Formula

Use a simple decision score:

priority_score =
  outcome_impact
  + risk_reduction
  + adoption_unblock
  + cost_capacity_relief
  + evidence_need
  + platform_reuse
  - delivery_effort
  - change_saturation

Risk and customer harm can override numeric score.

16.3 Backlog Review Rules

  • No backlog item enters top priority without evidence link.
  • Every roadmap item states expected metric movement.
  • Incident corrective actions outrank cosmetic improvements when severity is high.
  • Evidence gaps outrank scale decisions.
  • Platform reuse can outrank local feature work when multiple products repeat the same problem.
  • Retirement candidates are reviewed quarterly, not hidden at the bottom of the backlog.

17. Role / RACI

ActivityAI PMAI ArchitectCBAP BAOpsRisk / ComplianceAnalyticsPlatformFinance
Product Ops charterA/RCCCCCCI
Metric contractA/RCRCCRCC
Weekly ops reviewA/RRRRCRRI
Monthly value reviewA/RCRRCRCR
Quarterly portfolio reviewRCCCCRCA/R
Release calendarCA/RCCCCRI
Experiment registryA/RCCCCRCI
Decision logA/RRRCCCCI
Assumption ledgerA/RCRCCRCC
Incident learningRRRA/RA/R by severityCRI
Action closureA/RRRRRRRI
Dashboard designA/RRRCCRCC

Legend: A = accountable, R = responsible, C = consulted, I = informed.


18. Dashboard Design

18.1 Dashboard Stack

DashboardAudienceCadenceMust show
Runtime Signal BoardPM, Architect, Ops, Platformdailyincidents, SLO, latency, fallback, cost anomaly
Weekly Ops BoardPM, BA, Ops, Riskweeklyadoption, quality, risk, cost, actions
Monthly Value BoardSponsor, PM, Finance, Riskmonthlyoutcome, net value, value leakage, decision request
Release Impact BoardPM, Architect, Riskweekly/monthlyversion changes, affected population, regression
Portfolio BoardExecutives, Value Officequarterlyvalue, risk, cost, reuse, funding, retirement
Evidence Binder ViewPM, BA, Risk, Auditas neededtrace from metric to source, decision, action closure

18.2 Weekly Ops Board Wireframe

Header
  decision requests | releases pending | incidents open | actions overdue

Row 1: Adoption
  qualified use by cohort | accept/edit/reject | rejection reasons

Row 2: Quality
  eval trend | QA defects | top failure classes | affected versions

Row 3: Risk and Complaints
  complaints | overrides | policy breach | customer harm watchlist

Row 4: Reliability and Cost
  SLO | latency | fallback | cost per case | review queue

Row 5: Action Closure
  open by age | blocked | due this week | reopened

18.3 Monthly Value Board Wireframe

Decision requested
  scale / restrict / redesign / retire / continue

Outcome movement
  baseline vs current | cohort | confidence | source quality

Net value
  benefit | cost | human review | support | rework | risk adjustment

Risk and adoption
  guardrails | complaints | durable qualified use | policy drift

Roadmap implication
  next release | backlog changes | assumptions changed | decision log

18.4 Dashboard Rules

  • Every chart maps to a metric contract.
  • Every threshold maps to an action rule.
  • Every trend can be segmented by cohort and version.
  • Release markers appear on outcome and defect trends.
  • Dashboard includes open action age and closure quality.
  • Executive dashboard shows decisions, not raw operational noise.

19. Financial Retail Execution Patterns

19.1 Contact-Center Agent Assist

CadenceFocus
Weekly opsaccept/edit/reject by intent, repeat contact, script defects, latency
Monthly valueAHT net of repeat contact, QA, complaint, cost per assisted call
Release calendarprompt wording, knowledge article, CRM action, call intent rollout
Incident loopmisleading fee explanation, complaint spike, policy article drift
Backloghigh-edit intents, citation UI, supervisor coaching, knowledge freshness SLO

19.2 Complaint Intelligence

CadenceFocus
Weekly opsfalse negative sample, routing delay, taxonomy confusion
Monthly valuecycle time, remediation closure, regulatory complaint capture
Release calendartaxonomy update, root cause labels, policy corpus
Incident loopmissed regulatory complaint updates eval, routing, QA sample
Backlogexplanation of classification, root cause action tracker, dashboard for legal

19.3 KYC Onboarding

CadenceFocus
Weekly opsdocument completeness, rework, false pass sample, customer chase
Monthly valueonboarding cycle time, abandonment, EDD escalation, cost per application
Release calendardocument parser, policy rule, threshold, country/entity rollout
Incident loopsevere false pass triggers expansion hold and eval update
Backlogmissing document reason codes, customer notification wording, reviewer queue

19.4 Collections Hardship

CadenceFocus
Weekly opsarrangement suitability, vulnerability flag, agent override, complaint
Monthly valuekept promise rate, broken arrangement, customer harm guardrail
Release calendarhardship policy, conversation script, escalation workflow
Incident loopinappropriate pressure complaint updates prompt and training
Backlogvulnerability-aware guidance, escalation UX, manager coaching report

19.5 AML Triage

CadenceFocus
Weekly opsalert aging, reviewer edit distance, narrative defects, typology feedback
Monthly valueanalyst throughput, audit sample, escalation quality, SAR prep support
Release calendartypology knowledge, retrieval corpus, explanation template
Incident loopmissed typology updates eval, corpus, sampling and risk review
Backlogevidence citation, scenario-specific retrieval, investigator feedback capture

19.6 Personalized Pricing Governance

CadenceFocus
Weekly opsreason-code quality, overrides, complaint, segment impact
Monthly valuemargin/conversion net of fairness and conduct guardrails
Release calendarpricing model, eligibility policy, explanation wording
Incident loopunfair treatment signal triggers restrict and risk review
Backlogbranch override reasons, fairness dashboard, policy drift alerts

20. 90-Day Rollout Plan

Days 1-15: Foundation

Day rangeWorkOutput
1-3Confirm use-case scope, owners, risk tier, review forumsProduct Ops charter
4-6Draft metric contracts for top metricsMetric contract pack
7-9Build review pack and action closure registerEvidence pack v1
10-12Create release calendar and experiment registryRelease / experiment system
13-15Run first weekly ops reviewDecision log and actions

Days 16-35: Evidence Quality

Day rangeWorkOutput
16-20Join telemetry to workflow outcome and version contextEvidence data map
21-25Add complaint, QA, incident and cost feedsRisk / cost signal board
26-30Run first monthly value reviewValue decision memo
31-35Update assumption ledger and backlog governanceEvidence-driven backlog

Days 36-60: Release and Incident Learning

Day rangeWorkOutput
36-42Operationalize release calendarVersion trace and review dates
43-48Run experiment review cycleExperiment readout and decisions
49-54Run incident tabletopIncident-to-roadmap test
55-60Improve dashboards and action closureDashboard v2 and closure quality

Days 61-90: Portfolio Governance

Day rangeWorkOutput
61-70Build quarterly portfolio scorecardPortfolio review pack
71-78Identify scale, restrict, redesign, retire candidatesDecision recommendations
79-85Link platform investment to recurring pain pointsPlatform roadmap input
86-90Run quarterly portfolio review simulationFunding and roadmap decisions

21. Templates

21.1 Product Ops Charter

# AI Product Operations Charter

## Product
- Capability:
- Workflow:
- Target users:
- Customer / operations impact:
- Risk tier:

## Operating Cadence
- Daily triage:
- Weekly ops review:
- Monthly value review:
- Quarterly portfolio review:
- Release review:
- Incident review:

## Decision Rights
- Decisions owned by product:
- Decisions requiring risk review:
- Decisions requiring platform review:
- Decisions requiring executive review:

## Evidence Assets
- Metric contracts:
- Review pack:
- Release calendar:
- Experiment registry:
- Decision log:
- Assumption ledger:
- Action closure register:

## Success Criteria
- Outcome:
- Adoption:
- Quality:
- Risk:
- Cost:
- Reliability:
- Learning:

21.2 Action Closure Register

| action_id | source | action | owner | due_date | closure_evidence | reviewer | status | reopen_trigger |
|---|---|---|---|---|---|---|---|---|

21.3 Release Impact Review

| release_id | release_type | expected_effect | observed_effect | regressions | guardrail_status | decision |
|---|---|---|---|---|---|---|

21.4 Assumption Review

| assumption_id | previous_status | new_evidence | new_status | roadmap_impact | next_review |
|---|---|---|---|---|---|

22. Anti-Patterns

Anti-patternSymptomCorrection
Usage replaces outcomereports prompt count and MAU onlyuse outcome chain and net value
Cadence without decisionrecurring meetings with no decision requestedstart every pack with decision
Metric without ownerdashboard disputes every monthmetric contract with owner and action rule
Evidence without tracecannot link metric to version, workflow, cohortversion-aware telemetry and evidence map
Release invisibilityprompt/knowledge/tool changes not reviewedunified AI release calendar
Incident amnesiasame defect returns under new releaseincident-to-roadmap loop and recurrence review
Backlog politicsloud stakeholder beats evidenceevidence-linked backlog classes
Cost blindnessplatform spend treated as shared overheadcost per case and capacity review
Risk theaterpolicy documents exist but no runtime signalsrisk metrics, complaint sampling, action closure
Portfolio optimismweak capabilities continue because already fundedquarterly retire / pause decisions

23. Interview Answers

Q1: 你如何建立 AI 产品上线后的运营节奏?

我会建立 daily triage、weekly ops review、monthly value review 和 quarterly portfolio review。Weekly 处理 adoption、quality、SLO、risk、cost、incident 和 open action; monthly 判断 outcome、net value、value leakage 和 scale/stop; quarterly 比较 portfolio value、risk concentration、platform reuse 和 funding。每个节奏都用 metric contract、evidence pack、decision log 和 action closure 连接, 这样 review 不是汇报, 而是持续决策系统。

Q2: 为什么 AI Product Ops 需要 release calendar?

因为 AI 产品的生产变更不只是代码。模型、prompt、知识库、数据、tool permission、policy 和 workflow 都会改变用户结果和风险。如果这些变更不在同一个 release calendar, 事故和指标变化就无法追溯。好的 release calendar 记录 affected population、risk tier、evidence required、canary、rollback、review date 和 decision log。

Q3: 如何避免 dashboard 变成 vanity metrics?

每个 dashboard metric 都必须有 metric contract。合同定义业务问题、口径、population、source、owner、threshold、guardrail、segmentation 和 action rule。比如 agent assist 的 AHT 必须同时看 repeat contact、QA defect、complaint 和 cost per assisted call。没有行动规则的指标不进入 executive review。

Q4: AI incident 后你如何推动产品改进?

我会先按 severity contain 或 rollback, 然后做 root cause, 分类到 model、prompt、data、knowledge、tool、workflow、policy、training 或 monitoring。接着更新 corrective action、metric contract、eval set、release calendar、backlog 和 decision log。最后用 action closure evidence 和 recurrence review 证明问题真的被解决, 而不是 ticket 被关闭。

Q5: Monthly value review 和 quarterly portfolio review 的差异是什么?

Monthly value review 面向单个 capability, 判断 outcome、adoption、risk、cost 和 value leakage 是否支持 scale、restrict、redesign 或 retire。Quarterly portfolio review 比较多个 capabilities, 决定 funding、platform investment、risk concentration、capacity allocation 和 retirement。前者是产品价值判断, 后者是投资组合判断。

Q6: CBAP-level BA 在 AI Product Ops 里有什么高级价值?

BA 不只是记录需求。Post-launch 阶段, BA 可以维护 work-as-done evidence、metric contract、assumption ledger、complaint/incident taxonomy、action closure 和 policy drift impact。BA 的优势是把流程、规则、异常、控制点和用户行为连接到 outcome evidence, 让 PM 和架构师看到真实工作系统如何变化。

Q7: 如何向高管解释 action closure?

Action closure 不是“负责人说完成了”。它要求每个行动有 closure evidence, 例如 defect rate 降到阈值内、QA sample 清洁、知识库版本上线、release impact review 通过或 complaint recurrence 停止。没有 closure evidence, action 只是活动, 不是运营改善。


24. Portfolio Exercise

24.1 Assignment

选择一个金融零售 AI capability, 推荐用以下之一:

  1. Contact-center agent assist。
  2. Complaint intelligence。
  3. KYC onboarding assistant。
  4. Collections hardship guidance。
  5. AML triage assistant。
  6. Personalized pricing governance。

为它设计一套 90 天 AI Product Ops operating cadence。

24.2 Deliverables

DeliverableMinimum content
Product Ops Charterscope, owner, risk tier, cadence, decision rights
Operating Calendardaily, weekly, monthly, quarterly, release, incident, experiment
Metric Contract Packat least 12 metrics across outcome, adoption, quality, risk, cost, reliability
Evidence Review Packdecision request, outcome, adoption, quality, risk, cost, release, action closure
Release Calendarmodel, prompt, data, knowledge, tool, policy, workflow changes
Experiment Registryat least 3 experiments with guardrails
Incident-to-Roadmap Loopseverity, root cause taxonomy, corrective action, roadmap update
Backlog Governancebacklog classes, priority formula, evidence links
Dashboard Wireframesweekly ops, monthly value, quarterly portfolio
Interview Narrative30-second and 2-minute answer

24.3 Scoring Rubric

DimensionStrong answer
Cadence clarityDifferent forums have different decisions and evidence
Metric rigorMetrics have contracts, thresholds, guardrails and action rules
Evidence qualityUses telemetry, samples, experiments and finance/risk reconciliation
Post-launch realismIncludes release changes, incidents, complaints, cost and action closure
Financial retail fitHandles customer harm, compliance, frontline adoption and capacity
Architecture depthConnects version trace, OpenTelemetry-style observability and SLO thresholds
BA maturityModels assumptions, policy drift, work-as-done and exception paths
Executive usefulnessProduces scale, restrict, redesign, retire and funding decisions

25. Quality Bar

A strong AI Product Ops artifact can answer:

  1. What decision does each forum make?
  2. Which evidence proves the decision?
  3. Which metric contracts define the evidence?
  4. Which releases changed the system?
  5. Which incidents changed the roadmap?
  6. Which actions closed with proof?
  7. Which assumptions are still valid?
  8. Which capability deserves more investment, restriction or retirement?

If the answer is unclear, the product is not yet operating. It is only running.