AI 扩展计划 / Playbooks

AI Product Operations / Operating Cadence / Outcome Review Playbook

完成本 playbook 后, 你应该能:

1,120 行AI_PRODUCT_OPERATIONS_OPERATING_CADENCE_OUTCOME_REVIEW_PLAYBOOK.md

AI Product Operations / Operating Cadence / Outcome Review Playbook

目标: 给 Senior AI PM / AI Architect / CBAP-level BA 一套可执行方法, 用于建立 AI 产品上线后的运营节奏、证据审查、指标治理、事故学习、成本容量管理和 roadmap 决策闭环。适用场景: contact-center agent assist、complaint intelligence、KYC onboarding、collections hardship、AML triage、personalized pricing governance、internal copilots 和 regulated workflow automation。核心原则: AI Product Ops 不是会议制度, 而是把 outcome、risk、adoption、quality、cost、release、incident 和 action closure 放进同一套 post-launch operating system。

1. Target Audience

Audience	使用本手册完成什么
Senior AI PM	运行 weekly ops review、monthly value review 和 roadmap decision loop
AI Architect	设计 release calendar、runtime telemetry、SLO、version trace 和 evidence dashboard
CBAP-level BA	建立 metric contract、assumption ledger、action closure、complaint/incident learning
Product Operations Lead	维护 operating calendar、review pack、decision log 和 forum hygiene
Operations Leader	管理 adoption、capacity、review queue、coaching 和 frontline feedback
Risk / Compliance Partner	把 post-launch monitoring 连接到 policy drift、customer harm、control effectiveness
Data / Analytics Partner	维护 outcome join、metric definitions、segmentation 和 evidence quality
Finance / Value Office	评估 net realized value、unit economics、portfolio allocation 和 retirement

2. Learning Objectives

完成本 playbook 后, 你应该能:

为已上线 AI product 建立 90 天 post-launch operating cadence。
定义 weekly ops review、monthly value review、quarterly portfolio review 的 agenda、inputs、outputs。
写出 metric contract, 覆盖 outcome、adoption、quality、risk、cost、reliability 和 learning。
设计 evidence review pack, 支撑 scale、restrict、redesign、retire、release、rollback 决策。
运行 model / prompt / data / knowledge / tool / policy release calendar。
建立 experiment registry、assumption ledger、decision log 和 action closure register。
把 incident、complaint、near miss、policy drift 和 capacity issue 转化为 backlog item。
为金融零售案例设计 RACI、dashboard、anti-patterns、interview answers 和 portfolio exercise。

3. Executive Summary

上线后的 AI 产品会持续变化:

model behavior changes
prompt changes
knowledge changes
tool permissions change
policy changes
workflow behavior changes
cost profile changes
customer and regulator expectations change

因此 AI Product Ops 的核心任务是建立证据节奏:

Telemetry and samples
  -> metric contract
  -> evidence review pack
  -> weekly / monthly / quarterly decisions
  -> backlog and release calendar
  -> action closure
  -> roadmap and portfolio update

一个成熟的 post-launch operating system 至少包含 12 个资产:

Asset	用途
Operating calendar	固化 weekly, monthly, quarterly, release, experiment, incident review
Metric contract	定义指标口径、owner、threshold、guardrail、action rule
Evidence review pack	把 outcome、adoption、quality、risk、cost、incident 放到决策页
Weekly ops agenda	处理 adoption、quality、SLO、capacity、incident、open action
Monthly value review	评估 net value、value leakage、risk trend、scale/stop
Quarterly portfolio review	比较 use cases、funding、risk concentration、platform leverage
Release calendar	管理 model、prompt、data、knowledge、tool、policy、workflow 变更
Experiment registry	管理假设、cohort、metric、guardrail、结果和决策
Decision log	记录关键决策和证据依据
Assumption ledger	追踪价值、风险、采用、成本假设是否仍成立
Incident-to-roadmap loop	把 complaint、near miss、failure 变成 roadmap learning
Action closure register	防止会议行动没有证据闭环

4. Source Anchors

这些来源用于对齐治理、管理体系、需求工程、工程绩效、可观测性和可靠性语言。本文不提供法律、监管、审计或认证意见。

Source	Link	Playbook 用法
NIST AI RMF	https://www.nist.gov/itl/ai-risk-management-framework	用 Govern / Map / Measure / Manage 组织 post-launch risk monitoring and action closure
ISO/IEC 42001	https://www.iso.org/standard/81230.html	用 AI management system 的 objectives、operation、performance evaluation、improvement 组织 operating cadence
ISO/IEC/IEEE 29148	https://www.iso.org/standard/72089.html	用 requirements traceability 和 verification 思路定义 metric contract、assumption ledger 和 decision log
DORA	https://dora.dev/	用 delivery performance 和 reliability 思路管理 release cadence、change failure and restore
OpenTelemetry	https://opentelemetry.io/docs/	用 traces、metrics、logs 思路设计 AI product runtime evidence
Google SRE SLO	https://sre.google/sre-book/service-level-objectives/	用 SLO / threshold / error budget 思路定义 AI workflow reliability

5. Operating Setup

5.1 Entry Criteria

Use this playbook when an AI capability has at least one of the following:

controlled pilot with real users。
production release for a target workflow。
monthly executive value review requirement。
regulated customer, operations, risk or cost exposure。
repeat changes to model, prompt, data, knowledge, tool or policy。
complaints, incidents, near misses or adoption concerns。

5.2 Required Inputs

Input	Minimum quality
Product scope	named workflow, users, case types, customer impact
Baseline	pre-AI outcome, quality, cost, risk, capacity
Initial release package	model/prompt/data/tool versions, eval results, rollback path
Risk tier	customer harm, compliance, autonomy, reversibility
Telemetry	user action, workflow context, version, outcome join
Owner map	PM, architect, ops, risk, analytics, support, finance
Existing incidents	complaint, defect, override, QA, support, reliability history

5.3 First 10 Days

Day	Action	Output
1	Confirm scope and use-case owner	product ops charter
2	List post-launch decisions needed	decision inventory
3	Draft metric contract for top 10 metrics	metric contract v1
4	Build evidence review pack outline	review pack v1
5	Create operating calendar	cadence calendar
6	Create release calendar	release calendar v1
7	Create experiment registry	experiment registry v1
8	Create incident-to-roadmap taxonomy	incident learning map
9	Create dashboard wireframe	weekly / monthly / portfolio dashboard
10	Run first weekly ops review simulation	action closure register

6. Cadence Architecture

6.1 Cadence Map

Cadence	Owner	Attendees	Primary decision
Daily signal triage	Ops / Platform	PM, Architect, Ops, Support, Risk by severity	mitigate, rollback, escalate
Weekly ops review	AI PM	PM, BA, Architect, Ops, Analytics, Risk, Platform	fix, assign, close, adjust release
Monthly value review	AI PM / Sponsor	Sponsor, PM, Finance, Risk, Ops, Analytics	scale, restrict, redesign, retire
Quarterly portfolio review	Value Office	Executives, Portfolio, Platform, Risk, Finance	fund, pause, consolidate, retire
Release review	Architect / Release Lead	PM, Risk, Platform, Data, Ops	approve, canary, hold, rollback
Incident review	Ops / Risk	PM, Architect, Risk, Ops, Support, Data	root cause, corrective action, recurrence prevention
Experiment review	PM / Analytics	PM, BA, Data Science, Ops, Risk	continue, scale, stop, redesign

6.2 Cadence Design Rules

Every recurring forum has a decision type.
Every review pack starts with decision requested.
Every metric in executive review has a contract.
Every action has owner, due date, closure evidence and reopen trigger.
Every model / prompt / data / tool / policy change appears on the release calendar.
Every incident creates or updates at least one backlog, metric, eval, process or policy item.
Every monthly value review updates the assumption ledger.
Every quarterly review can decide to retire, not only expand.

7. Weekly Ops Review

7.1 Purpose

Weekly ops review answers:

What changed in production this week?
What signals require action?
Which actions closed with evidence?
Which release or backlog decision is needed before next review?

7.2 Agenda

Time	Block	Inputs	Outputs
5 min	Decision requests	proposed releases, incidents, stuck actions	agenda priority
10 min	Adoption and workflow	qualified use, cohort funnel, rejection reasons	adoption action
10 min	Quality and eval	QA sample, eval delta, defect classes	fix / sample / hold
10 min	Reliability and SLO	latency, availability, fallback, restore	platform action
10 min	Risk / complaints	complaints, overrides, policy breach, near miss	risk action / escalation
10 min	Cost / capacity	cost per case, review queue, support load	capacity / cost action
10 min	Release and experiments	version changes, experiment readout	release / experiment decision
10 min	Action closure	last actions, closure evidence, blockers	close / reopen / escalate

7.3 Weekly Review Questions

Domain	Questions
Adoption	Are target users using AI in the intended workflow step? Which cohorts are dropping off?
Quality	Which failure classes changed? Are defects tied to a release version?
Reliability	Are SLOs met for the workflow moment where AI is needed?
Risk	Did any complaint, override or QA sample indicate customer harm or policy drift?
Cost	Is unit cost aligned to value? Is human review load growing faster than usage?
Release	Did any model/prompt/data/tool/policy change create outcome movement or regression?
Action	Which actions truly closed, and what evidence proves closure?

7.4 Weekly Outputs

Use this format for each action:

| action_id | source_signal | action | owner | due_date | closure_evidence | reopen_trigger | forum |
|---|---|---|---|---|---|---|---|
| ACT-001 | contact-center repeat contact up 8% in billing calls | sample 50 edited suggestions and update billing knowledge article | Ops QA Lead | 2026-07-07 | QA sample shows defect class reduced below warning threshold | repeat contact remains above threshold for two weekly reviews | Weekly Ops |

No action should be accepted without closure evidence.

8. Monthly Value Review

8.1 Purpose

Monthly value review answers:

Is this AI capability improving business outcomes after adoption, risk, cost, capacity and incident learning are included?

8.2 Required Pack

Section	Required evidence
Decision requested	continue, scale, restrict, redesign, retire
Business outcome	baseline, current, trend, cohort, confidence
Adoption durability	returning qualified use, manager reinforcement, work-as-done evidence
Quality / control	eval trend, QA defects, control overrides, policy breaches
Risk / customer impact	complaints, harm signals, vulnerable customer impact, fairness/conduct indicators
Cost / capacity	unit cost, review load, support load, platform spend
Value leakage	rework, human review, delay, support, remediation, redress
Release / experiment impact	changes made and observed effects
Assumption ledger update	confirmed, weakened, invalidated assumptions
Recommendation	specific decision, next evidence needed, review trigger

8.3 Decision Matrix

Evidence pattern	Decision
Outcome improves, adoption durable, risk stable, cost acceptable	scale
Outcome improves only in low-risk cohort	scale selectively
Usage high, outcome flat, cost rising	redesign or restrict
Outcome improves, complaints rise	restrict and investigate
Adoption low, users cite workflow friction	product / workflow redesign
Adoption low, value thesis weak	retire or park
Quality defects tied to recent release	rollback or hotfix
Value depends on manual review load that cannot scale	redesign operating model before expansion

8.4 Monthly Decision Memo

# Monthly AI Value Review Decision Memo

## Capability
- Product:
- Workflow:
- Target population:
- Current version set:

## Decision Requested
- Recommendation:
- Decision deadline:
- Decision owner:

## Outcome Evidence
- Baseline:
- Current:
- Movement:
- Confidence:

## Adoption Evidence
- Qualified use:
- Cohort durability:
- Work-as-done observation:

## Risk / Control Evidence
- Complaints:
- Overrides:
- Policy drift:
- Customer harm signals:

## Cost / Capacity Evidence
- Cost per case:
- Human review load:
- Support load:
- Platform capacity:

## Release / Experiment Evidence
- Recent releases:
- Experiment result:
- Regressions:

## Assumption Ledger Update
- Confirmed:
- Weakened:
- Invalidated:

## Decision and Actions
- Decision:
- Actions:
- Closure evidence:
- Next review trigger:

9. Quarterly Portfolio Review

9.1 Purpose

Quarterly review compares AI capabilities across value, risk, cost, platform leverage and organizational capacity.

9.2 Scorecard

Dimension	Rating guide
Net value	finance-supported benefit after cost and leakage
Adoption durability	sustained qualified use across target cohorts
Risk stability	risk signals within appetite, controls effective
Evidence maturity	telemetry, sampling, experiment and finance evidence quality
Platform reuse	reusable model gateway, eval, RAG, tool, policy patterns
Capacity burden	SME review, operations queue, platform support, risk review load
Strategic fit	alignment to business capability and product strategy
Retirement logic	value weak, risk high, platform cost high, better alternative available

9.3 Portfolio Decisions

Decision	When to use
Fund	strong evidence, scalable economics, strategic fit
Scale	proven cohort, acceptable risk, operational capacity available
Restrict	value only in specific population or risk high in certain segments
Consolidate	multiple teams built similar capability
Platformize	repeated pattern should become reusable golden path
Pause	evidence insufficient or operating capacity constrained
Retire	assumptions failed or risk/cost exceeds value

9.4 Portfolio Review Output

| capability | decision | evidence_basis | funding_change | platform_dependency | risk_condition | next_review |
|---|---|---|---|---|---|---|
| KYC document assistant | scale selectively | cycle time down in low-risk retail, EDD defects unchanged | fund 2 engineers for workflow integration | document extraction quality dashboard | no expansion to complex entity until false-pass sample remains clean | 2026-Q3 monthly value review |

10. Metric Contract Template

10.1 Contract Fields

# Metric Contract: [metric_id]

## Business Question
- Decision this metric supports:

## Definition
- Numerator:
- Denominator:
- Inclusion rules:
- Exclusion rules:
- Time window:

## Population
- Workflow:
- User roles:
- Case types:
- Channel:
- Risk tier:

## Source and Trace
- Source systems:
- Event names:
- Version fields:
- Join keys:
- Data latency:

## Owner
- Business owner:
- Analytics owner:
- Risk reviewer:

## Thresholds and Action Rules
- Target:
- Warning:
- Breach:
- Stop rule:
- Action when warning:
- Action when breach:

## Guardrails
- Customer harm guardrail:
- Cost guardrail:
- Quality guardrail:
- Risk guardrail:

## Segmentation
- Required cohort cuts:
- Vulnerable customer / protected class proxy review:
- Manager/team cut:

## Evidence Quality
- Level:
- Sampling or reconciliation approach:
- Last reviewed:
- Next review:

10.2 Example: Contact-Center Repeat Contact

Field	Example
metric_id	`cc_agent_assist.repeat_contact_7d`
business question	Did agent assist reduce AHT without pushing customers to call again?
numerator	customers with another contact within 7 days for same intent
denominator	assisted calls for eligible intents
required cuts	intent, team, agent tenure, suggestion acceptance, prompt version
guardrail	if repeat contact rises while AHT falls, monthly value cannot claim net improvement
action rule	warning triggers 50-call QA sample; breach triggers release hold for affected intent

10.3 Example: KYC False Pass

Field	Example
metric_id	`kyc_ai.document_false_pass_sample_rate`
business question	Is onboarding speed improvement weakening document completeness controls?
numerator	sampled cases where AI marked document complete but QA found material defect
denominator	QA sampled AI-complete cases
required cuts	document type, entity type, geography, channel, model/prompt version
stop rule	any severe false pass in high-risk entity type triggers expansion hold
action rule	update eval set, increase human review threshold, review policy pack

11. Evidence Review Pack Template

11.1 Standard Pack

# AI Product Operations Evidence Review Pack

## Decision Requested
- Forum:
- Requested decision:
- Deadline:
- Owner:

## Product and Version Scope
- Product:
- Workflow:
- Population:
- Model version:
- Prompt version:
- Data / knowledge version:
- Tool version:
- Policy version:

## Outcome Evidence
| metric | baseline | current | trend | evidence quality | interpretation |
|---|---|---|---|---|---|

## Adoption Evidence
| cohort | exposure | qualified use | accept/edit/reject | durability | issue |
|---|---|---|---|---|---|

## Quality Evidence
| failure class | count/rate | affected version | severity | action |
|---|---|---|---|---|

## Risk / Control Evidence
| signal | population | severity | control impact | action |
|---|---|---|---|---|

## Cost / Capacity Evidence
| metric | current | threshold | driver | action |
|---|---|---|---|---|

## Release and Experiment Evidence
| item | release/experiment | population | observed impact | decision |
|---|---|---|---|---|

## Incident and Complaint Learning
| signal | root cause | corrective action | roadmap impact | closure evidence |
|---|---|---|---|---|

## Assumption Ledger Update
| assumption | status | evidence | decision impact |
|---|---|---|---|

## Action Closure
| action | owner | due date | closure evidence | status |
|---|---|---|---|---|

11.2 Evidence Quality Levels

Level	Name	Meaning
E1	anecdotal	useful signal but not decision-grade
E2	sampled	structured QA / complaint / interview sample
E3	instrumented	telemetry joined to workflow and version context
E4	experiment-grade	controlled, matched, cohort or time-series analysis
E5	certified	reconciled with finance, risk, audit or model validation evidence

12. Release Calendar

12.1 Release Objects

Object	Examples	Required review
Model	provider upgrade, fallback model, model class change	eval, latency, cost, risk, rollback
Prompt	system prompt, tool instruction, response format	regression eval, policy review by risk tier
Data	feature refresh, label update, training sample change	lineage, privacy, bias, validation
Knowledge	policy corpus, FAQ, product terms, scripts	freshness, authority, retrieval quality
Tool	CRM write, fee waiver, case closure, document request	authorization, side effect, audit, dual control
Policy	hardship policy, complaint taxonomy, AML escalation	legal/compliance, frontline comms, eval update
Workflow	UI step, routing, human review threshold	adoption, capacity, control impact
Dashboard	metric definition, threshold, evidence view	metric owner approval

12.2 Calendar Fields

| release_id | date | object_type | change | affected_population | risk_tier | evidence_required | owner | canary | rollback | review_date | decision_log |
|---|---|---|---|---|---|---|---|---|---|---|---|

12.3 Release Rules

Prompt and knowledge changes are production changes.
Tool permission changes require side-effect review.
Policy changes require eval and frontline communication updates.
Model changes require cost, latency and quality comparison.
Data changes require lineage and segmentation review.
Every release has a post-release impact review date.
Release calendar is reviewed in weekly ops and monthly value review.

13. Experiment Registry

13.1 Registry Fields

| experiment_id | hypothesis | population | treatment | control_or_baseline | primary_metric | guardrails | duration | owner | result | decision |
|---|---|---|---|---|---|---|---|---|---|---|

13.2 Example Experiments

Use case	Hypothesis	Guardrails
Agent assist	shorter, policy-cited suggestions reduce AHT and edits	repeat contact, QA script defect, complaint
Complaint intelligence	new taxonomy improves regulatory complaint routing	false negative regulatory sample, cycle time
KYC onboarding	document completeness assistant reduces customer chase	false pass, abandonment, EDD escalation
Collections hardship	vulnerability-aware guidance improves arrangement suitability	complaint, broken promise, vulnerable customer review
AML triage	typology-aware retrieval improves case narrative quality	audit sample, escalation miss, reviewer edit distance
Pricing governance	reason-code explanation reduces branch overrides	fairness signal, complaint, margin leakage

13.3 Experiment Review

Experiment review asks:

Did the experiment move the primary metric?
Did guardrails remain within threshold?
Did different cohorts respond differently?
Did cost or capacity change?
Which assumption changed?
What release, backlog or policy decision follows?

14. Decision Log and Assumption Ledger

14.1 Decision Log

| decision_id | date | forum | decision | options_considered | evidence | owner | review_trigger |
|---|---|---|---|---|---|---|---|

Decision log should capture:

scale / restrict / redesign / retire decisions。
release approval or rollback。
metric definition changes。
risk acceptance or risk appetite change。
platform investment decision。
incident corrective action acceptance。

14.2 Assumption Ledger

| assumption_id | statement | type | owner | evidence | confidence | expiry | status | decision_impact |
|---|---|---|---|---|---|---|---|---|

Assumption types:

Type	Example
Value	reducing average handle time creates net benefit after repeat contact
Adoption	agents will use suggestions if citations are visible
Risk	human review catches high-impact hallucinations
Cost	token cost per case remains below threshold at scale
Capacity	QA can sample 2% of assisted cases without queue impact
Policy	current hardship guidance remains valid for next quarter
Technical	retrieval latency supports live-call workflow

Monthly value review updates assumption status:

confirmed。
weakened。
invalidated。
needs more evidence。
expired due to policy, product, model or workflow change。

15. Incident-to-Roadmap Loop

15.1 Severity Model

Severity	Example	Required action
Sev 1	customer harm, regulatory breach, unauthorized tool action	immediate containment, executive/risk escalation, rollback/restrict
Sev 2	material workflow defect, repeated complaint, major SLO breach	weekly incident review, corrective action, metric update
Sev 3	localized quality regression, manageable cost spike	backlog item, targeted release, monitor
Sev 4	nuisance defect, unclear signal	sample, classify, watchlist

15.2 Incident Record

# AI Product Incident Record

## Signal
- Source:
- Date:
- Severity:
- Affected population:
- Affected versions:

## Impact
- Customer:
- Operations:
- Risk / compliance:
- Cost / capacity:

## Containment
- Immediate action:
- Rollback / restriction:
- Communication:

## Root Cause
- Model:
- Prompt:
- Data:
- Knowledge:
- Tool:
- Workflow:
- Policy:
- Training / adoption:
- Metric / monitoring:

## Corrective Actions
| action | owner | due_date | closure_evidence | reopen_trigger |
|---|---|---|---|---|

## Roadmap Impact
- Backlog item:
- Release calendar update:
- Metric contract update:
- Eval update:
- Policy / SOP update:
- Next review:

15.3 Complaint Learning

Complaint and incident reviews should classify:

Signal	Product question
misleading answer	prompt, knowledge or policy problem
inconsistent treatment	segmentation, policy or training problem
slow resolution	workflow, routing or capacity problem
unfair outcome	risk appetite, guardrail or model problem
customer confusion	explanation, UX or frontline script problem
repeated escalation	AI boundary or human oversight problem

16. Backlog Governance

16.1 Backlog Classes

Class	Example source	Required link
Outcome improvement	monthly value gap	metric contract
Adoption fix	cohort drop-off	adoption funnel
Quality fix	eval failure class	QA / eval evidence
Risk control	complaint or policy breach	incident / risk signal
Cost / capacity	unit cost or queue threshold	cost dashboard
Reliability	SLO breach	runtime signal
Evidence gap	missing join or weak sample	evidence pack
Release dependency	model/prompt/tool/policy change	release calendar
Experiment	assumption needs test	experiment registry
Retirement	weak value or high risk	portfolio decision

16.2 Prioritization Formula

Use a simple decision score:

priority_score =
  outcome_impact
  + risk_reduction
  + adoption_unblock
  + cost_capacity_relief
  + evidence_need
  + platform_reuse
  - delivery_effort
  - change_saturation

Risk and customer harm can override numeric score.

16.3 Backlog Review Rules

No backlog item enters top priority without evidence link.
Every roadmap item states expected metric movement.
Incident corrective actions outrank cosmetic improvements when severity is high.
Evidence gaps outrank scale decisions.
Platform reuse can outrank local feature work when multiple products repeat the same problem.
Retirement candidates are reviewed quarterly, not hidden at the bottom of the backlog.

17. Role / RACI

Activity	AI PM	AI Architect	CBAP BA	Ops	Risk / Compliance	Analytics	Platform	Finance
Product Ops charter	A/R	C	C	C	C	C	C	I
Metric contract	A/R	C	R	C	C	R	C	C
Weekly ops review	A/R	R	R	R	C	R	R	I
Monthly value review	A/R	C	R	R	C	R	C	R
Quarterly portfolio review	R	C	C	C	C	R	C	A/R
Release calendar	C	A/R	C	C	C	C	R	I
Experiment registry	A/R	C	C	C	C	R	C	I
Decision log	A/R	R	R	C	C	C	C	I
Assumption ledger	A/R	C	R	C	C	R	C	C
Incident learning	R	R	R	A/R	A/R by severity	C	R	I
Action closure	A/R	R	R	R	R	R	R	I
Dashboard design	A/R	R	R	C	C	R	C	C

Legend: A = accountable, R = responsible, C = consulted, I = informed.

18. Dashboard Design

18.1 Dashboard Stack

Dashboard	Audience	Cadence	Must show
Runtime Signal Board	PM, Architect, Ops, Platform	daily	incidents, SLO, latency, fallback, cost anomaly
Weekly Ops Board	PM, BA, Ops, Risk	weekly	adoption, quality, risk, cost, actions
Monthly Value Board	Sponsor, PM, Finance, Risk	monthly	outcome, net value, value leakage, decision request
Release Impact Board	PM, Architect, Risk	weekly/monthly	version changes, affected population, regression
Portfolio Board	Executives, Value Office	quarterly	value, risk, cost, reuse, funding, retirement
Evidence Binder View	PM, BA, Risk, Audit	as needed	trace from metric to source, decision, action closure

18.2 Weekly Ops Board Wireframe

Header
  decision requests | releases pending | incidents open | actions overdue

Row 1: Adoption
  qualified use by cohort | accept/edit/reject | rejection reasons

Row 2: Quality
  eval trend | QA defects | top failure classes | affected versions

Row 3: Risk and Complaints
  complaints | overrides | policy breach | customer harm watchlist

Row 4: Reliability and Cost
  SLO | latency | fallback | cost per case | review queue

Row 5: Action Closure
  open by age | blocked | due this week | reopened

18.3 Monthly Value Board Wireframe

Decision requested
  scale / restrict / redesign / retire / continue

Outcome movement
  baseline vs current | cohort | confidence | source quality

Net value
  benefit | cost | human review | support | rework | risk adjustment

Risk and adoption
  guardrails | complaints | durable qualified use | policy drift

Roadmap implication
  next release | backlog changes | assumptions changed | decision log

18.4 Dashboard Rules

Every chart maps to a metric contract.
Every threshold maps to an action rule.
Every trend can be segmented by cohort and version.
Release markers appear on outcome and defect trends.
Dashboard includes open action age and closure quality.
Executive dashboard shows decisions, not raw operational noise.

19. Financial Retail Execution Patterns

19.1 Contact-Center Agent Assist

Cadence	Focus
Weekly ops	accept/edit/reject by intent, repeat contact, script defects, latency
Monthly value	AHT net of repeat contact, QA, complaint, cost per assisted call
Release calendar	prompt wording, knowledge article, CRM action, call intent rollout
Incident loop	misleading fee explanation, complaint spike, policy article drift
Backlog	high-edit intents, citation UI, supervisor coaching, knowledge freshness SLO

19.2 Complaint Intelligence

Cadence	Focus
Weekly ops	false negative sample, routing delay, taxonomy confusion
Monthly value	cycle time, remediation closure, regulatory complaint capture
Release calendar	taxonomy update, root cause labels, policy corpus
Incident loop	missed regulatory complaint updates eval, routing, QA sample
Backlog	explanation of classification, root cause action tracker, dashboard for legal

19.3 KYC Onboarding

Cadence	Focus
Weekly ops	document completeness, rework, false pass sample, customer chase
Monthly value	onboarding cycle time, abandonment, EDD escalation, cost per application
Release calendar	document parser, policy rule, threshold, country/entity rollout
Incident loop	severe false pass triggers expansion hold and eval update
Backlog	missing document reason codes, customer notification wording, reviewer queue

19.4 Collections Hardship

Cadence	Focus
Weekly ops	arrangement suitability, vulnerability flag, agent override, complaint
Monthly value	kept promise rate, broken arrangement, customer harm guardrail
Release calendar	hardship policy, conversation script, escalation workflow
Incident loop	inappropriate pressure complaint updates prompt and training
Backlog	vulnerability-aware guidance, escalation UX, manager coaching report

19.5 AML Triage

Cadence	Focus
Weekly ops	alert aging, reviewer edit distance, narrative defects, typology feedback
Monthly value	analyst throughput, audit sample, escalation quality, SAR prep support
Release calendar	typology knowledge, retrieval corpus, explanation template
Incident loop	missed typology updates eval, corpus, sampling and risk review
Backlog	evidence citation, scenario-specific retrieval, investigator feedback capture

19.6 Personalized Pricing Governance

Cadence	Focus
Weekly ops	reason-code quality, overrides, complaint, segment impact
Monthly value	margin/conversion net of fairness and conduct guardrails
Release calendar	pricing model, eligibility policy, explanation wording
Incident loop	unfair treatment signal triggers restrict and risk review
Backlog	branch override reasons, fairness dashboard, policy drift alerts

20. 90-Day Rollout Plan

Days 1-15: Foundation

Day range	Work	Output
1-3	Confirm use-case scope, owners, risk tier, review forums	Product Ops charter
4-6	Draft metric contracts for top metrics	Metric contract pack
7-9	Build review pack and action closure register	Evidence pack v1
10-12	Create release calendar and experiment registry	Release / experiment system
13-15	Run first weekly ops review	Decision log and actions

Days 16-35: Evidence Quality

Day range	Work	Output
16-20	Join telemetry to workflow outcome and version context	Evidence data map
21-25	Add complaint, QA, incident and cost feeds	Risk / cost signal board
26-30	Run first monthly value review	Value decision memo
31-35	Update assumption ledger and backlog governance	Evidence-driven backlog

Days 36-60: Release and Incident Learning

Day range	Work	Output
36-42	Operationalize release calendar	Version trace and review dates
43-48	Run experiment review cycle	Experiment readout and decisions
49-54	Run incident tabletop	Incident-to-roadmap test
55-60	Improve dashboards and action closure	Dashboard v2 and closure quality

Days 61-90: Portfolio Governance

Day range	Work	Output
61-70	Build quarterly portfolio scorecard	Portfolio review pack
71-78	Identify scale, restrict, redesign, retire candidates	Decision recommendations
79-85	Link platform investment to recurring pain points	Platform roadmap input
86-90	Run quarterly portfolio review simulation	Funding and roadmap decisions

21. Templates

21.1 Product Ops Charter

# AI Product Operations Charter

## Product
- Capability:
- Workflow:
- Target users:
- Customer / operations impact:
- Risk tier:

## Operating Cadence
- Daily triage:
- Weekly ops review:
- Monthly value review:
- Quarterly portfolio review:
- Release review:
- Incident review:

## Decision Rights
- Decisions owned by product:
- Decisions requiring risk review:
- Decisions requiring platform review:
- Decisions requiring executive review:

## Evidence Assets
- Metric contracts:
- Review pack:
- Release calendar:
- Experiment registry:
- Decision log:
- Assumption ledger:
- Action closure register:

## Success Criteria
- Outcome:
- Adoption:
- Quality:
- Risk:
- Cost:
- Reliability:
- Learning:

21.2 Action Closure Register

| action_id | source | action | owner | due_date | closure_evidence | reviewer | status | reopen_trigger |
|---|---|---|---|---|---|---|---|---|

21.3 Release Impact Review

| release_id | release_type | expected_effect | observed_effect | regressions | guardrail_status | decision |
|---|---|---|---|---|---|---|

21.4 Assumption Review

| assumption_id | previous_status | new_evidence | new_status | roadmap_impact | next_review |
|---|---|---|---|---|---|

22. Anti-Patterns

Anti-pattern	Symptom	Correction
Usage replaces outcome	reports prompt count and MAU only	use outcome chain and net value
Cadence without decision	recurring meetings with no decision requested	start every pack with decision
Metric without owner	dashboard disputes every month	metric contract with owner and action rule
Evidence without trace	cannot link metric to version, workflow, cohort	version-aware telemetry and evidence map
Release invisibility	prompt/knowledge/tool changes not reviewed	unified AI release calendar
Incident amnesia	same defect returns under new release	incident-to-roadmap loop and recurrence review
Backlog politics	loud stakeholder beats evidence	evidence-linked backlog classes
Cost blindness	platform spend treated as shared overhead	cost per case and capacity review
Risk theater	policy documents exist but no runtime signals	risk metrics, complaint sampling, action closure
Portfolio optimism	weak capabilities continue because already funded	quarterly retire / pause decisions

23. Interview Answers

Q1: 你如何建立 AI 产品上线后的运营节奏?

我会建立 daily triage、weekly ops review、monthly value review 和 quarterly portfolio review。Weekly 处理 adoption、quality、SLO、risk、cost、incident 和 open action; monthly 判断 outcome、net value、value leakage 和 scale/stop; quarterly 比较 portfolio value、risk concentration、platform reuse 和 funding。每个节奏都用 metric contract、evidence pack、decision log 和 action closure 连接, 这样 review 不是汇报, 而是持续决策系统。

Q2: 为什么 AI Product Ops 需要 release calendar?

因为 AI 产品的生产变更不只是代码。模型、prompt、知识库、数据、tool permission、policy 和 workflow 都会改变用户结果和风险。如果这些变更不在同一个 release calendar, 事故和指标变化就无法追溯。好的 release calendar 记录 affected population、risk tier、evidence required、canary、rollback、review date 和 decision log。

Q3: 如何避免 dashboard 变成 vanity metrics?

每个 dashboard metric 都必须有 metric contract。合同定义业务问题、口径、population、source、owner、threshold、guardrail、segmentation 和 action rule。比如 agent assist 的 AHT 必须同时看 repeat contact、QA defect、complaint 和 cost per assisted call。没有行动规则的指标不进入 executive review。

Q4: AI incident 后你如何推动产品改进?

我会先按 severity contain 或 rollback, 然后做 root cause, 分类到 model、prompt、data、knowledge、tool、workflow、policy、training 或 monitoring。接着更新 corrective action、metric contract、eval set、release calendar、backlog 和 decision log。最后用 action closure evidence 和 recurrence review 证明问题真的被解决, 而不是 ticket 被关闭。

Q5: Monthly value review 和 quarterly portfolio review 的差异是什么?

Monthly value review 面向单个 capability, 判断 outcome、adoption、risk、cost 和 value leakage 是否支持 scale、restrict、redesign 或 retire。Quarterly portfolio review 比较多个 capabilities, 决定 funding、platform investment、risk concentration、capacity allocation 和 retirement。前者是产品价值判断, 后者是投资组合判断。

Q6: CBAP-level BA 在 AI Product Ops 里有什么高级价值?

BA 不只是记录需求。Post-launch 阶段, BA 可以维护 work-as-done evidence、metric contract、assumption ledger、complaint/incident taxonomy、action closure 和 policy drift impact。BA 的优势是把流程、规则、异常、控制点和用户行为连接到 outcome evidence, 让 PM 和架构师看到真实工作系统如何变化。

Q7: 如何向高管解释 action closure?

Action closure 不是“负责人说完成了”。它要求每个行动有 closure evidence, 例如 defect rate 降到阈值内、QA sample 清洁、知识库版本上线、release impact review 通过或 complaint recurrence 停止。没有 closure evidence, action 只是活动, 不是运营改善。

24. Portfolio Exercise

24.1 Assignment

选择一个金融零售 AI capability, 推荐用以下之一:

Contact-center agent assist。
Complaint intelligence。
KYC onboarding assistant。
Collections hardship guidance。
AML triage assistant。
Personalized pricing governance。

为它设计一套 90 天 AI Product Ops operating cadence。

24.2 Deliverables

Deliverable	Minimum content
Product Ops Charter	scope, owner, risk tier, cadence, decision rights
Operating Calendar	daily, weekly, monthly, quarterly, release, incident, experiment
Metric Contract Pack	at least 12 metrics across outcome, adoption, quality, risk, cost, reliability
Evidence Review Pack	decision request, outcome, adoption, quality, risk, cost, release, action closure
Release Calendar	model, prompt, data, knowledge, tool, policy, workflow changes
Experiment Registry	at least 3 experiments with guardrails
Incident-to-Roadmap Loop	severity, root cause taxonomy, corrective action, roadmap update
Backlog Governance	backlog classes, priority formula, evidence links
Dashboard Wireframes	weekly ops, monthly value, quarterly portfolio
Interview Narrative	30-second and 2-minute answer

24.3 Scoring Rubric

Dimension	Strong answer
Cadence clarity	Different forums have different decisions and evidence
Metric rigor	Metrics have contracts, thresholds, guardrails and action rules
Evidence quality	Uses telemetry, samples, experiments and finance/risk reconciliation
Post-launch realism	Includes release changes, incidents, complaints, cost and action closure
Financial retail fit	Handles customer harm, compliance, frontline adoption and capacity
Architecture depth	Connects version trace, OpenTelemetry-style observability and SLO thresholds
BA maturity	Models assumptions, policy drift, work-as-done and exception paths
Executive usefulness	Produces scale, restrict, redesign, retire and funding decisions

25. Quality Bar

A strong AI Product Ops artifact can answer:

What decision does each forum make?
Which evidence proves the decision?
Which metric contracts define the evidence?
Which releases changed the system?
Which incidents changed the roadmap?
Which actions closed with proof?
Which assumptions are still valid?
Which capability deserves more investment, restriction or retirement?

If the answer is unclear, the product is not yet operating. It is only running.