目录
AI Product Operations / Operating Cadence / Outcome Review Playbook
目标: 给 Senior AI PM / AI Architect / CBAP-level BA 一套可执行方法, 用于建立 AI 产品上线后的运营节奏、证据审查、指标治理、事故学习、成本容量管理和 roadmap 决策闭环。
适用场景: contact-center agent assist、complaint intelligence、KYC onboarding、collections hardship、AML triage、personalized pricing governance、internal copilots 和 regulated workflow automation。
核心原则: AI Product Ops 不是会议制度, 而是把 outcome、risk、adoption、quality、cost、release、incident 和 action closure 放进同一套 post-launch operating system。
1. Target Audience
Audience 使用本手册完成什么 Senior AI PM 运行 weekly ops review、monthly value review 和 roadmap decision loop AI Architect 设计 release calendar、runtime telemetry、SLO、version trace 和 evidence dashboard CBAP-level BA 建立 metric contract、assumption ledger、action closure、complaint/incident learning Product Operations Lead 维护 operating calendar、review pack、decision log 和 forum hygiene Operations Leader 管理 adoption、capacity、review queue、coaching 和 frontline feedback Risk / Compliance Partner 把 post-launch monitoring 连接到 policy drift、customer harm、control effectiveness Data / Analytics Partner 维护 outcome join、metric definitions、segmentation 和 evidence quality Finance / Value Office 评估 net realized value、unit economics、portfolio allocation 和 retirement
2. Learning Objectives
完成本 playbook 后, 你应该能:
为已上线 AI product 建立 90 天 post-launch operating cadence。
定义 weekly ops review、monthly value review、quarterly portfolio review 的 agenda、inputs、outputs。
写出 metric contract, 覆盖 outcome、adoption、quality、risk、cost、reliability 和 learning。
设计 evidence review pack, 支撑 scale、restrict、redesign、retire、release、rollback 决策。
运行 model / prompt / data / knowledge / tool / policy release calendar。
建立 experiment registry、assumption ledger、decision log 和 action closure register。
把 incident、complaint、near miss、policy drift 和 capacity issue 转化为 backlog item。
为金融零售案例设计 RACI、dashboard、anti-patterns、interview answers 和 portfolio exercise。
3. Executive Summary
上线后的 AI 产品会持续变化:
model behavior changes
prompt changes
knowledge changes
tool permissions change
policy changes
workflow behavior changes
cost profile changes
customer and regulator expectations change
因此 AI Product Ops 的核心任务是建立证据节奏:
Telemetry and samples
-> metric contract
-> evidence review pack
-> weekly / monthly / quarterly decisions
-> backlog and release calendar
-> action closure
-> roadmap and portfolio update
一个成熟的 post-launch operating system 至少包含 12 个资产:
Asset 用途 Operating calendar 固化 weekly, monthly, quarterly, release, experiment, incident review Metric contract 定义指标口径、owner、threshold、guardrail、action rule Evidence review pack 把 outcome、adoption、quality、risk、cost、incident 放到决策页 Weekly ops agenda 处理 adoption、quality、SLO、capacity、incident、open action Monthly value review 评估 net value、value leakage、risk trend、scale/stop Quarterly portfolio review 比较 use cases、funding、risk concentration、platform leverage Release calendar 管理 model、prompt、data、knowledge、tool、policy、workflow 变更 Experiment registry 管理假设、cohort、metric、guardrail、结果和决策 Decision log 记录关键决策和证据依据 Assumption ledger 追踪价值、风险、采用、成本假设是否仍成立 Incident-to-roadmap loop 把 complaint、near miss、failure 变成 roadmap learning Action closure register 防止会议行动没有证据闭环
4. Source Anchors
这些来源用于对齐治理、管理体系、需求工程、工程绩效、可观测性和可靠性语言。本文不提供法律、监管、审计或认证意见。
5. Operating Setup
5.1 Entry Criteria
Use this playbook when an AI capability has at least one of the following:
controlled pilot with real users。
production release for a target workflow。
monthly executive value review requirement。
regulated customer, operations, risk or cost exposure。
repeat changes to model, prompt, data, knowledge, tool or policy。
complaints, incidents, near misses or adoption concerns。
Input Minimum quality Product scope named workflow, users, case types, customer impact Baseline pre-AI outcome, quality, cost, risk, capacity Initial release package model/prompt/data/tool versions, eval results, rollback path Risk tier customer harm, compliance, autonomy, reversibility Telemetry user action, workflow context, version, outcome join Owner map PM, architect, ops, risk, analytics, support, finance Existing incidents complaint, defect, override, QA, support, reliability history
5.3 First 10 Days
Day Action Output 1 Confirm scope and use-case owner product ops charter 2 List post-launch decisions needed decision inventory 3 Draft metric contract for top 10 metrics metric contract v1 4 Build evidence review pack outline review pack v1 5 Create operating calendar cadence calendar 6 Create release calendar release calendar v1 7 Create experiment registry experiment registry v1 8 Create incident-to-roadmap taxonomy incident learning map 9 Create dashboard wireframe weekly / monthly / portfolio dashboard 10 Run first weekly ops review simulation action closure register
6. Cadence Architecture
6.1 Cadence Map
Cadence Owner Attendees Primary decision Daily signal triage Ops / Platform PM, Architect, Ops, Support, Risk by severity mitigate, rollback, escalate Weekly ops review AI PM PM, BA, Architect, Ops, Analytics, Risk, Platform fix, assign, close, adjust release Monthly value review AI PM / Sponsor Sponsor, PM, Finance, Risk, Ops, Analytics scale, restrict, redesign, retire Quarterly portfolio review Value Office Executives, Portfolio, Platform, Risk, Finance fund, pause, consolidate, retire Release review Architect / Release Lead PM, Risk, Platform, Data, Ops approve, canary, hold, rollback Incident review Ops / Risk PM, Architect, Risk, Ops, Support, Data root cause, corrective action, recurrence prevention Experiment review PM / Analytics PM, BA, Data Science, Ops, Risk continue, scale, stop, redesign
6.2 Cadence Design Rules
Every recurring forum has a decision type.
Every review pack starts with decision requested.
Every metric in executive review has a contract.
Every action has owner, due date, closure evidence and reopen trigger.
Every model / prompt / data / tool / policy change appears on the release calendar.
Every incident creates or updates at least one backlog, metric, eval, process or policy item.
Every monthly value review updates the assumption ledger.
Every quarterly review can decide to retire, not only expand.
7. Weekly Ops Review
7.1 Purpose
Weekly ops review answers:
What changed in production this week?
What signals require action?
Which actions closed with evidence?
Which release or backlog decision is needed before next review?
7.2 Agenda
Time Block Inputs Outputs 5 min Decision requests proposed releases, incidents, stuck actions agenda priority 10 min Adoption and workflow qualified use, cohort funnel, rejection reasons adoption action 10 min Quality and eval QA sample, eval delta, defect classes fix / sample / hold 10 min Reliability and SLO latency, availability, fallback, restore platform action 10 min Risk / complaints complaints, overrides, policy breach, near miss risk action / escalation 10 min Cost / capacity cost per case, review queue, support load capacity / cost action 10 min Release and experiments version changes, experiment readout release / experiment decision 10 min Action closure last actions, closure evidence, blockers close / reopen / escalate
7.3 Weekly Review Questions
Domain Questions Adoption Are target users using AI in the intended workflow step? Which cohorts are dropping off? Quality Which failure classes changed? Are defects tied to a release version? Reliability Are SLOs met for the workflow moment where AI is needed? Risk Did any complaint, override or QA sample indicate customer harm or policy drift? Cost Is unit cost aligned to value? Is human review load growing faster than usage? Release Did any model/prompt/data/tool/policy change create outcome movement or regression? Action Which actions truly closed, and what evidence proves closure?
7.4 Weekly Outputs
Use this format for each action:
| action_id | source_signal | action | owner | due_date | closure_evidence | reopen_trigger | forum |
|---|---|---|---|---|---|---|---|
| ACT-001 | contact-center repeat contact up 8% in billing calls | sample 50 edited suggestions and update billing knowledge article | Ops QA Lead | 2026-07-07 | QA sample shows defect class reduced below warning threshold | repeat contact remains above threshold for two weekly reviews | Weekly Ops |
No action should be accepted without closure evidence.
8. Monthly Value Review
8.1 Purpose
Monthly value review answers:
Is this AI capability improving business outcomes after adoption, risk, cost, capacity and incident learning are included?
8.2 Required Pack
Section Required evidence Decision requested continue, scale, restrict, redesign, retire Business outcome baseline, current, trend, cohort, confidence Adoption durability returning qualified use, manager reinforcement, work-as-done evidence Quality / control eval trend, QA defects, control overrides, policy breaches Risk / customer impact complaints, harm signals, vulnerable customer impact, fairness/conduct indicators Cost / capacity unit cost, review load, support load, platform spend Value leakage rework, human review, delay, support, remediation, redress Release / experiment impact changes made and observed effects Assumption ledger update confirmed, weakened, invalidated assumptions Recommendation specific decision, next evidence needed, review trigger
8.3 Decision Matrix
Evidence pattern Decision Outcome improves, adoption durable, risk stable, cost acceptable scale Outcome improves only in low-risk cohort scale selectively Usage high, outcome flat, cost rising redesign or restrict Outcome improves, complaints rise restrict and investigate Adoption low, users cite workflow friction product / workflow redesign Adoption low, value thesis weak retire or park Quality defects tied to recent release rollback or hotfix Value depends on manual review load that cannot scale redesign operating model before expansion
8.4 Monthly Decision Memo
# Monthly AI Value Review Decision Memo
## Capability
- Product:
- Workflow:
- Target population:
- Current version set:
## Decision Requested
- Recommendation:
- Decision deadline:
- Decision owner:
## Outcome Evidence
- Baseline:
- Current:
- Movement:
- Confidence:
## Adoption Evidence
- Qualified use:
- Cohort durability:
- Work-as-done observation:
## Risk / Control Evidence
- Complaints:
- Overrides:
- Policy drift:
- Customer harm signals:
## Cost / Capacity Evidence
- Cost per case:
- Human review load:
- Support load:
- Platform capacity:
## Release / Experiment Evidence
- Recent releases:
- Experiment result:
- Regressions:
## Assumption Ledger Update
- Confirmed:
- Weakened:
- Invalidated:
## Decision and Actions
- Decision:
- Actions:
- Closure evidence:
- Next review trigger:
9. Quarterly Portfolio Review
9.1 Purpose
Quarterly review compares AI capabilities across value, risk, cost, platform leverage and organizational capacity.
9.2 Scorecard
Dimension Rating guide Net value finance-supported benefit after cost and leakage Adoption durability sustained qualified use across target cohorts Risk stability risk signals within appetite, controls effective Evidence maturity telemetry, sampling, experiment and finance evidence quality Platform reuse reusable model gateway, eval, RAG, tool, policy patterns Capacity burden SME review, operations queue, platform support, risk review load Strategic fit alignment to business capability and product strategy Retirement logic value weak, risk high, platform cost high, better alternative available
9.3 Portfolio Decisions
Decision When to use Fund strong evidence, scalable economics, strategic fit Scale proven cohort, acceptable risk, operational capacity available Restrict value only in specific population or risk high in certain segments Consolidate multiple teams built similar capability Platformize repeated pattern should become reusable golden path Pause evidence insufficient or operating capacity constrained Retire assumptions failed or risk/cost exceeds value
9.4 Portfolio Review Output
| capability | decision | evidence_basis | funding_change | platform_dependency | risk_condition | next_review |
|---|---|---|---|---|---|---|
| KYC document assistant | scale selectively | cycle time down in low-risk retail, EDD defects unchanged | fund 2 engineers for workflow integration | document extraction quality dashboard | no expansion to complex entity until false-pass sample remains clean | 2026-Q3 monthly value review |
10. Metric Contract Template
10.1 Contract Fields
# Metric Contract: [metric_id]
## Business Question
- Decision this metric supports:
## Definition
- Numerator:
- Denominator:
- Inclusion rules:
- Exclusion rules:
- Time window:
## Population
- Workflow:
- User roles:
- Case types:
- Channel:
- Risk tier:
## Source and Trace
- Source systems:
- Event names:
- Version fields:
- Join keys:
- Data latency:
## Owner
- Business owner:
- Analytics owner:
- Risk reviewer:
## Thresholds and Action Rules
- Target:
- Warning:
- Breach:
- Stop rule:
- Action when warning:
- Action when breach:
## Guardrails
- Customer harm guardrail:
- Cost guardrail:
- Quality guardrail:
- Risk guardrail:
## Segmentation
- Required cohort cuts:
- Vulnerable customer / protected class proxy review:
- Manager/team cut:
## Evidence Quality
- Level:
- Sampling or reconciliation approach:
- Last reviewed:
- Next review:
Field Example metric_id cc_agent_assist.repeat_contact_7dbusiness question Did agent assist reduce AHT without pushing customers to call again? numerator customers with another contact within 7 days for same intent denominator assisted calls for eligible intents required cuts intent, team, agent tenure, suggestion acceptance, prompt version guardrail if repeat contact rises while AHT falls, monthly value cannot claim net improvement action rule warning triggers 50-call QA sample; breach triggers release hold for affected intent
10.3 Example: KYC False Pass
Field Example metric_id kyc_ai.document_false_pass_sample_ratebusiness question Is onboarding speed improvement weakening document completeness controls? numerator sampled cases where AI marked document complete but QA found material defect denominator QA sampled AI-complete cases required cuts document type, entity type, geography, channel, model/prompt version stop rule any severe false pass in high-risk entity type triggers expansion hold action rule update eval set, increase human review threshold, review policy pack
11. Evidence Review Pack Template
11.1 Standard Pack
# AI Product Operations Evidence Review Pack
## Decision Requested
- Forum:
- Requested decision:
- Deadline:
- Owner:
## Product and Version Scope
- Product:
- Workflow:
- Population:
- Model version:
- Prompt version:
- Data / knowledge version:
- Tool version:
- Policy version:
## Outcome Evidence
| metric | baseline | current | trend | evidence quality | interpretation |
|---|---|---|---|---|---|
## Adoption Evidence
| cohort | exposure | qualified use | accept/edit/reject | durability | issue |
|---|---|---|---|---|---|
## Quality Evidence
| failure class | count/rate | affected version | severity | action |
|---|---|---|---|---|
## Risk / Control Evidence
| signal | population | severity | control impact | action |
|---|---|---|---|---|
## Cost / Capacity Evidence
| metric | current | threshold | driver | action |
|---|---|---|---|---|
## Release and Experiment Evidence
| item | release/experiment | population | observed impact | decision |
|---|---|---|---|---|
## Incident and Complaint Learning
| signal | root cause | corrective action | roadmap impact | closure evidence |
|---|---|---|---|---|
## Assumption Ledger Update
| assumption | status | evidence | decision impact |
|---|---|---|---|
## Action Closure
| action | owner | due date | closure evidence | status |
|---|---|---|---|---|
11.2 Evidence Quality Levels
Level Name Meaning E1 anecdotal useful signal but not decision-grade E2 sampled structured QA / complaint / interview sample E3 instrumented telemetry joined to workflow and version context E4 experiment-grade controlled, matched, cohort or time-series analysis E5 certified reconciled with finance, risk, audit or model validation evidence
12. Release Calendar
12.1 Release Objects
Object Examples Required review Model provider upgrade, fallback model, model class change eval, latency, cost, risk, rollback Prompt system prompt, tool instruction, response format regression eval, policy review by risk tier Data feature refresh, label update, training sample change lineage, privacy, bias, validation Knowledge policy corpus, FAQ, product terms, scripts freshness, authority, retrieval quality Tool CRM write, fee waiver, case closure, document request authorization, side effect, audit, dual control Policy hardship policy, complaint taxonomy, AML escalation legal/compliance, frontline comms, eval update Workflow UI step, routing, human review threshold adoption, capacity, control impact Dashboard metric definition, threshold, evidence view metric owner approval
12.2 Calendar Fields
| release_id | date | object_type | change | affected_population | risk_tier | evidence_required | owner | canary | rollback | review_date | decision_log |
|---|---|---|---|---|---|---|---|---|---|---|---|
12.3 Release Rules
Prompt and knowledge changes are production changes.
Tool permission changes require side-effect review.
Policy changes require eval and frontline communication updates.
Model changes require cost, latency and quality comparison.
Data changes require lineage and segmentation review.
Every release has a post-release impact review date.
Release calendar is reviewed in weekly ops and monthly value review.
13. Experiment Registry
13.1 Registry Fields
| experiment_id | hypothesis | population | treatment | control_or_baseline | primary_metric | guardrails | duration | owner | result | decision |
|---|---|---|---|---|---|---|---|---|---|---|
13.2 Example Experiments
Use case Hypothesis Guardrails Agent assist shorter, policy-cited suggestions reduce AHT and edits repeat contact, QA script defect, complaint Complaint intelligence new taxonomy improves regulatory complaint routing false negative regulatory sample, cycle time KYC onboarding document completeness assistant reduces customer chase false pass, abandonment, EDD escalation Collections hardship vulnerability-aware guidance improves arrangement suitability complaint, broken promise, vulnerable customer review AML triage typology-aware retrieval improves case narrative quality audit sample, escalation miss, reviewer edit distance Pricing governance reason-code explanation reduces branch overrides fairness signal, complaint, margin leakage
13.3 Experiment Review
Experiment review asks:
Did the experiment move the primary metric?
Did guardrails remain within threshold?
Did different cohorts respond differently?
Did cost or capacity change?
Which assumption changed?
What release, backlog or policy decision follows?
14. Decision Log and Assumption Ledger
14.1 Decision Log
| decision_id | date | forum | decision | options_considered | evidence | owner | review_trigger |
|---|---|---|---|---|---|---|---|
Decision log should capture:
scale / restrict / redesign / retire decisions。
release approval or rollback。
metric definition changes。
risk acceptance or risk appetite change。
platform investment decision。
incident corrective action acceptance。
14.2 Assumption Ledger
| assumption_id | statement | type | owner | evidence | confidence | expiry | status | decision_impact |
|---|---|---|---|---|---|---|---|---|
Assumption types:
Type Example Value reducing average handle time creates net benefit after repeat contact Adoption agents will use suggestions if citations are visible Risk human review catches high-impact hallucinations Cost token cost per case remains below threshold at scale Capacity QA can sample 2% of assisted cases without queue impact Policy current hardship guidance remains valid for next quarter Technical retrieval latency supports live-call workflow
Monthly value review updates assumption status:
confirmed。
weakened。
invalidated。
needs more evidence。
expired due to policy, product, model or workflow change。
15. Incident-to-Roadmap Loop
15.1 Severity Model
Severity Example Required action Sev 1 customer harm, regulatory breach, unauthorized tool action immediate containment, executive/risk escalation, rollback/restrict Sev 2 material workflow defect, repeated complaint, major SLO breach weekly incident review, corrective action, metric update Sev 3 localized quality regression, manageable cost spike backlog item, targeted release, monitor Sev 4 nuisance defect, unclear signal sample, classify, watchlist
15.2 Incident Record
# AI Product Incident Record
## Signal
- Source:
- Date:
- Severity:
- Affected population:
- Affected versions:
## Impact
- Customer:
- Operations:
- Risk / compliance:
- Cost / capacity:
## Containment
- Immediate action:
- Rollback / restriction:
- Communication:
## Root Cause
- Model:
- Prompt:
- Data:
- Knowledge:
- Tool:
- Workflow:
- Policy:
- Training / adoption:
- Metric / monitoring:
## Corrective Actions
| action | owner | due_date | closure_evidence | reopen_trigger |
|---|---|---|---|---|
## Roadmap Impact
- Backlog item:
- Release calendar update:
- Metric contract update:
- Eval update:
- Policy / SOP update:
- Next review:
15.3 Complaint Learning
Complaint and incident reviews should classify:
Signal Product question misleading answer prompt, knowledge or policy problem inconsistent treatment segmentation, policy or training problem slow resolution workflow, routing or capacity problem unfair outcome risk appetite, guardrail or model problem customer confusion explanation, UX or frontline script problem repeated escalation AI boundary or human oversight problem
16. Backlog Governance
16.1 Backlog Classes
Class Example source Required link Outcome improvement monthly value gap metric contract Adoption fix cohort drop-off adoption funnel Quality fix eval failure class QA / eval evidence Risk control complaint or policy breach incident / risk signal Cost / capacity unit cost or queue threshold cost dashboard Reliability SLO breach runtime signal Evidence gap missing join or weak sample evidence pack Release dependency model/prompt/tool/policy change release calendar Experiment assumption needs test experiment registry Retirement weak value or high risk portfolio decision
Use a simple decision score:
priority_score =
outcome_impact
+ risk_reduction
+ adoption_unblock
+ cost_capacity_relief
+ evidence_need
+ platform_reuse
- delivery_effort
- change_saturation
Risk and customer harm can override numeric score.
16.3 Backlog Review Rules
No backlog item enters top priority without evidence link.
Every roadmap item states expected metric movement.
Incident corrective actions outrank cosmetic improvements when severity is high.
Evidence gaps outrank scale decisions.
Platform reuse can outrank local feature work when multiple products repeat the same problem.
Retirement candidates are reviewed quarterly, not hidden at the bottom of the backlog.
17. Role / RACI
Activity AI PM AI Architect CBAP BA Ops Risk / Compliance Analytics Platform Finance Product Ops charter A/R C C C C C C I Metric contract A/R C R C C R C C Weekly ops review A/R R R R C R R I Monthly value review A/R C R R C R C R Quarterly portfolio review R C C C C R C A/R Release calendar C A/R C C C C R I Experiment registry A/R C C C C R C I Decision log A/R R R C C C C I Assumption ledger A/R C R C C R C C Incident learning R R R A/R A/R by severity C R I Action closure A/R R R R R R R I Dashboard design A/R R R C C R C C
Legend: A = accountable, R = responsible, C = consulted, I = informed.
18. Dashboard Design
18.1 Dashboard Stack
Dashboard Audience Cadence Must show Runtime Signal Board PM, Architect, Ops, Platform daily incidents, SLO, latency, fallback, cost anomaly Weekly Ops Board PM, BA, Ops, Risk weekly adoption, quality, risk, cost, actions Monthly Value Board Sponsor, PM, Finance, Risk monthly outcome, net value, value leakage, decision request Release Impact Board PM, Architect, Risk weekly/monthly version changes, affected population, regression Portfolio Board Executives, Value Office quarterly value, risk, cost, reuse, funding, retirement Evidence Binder View PM, BA, Risk, Audit as needed trace from metric to source, decision, action closure
18.2 Weekly Ops Board Wireframe
Header
decision requests | releases pending | incidents open | actions overdue
Row 1: Adoption
qualified use by cohort | accept/edit/reject | rejection reasons
Row 2: Quality
eval trend | QA defects | top failure classes | affected versions
Row 3: Risk and Complaints
complaints | overrides | policy breach | customer harm watchlist
Row 4: Reliability and Cost
SLO | latency | fallback | cost per case | review queue
Row 5: Action Closure
open by age | blocked | due this week | reopened
18.3 Monthly Value Board Wireframe
Decision requested
scale / restrict / redesign / retire / continue
Outcome movement
baseline vs current | cohort | confidence | source quality
Net value
benefit | cost | human review | support | rework | risk adjustment
Risk and adoption
guardrails | complaints | durable qualified use | policy drift
Roadmap implication
next release | backlog changes | assumptions changed | decision log
18.4 Dashboard Rules
Every chart maps to a metric contract.
Every threshold maps to an action rule.
Every trend can be segmented by cohort and version.
Release markers appear on outcome and defect trends.
Dashboard includes open action age and closure quality.
Executive dashboard shows decisions, not raw operational noise.
19. Financial Retail Execution Patterns
Cadence Focus Weekly ops accept/edit/reject by intent, repeat contact, script defects, latency Monthly value AHT net of repeat contact, QA, complaint, cost per assisted call Release calendar prompt wording, knowledge article, CRM action, call intent rollout Incident loop misleading fee explanation, complaint spike, policy article drift Backlog high-edit intents, citation UI, supervisor coaching, knowledge freshness SLO
19.2 Complaint Intelligence
Cadence Focus Weekly ops false negative sample, routing delay, taxonomy confusion Monthly value cycle time, remediation closure, regulatory complaint capture Release calendar taxonomy update, root cause labels, policy corpus Incident loop missed regulatory complaint updates eval, routing, QA sample Backlog explanation of classification, root cause action tracker, dashboard for legal
19.3 KYC Onboarding
Cadence Focus Weekly ops document completeness, rework, false pass sample, customer chase Monthly value onboarding cycle time, abandonment, EDD escalation, cost per application Release calendar document parser, policy rule, threshold, country/entity rollout Incident loop severe false pass triggers expansion hold and eval update Backlog missing document reason codes, customer notification wording, reviewer queue
19.4 Collections Hardship
Cadence Focus Weekly ops arrangement suitability, vulnerability flag, agent override, complaint Monthly value kept promise rate, broken arrangement, customer harm guardrail Release calendar hardship policy, conversation script, escalation workflow Incident loop inappropriate pressure complaint updates prompt and training Backlog vulnerability-aware guidance, escalation UX, manager coaching report
19.5 AML Triage
Cadence Focus Weekly ops alert aging, reviewer edit distance, narrative defects, typology feedback Monthly value analyst throughput, audit sample, escalation quality, SAR prep support Release calendar typology knowledge, retrieval corpus, explanation template Incident loop missed typology updates eval, corpus, sampling and risk review Backlog evidence citation, scenario-specific retrieval, investigator feedback capture
19.6 Personalized Pricing Governance
Cadence Focus Weekly ops reason-code quality, overrides, complaint, segment impact Monthly value margin/conversion net of fairness and conduct guardrails Release calendar pricing model, eligibility policy, explanation wording Incident loop unfair treatment signal triggers restrict and risk review Backlog branch override reasons, fairness dashboard, policy drift alerts
20. 90-Day Rollout Plan
Days 1-15: Foundation
Day range Work Output 1-3 Confirm use-case scope, owners, risk tier, review forums Product Ops charter 4-6 Draft metric contracts for top metrics Metric contract pack 7-9 Build review pack and action closure register Evidence pack v1 10-12 Create release calendar and experiment registry Release / experiment system 13-15 Run first weekly ops review Decision log and actions
Days 16-35: Evidence Quality
Day range Work Output 16-20 Join telemetry to workflow outcome and version context Evidence data map 21-25 Add complaint, QA, incident and cost feeds Risk / cost signal board 26-30 Run first monthly value review Value decision memo 31-35 Update assumption ledger and backlog governance Evidence-driven backlog
Days 36-60: Release and Incident Learning
Day range Work Output 36-42 Operationalize release calendar Version trace and review dates 43-48 Run experiment review cycle Experiment readout and decisions 49-54 Run incident tabletop Incident-to-roadmap test 55-60 Improve dashboards and action closure Dashboard v2 and closure quality
Days 61-90: Portfolio Governance
Day range Work Output 61-70 Build quarterly portfolio scorecard Portfolio review pack 71-78 Identify scale, restrict, redesign, retire candidates Decision recommendations 79-85 Link platform investment to recurring pain points Platform roadmap input 86-90 Run quarterly portfolio review simulation Funding and roadmap decisions
21. Templates
21.1 Product Ops Charter
# AI Product Operations Charter
## Product
- Capability:
- Workflow:
- Target users:
- Customer / operations impact:
- Risk tier:
## Operating Cadence
- Daily triage:
- Weekly ops review:
- Monthly value review:
- Quarterly portfolio review:
- Release review:
- Incident review:
## Decision Rights
- Decisions owned by product:
- Decisions requiring risk review:
- Decisions requiring platform review:
- Decisions requiring executive review:
## Evidence Assets
- Metric contracts:
- Review pack:
- Release calendar:
- Experiment registry:
- Decision log:
- Assumption ledger:
- Action closure register:
## Success Criteria
- Outcome:
- Adoption:
- Quality:
- Risk:
- Cost:
- Reliability:
- Learning:
21.2 Action Closure Register
| action_id | source | action | owner | due_date | closure_evidence | reviewer | status | reopen_trigger |
|---|---|---|---|---|---|---|---|---|
21.3 Release Impact Review
| release_id | release_type | expected_effect | observed_effect | regressions | guardrail_status | decision |
|---|---|---|---|---|---|---|
21.4 Assumption Review
| assumption_id | previous_status | new_evidence | new_status | roadmap_impact | next_review |
|---|---|---|---|---|---|
22. Anti-Patterns
Anti-pattern Symptom Correction Usage replaces outcome reports prompt count and MAU only use outcome chain and net value Cadence without decision recurring meetings with no decision requested start every pack with decision Metric without owner dashboard disputes every month metric contract with owner and action rule Evidence without trace cannot link metric to version, workflow, cohort version-aware telemetry and evidence map Release invisibility prompt/knowledge/tool changes not reviewed unified AI release calendar Incident amnesia same defect returns under new release incident-to-roadmap loop and recurrence review Backlog politics loud stakeholder beats evidence evidence-linked backlog classes Cost blindness platform spend treated as shared overhead cost per case and capacity review Risk theater policy documents exist but no runtime signals risk metrics, complaint sampling, action closure Portfolio optimism weak capabilities continue because already funded quarterly retire / pause decisions
23. Interview Answers
Q1: 你如何建立 AI 产品上线后的运营节奏?
我会建立 daily triage、weekly ops review、monthly value review 和 quarterly portfolio review。Weekly 处理 adoption、quality、SLO、risk、cost、incident 和 open action; monthly 判断 outcome、net value、value leakage 和 scale/stop; quarterly 比较 portfolio value、risk concentration、platform reuse 和 funding。每个节奏都用 metric contract、evidence pack、decision log 和 action closure 连接, 这样 review 不是汇报, 而是持续决策系统。
Q2: 为什么 AI Product Ops 需要 release calendar?
因为 AI 产品的生产变更不只是代码。模型、prompt、知识库、数据、tool permission、policy 和 workflow 都会改变用户结果和风险。如果这些变更不在同一个 release calendar, 事故和指标变化就无法追溯。好的 release calendar 记录 affected population、risk tier、evidence required、canary、rollback、review date 和 decision log。
Q3: 如何避免 dashboard 变成 vanity metrics?
每个 dashboard metric 都必须有 metric contract。合同定义业务问题、口径、population、source、owner、threshold、guardrail、segmentation 和 action rule。比如 agent assist 的 AHT 必须同时看 repeat contact、QA defect、complaint 和 cost per assisted call。没有行动规则的指标不进入 executive review。
Q4: AI incident 后你如何推动产品改进?
我会先按 severity contain 或 rollback, 然后做 root cause, 分类到 model、prompt、data、knowledge、tool、workflow、policy、training 或 monitoring。接着更新 corrective action、metric contract、eval set、release calendar、backlog 和 decision log。最后用 action closure evidence 和 recurrence review 证明问题真的被解决, 而不是 ticket 被关闭。
Q5: Monthly value review 和 quarterly portfolio review 的差异是什么?
Monthly value review 面向单个 capability, 判断 outcome、adoption、risk、cost 和 value leakage 是否支持 scale、restrict、redesign 或 retire。Quarterly portfolio review 比较多个 capabilities, 决定 funding、platform investment、risk concentration、capacity allocation 和 retirement。前者是产品价值判断, 后者是投资组合判断。
Q6: CBAP-level BA 在 AI Product Ops 里有什么高级价值?
BA 不只是记录需求。Post-launch 阶段, BA 可以维护 work-as-done evidence、metric contract、assumption ledger、complaint/incident taxonomy、action closure 和 policy drift impact。BA 的优势是把流程、规则、异常、控制点和用户行为连接到 outcome evidence, 让 PM 和架构师看到真实工作系统如何变化。
Q7: 如何向高管解释 action closure?
Action closure 不是“负责人说完成了”。它要求每个行动有 closure evidence, 例如 defect rate 降到阈值内、QA sample 清洁、知识库版本上线、release impact review 通过或 complaint recurrence 停止。没有 closure evidence, action 只是活动, 不是运营改善。
24. Portfolio Exercise
24.1 Assignment
选择一个金融零售 AI capability, 推荐用以下之一:
Contact-center agent assist。
Complaint intelligence。
KYC onboarding assistant。
Collections hardship guidance。
AML triage assistant。
Personalized pricing governance。
为它设计一套 90 天 AI Product Ops operating cadence。
24.2 Deliverables
Deliverable Minimum content Product Ops Charter scope, owner, risk tier, cadence, decision rights Operating Calendar daily, weekly, monthly, quarterly, release, incident, experiment Metric Contract Pack at least 12 metrics across outcome, adoption, quality, risk, cost, reliability Evidence Review Pack decision request, outcome, adoption, quality, risk, cost, release, action closure Release Calendar model, prompt, data, knowledge, tool, policy, workflow changes Experiment Registry at least 3 experiments with guardrails Incident-to-Roadmap Loop severity, root cause taxonomy, corrective action, roadmap update Backlog Governance backlog classes, priority formula, evidence links Dashboard Wireframes weekly ops, monthly value, quarterly portfolio Interview Narrative 30-second and 2-minute answer
24.3 Scoring Rubric
Dimension Strong answer Cadence clarity Different forums have different decisions and evidence Metric rigor Metrics have contracts, thresholds, guardrails and action rules Evidence quality Uses telemetry, samples, experiments and finance/risk reconciliation Post-launch realism Includes release changes, incidents, complaints, cost and action closure Financial retail fit Handles customer harm, compliance, frontline adoption and capacity Architecture depth Connects version trace, OpenTelemetry-style observability and SLO thresholds BA maturity Models assumptions, policy drift, work-as-done and exception paths Executive usefulness Produces scale, restrict, redesign, retire and funding decisions
25. Quality Bar
A strong AI Product Ops artifact can answer:
What decision does each forum make?
Which evidence proves the decision?
Which metric contracts define the evidence?
Which releases changed the system?
Which incidents changed the roadmap?
Which actions closed with proof?
Which assumptions are still valid?
Which capability deserves more investment, restriction or retirement?
If the answer is unclear, the product is not yet operating. It is only running.