AI Shadow Mode / Counterfactual Evaluation / Silent Launch Playbook
Shadow mode / counterfactual evaluation / silent launch 的目的, 是在 AI 真正影响客户或一线员工之前, 生成可审计、可比较、可决策的 evidence:
AI Shadow Mode / Counterfactual Evaluation / Silent Launch Playbook
定位: 面向 experienced CBAP / financial retail PM / product architect / solution architect / AI governance lead 的高级落地手册。
核心问题: 如何在不影响客户、员工决策或系统状态的前提下, 用真实业务上下文验证 AI 决策是否值得进入 assisted mode、limited rollout 或正式治理审批。
适用范围: credit line management、AML alert triage、KYC onboarding、payment fraud intervention、collections contact strategy、contact center agent assist。
边界: 本手册不替代法律意见、模型验证报告、合规审批、UAT certification、online experimentation、release governance 或 adoption analytics。
Purpose And When To Use
Purpose
Shadow mode / counterfactual evaluation / silent launch 的目的, 是在 AI 真正影响客户或一线员工之前, 生成可审计、可比较、可决策的 evidence:
| Purpose | What it answers | Artifact |
|---|---|---|
| Prove workflow fit | AI 是否理解真实输入、政策、例外、延迟和缺失数据 | shadow run report |
| Estimate counterfactual value | 如果 AI 参与决策, 会带来多少收益、损害和运营负荷 | champion/challenger comparison |
| Detect concentrated harm | 是否对 segment、渠道、语言、地区、产品或 vulnerable customer 不公平 | fairness/segment scorecard |
| Calibrate human review | AI 与 SME、analyst、underwriter、agent 的差异在哪里 | disagreement review |
| Prepare rollout decision | 是否 no-go、继续 shadow、assisted mode、limited rollout | gate memo and evidence packet |
When To Use
| Use shadow mode when | Do not use it as a shortcut when |
|---|---|
| AI recommendation may change eligibility, intervention, escalation, customer treatment, prioritization or regulated communication | You already know the workflow is unsafe or prohibited |
| Outcome labels are delayed and offline eval cannot answer business impact | You need production A/B experimentation with customer impact |
| Human review quality and AI comparison matter | You only need technical regression for a non-decision component |
| Fairness, leakage, operational readiness or auditability are material | Logs cannot be retained or reconstructed under policy |
| You need confidence before exposing suggestions to staff | The AI output will secretly influence staff without controls |
Use Case Fit
| Use case | Shadowable decision | Primary risk | Mature outcome |
|---|---|---|---|
| Credit line management | increase / decrease / hold / review | unfair credit treatment, adverse action inconsistency | delinquency, loss, complaint, attrition |
| AML alert triage | close / escalate / prioritize / narrative | missed suspicious activity, analyst bias | SAR decision, QA defect, reopened case |
| KYC onboarding | approve / reject / document request / EDD | false reject, synthetic identity miss, discouragement | fraud hit, closure reason, complaint |
| Payment fraud intervention | allow / step-up / hold / decline | false decline, fraud loss, vulnerable customer harm | confirmed fraud, chargeback, customer confirmation |
| Collections contact strategy | channel / timing / hardship route / no contact | conduct risk, consent breach, vulnerable customer harm | cure, re-default, complaint, violation |
| Contact center agent assist | answer / citation / escalation / summary | hallucinated policy, automation bias, wrong regulated message | QA score, repeat contact, complaint |
Operating Model
1. Roles And Decision Rights
| Role | Responsibilities | Decision rights |
|---|---|---|
| Product owner | defines use case, customer impact, business value, rollout objective | recommends continue / hold / limited go |
| Senior BA / CBAP | maps decision boundary, policy, exception flows, outcome labels, human workflow | accepts business process completeness |
| Product architect | designs shadow operating model, authority boundary, evidence artifacts | approves product architecture readiness |
| Solution architect | designs event routing, logging, snapshot, versioning, access, observability | approves technical readiness |
| AI / ML owner | owns challenger model, prompt, RAG, tool logic, evals and limitations | approves model candidate for shadow |
| Operations owner | owns reviewer capacity, queue impact, training and escalation | approves operational readiness |
| Risk / compliance / model risk | challenges fairness, leakage, customer harm, evidence and residual risk | approves risk acceptance or no-go |
| Data governance | validates feature availability, lineage, retention and access controls | approves data readiness |
| Internal audit liaison | reviews evidence reconstructability and control clarity | gives evidence quality feedback |
2. End-To-End Flow
intake
-> decision boundary
-> population and sampling plan
-> feature/context snapshot design
-> read-only challenger integration
-> counterfactual event logging
-> leakage and evidence checks
-> human comparison
-> delayed outcome join
-> fairness and segment scorecard
-> gate decision
-> assisted mode / continue shadow / no-go
3. Cadence
| Cadence | Meeting | Inputs | Outputs |
|---|---|---|---|
| Daily during first week | Shadow run health check | trace completeness, errors, blocked events, write attempts | run issue log |
| Weekly | Counterfactual review | agreement, disagreement severity, sample reviews, missing labels | tuning and control actions |
| Biweekly or monthly | Gate readiness review | outcome maturity, fairness scorecard, operations capacity, evidence binder | continue / narrow / expand / stop recommendation |
| At label maturity | Outcome review | mature labels, losses, complaints, QA defects, appeals | rollout gate memo |
Shadow Mode Intake Template
| Field | Required content | Example |
|---|---|---|
| Use case | Business decision being shadowed | Payment fraud intervention for high-value card-not-present transactions |
| Business owner | Accountable decision owner | Fraud strategy director |
| Workflow insertion point | Exact event and step | After authorization risk score, before customer step-up |
| Champion path | Current actual decision maker | Fraud rules engine + fraud analyst queue |
| Challenger path | AI candidate behavior | AI recommends allow, step-up, hold or decline with reason |
| Customer impact prohibited in shadow | Actions AI cannot trigger | No customer message, no hold, no decline, no case note write |
| Employee exposure | Whether staff can see AI output | Hidden during L1; SME-only review during L2 |
| Population | Included and excluded traffic | Include domestic card-not-present above threshold; exclude disputed accounts |
| Sampling | How events enter shadow | 20% stratified sample by risk band and merchant category |
| Outcome labels | Mature labels and proxy labels | confirmed fraud, chargeback, customer confirmation, complaint |
| Fairness segments | Approved segments/proxies | region, age band where permitted, language, product, device, vulnerability flag |
| Required evidence | Gate artifacts | event schema, leakage register, scorecard, human comparison, gate memo |
| Initial gate date | First decision date based on label maturity | 30-day readiness gate, 60-day outcome gate |
Counterfactual Event Schema Template
| Field | Type | Description | Example |
|---|---|---|---|
| event_id | string | Stable shadow event id | shd_pay_20260630_000184 |
| trace_id | string | Observability trace across router, challenger, store, outcome join | trc_75f4b28a |
| use_case_id | string | Registered AI use case | PAY-FRAUD-INTERVENTION-AI |
| event_time | timestamp | Time of business event | 2026-06-30T14:22:09Z |
| decision_time | timestamp | Time challenger produced decision | 2026-06-30T14:22:10Z |
| population_slice | object | Product/channel/region/risk-band descriptors | card-not-present, high-value, mobile |
| feature_snapshot_id | string | Decision-time feature snapshot | fs_pay_v17_20260630_142209 |
| policy_version | string | Policy/rule version visible at decision time | fraud_policy_2026_06_v3 |
| model_version | string | Challenger model version | fraud_challenger_v0.8.2 |
| prompt_version | string | Prompt/system instruction version when applicable | prompt_fraud_reason_v5 |
| rag_corpus_version | string | Knowledge corpus version when applicable | fraud_typology_corpus_2026_06_15 |
| tool_schema_version | string | Tool contract version when applicable | readonly_fraud_tools_v2 |
| ai_recommendation | enum | Proposed decision | step-up |
| ai_confidence | number | Calibrated confidence or score | 0.78 |
| ai_reason_codes | array | Business-readable reasons | unusual merchant, device mismatch, velocity spike |
| ai_citations | array | Policy or evidence references | fraud policy section 4.2 |
| ai_abstained | boolean | Whether AI declined to recommend | false |
| champion_decision | enum | Actual decision made by current process | allow |
| champion_actor | string | Rule, model, human role, or workflow owner | rules_engine_v12 |
| action_taken | string | Actual customer/system impact | authorization approved |
| disagreement_severity | enum | none / low / medium / high / critical | high |
| human_review_label | enum | SME judgment for comparison sample | step-up appropriate |
| outcome_label_status | enum | pending / proxy / mature / unavailable | pending |
| outcome_value | object | Mature label once available | confirmed_fraud=true, chargeback=false |
| fairness_slice_id | string | Approved monitoring slice id | mobile_high_value_region_3 |
| leakage_check_status | enum | pass / fail / exception | pass |
| retention_class | string | Retention and privacy class | regulated_decision_shadow_7y |
| evidence_refs | array | Linked eval, review, issue and gate ids | eval_run_241, gate_pay_2026_08 |
Design rule: every event must reconstruct what the challenger knew, what it would have done, what the champion did, and what later happened.
Label/Outcome Plan Template
| Label / outcome | Source system | Maturity window | Decision use | Quality control |
|---|---|---|---|---|
| Champion decision | workflow engine / case system | same day | agreement baseline | reconcile against audit trail |
| Human SME comparison | independent review queue | 3-10 business days | disagreement severity and calibration | blind review for sampled cases |
| Customer complaint | complaint management | 7-60 days | harm proxy | map to event when complaint references decision |
| Confirmed fraud | fraud case system | 7-60 days | payment fraud outcome | exclude unresolved investigations |
| Delinquency / default | credit servicing | 30-180 days | credit line risk outcome | vintage by decision date |
| SAR / QA disposition | AML case management | 30-120 days | AML triage outcome | separate analyst disposition and QA defects |
| KYC fraud hit | identity / fraud ops | 30-180 days | onboarding risk outcome | tag synthetic identity confidence |
| Collections cure / re-default | servicing and collections | 30-120 days | treatment effectiveness | separate payment cure from sustainable cure |
| Contact center QA | QA platform / CRM | 7-45 days | answer quality and resolution | sample by agent, queue and issue type |
Outcome rules:
- Each gate memo must state which outcomes are mature, proxy-only, or unavailable.
- Proxy outcomes can support readiness but cannot prove full business impact.
- Outcome join failures are evidence quality issues, not missing details to ignore.
- Label definitions must be frozen before gate analysis to avoid decision-driven relabeling.
Leakage Controls Template
| Leakage risk | Control design | Evidence | Owner |
|---|---|---|---|
| Future outcome feature enters shadow decision | Feature snapshot only includes values available at event_time | feature availability contract, snapshot audit | data governance |
| Champion decision influences challenger output | Challenger runs before champion decision capture is made available | event ordering logs | solution architect |
| Reviewer sees AI before independent label | Blind review queue hides challenger output | reviewer UI screenshot, assignment log | operations owner |
| Sample excludes hard cases | Population and sampling plan includes risk bands and edge cases | traffic inclusion report | product owner |
| RAG corpus contains post-event policy | Corpus version is locked by decision_time | corpus version manifest | AI owner |
| Labels from remediated cases mix with original decisions | Label maturity registry separates initial and corrected disposition | outcome lineage report | Senior BA |
| Protected attributes leak into runtime decision | Monitoring environment separated from runtime feature set | access control and feature list | risk/compliance |
| Human-in-the-loop becomes influenced pilot | Staff exposure level is documented and separated by L1/L2/L3 | exposure register | product architect |
Fairness/Segment Scorecard Template
| Segment | Population share | AI recommendation rate | Champion rate | High-severity disagreement | False positive proxy | False negative proxy | Outcome harm signal | Action |
|---|---|---|---|---|---|---|---|---|
| Mobile high-value payments | 18% | 11.4% step-up | 6.8% step-up | 4.1% | 2.2% false step-up | 0.9% missed fraud | complaint proxy stable | monitor |
| New-to-bank KYC | 12% | 9.6% EDD | 5.1% EDD | 5.7% | 3.4% false EDD | 1.3% missed fraud | onboarding drop-off elevated | investigate |
| Small business credit line | 8% | 7.0% line decrease | 4.9% line decrease | 3.3% | 1.8% adverse change proxy | 2.5% missed risk | complaint proxy stable | require SME review |
| Vulnerability flag collections | 5% | 2.1% hardship route | 3.8% hardship route | 6.2% | 0.7% over-route | 4.8% under-route | complaint proxy elevated | no-go for this segment |
| Limited-English contact center | 7% | 15.2% escalation | 10.4% escalation | 4.9% | 2.9% unnecessary escalation | 1.5% under-escalation | repeat contact elevated | improve RAG and language eval |
Scorecard rules:
- Show champion rate beside AI rate so the review is about decision change, not raw model output.
- Separate false positive harm and false negative harm; different use cases value them differently.
- Segment thresholds must be approved before reviewing results.
- Any critical segment regression blocks broad rollout even when aggregate metrics improve.
Rollout Gate Template
| Gate item | Evidence required | Pass standard | Decision |
|---|---|---|---|
| Scope and authority | use case card, prohibited action list, authority matrix | AI has no unapproved customer or system impact | pass |
| Shadow stability | trace completeness, error rate, latency/cost report | agreed trace completeness and no champion path impact | pass |
| Leakage | leakage register, failed check log, remediation evidence | no material unresolved leakage | pass |
| Decision performance | agreement, disagreement, SME upheld rate, outcome metrics | challenger improves or supports target decision without critical harm | conditional pass |
| Outcome maturity | label maturity report and join rate | gate uses only mature labels for outcome claims | pass |
| Fairness / segment | scorecard and threshold breaches | no unexplained critical segment regression | investigate if any red slice |
| Human comparison | blind review protocol, reviewer calibration | high-severity disagreements reviewed and dispositioned | pass |
| Operations | reviewer capacity, escalation, fallback, runbook | team can operate assisted mode without control degradation | pass |
| Evidence | evidence packet index and reconstructability sample | sampled decisions can be reconstructed end-to-end | pass |
| Residual risk | risk acceptance, compensating controls, expiry | accountable owner accepts limited rollout risk | limited go only |
Gate outputs:
- No-go: material leakage, prohibited action, unfair segment harm, unreviewed critical disagreement or unreconstructable evidence.
- Continue shadow: evidence incomplete, labels immature, operations not ready, or value unclear.
- Limited go: narrow segment, human approval, explicit rollback triggers, daily monitoring.
- Rollout go: mature evidence supports controlled expansion and risk owners approve.
Rollback Trigger Template
| Trigger | Threshold example | Detection source | Immediate action | Owner |
|---|---|---|---|---|
| Write attempt from challenger | any unauthorized write or customer message | access logs / tool sandbox | disable challenger integration | solution architect |
| Trace completeness breach | below agreed threshold for two business days | observability dashboard | pause gate analysis and fix logging | AI platform owner |
| Leakage confirmed | any material future or champion leakage | leakage review | invalidate affected run | data governance |
| Critical disagreement spike | above approved threshold in high-risk slice | daily disagreement report | stop expansion, SME review | product owner |
| Fairness breach | segment false positive/negative disparity above threshold | segment scorecard | no-go for impacted segment | risk/compliance |
| Reviewer overload | queue SLA breach or control backlog | operations dashboard | reduce sample or pause L2/L3 | operations owner |
| Outcome harm signal | complaint, appeal, false decline, QA defect spike | outcome joiner / complaint system | rollback to hidden shadow or no-go | product owner |
| Evidence gap | sampled event cannot reconstruct versions and decision | evidence binder QA | hold gate decision | governance lead |
Rollback rule: a rollback trigger must specify who can stop shadow exposure, how quickly the path is disabled, what evidence is preserved, and how re-entry is approved.
Evidence Packet Template
| Evidence item | Description | Minimum content |
|---|---|---|
| Executive summary | Decision recommendation and risk posture | no-go / continue / limited go / go, scope, residual risks |
| Use case card | Business context and authority boundary | customer impact, employee exposure, prohibited actions |
| Architecture diagram | Champion/challenger and logging path | router, snapshot, challenger, event store, outcome join |
| Event schema | Counterfactual data contract | fields, retention class, versioning, examples |
| Data lineage | Input and outcome provenance | feature availability, policy versions, RAG corpus, label sources |
| Leakage register | Leakage assessment and controls | risks, controls, failed checks, remediation |
| Shadow run report | Operational health | volume, errors, latency, cost, trace completeness |
| Human comparison | SME / analyst / agent comparison | blind review method, disagreement severity, calibration |
| Outcome analysis | Mature and proxy outcomes | label maturity, join rate, business metrics, limitations |
| Fairness scorecard | Segment and harm analysis | thresholds, breaches, mitigations, owner decisions |
| Gate memo | Go / no-go decision | criteria results, open issues, risk acceptance, rollout limits |
| Rollback plan | Stop rules and execution path | triggers, owners, communication, re-entry criteria |
| Audit index | Reconstructable evidence references | trace ids, run ids, approvals, issue records |
PM/BA/Architecture Questions
PM Questions
| Question | Why it matters |
|---|---|
| Which customer or business outcome could change if AI were allowed to act? | Defines decision impact and risk tier |
| Is the AI recommending, ranking, drafting, deciding, or executing? | Determines authority boundary |
| Which segment could be harmed even if aggregate performance improves? | Forces fairness and customer harm analysis |
| What is the smallest low-risk surface for limited go? | Avoids broad rollout based on thin evidence |
| What value would justify added operational and governance cost? | Prevents shadow mode from becoming a research exercise |
BA / CBAP Questions
| Question | Why it matters |
|---|---|
| What is the exact business rule, policy or exception AI is shadowing? | Prevents vague “AI assist” scope |
| Which decisions are observable immediately and which outcomes are delayed? | Builds label/outcome plan |
| What must be recorded to replay the decision? | Defines evidence and event schema |
| When does human review need to be blind? | Protects comparison validity |
| Which workflow step creates customer impact? | Separates silent mode from pilot |
Architecture Questions
| Question | Why it matters |
|---|---|
| Can challenger service technically write to any production system? | Confirms read-only control |
| Are feature, policy, prompt, model, RAG and tool versions reconstructable? | Enables audit and root cause analysis |
| How are trace ids propagated from event to outcome join? | Connects observability to evaluation |
| Where are protected attributes or proxies handled for monitoring? | Separates fairness analytics from runtime decisioning |
| What happens if shadow path fails or exceeds latency/cost budget? | Protects champion path and operations |
Release Checklist
This checklist is named “release” because it decides whether the AI can be released from hidden shadow into a more exposed mode. It is not a general software release checklist.
| Check | Done condition |
|---|---|
| Use case registered | use case id, owner, risk tier and decision boundary approved |
| Authority boundary defined | AI cannot trigger customer/system action in L1/L2 shadow |
| Population and sampling approved | inclusion/exclusion and stratification documented |
| Counterfactual schema implemented | event reconstructs input, AI output, champion decision and outcome plan |
| Feature snapshots locked | only decision-time available features used |
| Version lineage complete | model, prompt, RAG, policy, tool and data versions captured |
| Leakage controls passed | no material unresolved leakage |
| Human comparison ready | blind review and SME calibration protocol active |
| Outcome plan active | label sources, maturity windows and join logic approved |
| Fairness scorecard active | segments, thresholds and owners approved |
| Operational readiness proven | reviewer capacity, escalation, fallback and support path ready |
| Rollback triggers approved | stop rules, owners and re-entry criteria documented |
| Evidence packet complete | gate memo can reconstruct sampled events end-to-end |
| Gate decision recorded | no-go, continue shadow, limited go or rollout go approved by accountable owners |
Executive Narrative
1-Minute Steering Committee Version
We are not asking to let the AI change customer outcomes yet. We are asking to continue or expand a controlled shadow mode where the AI sees real business events, produces counterfactual recommendations, and records what it would have done while the existing champion process remains fully in control. This gives us evidence on decision quality, delayed outcomes, fairness, human review differences, operational readiness and audit traceability before any customer-impacting rollout.
3-Minute Risk Committee Version
The control objective is to avoid moving from offline testing directly into customer-impacting AI decisions. In shadow mode, the challenger AI is read-only. It cannot write to production systems, send customer communications, change account status, alter credit lines, close AML alerts, decline payments or direct collections actions. Every recommendation is stored as a counterfactual event with decision-time features, policy and model versions, the actual champion decision and later outcome labels.
We will use this evidence to answer five questions: does AI add decision value, does it create concentrated harm, does it disagree with human experts in high-risk cases, are outcome labels mature enough to support the claim, and can operations and governance reconstruct the decision. If any material leakage, fairness breach, unauthorized action, reviewer overload or evidence gap occurs, the rollout path stops and returns to remediation.
Portfolio Storyline
I designed shadow mode as a pre-decision architecture pattern for regulated financial AI. It combines champion/challenger comparison, counterfactual logging, delayed outcome evaluation, leakage controls, fairness scorecards, human review calibration, rollout gates and rollback triggers. The value is that product, architecture and governance teams can make a disciplined go / no-go decision before the AI is allowed to affect customers.
Interview Drills
Drill 1: Credit Line Management
Question: “How would you validate an AI credit line recommender before launch?”
Strong answer:
I would run it in read-only shadow mode against real line-management events. The current credit policy remains champion, and the AI challenger records increase/decrease/hold/review recommendations with reason codes and decision-time feature snapshots. I would compare against actual decisions, wait for delinquency, utilization, complaint and attrition labels to mature, and run fair lending segment analysis. It can only move to assisted mode if reason-code consistency, high-severity disagreement, fairness and evidence gates pass.
Drill 2: AML Alert Triage
Question: “What makes shadow mode hard for AML?”
Strong answer:
AML labels are delayed and noisy. Analyst disposition, QA review and SAR decision are related but not identical labels. I would separate immediate analyst agreement from mature outcomes, run blind SME comparison on sampled alerts, monitor typology-specific false negatives, and preserve narrative evidence. A model that reduces queue volume but under-escalates high-risk typologies should fail the gate even if aggregate agreement looks good.
Drill 3: KYC Onboarding
Question: “How do you avoid shadow evaluation becoming biased by the current process?”
Strong answer:
I would define the population before sampling, include cases the current process sends to different paths, and keep challenger output separate from champion decision capture. For human comparison, reviewers must be blind to AI output. I would also control policy version and document availability at decision time, because onboarding results can look better if future fraud flags or corrected documents leak into evaluation.
Drill 4: Payment Fraud Intervention
Question: “Why not just A/B test the fraud model?”
Strong answer:
For payment fraud, customer harm from false declines and missed fraud can be immediate. Before any customer-impacting experiment, I want counterfactual evidence using real transaction context. The AI proposes allow, step-up, hold or decline, but the champion path still decides. We then join confirmed fraud, chargebacks, customer confirmations and complaints. Only if false positive and false negative tradeoffs are acceptable by segment would I move to a limited, reversible rollout.
Drill 5: Collections Contact Strategy
Question: “What would block rollout even if cure rate improves?”
Strong answer:
I would block rollout if the AI increases contact pressure on vulnerable customers, violates consent or contact frequency rules, under-routes hardship cases, or concentrates complaints in a protected/proxy segment. Collections AI must be judged on conduct risk and customer treatment, not only cure rate.
Drill 6: Contact Center Agent Assist
Question: “How do you test agent assist without creating automation bias?”
Strong answer:
Start with hidden shadow mode and compare AI suggested answers against actual agent responses and QA outcomes. For sampled cases, run independent review where reviewers do not see the AI output. If we later expose suggestions to agents, that becomes assisted silent launch, and we need acceptance/override tracking, citation quality, repeat contact, complaint and QA monitoring to detect automation bias.
Source Anchors
| Source | Link | 本手册使用方式 |
|---|---|---|
| NIST AI Risk Management Framework | https://www.nist.gov/itl/ai-risk-management-framework | 用 Govern / Map / Measure / Manage 组织 shadow evidence、risk gate 和管理层沟通。 |
| NIST AI RMF Resources and TEVV | https://www.nist.gov/itl/ai-risk-management-framework/ai-risk-management-framework-resources | 用 TEVV 语境组织 test, evaluation, verification, validation 和 independent challenge。 |
| ISO/IEC 42001 | https://www.iso.org/standard/81230.html | 用 AI management system 思维组织 operating model、performance evaluation 和 continual improvement。 |
| ISO/IEC 23894 | https://www.iso.org/standard/77304.html | 用 AI risk management 语境组织 risk treatment、monitoring 和 review。 |
| Google Rules of Machine Learning | https://developers.google.com/machine-learning/guides/rules-of-ml | 用 ML engineering rules 校准训练/服务一致性、监控和上线前系统检查。 |
| DORA metrics | https://dora.dev/ | 用 delivery performance 和 resilience 语言连接 rollout gates、change failure 和 restore thinking。 |
| OpenTelemetry docs | https://opentelemetry.io/docs/ | 用 traces, metrics, logs 和 context propagation 支撑 counterfactual event-to-outcome observability。 |