AI 底层逻辑 / 经典论文

AI UAT / Regression Certification：业务验收架构

重要说明: 本文是学习、作品集和内部架构训练材料, 不构成法律意见、合规结论、审计意见、模型验证结论、信息安全认证、无障碍合规结论或监管解释。具体适用范围、控制要求、客户影响判断、风险接受权限、上线批准、回滚触发和对外沟通由 Legal / Compliance / Risk / Model Risk / Information Security / Internal Audit / Busine

746 行ai-foundations/papers/149-ai-uat-regression-certification-business-acceptance-architecture.md

AI UAT / Regression Certification / Business Acceptance Architecture 解读

面向对象: CBAP+ Financial Retail PM / Senior BA / Product Architecture Lead / Solution Architect / QA Governance Lead / AI Governance / Operational Risk / Release Manager。核心问题: 金融零售 AI 系统如何把 UAT、回归测试、业务验收、发布认证、风险接受、证据包、运行准备和上线后监控设计成可证明的 evidence architecture, 而不是上线前的签字仪式? 学习目标: 建立 business acceptance criteria、golden journey library、synthetic transaction pack、persona/segment coverage、risk/control coverage、AI regression matrix、workflow replay、shadow testing、parallel run、defect triage、release certification、exception acceptance、rollback 和 post-release monitoring 的系统架构心智模型。

Source Anchors

Source	Official link	本文使用方式
FFIEC Development, Acquisition, and Maintenance IT Handbook	https://ithandbook.ffiec.gov/it-booklets/development-acquisition-and-maintenance/	用 SDLC、testing、implementation and assessment、maintenance、change management、rollback/back-out、testing data controls 和 documentation language 组织 UAT / regression / release certification。
FFIEC DA&M - V.B Testing	https://ithandbook.ffiec.gov/it-booklets/development-acquisition-and-maintenance/v-development/vb-testing/	用 testing scope、test results、corrective actions、UAT、regression testing、stress testing、production-data-in-testing controls 校准测试证据。
FFIEC Management IT Handbook	https://ithandbook.ffiec.gov/it-booklets/management	用 IT governance、risk management、enterprise architecture、project management、information systems reporting 组织 ownership、management reporting 和 release oversight。
FFIEC Business Continuity Management IT Handbook	https://ithandbook.ffiec.gov/it-booklets/business-continuity-management	用 business impact analysis、interdependency analysis、resilience、event management、continuity/recovery、exercises/tests、maintenance/improvement 支撑 operational readiness、degraded mode 和 rollback。
NIST SP 800-218 Secure Software Development Framework	https://csrc.nist.gov/pubs/sp/800/218/final	用 Prepare the Organization、Protect the Software、Produce Well-Secured Software、Respond to Vulnerabilities 的生命周期语言设计 secure release evidence。
NIST SP 800-53 Rev. 5	https://csrc.nist.gov/pubs/sp/800/53/r5/upd1/final	用 control catalog、assessment、audit/logging、configuration/change、contingency、risk assessment、system integrity 等控制语言表达证据对象。
NIST AI Risk Management Framework	https://www.nist.gov/itl/ai-risk-management-framework	用 Govern / Map / Measure / Manage 组织 AI UAT 的上下文、风险、度量、处置和持续改进。
NIST AI RMF Core	https://airc.nist.gov/airmf-resources/airmf/	用 AI RMF Core 的 function/category 思维设计 risk-to-test-to-evidence traceability。
ISO/IEC 42001 AI management systems	https://www.iso.org/standard/81230.html	用 AI management system、risk/opportunity、operational control、performance evaluation、improvement 组织验收治理和管理体系证据。

一句话:

AI UAT is not a ceremony where business signs a release. It is an evidence architecture that proves a business capability is acceptable, controllable, operable, reversible and monitorable under a specific model, prompt, data, workflow and control version.

1. Thesis: UAT 不是签字, 是可证明的业务接受

传统 UAT 经常被误用成上线前最后一轮业务试点:

requirements document
  -> QA says tests passed
  -> business users click through screens
  -> defects discussed in daily call
  -> UAT sign-off email
  -> release

这个模式在 AI 系统里不够。AI 不是只改 UI 或交易规则, 它可能改变判断、推荐、解释、检索、分流、例外处理、人工工作负载、客户沟通、运营风险和监控方式。业务验收如果只证明"页面能走通", 就无法回答:

高级验收问题	为什么 AI 场景更尖锐
业务结果是否可接受	模型输出概率化, 不能只看 happy path
覆盖了哪些客户、渠道、产品、语言和脆弱客户场景	segment gap 会导致局部客户伤害
哪些控制真实生效	prompt guardrail、retrieval filter、tool gateway、HITL 不是看界面能证明
回归影响在哪里	model、prompt、RAG corpus、tool schema、policy rule、workflow 任何一个变更都可能改变行为
例外如何接受	AI residual risk 不能靠口头"业务同意"
上线后如何发现偏移	离线 UAT 通过不代表生产分布持续稳定
出事如何回滚	模型版本、prompt、retriever、feature flag、workflow 和人工操作都要有回退路径

成熟的 AI UAT / Regression Certification 架构应写成:

business objective
  -> acceptance claim
  -> acceptance criteria
  -> golden journeys
  -> synthetic and replay transaction packs
  -> persona / segment / channel coverage
  -> risk and control coverage
  -> model / prompt / RAG / tool / workflow regression
  -> defect triage and residual risk
  -> release certification memo
  -> operational readiness evidence
  -> post-release monitoring and rollback criteria

UAT 的产品不是"签字记录", 而是一组可以被业务、风险、架构、QA、运营、内审和管理层共同查询的 acceptance evidence。

2. Acceptance Evidence Stack

2.1 从 Sign-Off 到 Claim-Based Acceptance

UAT 应先定义要证明的 claim:

For release R,
under model M / prompt P / RAG corpus K / tool set T / workflow W / policy C,
the business capability is acceptable for population S,
because evidence E demonstrates criteria A,
with known exceptions X,
accepted by owners O,
and monitored by controls N.

Stack layer	关键对象	必须回答
Business claim	业务能力可接受声明	这次上线究竟证明哪个业务能力可以进入生产?
Scope	product、journey、persona、channel、region、system、release	哪些客户、员工、流程和技术版本在范围内?
Criteria	business acceptance criteria、risk acceptance criteria、operational criteria	什么叫可接受, 阈值是什么, 谁有权判断?
Test assets	golden journey、synthetic pack、workflow replay、shadow sample、parallel run sample	证据来自哪些可复用资产?
Regression matrix	model、prompt、RAG、tool、policy、workflow、data、UI/API	本次变更可能影响哪些行为?
Control evidence	logs、eval report、defect ticket、approval、exception、monitoring config	控制是否真实运行并留下证据?
Decision record	release certification、hold、limited release、exception release、rollback	最终决策是什么, 剩余风险是什么?

2.2 UAT 与 QA / Model Validation / Audit 的边界

Discipline	关注点	不能替代
QA	功能、集成、性能、自动化回归、缺陷修复验证	不能代表业务接受残余风险
UAT / Business Acceptance	业务流程、客户/员工旅程、政策边界、例外处理、操作准备	不能替代独立模型验证或安全评估
Model validation / AI risk	模型适用性、性能、偏差、稳定性、限制、监控	不能替代真实业务流程和运营准备
Information security	访问控制、数据保护、漏洞、日志、安全测试	不能确认业务结果是否可接受
Internal audit	独立评价控制设计与运行	不拥有上线接受责任
Release management	变更窗口、部署、回滚、环境准备	不判断业务和风险是否接受

高级 BA / PM 的价值在于把这些证据连接起来, 形成 release certification bundle, 而不是让每个团队交一份孤立附件。

3. Reference Architecture

Business goals / policies / risk appetite / customer impact
        |
        v
Acceptance criteria registry
  business outcome | journey criteria | risk/control criteria | ops readiness | rollback
        |
        v
Test asset library
  golden journeys | persona packs | synthetic transactions | replay datasets
  accessibility cases | exception scenarios | adversarial prompts | edge cases
        |
        v
AI regression certification plane
  model version | prompt version | retriever/corpus version | tool schema/version
  policy engine | workflow state machine | feature flags | UI/API
        |
        v
Execution and evidence capture
  automated tests | business UAT sessions | workflow replay | shadow mode
  parallel run | runbook drills | monitoring dry run
        |
        v
Defect and exception governance
  severity | root cause | customer impact | control impact | owner | disposition
        |
        v
Release certification
  coverage matrix | evidence index | open exceptions | risk acceptance
  operational readiness | rollback criteria | sign-off ledger
        |
        v
Production monitoring and feedback loop
  quality sampling | drift | complaint/appeal | operational load | incidents
  defect clustering | eval expansion | corrective action

架构原则:

No business acceptance without explicit acceptance criteria.
No acceptance criteria without coverage evidence.
No AI release certification without model/prompt/RAG/tool/workflow regression.
No production data in testing without documented need, controls and approval.
No exception without owner, expiry, compensating control and monitoring.
No release without operational readiness and rollback criteria.
No AI-generated UAT evidence summary without human accountable acceptance.

4. Business Acceptance Criteria Contract

业务验收标准不是 user story 的 "acceptance criteria" 简写。AI 系统的业务接受要覆盖结果、边界、控制、运营和证据。

Contract field	必须写清楚	反例
Capability	被验收的业务能力, 不只是技术功能	"AI 已集成"
Population	客户/员工/账户/交易/地区/语言/渠道/产品范围	"所有用户"但没有 segment 证明
Decision impact	AI 是建议、排序、摘要、自动决策、分流还是工具执行	不说明 AI 输出如何影响业务行动
Success criteria	正确率、完成率、人工覆盖率、错误容忍、延迟、成本、客户影响阈值	"业务认为体验好"
Risk criteria	禁止输出、升级规则、脆弱客户保护、异常处置、可解释性	风险只写在单独风险文档
Control criteria	HITL、dual control、policy guardrail、tool approval、logging、access	控制没有测试和证据
Regression scope	model、prompt、RAG、tool、workflow、policy、data、UI/API	只做 UI 点击回归
Evidence required	eval report、test run、session record、defect summary、approval、monitoring config	邮件回复"approved"
Decision owner	business owner、risk owner、operations owner、technology owner	只有项目经理代签
Expiry / re-cert trigger	何时需要重新认证	改 prompt / 知识库 / 工具不触发 UAT

示例:

acceptance_id: ACC-KYC-DOC-AI-ONBOARDING-2026Q3
capability: AI-assisted document review for retail account opening
population: mobile channel, English/Spanish, domestic ID + utility bill journeys
decision_impact: AI extracts fields and recommends pass/review; final reject remains human-controlled
success_criteria:
  - golden journey pass rate >= 98%
  - unsupported rejection recommendation = 0 for protected scenarios in test pack
  - manual review queue increase <= 12% versus baseline in parallel run
risk_criteria:
  - vulnerable customer and accessibility scenarios routed without loss of recourse
  - policy uncertainty triggers human review
control_criteria:
  - prompt/model/version logged for every recommendation
  - source document hash and extraction evidence retained
  - override reason required for human reviewer changes
recertification_trigger:
  - model major/minor version change
  - prompt or policy rule change affecting decision boundary
  - document taxonomy or OCR provider change
  - complaint spike or monitoring breach

5. Golden Journey Library

Golden journey 是业务验收的核心资产。它不是截图脚本, 而是端到端业务事实、AI 行为、控制点和证据路径的可复用案例。

5.1 Golden Journey 应覆盖的层次

Journey type	例子	验收重点
Happy path	正常开户、正常支付纠纷、正常客服政策问答	端到端完成率、延迟、证据完整
High-value / high-risk	大额转账、信贷额度调整、AML 高风险告警	控制升级、人工复核、日志和审批
Exception path	资料不全、系统超时、知识库无答案、工具调用失败	降级、拒答、补救、运营队列
Customer harm prevention	误拒、误导性建议、脆弱客户、语言障碍	recourse、accessibility、clear explanation
Operational stress	close period、峰值交易、供应商降级、人工队列拥塞	BCM、SLO、degraded mode
Compliance-sensitive	KYC、AML、投诉、信贷解释、费用争议	来源引用、禁止结论、权限边界

5.2 Golden Journey Card

Field	内容
journey_id	稳定 ID, 例如 `GJ-AML-INVESTIGATION-ESCALATE-017`
business objective	该旅程证明的业务目标
persona / segment	客户、员工、渠道、语言、产品、风险段
preconditions	账户状态、权限、数据状态、模型/知识库版本
steps	业务步骤, 不只是 UI 操作
AI touchpoints	模型输出、RAG 检索、工具调用、推荐、摘要
expected behavior	可接受输出、拒答、升级、人工复核
controls	HITL、policy check、logging、data mask、dual approval
evidence	trace id、eval run、screen-free data proof、approval、defect link
pass / fail	以 criteria 判断, 不是 tester 主观感受

5.3 Golden Journey 的架构价值

架构价值	说明
Change impact	模型、prompt、RAG、工具或流程变更可以反查影响哪些 journey
Regression reuse	每个 release 重跑关键 journey, 不重新发明 UAT 脚本
Evidence continuity	同一 journey 在 UAT、shadow、parallel run、production sampling 中保持一致
Audit readiness	可展示业务接受不是任意抽样, 而是有风险导向的 journey coverage
Portfolio learning	生产缺陷回写到 journey library, 把事故转成长期回归资产

6. Synthetic Transaction Packs

AI UAT 不能只依赖真实历史数据。金融零售场景常需要 synthetic transaction packs 来覆盖罕见、敏感、边界和高风险情境。

Pack type	覆盖内容	设计要求
Boundary pack	阈值附近、临界额度、临界日期、临界规则	能证明规则边界和模型边界都稳定
Rare event pack	欺诈模式、AML typology、投诉升级、灾备场景	不能等待生产自然出现
Protected / sensitive segment pack	语言、年龄段、残障辅助、低数字能力、脆弱客户	用来测试公平、可及性和客户伤害控制
Privacy pack	PII、masked data、tokenized customer、data minimization	确保证据有用但不暴露真实敏感信息
Adversarial pack	prompt injection、policy bypass、jailbreak、malicious document	验证 AI guardrail 和 tool gateway
Operational pack	超时、重复提交、第三方不可用、队列拥塞	验证降级、恢复和运营 runbook

Synthetic pack 的证据字段:

pack_id
generation_method
source_pattern
privacy_classification
business_rationale
covered_acceptance_criteria
covered_risks_and_controls
expected_outputs
reviewer
version
retention_class

关键点:

synthetic 不等于随便造数据, 它要有业务依据、风险依据和预期结果。
synthetic data 可以降低测试隐私风险, 但仍要治理生成方法、敏感属性、重识别风险和保留策略。
high-impact scenario 的 synthetic expected output 应由业务、风险或 policy owner 审核, 不能只由 AI 生成。

7. Persona / Segment / Channel Coverage

AI UAT 的成熟度取决于是否能证明覆盖了真实业务分布和高风险边缘人群。

Coverage axis	金融零售例子	接受证据
Customer segment	新客、老客、高净值、薄信用档案、学生、退休、small business owner	segment matrix、sample count、pass/fail
Vulnerability	老年客户、残障客户、低英语熟练度、经济困难、被诈骗风险	inclusive journey、recourse test、accessibility evidence
Product	checking、credit card、mortgage、personal loan、investment referral	product-specific journey coverage
Channel	mobile、web、branch、contact center、back-office workbench	channel journey result、handoff evidence
Geography / jurisdiction	州、国家、跨境、数据 residency	routing、language、policy source version
Employee role	front line、supervisor、analyst、operations QA、admin	RBAC、workflow permission、training evidence
Risk segment	AML risk tier、fraud risk score、credit band、complaint severity	control escalation and override evidence
Accessibility	screen reader、keyboard navigation、contrast、plain language	accessibility test, manual assist path

高级 PM / BA 不需要把所有组合暴力笛卡尔积。更好的方法是 risk-based pairwise coverage + material scenario coverage:

material segments
  + high-risk journey
  + policy-sensitive output
  + known production defect history
  + accessibility / recourse path
  -> required UAT coverage

8. Risk / Control Coverage

UAT 覆盖率不能只算 test cases passed。应计算 risk/control coverage。

Risk	Acceptance control	UAT / regression evidence
AI 输出错误导致客户错误行动	source grounding、confidence threshold、HITL、customer-facing language review	golden Q&A, narrative fact-check, reviewer approval
AI 自动化扩大错误规模	feature flag、rate limit、kill switch、shadow/parallel run	release config, kill switch drill, monitoring threshold
业务规则绕过	policy engine、tool gateway、denylist/allowlist、dual approval	adversarial pack, tool trajectory test
隐私泄露	data minimization、masking、test data controls、logging redaction	privacy test, access review, sample log inspection
不公平或 segment harm	segment eval、threshold review、recourse path	segment matrix, complaint pathway replay
运营队列不可承受	capacity model、queue monitor、manual fallback	parallel run workload comparison
无法审计	trace id、version capture、evidence vault、retention	evidence completeness query
回滚失败	backward-compatible config、data migration plan、manual runbook	rollback rehearsal and decision criteria

Risk/control coverage 问题:

哪些 high-risk acceptance criteria 没有 test asset?
哪些 control 只有设计文档, 没有运行证据?
哪些 production incidents 没有回写到 regression pack?
哪些 release exceptions 没有 expiry 和 compensating control?

9. AI Regression Matrix

AI 回归不只是代码回归。每个 AI release 都要声明哪些行为资产被改变。

Change object	可能影响	必要回归
Foundation / vendor model	输出风格、推理、拒答、工具调用、延迟、成本	golden eval、red-team pack、latency/cost, high-risk journeys
Fine-tuned / task model	分群表现、阈值、错误类型	segment eval、calibration、backtest、parallel run
Prompt / system instruction	policy boundary、tone、citation、tool choice	prompt regression, adversarial prompt pack
RAG corpus	答案事实、过期政策、引用准确性	retrieval eval、source freshness、citation correctness
Retriever / embedding	recall、ranking、source drift	retrieval benchmark、known-answer set
Tool schema / API	action accuracy、side effect、authorization	tool trajectory test、contract test、negative cases
Policy engine	allow/deny、escalation、geography/product rules	policy decision table regression
Workflow state machine	handoff、queue、exception、retry、timeout	workflow replay, state transition tests
UI / UX	user comprehension、accessibility、operator error	UAT journey, accessibility, training validation
Feature flag / rollout config	population exposure、rollback speed	rollout rehearsal, kill switch verification
Data pipeline / feature	model input distribution、missingness、timeliness	DQ, drift baseline, backfill/replay

AI regression certificate 应声明:

changed_objects
unchanged_but_impacted_objects
test_assets_run
criteria_passed
defects_open
exceptions_accepted
monitoring_updated
rollback_ready
recertification_triggers

10. Workflow Replay, Shadow Testing, Parallel Run

10.1 三者差异

Pattern	目的	证据
Workflow replay	用历史或 synthetic 事件重放端到端状态转换	state transition result, tool call trace, exception path
Shadow testing	生产流量旁路运行 AI, 不影响真实决策	output comparison, risk signal, latency/cost, no-customer-impact proof
Parallel run	新旧流程同时运行并比较业务结果	decision delta, workload delta, customer impact assessment, reconciliation

10.2 使用场景

Situation	推荐方式	原因
高影响客户决策	parallel run + human review	需要比较新旧结果和人工判断
生成式客服辅助	shadow test + sampling QA	可观察输出质量但不直接面向客户
AML / fraud analyst copilot	workflow replay + analyst benchmark	证明证据完整性和操作效率
RAG 政策问答	golden set + retrieval replay	检查知识源、引用和拒答
Tool-using agent	workflow replay + sandbox tool trajectory	证明授权、参数、side effect 和失败处理

10.3 Parallel Run Decision Signals

Signal	Green	Amber	Red
Decision delta	差异符合预期且解释充分	差异集中于特定 segment	未解释差异影响高风险客户或关键控制
Manual workload	在容量模型内	短期增加但有计划	队列超过 SLA 或影响客户
Error taxonomy	错误类型可接受且已监控	有重复错误但补偿控制有效	出现禁止错误或不可解释错误
Control adherence	HITL / logging / policy gate 完整	少量证据缺口有修复计划	关键控制缺证据
Customer impact	无新增 harm signal	存在可管理投诉/申诉信号	客户补救或监管敏感场景受影响

11. Defect Triage Architecture

AI 缺陷不能只按技术 severity 分类。应按 business impact + control impact + recurrence + release decision 组织。

Severity	标准	处置
Critical	禁止输出、客户重大影响、关键控制失效、隐私泄露、提交/交易错误、无法回滚	block release, evidence freeze, executive escalation
High	高风险 journey 失败、segment harm、重大手工绕行、监控缺失、重复缺陷	release hold or exception committee approval
Medium	局部 journey 失败但有补偿控制, 或非关键 segment 表现低于阈值	fix before scale or limited release with monitoring
Low	文案、低风险 UI、非关键测试数据问题	normal backlog, no silent exclusion

Defect record 必须包含:

defect_id
impacted_acceptance_criteria
impacted_journeys
impacted_segments
ai_object_version
root_cause: model | prompt | RAG | tool | data | workflow | UI | control | training
customer_or_employee_impact
control_impact
fix_or_exception_decision
retest_evidence
release_disposition

AI 可以帮助聚类 defect、识别重复根因、映射缺陷到 requirements/tests, 但不能自动降低 severity 或关闭缺陷。

12. Release Certification

Release certification 是 evidence bundle, 不是会议纪要。

12.1 Certification Packet

Section	内容
Release identity	release id、model/prompt/RAG/tool/workflow/config versions
Scope	products、channels、segments、roles、regions、feature flags
Acceptance criteria result	criteria、threshold、actual result、owner decision
Golden journey coverage	run list、pass/fail、defects、evidence links
Synthetic / replay / shadow / parallel evidence	pack ids、sample counts、comparison results
Regression matrix	changed objects、impacted tests、unrun rationale
Risk/control evidence	high-risk controls, AI guardrails, logging, privacy, accessibility
Defect summary	open defects、severity、disposition、retest
Exceptions	accepted risk, compensating controls, expiry, monitoring
Operational readiness	runbook、training、support model、BCP/degraded mode、capacity
Post-release monitoring	metrics、thresholds、owner、review cadence、rollback triggers
Decision	release, limited pilot, hold, rollback, exception release
Sign-off	business, ops, technology, risk/control, AI governance roles

12.2 Decision Types

Decision	适用条件	必要证据
Full release	criteria met, no critical/high unresolved, monitoring and rollback ready	complete certification packet
Limited pilot	material criteria met for limited population, residual risk bounded	exposure cap, monitoring, exit criteria
Exception release	known defect accepted by authorized owner	risk acceptance, compensating control, expiry
Hold	critical criteria missing or evidence incomplete	blockers, owners, target remediation
Rollback / no-go	production or pre-prod signal violates stop rule	rollback plan, communication, incident link

13. Exception / Risk Acceptance

AI release exception 不是"业务知道了"。它必须是结构化风险接受。

Field	内容
exception_id	stable id
linked release / criteria / defect	连接 release、acceptance criteria 和 defect
residual risk	业务、客户、运营、模型、隐私、安全或控制风险
impacted population	客户、员工、交易、渠道、segment
compensating control	限流、人工复核、抽检、监控、手工 fallback
expiry	例外到期时间或 scale gate
owner	有权接受该风险的业务/风险/管理角色
monitoring	指标、阈值、review cadence
closure criteria	修复和重新认证标准

不可接受的例外:

critical control missing 但要求 full release。
没有 owner 的 "known issue"。
没有 expiry 的长期 exception。
用 AI summary 替代风险接受理由。
不披露客户/员工影响范围。

14. Test Data Governance, Privacy, Accessibility

14.1 测试数据治理

Concern	Guardrail
Production data use	记录必要性、授权、脱敏/匿名化、访问控制、环境保护、保留期限
Synthetic data	记录生成逻辑、敏感属性、预期输出、reviewer、重识别风险
Prompt / output logs	避免保留不必要明文 PII, 使用 hash、redaction、structured metadata
Evidence retention	evidence 有 owner、retention class、legal hold path、access log
Third-party model test	避免把 restricted data 发往未批准模型或供应商环境

14.2 Privacy-by-UAT

UAT 需要证明:

AI 不需要过量上下文也能完成任务。
测试环境和证据库不泄露敏感客户信息。
人工 reviewer 只看到完成任务所需信息。
日志可复盘但不制造新的数据暴露面。
UAT defects 不在截图、聊天记录或 spreadsheet 中扩散 PII。

14.3 Accessibility and Inclusive Acceptance

Area	验收点
Digital accessibility	screen reader、keyboard、contrast、focus order、error message
Language	plain language、translation consistency、non-English policy accuracy
Cognitive load	AI 建议是否让员工过度依赖或客户误解
Recourse	客户能否申诉、转人工、获得解释
Employee accessibility	操作台是否支持不同能力员工完成复核和 override

Accessibility 不是 UI 附件, 它是 business acceptance 的一部分。对客户产生决定或建议影响的 AI, 必须证明可理解、可申诉、可转人工。

15. Audit Trail and Evidence Objects

每个 UAT / certification evidence object 应有结构:

Field	内容
evidence_id	stable id
evidence_type	test_run、UAT_session、eval_report、shadow_report、parallel_run、approval、exception
release_id	release / feature flag / deployment batch
AI object versions	model、prompt、retriever、corpus、tool、policy、workflow
linked criteria	acceptance criteria ids
linked controls	control ids
produced_by	system、tester、business owner、AI assistant、release workflow
produced_at	timestamp
reviewer	accountable human reviewer where applicable
integrity	checksum、immutable store、access log
retention	retention class and legal hold path

Evidence query examples:

Query	Release readiness value
Show all high-risk criteria without evidence	找阻塞项
Show all model/prompt changes without regression	找变更盲区
Show all golden journeys failed in last 3 releases	找系统性质量问题
Show all accepted exceptions expiring in 30 days	管理 residual risk
Show all production incidents not mapped to regression pack	防止事故不沉淀

16. Operational Readiness and Post-Release Monitoring

UAT 通过但运营未准备好, 仍然不应该 release。

16.1 Operational Readiness

Area	Evidence
Runbook	normal、exception、degraded mode、kill switch、rollback
Training	business users、reviewers、supervisors、support desk training completion
Support model	L1/L2/L3 owner、AI governance escalation、vendor support
Monitoring	quality、latency、cost、drift、tool errors、complaints、queue load
Capacity	manual review queue、back-office SLA、call center handle time
Communication	employee release note、customer communication path where applicable
BCP / resilience	fallback process、manual workaround、recovery exercise

16.2 Post-Release Monitoring Metrics

Metric	Purpose
Business outcome	completion rate、cycle time、conversion、case resolution
Quality	accuracy、citation correctness、unsupported claim、tool action success
Risk	policy violation、high-risk escalation miss、privacy event、complaint
Segment	performance by customer segment、language、channel、product
Operations	manual review volume、queue aging、override rate、training issue
Reliability	latency、error rate、timeout、vendor degradation
Cost	cost per successful task、token spend, human review cost
Adoption	employee usage、acceptance/override pattern、misuse signals

16.3 Rollback Criteria

Rollback criteria must be pre-agreed:

Trigger	Response
critical customer harm signal	disable feature flag, route to manual, incident workflow
prohibited output observed	freeze version, block prompt/model, QA sample expansion
control logging failure	pause release cohort, restore prior version, evidence gap assessment
manual queue exceeds capacity	reduce exposure, switch to assisted-only mode
model or retrieval drift breach	revert model/corpus, run targeted regression
vendor outage / latency	fallback provider or manual process
privacy incident	stop affected flow, preserve evidence, authorized response workflow

17. AI's Role in UAT

AI can accelerate acceptance work, but cannot own final acceptance.

AI assist use case	Value	Boundary
Generate candidate test cases	expands coverage across journeys and edge cases	human owner approves expected outcome and risk relevance
Find coverage gaps	maps requirements to tests, controls and evidence	deterministic traceability remains source of truth
Cluster defects	identifies recurring themes and root causes	severity/disposition remains accountable human decision
Map requirements to tests	builds traceability graph	cannot invent missing approvals or criteria
Summarize evidence	creates release memo draft	summary must cite evidence ids and be reviewed
Generate synthetic scenarios	improves rare-event coverage	sensitive attributes and expected outcomes need governance
Review UAT transcripts	finds repeated confusion, training gaps	privacy and access controls required
Suggest regression impact	links changed object to tests	architect/release owner confirms final scope

Prohibited:

AI does not sign UAT.
AI does not accept residual risk.
AI does not decide that a defect is immaterial.
AI does not approve release certification.
AI does not replace independent validation or internal audit.
AI does not create evidence after the fact to fill missing controls.

18. PM / BA / Architect Implications

Role	高级职责	关键产出
PM	把上线从 feature delivery 转成 business capability certification	acceptance strategy、release decision memo、readiness dashboard
Senior BA	把业务目标、journey、policy、exception 和 evidence 写成可验收契约	acceptance criteria contract、coverage matrix、UAT scripts
Solution Architect	把 evidence capture、versioning、rollback、monitoring 设计进架构	reference architecture、traceability model、rollback design
QA / Test Architect	把 golden journeys、synthetic packs、automation regression 和 defect taxonomy 产品化	test asset library、regression suite、quality gates
AI Governance / Model Risk	定义 AI-specific acceptance、model/prompt/RAG/tool regression 和 monitoring	AI use register、eval thresholds、risk acceptance matrix
Operations Owner	证明队列、培训、runbook、support 和 fallback ready	operational readiness pack
Risk / Compliance	确认 control coverage、exception ownership 和 policy-sensitive scenarios	control checklist、exception decision record

高级面试表达:

我不会把 AI UAT 当成业务试点签字。我会把它设计成 acceptance evidence architecture: 每个业务验收标准连接 golden journey、synthetic pack、persona/segment coverage、risk/control coverage、AI regression result、defect disposition、exception acceptance、operational readiness 和 post-release monitoring。AI 可以帮我们生成候选测试、找覆盖缺口、聚类缺陷和总结证据, 但最终业务接受和风险接受必须由授权 owner 完成。

19. Anti-Patterns

Anti-pattern	风险	改法
UAT as email sign-off	签署与 criteria、版本、证据脱节	structured release certification packet
Happy-path-only UAT	真实生产 exception 和 high-risk segment 没覆盖	golden journey + synthetic + risk-based coverage
UI click scripts for AI behavior	只证明页面能点, 不证明模型/工具/检索行为	AI regression matrix and trace evidence
Production data copied into UAT uncontrolled	敏感信息扩散, 测试证据成为隐私风险	sanitized/synthetic data and documented exception controls
Defects closed by meeting consensus	无 root cause、retest、customer impact	defect taxonomy and retest evidence
Prompt change treated as minor copy edit	行为边界变化没有回归	prompt versioning and regression trigger
RAG update without acceptance impact analysis	事实答案变化不可见	corpus versioning and retrieval golden set
Shadow test without comparison criteria	旁路跑了但不能决策	predefined comparison metrics and thresholds
Parallel run with no workload analysis	新流程把人工队列压垮	capacity and queue readiness metrics
Exception with no expiry	residual risk 永久化	owner, expiry, compensating control, monitoring
AI writes certification memo unsourced	证据摘要可能幻觉或夸大	evidence id citation and human approval

20. Interview Questions

Q1: AI 系统的 UAT 和传统 UAT 最大区别是什么?

30 秒版本: 传统 UAT 常证明流程和界面是否符合需求。AI UAT 要证明业务能力在特定模型、prompt、RAG、工具、数据、流程和控制版本下可接受, 并且覆盖高风险 journey、segment、控制、回归、运营准备和上线后监控。

2 分钟版本: AI 系统的不确定性更高, 行为也更依赖数据、模型、提示词、知识库和工具调用。我的做法是把 UAT 从 sign-off 转成 evidence architecture。先定义 acceptance claim 和 criteria, 再设计 golden journey library、synthetic transaction packs、persona/segment matrix 和 risk/control coverage。然后做 model/prompt/RAG/tool/workflow regression, 用 workflow replay、shadow testing 或 parallel run 证明行为差异可接受。最后形成 release certification packet, 记录缺陷、例外、风险接受、运营准备、监控和回滚标准。

Q2: 如何判断一个 AI release 可以认证上线?

30 秒版本: 我看六件事: business acceptance criteria 是否达标, high-risk journey 和 segment 是否覆盖, AI regression 是否完成, critical/high defects 是否关闭或授权接受, operational readiness 是否具备, monitoring/rollback 是否预设。

2 分钟版本: release certification 不是看测试通过率。我要能看到 release identity, 包括模型、prompt、RAG、工具、workflow 和 feature flag 版本。每个 high-risk acceptance criteria 应连接测试证据和 owner 决策。缺陷要有 severity、root cause、impact、retest 和 disposition。例外必须有 risk owner、expiry、compensating control 和 monitoring。运营层面要有 runbook、培训、support、capacity 和 fallback。上线后要有质量、风险、segment、运营、成本和投诉指标, 以及明确 rollback triggers。

Q3: Golden journey 和 synthetic transaction pack 有什么区别?

30 秒版本: Golden journey 是端到端业务旅程资产, 用来证明真实流程可接受。Synthetic transaction pack 是构造的交易/客户/文档/事件样本, 用来覆盖稀有、敏感、边界和高风险情境。

2 分钟版本: 例如 AI KYC 文档审核, golden journey 会覆盖新客开户、资料补传、人工复核、拒绝建议和申诉路径。Synthetic pack 则可以构造边界身份证件、低质量图片、语言差异、脆弱客户、欺诈文档、系统超时和隐私字段场景。两者要连接: synthetic pack 提供输入资产, golden journey 提供流程和控制上下文。合格证据不是截图, 而是 trace id、版本、expected output、actual output、control result、defect 和 reviewer。

Q4: AI 在 UAT 中应该做什么, 不应该做什么?

30 秒版本: AI 可以生成候选测试、发现覆盖缺口、聚类缺陷、映射需求到测试、总结证据。AI 不能拥有最终验收, 不能降低缺陷严重度, 不能接受残余风险, 不能签署 release。

2 分钟版本: 我会把 AI 作为测试与证据 copilot。比如用它从 requirements 和 policy 中生成测试候选, 对比 acceptance criteria 和 test assets 找 gap, 对 defect tickets 做聚类和 root cause suggestion, 起草 release evidence summary。但这些输出必须引用证据 id, 并由业务、QA、风险或架构 owner 审核。最终 acceptance 是授权责任行为, 不是模型输出。否则团队会把 AI 生成的漂亮报告误认为真实控制。

Q5: Shadow testing 和 parallel run 如何选择?

30 秒版本: shadow testing 适合先观察 AI 输出但不影响生产决策; parallel run 适合比较新旧流程和人工判断, 尤其是高影响客户决策。选择取决于客户影响、决策重要性、人工容量和证据需求。

2 分钟版本: 生成式客服坐席辅助可以先 shadow, 看 AI 建议、引用、拒答、延迟和成本, 不直接给客户。信贷、KYC 或 fraud 决策更适合 parallel run, 因为要比较新旧策略、人工结论、segment 差异和队列负载。无论哪种方式, 都要提前定义 comparison criteria, 例如 decision delta、manual workload、error taxonomy、control adherence 和 customer impact。没有阈值的 shadow/parallel run 只是观察活动, 不能支持 release certification。

21. Minimum Viable Architecture

Capability	MVP	Scale version
Acceptance registry	top AI use cases 的 criteria + owner	enterprise acceptance graph
Golden journeys	top 20 risk-based journeys	reusable journey library with coverage analytics
Synthetic packs	high-risk edge cases and privacy-safe samples	governed synthetic data factory
AI regression	model/prompt/RAG/tool checklist	automated regression certification plane
Evidence capture	release packet and evidence index	immutable evidence vault with query
Defect governance	severity + root cause + retest	defect analytics and production feedback loop
Operational readiness	runbook + monitoring + rollback	integrated release readiness control tower
Post-release monitoring	quality/risk/ops metrics	closed-loop eval expansion and recertification

最终判断标准:

Can the team reconstruct why this AI release was accepted,
which business criteria it satisfied,
which journeys and segments were tested,
which model/prompt/RAG/tool versions were certified,
which risks were accepted,
who accepted them,
and what would trigger rollback after release?

如果答案是否定的, 问题不是 UAT 人员不努力, 而是 business acceptance architecture 还没有被产品化。