返回 Papers
AI 底层逻辑 / 经典论文

AI UAT / Regression Certification:业务验收架构

重要说明: 本文是学习、作品集和内部架构训练材料, 不构成法律意见、合规结论、审计意见、模型验证结论、信息安全认证、无障碍合规结论或监管解释。具体适用范围、控制要求、客户影响判断、风险接受权限、上线批准、回滚触发和对外沟通由 Legal / Compliance / Risk / Model Risk / Information Security / Internal Audit / Busine

746ai-foundations/papers/149-ai-uat-regression-certification-business-acceptance-architecture.md

AI UAT / Regression Certification / Business Acceptance Architecture 解读

面向对象: CBAP+ Financial Retail PM / Senior BA / Product Architecture Lead / Solution Architect / QA Governance Lead / AI Governance / Operational Risk / Release Manager。 核心问题: 金融零售 AI 系统如何把 UAT、回归测试、业务验收、发布认证、风险接受、证据包、运行准备和上线后监控设计成可证明的 evidence architecture, 而不是上线前的签字仪式? 学习目标: 建立 business acceptance criteria、golden journey library、synthetic transaction pack、persona/segment coverage、risk/control coverage、AI regression matrix、workflow replay、shadow testing、parallel run、defect triage、release certification、exception acceptance、rollback 和 post-release monitoring 的系统架构心智模型。

重要说明: 本文是学习、作品集和内部架构训练材料, 不构成法律意见、合规结论、审计意见、模型验证结论、信息安全认证、无障碍合规结论或监管解释。具体适用范围、控制要求、客户影响判断、风险接受权限、上线批准、回滚触发和对外沟通由 Legal / Compliance / Risk / Model Risk / Information Security / Internal Audit / Business Owner / authorized management 根据机构政策和监管关系确认。访问日期按 2026-06-30 记录。


Source Anchors

SourceOfficial link本文使用方式
FFIEC Development, Acquisition, and Maintenance IT Handbookhttps://ithandbook.ffiec.gov/it-booklets/development-acquisition-and-maintenance/用 SDLC、testing、implementation and assessment、maintenance、change management、rollback/back-out、testing data controls 和 documentation language 组织 UAT / regression / release certification。
FFIEC DA&M - V.B Testinghttps://ithandbook.ffiec.gov/it-booklets/development-acquisition-and-maintenance/v-development/vb-testing/用 testing scope、test results、corrective actions、UAT、regression testing、stress testing、production-data-in-testing controls 校准测试证据。
FFIEC Management IT Handbookhttps://ithandbook.ffiec.gov/it-booklets/management用 IT governance、risk management、enterprise architecture、project management、information systems reporting 组织 ownership、management reporting 和 release oversight。
FFIEC Business Continuity Management IT Handbookhttps://ithandbook.ffiec.gov/it-booklets/business-continuity-management用 business impact analysis、interdependency analysis、resilience、event management、continuity/recovery、exercises/tests、maintenance/improvement 支撑 operational readiness、degraded mode 和 rollback。
NIST SP 800-218 Secure Software Development Frameworkhttps://csrc.nist.gov/pubs/sp/800/218/final用 Prepare the Organization、Protect the Software、Produce Well-Secured Software、Respond to Vulnerabilities 的生命周期语言设计 secure release evidence。
NIST SP 800-53 Rev. 5https://csrc.nist.gov/pubs/sp/800/53/r5/upd1/final用 control catalog、assessment、audit/logging、configuration/change、contingency、risk assessment、system integrity 等控制语言表达证据对象。
NIST AI Risk Management Frameworkhttps://www.nist.gov/itl/ai-risk-management-framework用 Govern / Map / Measure / Manage 组织 AI UAT 的上下文、风险、度量、处置和持续改进。
NIST AI RMF Corehttps://airc.nist.gov/airmf-resources/airmf/用 AI RMF Core 的 function/category 思维设计 risk-to-test-to-evidence traceability。
ISO/IEC 42001 AI management systemshttps://www.iso.org/standard/81230.html用 AI management system、risk/opportunity、operational control、performance evaluation、improvement 组织验收治理和管理体系证据。

一句话:

AI UAT is not a ceremony where business signs a release. It is an evidence architecture that proves a business capability is acceptable, controllable, operable, reversible and monitorable under a specific model, prompt, data, workflow and control version.


1. Thesis: UAT 不是签字, 是可证明的业务接受

传统 UAT 经常被误用成上线前最后一轮业务试点:

requirements document
  -> QA says tests passed
  -> business users click through screens
  -> defects discussed in daily call
  -> UAT sign-off email
  -> release

这个模式在 AI 系统里不够。AI 不是只改 UI 或交易规则, 它可能改变判断、推荐、解释、检索、分流、例外处理、人工工作负载、客户沟通、运营风险和监控方式。业务验收如果只证明"页面能走通", 就无法回答:

高级验收问题为什么 AI 场景更尖锐
业务结果是否可接受模型输出概率化, 不能只看 happy path
覆盖了哪些客户、渠道、产品、语言和脆弱客户场景segment gap 会导致局部客户伤害
哪些控制真实生效prompt guardrail、retrieval filter、tool gateway、HITL 不是看界面能证明
回归影响在哪里model、prompt、RAG corpus、tool schema、policy rule、workflow 任何一个变更都可能改变行为
例外如何接受AI residual risk 不能靠口头"业务同意"
上线后如何发现偏移离线 UAT 通过不代表生产分布持续稳定
出事如何回滚模型版本、prompt、retriever、feature flag、workflow 和人工操作都要有回退路径

成熟的 AI UAT / Regression Certification 架构应写成:

business objective
  -> acceptance claim
  -> acceptance criteria
  -> golden journeys
  -> synthetic and replay transaction packs
  -> persona / segment / channel coverage
  -> risk and control coverage
  -> model / prompt / RAG / tool / workflow regression
  -> defect triage and residual risk
  -> release certification memo
  -> operational readiness evidence
  -> post-release monitoring and rollback criteria

UAT 的产品不是"签字记录", 而是一组可以被业务、风险、架构、QA、运营、内审和管理层共同查询的 acceptance evidence。


2. Acceptance Evidence Stack

2.1 从 Sign-Off 到 Claim-Based Acceptance

UAT 应先定义要证明的 claim:

For release R,
under model M / prompt P / RAG corpus K / tool set T / workflow W / policy C,
the business capability is acceptable for population S,
because evidence E demonstrates criteria A,
with known exceptions X,
accepted by owners O,
and monitored by controls N.
Stack layer关键对象必须回答
Business claim业务能力可接受声明这次上线究竟证明哪个业务能力可以进入生产?
Scopeproduct、journey、persona、channel、region、system、release哪些客户、员工、流程和技术版本在范围内?
Criteriabusiness acceptance criteria、risk acceptance criteria、operational criteria什么叫可接受, 阈值是什么, 谁有权判断?
Test assetsgolden journey、synthetic pack、workflow replay、shadow sample、parallel run sample证据来自哪些可复用资产?
Regression matrixmodel、prompt、RAG、tool、policy、workflow、data、UI/API本次变更可能影响哪些行为?
Control evidencelogs、eval report、defect ticket、approval、exception、monitoring config控制是否真实运行并留下证据?
Decision recordrelease certification、hold、limited release、exception release、rollback最终决策是什么, 剩余风险是什么?

2.2 UAT 与 QA / Model Validation / Audit 的边界

Discipline关注点不能替代
QA功能、集成、性能、自动化回归、缺陷修复验证不能代表业务接受残余风险
UAT / Business Acceptance业务流程、客户/员工旅程、政策边界、例外处理、操作准备不能替代独立模型验证或安全评估
Model validation / AI risk模型适用性、性能、偏差、稳定性、限制、监控不能替代真实业务流程和运营准备
Information security访问控制、数据保护、漏洞、日志、安全测试不能确认业务结果是否可接受
Internal audit独立评价控制设计与运行不拥有上线接受责任
Release management变更窗口、部署、回滚、环境准备不判断业务和风险是否接受

高级 BA / PM 的价值在于把这些证据连接起来, 形成 release certification bundle, 而不是让每个团队交一份孤立附件。


3. Reference Architecture

Business goals / policies / risk appetite / customer impact
        |
        v
Acceptance criteria registry
  business outcome | journey criteria | risk/control criteria | ops readiness | rollback
        |
        v
Test asset library
  golden journeys | persona packs | synthetic transactions | replay datasets
  accessibility cases | exception scenarios | adversarial prompts | edge cases
        |
        v
AI regression certification plane
  model version | prompt version | retriever/corpus version | tool schema/version
  policy engine | workflow state machine | feature flags | UI/API
        |
        v
Execution and evidence capture
  automated tests | business UAT sessions | workflow replay | shadow mode
  parallel run | runbook drills | monitoring dry run
        |
        v
Defect and exception governance
  severity | root cause | customer impact | control impact | owner | disposition
        |
        v
Release certification
  coverage matrix | evidence index | open exceptions | risk acceptance
  operational readiness | rollback criteria | sign-off ledger
        |
        v
Production monitoring and feedback loop
  quality sampling | drift | complaint/appeal | operational load | incidents
  defect clustering | eval expansion | corrective action

架构原则:

No business acceptance without explicit acceptance criteria.
No acceptance criteria without coverage evidence.
No AI release certification without model/prompt/RAG/tool/workflow regression.
No production data in testing without documented need, controls and approval.
No exception without owner, expiry, compensating control and monitoring.
No release without operational readiness and rollback criteria.
No AI-generated UAT evidence summary without human accountable acceptance.

4. Business Acceptance Criteria Contract

业务验收标准不是 user story 的 "acceptance criteria" 简写。AI 系统的业务接受要覆盖结果、边界、控制、运营和证据。

Contract field必须写清楚反例
Capability被验收的业务能力, 不只是技术功能"AI 已集成"
Population客户/员工/账户/交易/地区/语言/渠道/产品范围"所有用户"但没有 segment 证明
Decision impactAI 是建议、排序、摘要、自动决策、分流还是工具执行不说明 AI 输出如何影响业务行动
Success criteria正确率、完成率、人工覆盖率、错误容忍、延迟、成本、客户影响阈值"业务认为体验好"
Risk criteria禁止输出、升级规则、脆弱客户保护、异常处置、可解释性风险只写在单独风险文档
Control criteriaHITL、dual control、policy guardrail、tool approval、logging、access控制没有测试和证据
Regression scopemodel、prompt、RAG、tool、workflow、policy、data、UI/API只做 UI 点击回归
Evidence requiredeval report、test run、session record、defect summary、approval、monitoring config邮件回复"approved"
Decision ownerbusiness owner、risk owner、operations owner、technology owner只有项目经理代签
Expiry / re-cert trigger何时需要重新认证改 prompt / 知识库 / 工具不触发 UAT

示例:

acceptance_id: ACC-KYC-DOC-AI-ONBOARDING-2026Q3
capability: AI-assisted document review for retail account opening
population: mobile channel, English/Spanish, domestic ID + utility bill journeys
decision_impact: AI extracts fields and recommends pass/review; final reject remains human-controlled
success_criteria:
  - golden journey pass rate >= 98%
  - unsupported rejection recommendation = 0 for protected scenarios in test pack
  - manual review queue increase <= 12% versus baseline in parallel run
risk_criteria:
  - vulnerable customer and accessibility scenarios routed without loss of recourse
  - policy uncertainty triggers human review
control_criteria:
  - prompt/model/version logged for every recommendation
  - source document hash and extraction evidence retained
  - override reason required for human reviewer changes
recertification_trigger:
  - model major/minor version change
  - prompt or policy rule change affecting decision boundary
  - document taxonomy or OCR provider change
  - complaint spike or monitoring breach

5. Golden Journey Library

Golden journey 是业务验收的核心资产。它不是截图脚本, 而是端到端业务事实、AI 行为、控制点和证据路径的可复用案例。

5.1 Golden Journey 应覆盖的层次

Journey type例子验收重点
Happy path正常开户、正常支付纠纷、正常客服政策问答端到端完成率、延迟、证据完整
High-value / high-risk大额转账、信贷额度调整、AML 高风险告警控制升级、人工复核、日志和审批
Exception path资料不全、系统超时、知识库无答案、工具调用失败降级、拒答、补救、运营队列
Customer harm prevention误拒、误导性建议、脆弱客户、语言障碍recourse、accessibility、clear explanation
Operational stressclose period、峰值交易、供应商降级、人工队列拥塞BCM、SLO、degraded mode
Compliance-sensitiveKYC、AML、投诉、信贷解释、费用争议来源引用、禁止结论、权限边界

5.2 Golden Journey Card

Field内容
journey_id稳定 ID, 例如 GJ-AML-INVESTIGATION-ESCALATE-017
business objective该旅程证明的业务目标
persona / segment客户、员工、渠道、语言、产品、风险段
preconditions账户状态、权限、数据状态、模型/知识库版本
steps业务步骤, 不只是 UI 操作
AI touchpoints模型输出、RAG 检索、工具调用、推荐、摘要
expected behavior可接受输出、拒答、升级、人工复核
controlsHITL、policy check、logging、data mask、dual approval
evidencetrace id、eval run、screen-free data proof、approval、defect link
pass / fail以 criteria 判断, 不是 tester 主观感受

5.3 Golden Journey 的架构价值

架构价值说明
Change impact模型、prompt、RAG、工具或流程变更可以反查影响哪些 journey
Regression reuse每个 release 重跑关键 journey, 不重新发明 UAT 脚本
Evidence continuity同一 journey 在 UAT、shadow、parallel run、production sampling 中保持一致
Audit readiness可展示业务接受不是任意抽样, 而是有风险导向的 journey coverage
Portfolio learning生产缺陷回写到 journey library, 把事故转成长期回归资产

6. Synthetic Transaction Packs

AI UAT 不能只依赖真实历史数据。金融零售场景常需要 synthetic transaction packs 来覆盖罕见、敏感、边界和高风险情境。

Pack type覆盖内容设计要求
Boundary pack阈值附近、临界额度、临界日期、临界规则能证明规则边界和模型边界都稳定
Rare event pack欺诈模式、AML typology、投诉升级、灾备场景不能等待生产自然出现
Protected / sensitive segment pack语言、年龄段、残障辅助、低数字能力、脆弱客户用来测试公平、可及性和客户伤害控制
Privacy packPII、masked data、tokenized customer、data minimization确保证据有用但不暴露真实敏感信息
Adversarial packprompt injection、policy bypass、jailbreak、malicious document验证 AI guardrail 和 tool gateway
Operational pack超时、重复提交、第三方不可用、队列拥塞验证降级、恢复和运营 runbook

Synthetic pack 的证据字段:

pack_id
generation_method
source_pattern
privacy_classification
business_rationale
covered_acceptance_criteria
covered_risks_and_controls
expected_outputs
reviewer
version
retention_class

关键点:

  • synthetic 不等于随便造数据, 它要有业务依据、风险依据和预期结果。
  • synthetic data 可以降低测试隐私风险, 但仍要治理生成方法、敏感属性、重识别风险和保留策略。
  • high-impact scenario 的 synthetic expected output 应由业务、风险或 policy owner 审核, 不能只由 AI 生成。

7. Persona / Segment / Channel Coverage

AI UAT 的成熟度取决于是否能证明覆盖了真实业务分布和高风险边缘人群。

Coverage axis金融零售例子接受证据
Customer segment新客、老客、高净值、薄信用档案、学生、退休、small business ownersegment matrix、sample count、pass/fail
Vulnerability老年客户、残障客户、低英语熟练度、经济困难、被诈骗风险inclusive journey、recourse test、accessibility evidence
Productchecking、credit card、mortgage、personal loan、investment referralproduct-specific journey coverage
Channelmobile、web、branch、contact center、back-office workbenchchannel journey result、handoff evidence
Geography / jurisdiction州、国家、跨境、数据 residencyrouting、language、policy source version
Employee rolefront line、supervisor、analyst、operations QA、adminRBAC、workflow permission、training evidence
Risk segmentAML risk tier、fraud risk score、credit band、complaint severitycontrol escalation and override evidence
Accessibilityscreen reader、keyboard navigation、contrast、plain languageaccessibility test, manual assist path

高级 PM / BA 不需要把所有组合暴力笛卡尔积。更好的方法是 risk-based pairwise coverage + material scenario coverage:

material segments
  + high-risk journey
  + policy-sensitive output
  + known production defect history
  + accessibility / recourse path
  -> required UAT coverage

8. Risk / Control Coverage

UAT 覆盖率不能只算 test cases passed。应计算 risk/control coverage。

RiskAcceptance controlUAT / regression evidence
AI 输出错误导致客户错误行动source grounding、confidence threshold、HITL、customer-facing language reviewgolden Q&A, narrative fact-check, reviewer approval
AI 自动化扩大错误规模feature flag、rate limit、kill switch、shadow/parallel runrelease config, kill switch drill, monitoring threshold
业务规则绕过policy engine、tool gateway、denylist/allowlist、dual approvaladversarial pack, tool trajectory test
隐私泄露data minimization、masking、test data controls、logging redactionprivacy test, access review, sample log inspection
不公平或 segment harmsegment eval、threshold review、recourse pathsegment matrix, complaint pathway replay
运营队列不可承受capacity model、queue monitor、manual fallbackparallel run workload comparison
无法审计trace id、version capture、evidence vault、retentionevidence completeness query
回滚失败backward-compatible config、data migration plan、manual runbookrollback rehearsal and decision criteria

Risk/control coverage 问题:

  • 哪些 high-risk acceptance criteria 没有 test asset?
  • 哪些 control 只有设计文档, 没有运行证据?
  • 哪些 production incidents 没有回写到 regression pack?
  • 哪些 release exceptions 没有 expiry 和 compensating control?

9. AI Regression Matrix

AI 回归不只是代码回归。每个 AI release 都要声明哪些行为资产被改变。

Change object可能影响必要回归
Foundation / vendor model输出风格、推理、拒答、工具调用、延迟、成本golden eval、red-team pack、latency/cost, high-risk journeys
Fine-tuned / task model分群表现、阈值、错误类型segment eval、calibration、backtest、parallel run
Prompt / system instructionpolicy boundary、tone、citation、tool choiceprompt regression, adversarial prompt pack
RAG corpus答案事实、过期政策、引用准确性retrieval eval、source freshness、citation correctness
Retriever / embeddingrecall、ranking、source driftretrieval benchmark、known-answer set
Tool schema / APIaction accuracy、side effect、authorizationtool trajectory test、contract test、negative cases
Policy engineallow/deny、escalation、geography/product rulespolicy decision table regression
Workflow state machinehandoff、queue、exception、retry、timeoutworkflow replay, state transition tests
UI / UXuser comprehension、accessibility、operator errorUAT journey, accessibility, training validation
Feature flag / rollout configpopulation exposure、rollback speedrollout rehearsal, kill switch verification
Data pipeline / featuremodel input distribution、missingness、timelinessDQ, drift baseline, backfill/replay

AI regression certificate 应声明:

changed_objects
unchanged_but_impacted_objects
test_assets_run
criteria_passed
defects_open
exceptions_accepted
monitoring_updated
rollback_ready
recertification_triggers

10. Workflow Replay, Shadow Testing, Parallel Run

10.1 三者差异

Pattern目的证据
Workflow replay用历史或 synthetic 事件重放端到端状态转换state transition result, tool call trace, exception path
Shadow testing生产流量旁路运行 AI, 不影响真实决策output comparison, risk signal, latency/cost, no-customer-impact proof
Parallel run新旧流程同时运行并比较业务结果decision delta, workload delta, customer impact assessment, reconciliation

10.2 使用场景

Situation推荐方式原因
高影响客户决策parallel run + human review需要比较新旧结果和人工判断
生成式客服辅助shadow test + sampling QA可观察输出质量但不直接面向客户
AML / fraud analyst copilotworkflow replay + analyst benchmark证明证据完整性和操作效率
RAG 政策问答golden set + retrieval replay检查知识源、引用和拒答
Tool-using agentworkflow replay + sandbox tool trajectory证明授权、参数、side effect 和失败处理

10.3 Parallel Run Decision Signals

SignalGreenAmberRed
Decision delta差异符合预期且解释充分差异集中于特定 segment未解释差异影响高风险客户或关键控制
Manual workload在容量模型内短期增加但有计划队列超过 SLA 或影响客户
Error taxonomy错误类型可接受且已监控有重复错误但补偿控制有效出现禁止错误或不可解释错误
Control adherenceHITL / logging / policy gate 完整少量证据缺口有修复计划关键控制缺证据
Customer impact无新增 harm signal存在可管理投诉/申诉信号客户补救或监管敏感场景受影响

11. Defect Triage Architecture

AI 缺陷不能只按技术 severity 分类。应按 business impact + control impact + recurrence + release decision 组织。

Severity标准处置
Critical禁止输出、客户重大影响、关键控制失效、隐私泄露、提交/交易错误、无法回滚block release, evidence freeze, executive escalation
High高风险 journey 失败、segment harm、重大手工绕行、监控缺失、重复缺陷release hold or exception committee approval
Medium局部 journey 失败但有补偿控制, 或非关键 segment 表现低于阈值fix before scale or limited release with monitoring
Low文案、低风险 UI、非关键测试数据问题normal backlog, no silent exclusion

Defect record 必须包含:

defect_id
impacted_acceptance_criteria
impacted_journeys
impacted_segments
ai_object_version
root_cause: model | prompt | RAG | tool | data | workflow | UI | control | training
customer_or_employee_impact
control_impact
fix_or_exception_decision
retest_evidence
release_disposition

AI 可以帮助聚类 defect、识别重复根因、映射缺陷到 requirements/tests, 但不能自动降低 severity 或关闭缺陷。


12. Release Certification

Release certification 是 evidence bundle, 不是会议纪要。

12.1 Certification Packet

Section内容
Release identityrelease id、model/prompt/RAG/tool/workflow/config versions
Scopeproducts、channels、segments、roles、regions、feature flags
Acceptance criteria resultcriteria、threshold、actual result、owner decision
Golden journey coveragerun list、pass/fail、defects、evidence links
Synthetic / replay / shadow / parallel evidencepack ids、sample counts、comparison results
Regression matrixchanged objects、impacted tests、unrun rationale
Risk/control evidencehigh-risk controls, AI guardrails, logging, privacy, accessibility
Defect summaryopen defects、severity、disposition、retest
Exceptionsaccepted risk, compensating controls, expiry, monitoring
Operational readinessrunbook、training、support model、BCP/degraded mode、capacity
Post-release monitoringmetrics、thresholds、owner、review cadence、rollback triggers
Decisionrelease, limited pilot, hold, rollback, exception release
Sign-offbusiness, ops, technology, risk/control, AI governance roles

12.2 Decision Types

Decision适用条件必要证据
Full releasecriteria met, no critical/high unresolved, monitoring and rollback readycomplete certification packet
Limited pilotmaterial criteria met for limited population, residual risk boundedexposure cap, monitoring, exit criteria
Exception releaseknown defect accepted by authorized ownerrisk acceptance, compensating control, expiry
Holdcritical criteria missing or evidence incompleteblockers, owners, target remediation
Rollback / no-goproduction or pre-prod signal violates stop rulerollback plan, communication, incident link

13. Exception / Risk Acceptance

AI release exception 不是"业务知道了"。它必须是结构化风险接受。

Field内容
exception_idstable id
linked release / criteria / defect连接 release、acceptance criteria 和 defect
residual risk业务、客户、运营、模型、隐私、安全或控制风险
impacted population客户、员工、交易、渠道、segment
compensating control限流、人工复核、抽检、监控、手工 fallback
expiry例外到期时间或 scale gate
owner有权接受该风险的业务/风险/管理角色
monitoring指标、阈值、review cadence
closure criteria修复和重新认证标准

不可接受的例外:

  • critical control missing 但要求 full release。
  • 没有 owner 的 "known issue"。
  • 没有 expiry 的长期 exception。
  • 用 AI summary 替代风险接受理由。
  • 不披露客户/员工影响范围。

14. Test Data Governance, Privacy, Accessibility

14.1 测试数据治理

ConcernGuardrail
Production data use记录必要性、授权、脱敏/匿名化、访问控制、环境保护、保留期限
Synthetic data记录生成逻辑、敏感属性、预期输出、reviewer、重识别风险
Prompt / output logs避免保留不必要明文 PII, 使用 hash、redaction、structured metadata
Evidence retentionevidence 有 owner、retention class、legal hold path、access log
Third-party model test避免把 restricted data 发往未批准模型或供应商环境

14.2 Privacy-by-UAT

UAT 需要证明:

  • AI 不需要过量上下文也能完成任务。
  • 测试环境和证据库不泄露敏感客户信息。
  • 人工 reviewer 只看到完成任务所需信息。
  • 日志可复盘但不制造新的数据暴露面。
  • UAT defects 不在截图、聊天记录或 spreadsheet 中扩散 PII。

14.3 Accessibility and Inclusive Acceptance

Area验收点
Digital accessibilityscreen reader、keyboard、contrast、focus order、error message
Languageplain language、translation consistency、non-English policy accuracy
Cognitive loadAI 建议是否让员工过度依赖或客户误解
Recourse客户能否申诉、转人工、获得解释
Employee accessibility操作台是否支持不同能力员工完成复核和 override

Accessibility 不是 UI 附件, 它是 business acceptance 的一部分。对客户产生决定或建议影响的 AI, 必须证明可理解、可申诉、可转人工。


15. Audit Trail and Evidence Objects

每个 UAT / certification evidence object 应有结构:

Field内容
evidence_idstable id
evidence_typetest_run、UAT_session、eval_report、shadow_report、parallel_run、approval、exception
release_idrelease / feature flag / deployment batch
AI object versionsmodel、prompt、retriever、corpus、tool、policy、workflow
linked criteriaacceptance criteria ids
linked controlscontrol ids
produced_bysystem、tester、business owner、AI assistant、release workflow
produced_attimestamp
revieweraccountable human reviewer where applicable
integritychecksum、immutable store、access log
retentionretention class and legal hold path

Evidence query examples:

QueryRelease readiness value
Show all high-risk criteria without evidence找阻塞项
Show all model/prompt changes without regression找变更盲区
Show all golden journeys failed in last 3 releases找系统性质量问题
Show all accepted exceptions expiring in 30 days管理 residual risk
Show all production incidents not mapped to regression pack防止事故不沉淀

16. Operational Readiness and Post-Release Monitoring

UAT 通过但运营未准备好, 仍然不应该 release。

16.1 Operational Readiness

AreaEvidence
Runbooknormal、exception、degraded mode、kill switch、rollback
Trainingbusiness users、reviewers、supervisors、support desk training completion
Support modelL1/L2/L3 owner、AI governance escalation、vendor support
Monitoringquality、latency、cost、drift、tool errors、complaints、queue load
Capacitymanual review queue、back-office SLA、call center handle time
Communicationemployee release note、customer communication path where applicable
BCP / resiliencefallback process、manual workaround、recovery exercise

16.2 Post-Release Monitoring Metrics

MetricPurpose
Business outcomecompletion rate、cycle time、conversion、case resolution
Qualityaccuracy、citation correctness、unsupported claim、tool action success
Riskpolicy violation、high-risk escalation miss、privacy event、complaint
Segmentperformance by customer segment、language、channel、product
Operationsmanual review volume、queue aging、override rate、training issue
Reliabilitylatency、error rate、timeout、vendor degradation
Costcost per successful task、token spend, human review cost
Adoptionemployee usage、acceptance/override pattern、misuse signals

16.3 Rollback Criteria

Rollback criteria must be pre-agreed:

TriggerResponse
critical customer harm signaldisable feature flag, route to manual, incident workflow
prohibited output observedfreeze version, block prompt/model, QA sample expansion
control logging failurepause release cohort, restore prior version, evidence gap assessment
manual queue exceeds capacityreduce exposure, switch to assisted-only mode
model or retrieval drift breachrevert model/corpus, run targeted regression
vendor outage / latencyfallback provider or manual process
privacy incidentstop affected flow, preserve evidence, authorized response workflow

17. AI's Role in UAT

AI can accelerate acceptance work, but cannot own final acceptance.

AI assist use caseValueBoundary
Generate candidate test casesexpands coverage across journeys and edge caseshuman owner approves expected outcome and risk relevance
Find coverage gapsmaps requirements to tests, controls and evidencedeterministic traceability remains source of truth
Cluster defectsidentifies recurring themes and root causesseverity/disposition remains accountable human decision
Map requirements to testsbuilds traceability graphcannot invent missing approvals or criteria
Summarize evidencecreates release memo draftsummary must cite evidence ids and be reviewed
Generate synthetic scenariosimproves rare-event coveragesensitive attributes and expected outcomes need governance
Review UAT transcriptsfinds repeated confusion, training gapsprivacy and access controls required
Suggest regression impactlinks changed object to testsarchitect/release owner confirms final scope

Prohibited:

  • AI does not sign UAT.
  • AI does not accept residual risk.
  • AI does not decide that a defect is immaterial.
  • AI does not approve release certification.
  • AI does not replace independent validation or internal audit.
  • AI does not create evidence after the fact to fill missing controls.

18. PM / BA / Architect Implications

Role高级职责关键产出
PM把上线从 feature delivery 转成 business capability certificationacceptance strategy、release decision memo、readiness dashboard
Senior BA把业务目标、journey、policy、exception 和 evidence 写成可验收契约acceptance criteria contract、coverage matrix、UAT scripts
Solution Architect把 evidence capture、versioning、rollback、monitoring 设计进架构reference architecture、traceability model、rollback design
QA / Test Architect把 golden journeys、synthetic packs、automation regression 和 defect taxonomy 产品化test asset library、regression suite、quality gates
AI Governance / Model Risk定义 AI-specific acceptance、model/prompt/RAG/tool regression 和 monitoringAI use register、eval thresholds、risk acceptance matrix
Operations Owner证明队列、培训、runbook、support 和 fallback readyoperational readiness pack
Risk / Compliance确认 control coverage、exception ownership 和 policy-sensitive scenarioscontrol checklist、exception decision record

高级面试表达:

我不会把 AI UAT 当成业务试点签字。我会把它设计成 acceptance evidence architecture: 每个业务验收标准连接 golden journey、synthetic pack、persona/segment coverage、risk/control coverage、AI regression result、defect disposition、exception acceptance、operational readiness 和 post-release monitoring。AI 可以帮我们生成候选测试、找覆盖缺口、聚类缺陷和总结证据, 但最终业务接受和风险接受必须由授权 owner 完成。


19. Anti-Patterns

Anti-pattern风险改法
UAT as email sign-off签署与 criteria、版本、证据脱节structured release certification packet
Happy-path-only UAT真实生产 exception 和 high-risk segment 没覆盖golden journey + synthetic + risk-based coverage
UI click scripts for AI behavior只证明页面能点, 不证明模型/工具/检索行为AI regression matrix and trace evidence
Production data copied into UAT uncontrolled敏感信息扩散, 测试证据成为隐私风险sanitized/synthetic data and documented exception controls
Defects closed by meeting consensus无 root cause、retest、customer impactdefect taxonomy and retest evidence
Prompt change treated as minor copy edit行为边界变化没有回归prompt versioning and regression trigger
RAG update without acceptance impact analysis事实答案变化不可见corpus versioning and retrieval golden set
Shadow test without comparison criteria旁路跑了但不能决策predefined comparison metrics and thresholds
Parallel run with no workload analysis新流程把人工队列压垮capacity and queue readiness metrics
Exception with no expiryresidual risk 永久化owner, expiry, compensating control, monitoring
AI writes certification memo unsourced证据摘要可能幻觉或夸大evidence id citation and human approval

20. Interview Questions

Q1: AI 系统的 UAT 和传统 UAT 最大区别是什么?

30 秒版本: 传统 UAT 常证明流程和界面是否符合需求。AI UAT 要证明业务能力在特定模型、prompt、RAG、工具、数据、流程和控制版本下可接受, 并且覆盖高风险 journey、segment、控制、回归、运营准备和上线后监控。

2 分钟版本: AI 系统的不确定性更高, 行为也更依赖数据、模型、提示词、知识库和工具调用。我的做法是把 UAT 从 sign-off 转成 evidence architecture。先定义 acceptance claim 和 criteria, 再设计 golden journey library、synthetic transaction packs、persona/segment matrix 和 risk/control coverage。然后做 model/prompt/RAG/tool/workflow regression, 用 workflow replay、shadow testing 或 parallel run 证明行为差异可接受。最后形成 release certification packet, 记录缺陷、例外、风险接受、运营准备、监控和回滚标准。

Q2: 如何判断一个 AI release 可以认证上线?

30 秒版本: 我看六件事: business acceptance criteria 是否达标, high-risk journey 和 segment 是否覆盖, AI regression 是否完成, critical/high defects 是否关闭或授权接受, operational readiness 是否具备, monitoring/rollback 是否预设。

2 分钟版本: release certification 不是看测试通过率。我要能看到 release identity, 包括模型、prompt、RAG、工具、workflow 和 feature flag 版本。每个 high-risk acceptance criteria 应连接测试证据和 owner 决策。缺陷要有 severity、root cause、impact、retest 和 disposition。例外必须有 risk owner、expiry、compensating control 和 monitoring。运营层面要有 runbook、培训、support、capacity 和 fallback。上线后要有质量、风险、segment、运营、成本和投诉指标, 以及明确 rollback triggers。

Q3: Golden journey 和 synthetic transaction pack 有什么区别?

30 秒版本: Golden journey 是端到端业务旅程资产, 用来证明真实流程可接受。Synthetic transaction pack 是构造的交易/客户/文档/事件样本, 用来覆盖稀有、敏感、边界和高风险情境。

2 分钟版本: 例如 AI KYC 文档审核, golden journey 会覆盖新客开户、资料补传、人工复核、拒绝建议和申诉路径。Synthetic pack 则可以构造边界身份证件、低质量图片、语言差异、脆弱客户、欺诈文档、系统超时和隐私字段场景。两者要连接: synthetic pack 提供输入资产, golden journey 提供流程和控制上下文。合格证据不是截图, 而是 trace id、版本、expected output、actual output、control result、defect 和 reviewer。

Q4: AI 在 UAT 中应该做什么, 不应该做什么?

30 秒版本: AI 可以生成候选测试、发现覆盖缺口、聚类缺陷、映射需求到测试、总结证据。AI 不能拥有最终验收, 不能降低缺陷严重度, 不能接受残余风险, 不能签署 release。

2 分钟版本: 我会把 AI 作为测试与证据 copilot。比如用它从 requirements 和 policy 中生成测试候选, 对比 acceptance criteria 和 test assets 找 gap, 对 defect tickets 做聚类和 root cause suggestion, 起草 release evidence summary。但这些输出必须引用证据 id, 并由业务、QA、风险或架构 owner 审核。最终 acceptance 是授权责任行为, 不是模型输出。否则团队会把 AI 生成的漂亮报告误认为真实控制。

Q5: Shadow testing 和 parallel run 如何选择?

30 秒版本: shadow testing 适合先观察 AI 输出但不影响生产决策; parallel run 适合比较新旧流程和人工判断, 尤其是高影响客户决策。选择取决于客户影响、决策重要性、人工容量和证据需求。

2 分钟版本: 生成式客服坐席辅助可以先 shadow, 看 AI 建议、引用、拒答、延迟和成本, 不直接给客户。信贷、KYC 或 fraud 决策更适合 parallel run, 因为要比较新旧策略、人工结论、segment 差异和队列负载。无论哪种方式, 都要提前定义 comparison criteria, 例如 decision delta、manual workload、error taxonomy、control adherence 和 customer impact。没有阈值的 shadow/parallel run 只是观察活动, 不能支持 release certification。


21. Minimum Viable Architecture

CapabilityMVPScale version
Acceptance registrytop AI use cases 的 criteria + ownerenterprise acceptance graph
Golden journeystop 20 risk-based journeysreusable journey library with coverage analytics
Synthetic packshigh-risk edge cases and privacy-safe samplesgoverned synthetic data factory
AI regressionmodel/prompt/RAG/tool checklistautomated regression certification plane
Evidence capturerelease packet and evidence indeximmutable evidence vault with query
Defect governanceseverity + root cause + retestdefect analytics and production feedback loop
Operational readinessrunbook + monitoring + rollbackintegrated release readiness control tower
Post-release monitoringquality/risk/ops metricsclosed-loop eval expansion and recertification

最终判断标准:

Can the team reconstruct why this AI release was accepted,
which business criteria it satisfied,
which journeys and segments were tested,
which model/prompt/RAG/tool versions were certified,
which risks were accepted,
who accepted them,
and what would trigger rollback after release?

如果答案是否定的, 问题不是 UAT 人员不努力, 而是 business acceptance architecture 还没有被产品化。