AI UAT / Regression Certification:业务验收架构
重要说明: 本文是学习、作品集和内部架构训练材料, 不构成法律意见、合规结论、审计意见、模型验证结论、信息安全认证、无障碍合规结论或监管解释。具体适用范围、控制要求、客户影响判断、风险接受权限、上线批准、回滚触发和对外沟通由 Legal / Compliance / Risk / Model Risk / Information Security / Internal Audit / Busine
AI UAT / Regression Certification / Business Acceptance Architecture 解读
面向对象: CBAP+ Financial Retail PM / Senior BA / Product Architecture Lead / Solution Architect / QA Governance Lead / AI Governance / Operational Risk / Release Manager。 核心问题: 金融零售 AI 系统如何把 UAT、回归测试、业务验收、发布认证、风险接受、证据包、运行准备和上线后监控设计成可证明的 evidence architecture, 而不是上线前的签字仪式? 学习目标: 建立 business acceptance criteria、golden journey library、synthetic transaction pack、persona/segment coverage、risk/control coverage、AI regression matrix、workflow replay、shadow testing、parallel run、defect triage、release certification、exception acceptance、rollback 和 post-release monitoring 的系统架构心智模型。
重要说明: 本文是学习、作品集和内部架构训练材料, 不构成法律意见、合规结论、审计意见、模型验证结论、信息安全认证、无障碍合规结论或监管解释。具体适用范围、控制要求、客户影响判断、风险接受权限、上线批准、回滚触发和对外沟通由 Legal / Compliance / Risk / Model Risk / Information Security / Internal Audit / Business Owner / authorized management 根据机构政策和监管关系确认。访问日期按 2026-06-30 记录。
Source Anchors
| Source | Official link | 本文使用方式 |
|---|---|---|
| FFIEC Development, Acquisition, and Maintenance IT Handbook | https://ithandbook.ffiec.gov/it-booklets/development-acquisition-and-maintenance/ | 用 SDLC、testing、implementation and assessment、maintenance、change management、rollback/back-out、testing data controls 和 documentation language 组织 UAT / regression / release certification。 |
| FFIEC DA&M - V.B Testing | https://ithandbook.ffiec.gov/it-booklets/development-acquisition-and-maintenance/v-development/vb-testing/ | 用 testing scope、test results、corrective actions、UAT、regression testing、stress testing、production-data-in-testing controls 校准测试证据。 |
| FFIEC Management IT Handbook | https://ithandbook.ffiec.gov/it-booklets/management | 用 IT governance、risk management、enterprise architecture、project management、information systems reporting 组织 ownership、management reporting 和 release oversight。 |
| FFIEC Business Continuity Management IT Handbook | https://ithandbook.ffiec.gov/it-booklets/business-continuity-management | 用 business impact analysis、interdependency analysis、resilience、event management、continuity/recovery、exercises/tests、maintenance/improvement 支撑 operational readiness、degraded mode 和 rollback。 |
| NIST SP 800-218 Secure Software Development Framework | https://csrc.nist.gov/pubs/sp/800/218/final | 用 Prepare the Organization、Protect the Software、Produce Well-Secured Software、Respond to Vulnerabilities 的生命周期语言设计 secure release evidence。 |
| NIST SP 800-53 Rev. 5 | https://csrc.nist.gov/pubs/sp/800/53/r5/upd1/final | 用 control catalog、assessment、audit/logging、configuration/change、contingency、risk assessment、system integrity 等控制语言表达证据对象。 |
| NIST AI Risk Management Framework | https://www.nist.gov/itl/ai-risk-management-framework | 用 Govern / Map / Measure / Manage 组织 AI UAT 的上下文、风险、度量、处置和持续改进。 |
| NIST AI RMF Core | https://airc.nist.gov/airmf-resources/airmf/ | 用 AI RMF Core 的 function/category 思维设计 risk-to-test-to-evidence traceability。 |
| ISO/IEC 42001 AI management systems | https://www.iso.org/standard/81230.html | 用 AI management system、risk/opportunity、operational control、performance evaluation、improvement 组织验收治理和管理体系证据。 |
一句话:
AI UAT is not a ceremony where business signs a release. It is an evidence architecture that proves a business capability is acceptable, controllable, operable, reversible and monitorable under a specific model, prompt, data, workflow and control version.
1. Thesis: UAT 不是签字, 是可证明的业务接受
传统 UAT 经常被误用成上线前最后一轮业务试点:
requirements document
-> QA says tests passed
-> business users click through screens
-> defects discussed in daily call
-> UAT sign-off email
-> release
这个模式在 AI 系统里不够。AI 不是只改 UI 或交易规则, 它可能改变判断、推荐、解释、检索、分流、例外处理、人工工作负载、客户沟通、运营风险和监控方式。业务验收如果只证明"页面能走通", 就无法回答:
| 高级验收问题 | 为什么 AI 场景更尖锐 |
|---|---|
| 业务结果是否可接受 | 模型输出概率化, 不能只看 happy path |
| 覆盖了哪些客户、渠道、产品、语言和脆弱客户场景 | segment gap 会导致局部客户伤害 |
| 哪些控制真实生效 | prompt guardrail、retrieval filter、tool gateway、HITL 不是看界面能证明 |
| 回归影响在哪里 | model、prompt、RAG corpus、tool schema、policy rule、workflow 任何一个变更都可能改变行为 |
| 例外如何接受 | AI residual risk 不能靠口头"业务同意" |
| 上线后如何发现偏移 | 离线 UAT 通过不代表生产分布持续稳定 |
| 出事如何回滚 | 模型版本、prompt、retriever、feature flag、workflow 和人工操作都要有回退路径 |
成熟的 AI UAT / Regression Certification 架构应写成:
business objective
-> acceptance claim
-> acceptance criteria
-> golden journeys
-> synthetic and replay transaction packs
-> persona / segment / channel coverage
-> risk and control coverage
-> model / prompt / RAG / tool / workflow regression
-> defect triage and residual risk
-> release certification memo
-> operational readiness evidence
-> post-release monitoring and rollback criteria
UAT 的产品不是"签字记录", 而是一组可以被业务、风险、架构、QA、运营、内审和管理层共同查询的 acceptance evidence。
2. Acceptance Evidence Stack
2.1 从 Sign-Off 到 Claim-Based Acceptance
UAT 应先定义要证明的 claim:
For release R,
under model M / prompt P / RAG corpus K / tool set T / workflow W / policy C,
the business capability is acceptable for population S,
because evidence E demonstrates criteria A,
with known exceptions X,
accepted by owners O,
and monitored by controls N.
| Stack layer | 关键对象 | 必须回答 |
|---|---|---|
| Business claim | 业务能力可接受声明 | 这次上线究竟证明哪个业务能力可以进入生产? |
| Scope | product、journey、persona、channel、region、system、release | 哪些客户、员工、流程和技术版本在范围内? |
| Criteria | business acceptance criteria、risk acceptance criteria、operational criteria | 什么叫可接受, 阈值是什么, 谁有权判断? |
| Test assets | golden journey、synthetic pack、workflow replay、shadow sample、parallel run sample | 证据来自哪些可复用资产? |
| Regression matrix | model、prompt、RAG、tool、policy、workflow、data、UI/API | 本次变更可能影响哪些行为? |
| Control evidence | logs、eval report、defect ticket、approval、exception、monitoring config | 控制是否真实运行并留下证据? |
| Decision record | release certification、hold、limited release、exception release、rollback | 最终决策是什么, 剩余风险是什么? |
2.2 UAT 与 QA / Model Validation / Audit 的边界
| Discipline | 关注点 | 不能替代 |
|---|---|---|
| QA | 功能、集成、性能、自动化回归、缺陷修复验证 | 不能代表业务接受残余风险 |
| UAT / Business Acceptance | 业务流程、客户/员工旅程、政策边界、例外处理、操作准备 | 不能替代独立模型验证或安全评估 |
| Model validation / AI risk | 模型适用性、性能、偏差、稳定性、限制、监控 | 不能替代真实业务流程和运营准备 |
| Information security | 访问控制、数据保护、漏洞、日志、安全测试 | 不能确认业务结果是否可接受 |
| Internal audit | 独立评价控制设计与运行 | 不拥有上线接受责任 |
| Release management | 变更窗口、部署、回滚、环境准备 | 不判断业务和风险是否接受 |
高级 BA / PM 的价值在于把这些证据连接起来, 形成 release certification bundle, 而不是让每个团队交一份孤立附件。
3. Reference Architecture
Business goals / policies / risk appetite / customer impact
|
v
Acceptance criteria registry
business outcome | journey criteria | risk/control criteria | ops readiness | rollback
|
v
Test asset library
golden journeys | persona packs | synthetic transactions | replay datasets
accessibility cases | exception scenarios | adversarial prompts | edge cases
|
v
AI regression certification plane
model version | prompt version | retriever/corpus version | tool schema/version
policy engine | workflow state machine | feature flags | UI/API
|
v
Execution and evidence capture
automated tests | business UAT sessions | workflow replay | shadow mode
parallel run | runbook drills | monitoring dry run
|
v
Defect and exception governance
severity | root cause | customer impact | control impact | owner | disposition
|
v
Release certification
coverage matrix | evidence index | open exceptions | risk acceptance
operational readiness | rollback criteria | sign-off ledger
|
v
Production monitoring and feedback loop
quality sampling | drift | complaint/appeal | operational load | incidents
defect clustering | eval expansion | corrective action
架构原则:
No business acceptance without explicit acceptance criteria.
No acceptance criteria without coverage evidence.
No AI release certification without model/prompt/RAG/tool/workflow regression.
No production data in testing without documented need, controls and approval.
No exception without owner, expiry, compensating control and monitoring.
No release without operational readiness and rollback criteria.
No AI-generated UAT evidence summary without human accountable acceptance.
4. Business Acceptance Criteria Contract
业务验收标准不是 user story 的 "acceptance criteria" 简写。AI 系统的业务接受要覆盖结果、边界、控制、运营和证据。
| Contract field | 必须写清楚 | 反例 |
|---|---|---|
| Capability | 被验收的业务能力, 不只是技术功能 | "AI 已集成" |
| Population | 客户/员工/账户/交易/地区/语言/渠道/产品范围 | "所有用户"但没有 segment 证明 |
| Decision impact | AI 是建议、排序、摘要、自动决策、分流还是工具执行 | 不说明 AI 输出如何影响业务行动 |
| Success criteria | 正确率、完成率、人工覆盖率、错误容忍、延迟、成本、客户影响阈值 | "业务认为体验好" |
| Risk criteria | 禁止输出、升级规则、脆弱客户保护、异常处置、可解释性 | 风险只写在单独风险文档 |
| Control criteria | HITL、dual control、policy guardrail、tool approval、logging、access | 控制没有测试和证据 |
| Regression scope | model、prompt、RAG、tool、workflow、policy、data、UI/API | 只做 UI 点击回归 |
| Evidence required | eval report、test run、session record、defect summary、approval、monitoring config | 邮件回复"approved" |
| Decision owner | business owner、risk owner、operations owner、technology owner | 只有项目经理代签 |
| Expiry / re-cert trigger | 何时需要重新认证 | 改 prompt / 知识库 / 工具不触发 UAT |
示例:
acceptance_id: ACC-KYC-DOC-AI-ONBOARDING-2026Q3
capability: AI-assisted document review for retail account opening
population: mobile channel, English/Spanish, domestic ID + utility bill journeys
decision_impact: AI extracts fields and recommends pass/review; final reject remains human-controlled
success_criteria:
- golden journey pass rate >= 98%
- unsupported rejection recommendation = 0 for protected scenarios in test pack
- manual review queue increase <= 12% versus baseline in parallel run
risk_criteria:
- vulnerable customer and accessibility scenarios routed without loss of recourse
- policy uncertainty triggers human review
control_criteria:
- prompt/model/version logged for every recommendation
- source document hash and extraction evidence retained
- override reason required for human reviewer changes
recertification_trigger:
- model major/minor version change
- prompt or policy rule change affecting decision boundary
- document taxonomy or OCR provider change
- complaint spike or monitoring breach
5. Golden Journey Library
Golden journey 是业务验收的核心资产。它不是截图脚本, 而是端到端业务事实、AI 行为、控制点和证据路径的可复用案例。
5.1 Golden Journey 应覆盖的层次
| Journey type | 例子 | 验收重点 |
|---|---|---|
| Happy path | 正常开户、正常支付纠纷、正常客服政策问答 | 端到端完成率、延迟、证据完整 |
| High-value / high-risk | 大额转账、信贷额度调整、AML 高风险告警 | 控制升级、人工复核、日志和审批 |
| Exception path | 资料不全、系统超时、知识库无答案、工具调用失败 | 降级、拒答、补救、运营队列 |
| Customer harm prevention | 误拒、误导性建议、脆弱客户、语言障碍 | recourse、accessibility、clear explanation |
| Operational stress | close period、峰值交易、供应商降级、人工队列拥塞 | BCM、SLO、degraded mode |
| Compliance-sensitive | KYC、AML、投诉、信贷解释、费用争议 | 来源引用、禁止结论、权限边界 |
5.2 Golden Journey Card
| Field | 内容 |
|---|---|
| journey_id | 稳定 ID, 例如 GJ-AML-INVESTIGATION-ESCALATE-017 |
| business objective | 该旅程证明的业务目标 |
| persona / segment | 客户、员工、渠道、语言、产品、风险段 |
| preconditions | 账户状态、权限、数据状态、模型/知识库版本 |
| steps | 业务步骤, 不只是 UI 操作 |
| AI touchpoints | 模型输出、RAG 检索、工具调用、推荐、摘要 |
| expected behavior | 可接受输出、拒答、升级、人工复核 |
| controls | HITL、policy check、logging、data mask、dual approval |
| evidence | trace id、eval run、screen-free data proof、approval、defect link |
| pass / fail | 以 criteria 判断, 不是 tester 主观感受 |
5.3 Golden Journey 的架构价值
| 架构价值 | 说明 |
|---|---|
| Change impact | 模型、prompt、RAG、工具或流程变更可以反查影响哪些 journey |
| Regression reuse | 每个 release 重跑关键 journey, 不重新发明 UAT 脚本 |
| Evidence continuity | 同一 journey 在 UAT、shadow、parallel run、production sampling 中保持一致 |
| Audit readiness | 可展示业务接受不是任意抽样, 而是有风险导向的 journey coverage |
| Portfolio learning | 生产缺陷回写到 journey library, 把事故转成长期回归资产 |
6. Synthetic Transaction Packs
AI UAT 不能只依赖真实历史数据。金融零售场景常需要 synthetic transaction packs 来覆盖罕见、敏感、边界和高风险情境。
| Pack type | 覆盖内容 | 设计要求 |
|---|---|---|
| Boundary pack | 阈值附近、临界额度、临界日期、临界规则 | 能证明规则边界和模型边界都稳定 |
| Rare event pack | 欺诈模式、AML typology、投诉升级、灾备场景 | 不能等待生产自然出现 |
| Protected / sensitive segment pack | 语言、年龄段、残障辅助、低数字能力、脆弱客户 | 用来测试公平、可及性和客户伤害控制 |
| Privacy pack | PII、masked data、tokenized customer、data minimization | 确保证据有用但不暴露真实敏感信息 |
| Adversarial pack | prompt injection、policy bypass、jailbreak、malicious document | 验证 AI guardrail 和 tool gateway |
| Operational pack | 超时、重复提交、第三方不可用、队列拥塞 | 验证降级、恢复和运营 runbook |
Synthetic pack 的证据字段:
pack_id
generation_method
source_pattern
privacy_classification
business_rationale
covered_acceptance_criteria
covered_risks_and_controls
expected_outputs
reviewer
version
retention_class
关键点:
- synthetic 不等于随便造数据, 它要有业务依据、风险依据和预期结果。
- synthetic data 可以降低测试隐私风险, 但仍要治理生成方法、敏感属性、重识别风险和保留策略。
- high-impact scenario 的 synthetic expected output 应由业务、风险或 policy owner 审核, 不能只由 AI 生成。
7. Persona / Segment / Channel Coverage
AI UAT 的成熟度取决于是否能证明覆盖了真实业务分布和高风险边缘人群。
| Coverage axis | 金融零售例子 | 接受证据 |
|---|---|---|
| Customer segment | 新客、老客、高净值、薄信用档案、学生、退休、small business owner | segment matrix、sample count、pass/fail |
| Vulnerability | 老年客户、残障客户、低英语熟练度、经济困难、被诈骗风险 | inclusive journey、recourse test、accessibility evidence |
| Product | checking、credit card、mortgage、personal loan、investment referral | product-specific journey coverage |
| Channel | mobile、web、branch、contact center、back-office workbench | channel journey result、handoff evidence |
| Geography / jurisdiction | 州、国家、跨境、数据 residency | routing、language、policy source version |
| Employee role | front line、supervisor、analyst、operations QA、admin | RBAC、workflow permission、training evidence |
| Risk segment | AML risk tier、fraud risk score、credit band、complaint severity | control escalation and override evidence |
| Accessibility | screen reader、keyboard navigation、contrast、plain language | accessibility test, manual assist path |
高级 PM / BA 不需要把所有组合暴力笛卡尔积。更好的方法是 risk-based pairwise coverage + material scenario coverage:
material segments
+ high-risk journey
+ policy-sensitive output
+ known production defect history
+ accessibility / recourse path
-> required UAT coverage
8. Risk / Control Coverage
UAT 覆盖率不能只算 test cases passed。应计算 risk/control coverage。
| Risk | Acceptance control | UAT / regression evidence |
|---|---|---|
| AI 输出错误导致客户错误行动 | source grounding、confidence threshold、HITL、customer-facing language review | golden Q&A, narrative fact-check, reviewer approval |
| AI 自动化扩大错误规模 | feature flag、rate limit、kill switch、shadow/parallel run | release config, kill switch drill, monitoring threshold |
| 业务规则绕过 | policy engine、tool gateway、denylist/allowlist、dual approval | adversarial pack, tool trajectory test |
| 隐私泄露 | data minimization、masking、test data controls、logging redaction | privacy test, access review, sample log inspection |
| 不公平或 segment harm | segment eval、threshold review、recourse path | segment matrix, complaint pathway replay |
| 运营队列不可承受 | capacity model、queue monitor、manual fallback | parallel run workload comparison |
| 无法审计 | trace id、version capture、evidence vault、retention | evidence completeness query |
| 回滚失败 | backward-compatible config、data migration plan、manual runbook | rollback rehearsal and decision criteria |
Risk/control coverage 问题:
- 哪些 high-risk acceptance criteria 没有 test asset?
- 哪些 control 只有设计文档, 没有运行证据?
- 哪些 production incidents 没有回写到 regression pack?
- 哪些 release exceptions 没有 expiry 和 compensating control?
9. AI Regression Matrix
AI 回归不只是代码回归。每个 AI release 都要声明哪些行为资产被改变。
| Change object | 可能影响 | 必要回归 |
|---|---|---|
| Foundation / vendor model | 输出风格、推理、拒答、工具调用、延迟、成本 | golden eval、red-team pack、latency/cost, high-risk journeys |
| Fine-tuned / task model | 分群表现、阈值、错误类型 | segment eval、calibration、backtest、parallel run |
| Prompt / system instruction | policy boundary、tone、citation、tool choice | prompt regression, adversarial prompt pack |
| RAG corpus | 答案事实、过期政策、引用准确性 | retrieval eval、source freshness、citation correctness |
| Retriever / embedding | recall、ranking、source drift | retrieval benchmark、known-answer set |
| Tool schema / API | action accuracy、side effect、authorization | tool trajectory test、contract test、negative cases |
| Policy engine | allow/deny、escalation、geography/product rules | policy decision table regression |
| Workflow state machine | handoff、queue、exception、retry、timeout | workflow replay, state transition tests |
| UI / UX | user comprehension、accessibility、operator error | UAT journey, accessibility, training validation |
| Feature flag / rollout config | population exposure、rollback speed | rollout rehearsal, kill switch verification |
| Data pipeline / feature | model input distribution、missingness、timeliness | DQ, drift baseline, backfill/replay |
AI regression certificate 应声明:
changed_objects
unchanged_but_impacted_objects
test_assets_run
criteria_passed
defects_open
exceptions_accepted
monitoring_updated
rollback_ready
recertification_triggers
10. Workflow Replay, Shadow Testing, Parallel Run
10.1 三者差异
| Pattern | 目的 | 证据 |
|---|---|---|
| Workflow replay | 用历史或 synthetic 事件重放端到端状态转换 | state transition result, tool call trace, exception path |
| Shadow testing | 生产流量旁路运行 AI, 不影响真实决策 | output comparison, risk signal, latency/cost, no-customer-impact proof |
| Parallel run | 新旧流程同时运行并比较业务结果 | decision delta, workload delta, customer impact assessment, reconciliation |
10.2 使用场景
| Situation | 推荐方式 | 原因 |
|---|---|---|
| 高影响客户决策 | parallel run + human review | 需要比较新旧结果和人工判断 |
| 生成式客服辅助 | shadow test + sampling QA | 可观察输出质量但不直接面向客户 |
| AML / fraud analyst copilot | workflow replay + analyst benchmark | 证明证据完整性和操作效率 |
| RAG 政策问答 | golden set + retrieval replay | 检查知识源、引用和拒答 |
| Tool-using agent | workflow replay + sandbox tool trajectory | 证明授权、参数、side effect 和失败处理 |
10.3 Parallel Run Decision Signals
| Signal | Green | Amber | Red |
|---|---|---|---|
| Decision delta | 差异符合预期且解释充分 | 差异集中于特定 segment | 未解释差异影响高风险客户或关键控制 |
| Manual workload | 在容量模型内 | 短期增加但有计划 | 队列超过 SLA 或影响客户 |
| Error taxonomy | 错误类型可接受且已监控 | 有重复错误但补偿控制有效 | 出现禁止错误或不可解释错误 |
| Control adherence | HITL / logging / policy gate 完整 | 少量证据缺口有修复计划 | 关键控制缺证据 |
| Customer impact | 无新增 harm signal | 存在可管理投诉/申诉信号 | 客户补救或监管敏感场景受影响 |
11. Defect Triage Architecture
AI 缺陷不能只按技术 severity 分类。应按 business impact + control impact + recurrence + release decision 组织。
| Severity | 标准 | 处置 |
|---|---|---|
| Critical | 禁止输出、客户重大影响、关键控制失效、隐私泄露、提交/交易错误、无法回滚 | block release, evidence freeze, executive escalation |
| High | 高风险 journey 失败、segment harm、重大手工绕行、监控缺失、重复缺陷 | release hold or exception committee approval |
| Medium | 局部 journey 失败但有补偿控制, 或非关键 segment 表现低于阈值 | fix before scale or limited release with monitoring |
| Low | 文案、低风险 UI、非关键测试数据问题 | normal backlog, no silent exclusion |
Defect record 必须包含:
defect_id
impacted_acceptance_criteria
impacted_journeys
impacted_segments
ai_object_version
root_cause: model | prompt | RAG | tool | data | workflow | UI | control | training
customer_or_employee_impact
control_impact
fix_or_exception_decision
retest_evidence
release_disposition
AI 可以帮助聚类 defect、识别重复根因、映射缺陷到 requirements/tests, 但不能自动降低 severity 或关闭缺陷。
12. Release Certification
Release certification 是 evidence bundle, 不是会议纪要。
12.1 Certification Packet
| Section | 内容 |
|---|---|
| Release identity | release id、model/prompt/RAG/tool/workflow/config versions |
| Scope | products、channels、segments、roles、regions、feature flags |
| Acceptance criteria result | criteria、threshold、actual result、owner decision |
| Golden journey coverage | run list、pass/fail、defects、evidence links |
| Synthetic / replay / shadow / parallel evidence | pack ids、sample counts、comparison results |
| Regression matrix | changed objects、impacted tests、unrun rationale |
| Risk/control evidence | high-risk controls, AI guardrails, logging, privacy, accessibility |
| Defect summary | open defects、severity、disposition、retest |
| Exceptions | accepted risk, compensating controls, expiry, monitoring |
| Operational readiness | runbook、training、support model、BCP/degraded mode、capacity |
| Post-release monitoring | metrics、thresholds、owner、review cadence、rollback triggers |
| Decision | release, limited pilot, hold, rollback, exception release |
| Sign-off | business, ops, technology, risk/control, AI governance roles |
12.2 Decision Types
| Decision | 适用条件 | 必要证据 |
|---|---|---|
| Full release | criteria met, no critical/high unresolved, monitoring and rollback ready | complete certification packet |
| Limited pilot | material criteria met for limited population, residual risk bounded | exposure cap, monitoring, exit criteria |
| Exception release | known defect accepted by authorized owner | risk acceptance, compensating control, expiry |
| Hold | critical criteria missing or evidence incomplete | blockers, owners, target remediation |
| Rollback / no-go | production or pre-prod signal violates stop rule | rollback plan, communication, incident link |
13. Exception / Risk Acceptance
AI release exception 不是"业务知道了"。它必须是结构化风险接受。
| Field | 内容 |
|---|---|
| exception_id | stable id |
| linked release / criteria / defect | 连接 release、acceptance criteria 和 defect |
| residual risk | 业务、客户、运营、模型、隐私、安全或控制风险 |
| impacted population | 客户、员工、交易、渠道、segment |
| compensating control | 限流、人工复核、抽检、监控、手工 fallback |
| expiry | 例外到期时间或 scale gate |
| owner | 有权接受该风险的业务/风险/管理角色 |
| monitoring | 指标、阈值、review cadence |
| closure criteria | 修复和重新认证标准 |
不可接受的例外:
- critical control missing 但要求 full release。
- 没有 owner 的 "known issue"。
- 没有 expiry 的长期 exception。
- 用 AI summary 替代风险接受理由。
- 不披露客户/员工影响范围。
14. Test Data Governance, Privacy, Accessibility
14.1 测试数据治理
| Concern | Guardrail |
|---|---|
| Production data use | 记录必要性、授权、脱敏/匿名化、访问控制、环境保护、保留期限 |
| Synthetic data | 记录生成逻辑、敏感属性、预期输出、reviewer、重识别风险 |
| Prompt / output logs | 避免保留不必要明文 PII, 使用 hash、redaction、structured metadata |
| Evidence retention | evidence 有 owner、retention class、legal hold path、access log |
| Third-party model test | 避免把 restricted data 发往未批准模型或供应商环境 |
14.2 Privacy-by-UAT
UAT 需要证明:
- AI 不需要过量上下文也能完成任务。
- 测试环境和证据库不泄露敏感客户信息。
- 人工 reviewer 只看到完成任务所需信息。
- 日志可复盘但不制造新的数据暴露面。
- UAT defects 不在截图、聊天记录或 spreadsheet 中扩散 PII。
14.3 Accessibility and Inclusive Acceptance
| Area | 验收点 |
|---|---|
| Digital accessibility | screen reader、keyboard、contrast、focus order、error message |
| Language | plain language、translation consistency、non-English policy accuracy |
| Cognitive load | AI 建议是否让员工过度依赖或客户误解 |
| Recourse | 客户能否申诉、转人工、获得解释 |
| Employee accessibility | 操作台是否支持不同能力员工完成复核和 override |
Accessibility 不是 UI 附件, 它是 business acceptance 的一部分。对客户产生决定或建议影响的 AI, 必须证明可理解、可申诉、可转人工。
15. Audit Trail and Evidence Objects
每个 UAT / certification evidence object 应有结构:
| Field | 内容 |
|---|---|
| evidence_id | stable id |
| evidence_type | test_run、UAT_session、eval_report、shadow_report、parallel_run、approval、exception |
| release_id | release / feature flag / deployment batch |
| AI object versions | model、prompt、retriever、corpus、tool、policy、workflow |
| linked criteria | acceptance criteria ids |
| linked controls | control ids |
| produced_by | system、tester、business owner、AI assistant、release workflow |
| produced_at | timestamp |
| reviewer | accountable human reviewer where applicable |
| integrity | checksum、immutable store、access log |
| retention | retention class and legal hold path |
Evidence query examples:
| Query | Release readiness value |
|---|---|
| Show all high-risk criteria without evidence | 找阻塞项 |
| Show all model/prompt changes without regression | 找变更盲区 |
| Show all golden journeys failed in last 3 releases | 找系统性质量问题 |
| Show all accepted exceptions expiring in 30 days | 管理 residual risk |
| Show all production incidents not mapped to regression pack | 防止事故不沉淀 |
16. Operational Readiness and Post-Release Monitoring
UAT 通过但运营未准备好, 仍然不应该 release。
16.1 Operational Readiness
| Area | Evidence |
|---|---|
| Runbook | normal、exception、degraded mode、kill switch、rollback |
| Training | business users、reviewers、supervisors、support desk training completion |
| Support model | L1/L2/L3 owner、AI governance escalation、vendor support |
| Monitoring | quality、latency、cost、drift、tool errors、complaints、queue load |
| Capacity | manual review queue、back-office SLA、call center handle time |
| Communication | employee release note、customer communication path where applicable |
| BCP / resilience | fallback process、manual workaround、recovery exercise |
16.2 Post-Release Monitoring Metrics
| Metric | Purpose |
|---|---|
| Business outcome | completion rate、cycle time、conversion、case resolution |
| Quality | accuracy、citation correctness、unsupported claim、tool action success |
| Risk | policy violation、high-risk escalation miss、privacy event、complaint |
| Segment | performance by customer segment、language、channel、product |
| Operations | manual review volume、queue aging、override rate、training issue |
| Reliability | latency、error rate、timeout、vendor degradation |
| Cost | cost per successful task、token spend, human review cost |
| Adoption | employee usage、acceptance/override pattern、misuse signals |
16.3 Rollback Criteria
Rollback criteria must be pre-agreed:
| Trigger | Response |
|---|---|
| critical customer harm signal | disable feature flag, route to manual, incident workflow |
| prohibited output observed | freeze version, block prompt/model, QA sample expansion |
| control logging failure | pause release cohort, restore prior version, evidence gap assessment |
| manual queue exceeds capacity | reduce exposure, switch to assisted-only mode |
| model or retrieval drift breach | revert model/corpus, run targeted regression |
| vendor outage / latency | fallback provider or manual process |
| privacy incident | stop affected flow, preserve evidence, authorized response workflow |
17. AI's Role in UAT
AI can accelerate acceptance work, but cannot own final acceptance.
| AI assist use case | Value | Boundary |
|---|---|---|
| Generate candidate test cases | expands coverage across journeys and edge cases | human owner approves expected outcome and risk relevance |
| Find coverage gaps | maps requirements to tests, controls and evidence | deterministic traceability remains source of truth |
| Cluster defects | identifies recurring themes and root causes | severity/disposition remains accountable human decision |
| Map requirements to tests | builds traceability graph | cannot invent missing approvals or criteria |
| Summarize evidence | creates release memo draft | summary must cite evidence ids and be reviewed |
| Generate synthetic scenarios | improves rare-event coverage | sensitive attributes and expected outcomes need governance |
| Review UAT transcripts | finds repeated confusion, training gaps | privacy and access controls required |
| Suggest regression impact | links changed object to tests | architect/release owner confirms final scope |
Prohibited:
- AI does not sign UAT.
- AI does not accept residual risk.
- AI does not decide that a defect is immaterial.
- AI does not approve release certification.
- AI does not replace independent validation or internal audit.
- AI does not create evidence after the fact to fill missing controls.
18. PM / BA / Architect Implications
| Role | 高级职责 | 关键产出 |
|---|---|---|
| PM | 把上线从 feature delivery 转成 business capability certification | acceptance strategy、release decision memo、readiness dashboard |
| Senior BA | 把业务目标、journey、policy、exception 和 evidence 写成可验收契约 | acceptance criteria contract、coverage matrix、UAT scripts |
| Solution Architect | 把 evidence capture、versioning、rollback、monitoring 设计进架构 | reference architecture、traceability model、rollback design |
| QA / Test Architect | 把 golden journeys、synthetic packs、automation regression 和 defect taxonomy 产品化 | test asset library、regression suite、quality gates |
| AI Governance / Model Risk | 定义 AI-specific acceptance、model/prompt/RAG/tool regression 和 monitoring | AI use register、eval thresholds、risk acceptance matrix |
| Operations Owner | 证明队列、培训、runbook、support 和 fallback ready | operational readiness pack |
| Risk / Compliance | 确认 control coverage、exception ownership 和 policy-sensitive scenarios | control checklist、exception decision record |
高级面试表达:
我不会把 AI UAT 当成业务试点签字。我会把它设计成 acceptance evidence architecture: 每个业务验收标准连接 golden journey、synthetic pack、persona/segment coverage、risk/control coverage、AI regression result、defect disposition、exception acceptance、operational readiness 和 post-release monitoring。AI 可以帮我们生成候选测试、找覆盖缺口、聚类缺陷和总结证据, 但最终业务接受和风险接受必须由授权 owner 完成。
19. Anti-Patterns
| Anti-pattern | 风险 | 改法 |
|---|---|---|
| UAT as email sign-off | 签署与 criteria、版本、证据脱节 | structured release certification packet |
| Happy-path-only UAT | 真实生产 exception 和 high-risk segment 没覆盖 | golden journey + synthetic + risk-based coverage |
| UI click scripts for AI behavior | 只证明页面能点, 不证明模型/工具/检索行为 | AI regression matrix and trace evidence |
| Production data copied into UAT uncontrolled | 敏感信息扩散, 测试证据成为隐私风险 | sanitized/synthetic data and documented exception controls |
| Defects closed by meeting consensus | 无 root cause、retest、customer impact | defect taxonomy and retest evidence |
| Prompt change treated as minor copy edit | 行为边界变化没有回归 | prompt versioning and regression trigger |
| RAG update without acceptance impact analysis | 事实答案变化不可见 | corpus versioning and retrieval golden set |
| Shadow test without comparison criteria | 旁路跑了但不能决策 | predefined comparison metrics and thresholds |
| Parallel run with no workload analysis | 新流程把人工队列压垮 | capacity and queue readiness metrics |
| Exception with no expiry | residual risk 永久化 | owner, expiry, compensating control, monitoring |
| AI writes certification memo unsourced | 证据摘要可能幻觉或夸大 | evidence id citation and human approval |
20. Interview Questions
Q1: AI 系统的 UAT 和传统 UAT 最大区别是什么?
30 秒版本: 传统 UAT 常证明流程和界面是否符合需求。AI UAT 要证明业务能力在特定模型、prompt、RAG、工具、数据、流程和控制版本下可接受, 并且覆盖高风险 journey、segment、控制、回归、运营准备和上线后监控。
2 分钟版本: AI 系统的不确定性更高, 行为也更依赖数据、模型、提示词、知识库和工具调用。我的做法是把 UAT 从 sign-off 转成 evidence architecture。先定义 acceptance claim 和 criteria, 再设计 golden journey library、synthetic transaction packs、persona/segment matrix 和 risk/control coverage。然后做 model/prompt/RAG/tool/workflow regression, 用 workflow replay、shadow testing 或 parallel run 证明行为差异可接受。最后形成 release certification packet, 记录缺陷、例外、风险接受、运营准备、监控和回滚标准。
Q2: 如何判断一个 AI release 可以认证上线?
30 秒版本: 我看六件事: business acceptance criteria 是否达标, high-risk journey 和 segment 是否覆盖, AI regression 是否完成, critical/high defects 是否关闭或授权接受, operational readiness 是否具备, monitoring/rollback 是否预设。
2 分钟版本: release certification 不是看测试通过率。我要能看到 release identity, 包括模型、prompt、RAG、工具、workflow 和 feature flag 版本。每个 high-risk acceptance criteria 应连接测试证据和 owner 决策。缺陷要有 severity、root cause、impact、retest 和 disposition。例外必须有 risk owner、expiry、compensating control 和 monitoring。运营层面要有 runbook、培训、support、capacity 和 fallback。上线后要有质量、风险、segment、运营、成本和投诉指标, 以及明确 rollback triggers。
Q3: Golden journey 和 synthetic transaction pack 有什么区别?
30 秒版本: Golden journey 是端到端业务旅程资产, 用来证明真实流程可接受。Synthetic transaction pack 是构造的交易/客户/文档/事件样本, 用来覆盖稀有、敏感、边界和高风险情境。
2 分钟版本: 例如 AI KYC 文档审核, golden journey 会覆盖新客开户、资料补传、人工复核、拒绝建议和申诉路径。Synthetic pack 则可以构造边界身份证件、低质量图片、语言差异、脆弱客户、欺诈文档、系统超时和隐私字段场景。两者要连接: synthetic pack 提供输入资产, golden journey 提供流程和控制上下文。合格证据不是截图, 而是 trace id、版本、expected output、actual output、control result、defect 和 reviewer。
Q4: AI 在 UAT 中应该做什么, 不应该做什么?
30 秒版本: AI 可以生成候选测试、发现覆盖缺口、聚类缺陷、映射需求到测试、总结证据。AI 不能拥有最终验收, 不能降低缺陷严重度, 不能接受残余风险, 不能签署 release。
2 分钟版本: 我会把 AI 作为测试与证据 copilot。比如用它从 requirements 和 policy 中生成测试候选, 对比 acceptance criteria 和 test assets 找 gap, 对 defect tickets 做聚类和 root cause suggestion, 起草 release evidence summary。但这些输出必须引用证据 id, 并由业务、QA、风险或架构 owner 审核。最终 acceptance 是授权责任行为, 不是模型输出。否则团队会把 AI 生成的漂亮报告误认为真实控制。
Q5: Shadow testing 和 parallel run 如何选择?
30 秒版本: shadow testing 适合先观察 AI 输出但不影响生产决策; parallel run 适合比较新旧流程和人工判断, 尤其是高影响客户决策。选择取决于客户影响、决策重要性、人工容量和证据需求。
2 分钟版本: 生成式客服坐席辅助可以先 shadow, 看 AI 建议、引用、拒答、延迟和成本, 不直接给客户。信贷、KYC 或 fraud 决策更适合 parallel run, 因为要比较新旧策略、人工结论、segment 差异和队列负载。无论哪种方式, 都要提前定义 comparison criteria, 例如 decision delta、manual workload、error taxonomy、control adherence 和 customer impact。没有阈值的 shadow/parallel run 只是观察活动, 不能支持 release certification。
21. Minimum Viable Architecture
| Capability | MVP | Scale version |
|---|---|---|
| Acceptance registry | top AI use cases 的 criteria + owner | enterprise acceptance graph |
| Golden journeys | top 20 risk-based journeys | reusable journey library with coverage analytics |
| Synthetic packs | high-risk edge cases and privacy-safe samples | governed synthetic data factory |
| AI regression | model/prompt/RAG/tool checklist | automated regression certification plane |
| Evidence capture | release packet and evidence index | immutable evidence vault with query |
| Defect governance | severity + root cause + retest | defect analytics and production feedback loop |
| Operational readiness | runbook + monitoring + rollback | integrated release readiness control tower |
| Post-release monitoring | quality/risk/ops metrics | closed-loop eval expansion and recertification |
最终判断标准:
Can the team reconstruct why this AI release was accepted,
which business criteria it satisfied,
which journeys and segments were tested,
which model/prompt/RAG/tool versions were certified,
which risks were accepted,
who accepted them,
and what would trigger rollback after release?
如果答案是否定的, 问题不是 UAT 人员不努力, 而是 business acceptance architecture 还没有被产品化。