AI Architecture Review Gate Checklists
这些来源只作为学习和评审框架锚点, 不构成法律、合规、采购或审计意见。
AI Architecture Review Gate Checklists
定位: 面向 AI Solutions Architect / Enterprise Architect / AI PM / AI BA 的架构评审门禁手册。 目标: 把 AI 项目从“能 demo”评审到“能上线、能监控、能回滚、能审计、能证明价值”。 用法: 每个 AI use case 在 discovery、pilot、release、scale 四个阶段都过一次 gate。不要用单张架构图替代评审证据。
Source Anchors
这些来源只作为学习和评审框架锚点, 不构成法律、合规、采购或审计意见。
| Anchor | Link | 在评审中的用法 |
|---|---|---|
| NIST AI RMF | https://www.nist.gov/itl/ai-risk-management-framework | 用 Govern / Map / Measure / Manage 组织风险识别、评估、监控和治理 |
| NIST AI 600-1 GenAI Profile | https://www.nist.gov/publications/artificial-intelligence-risk-management-framework-generative-artificial-intelligence | 用于 GenAI 特有风险: hallucination, data leakage, misuse, synthetic content, evaluation |
| EU AI Act | https://eur-lex.europa.eu/eli/reg/2024/1689/oj/eng | 用 risk-based lens 识别高风险场景、透明度、人类监督和技术文档需求 |
| ISO/IEC 42001 | https://www.iso.org/standard/42001 | 用 AI management system 思路评审责任、生命周期、持续改进 |
| OWASP LLM Top 10 | https://owasp.org/www-project-top-10-for-large-language-model-applications/ | 用 LLM 风险类别评审 prompt injection、sensitive information disclosure、excessive agency 等 |
| TOGAF | https://www.opengroup.org/togaf | 用架构治理、能力规划、路线图和 architecture board 语言组织评审 |
| C4 Model | https://c4model.com/ | 用 context/container/component 让架构边界可讨论 |
| OMG BPMN | https://www.omg.org/spec/BPMN/ | 用 AS-IS / TO-BE 评审流程、异常、人工节点和控制点 |
| BIAN | https://bian.org/deliverables/service-landscape/ | 用银行服务域评审金融零售能力边界和系统集成 |
1. 为什么 AI 架构评审必须分 gate
传统系统评审常关注:
- 服务边界。
- 数据库。
- API。
- 安全。
- 部署。
- 可用性。
AI 系统还必须评审:
- 业务问题是否真的适合 AI。
- 输出错误的业务代价是什么。
- 模型是否引用正确证据。
- 检索是否按权限过滤。
- 工具调用是否可能越权。
- Eval 是否覆盖真实失败模式。
- Prompt/model/knowledge 更新是否有回归测试。
- 高风险输出是否进入人工复核。
- 用户是否真的改变工作方式。
- 上线后谁维护知识库、eval、policy、incident。
如果不分 gate, 团队常在 demo 后才发现:
- 数据不能用。
- 法务/合规不接受。
- 业务没有 owner。
- 模型质量没有评测方法。
- 架构无法审计。
- 成本或延迟超出 workflow。
- 用户不信任或不采用。
Gate 的作用是把风险前移。
Idea -> Discovery Gate -> Architecture Gate -> Pilot Gate -> Release Gate -> Scale Gate -> Quarterly Review
2. Gate 总览
| Gate | 主要问题 | 决策结果 | 核心证据 |
|---|---|---|---|
| G0 Intake Gate | 这个 use case 是否值得进入 discovery? | accept / park / reject | opportunity brief |
| G1 Business Fit Gate | 业务问题、用户、流程、价值是否清楚? | continue / refine / stop | Opportunity Canvas, BPMN, stakeholder map |
| G2 Data and Knowledge Gate | 数据、知识、权限、质量是否足够? | ready / conditional / blocked | Data Readiness Pack, source inventory |
| G3 AI Pattern Gate | 应该用 RAG、workflow、agent、fine-tuning、rules 还是 vendor? | architecture direction | ADR, decision matrix |
| G4 Architecture Gate | 系统边界、集成、控制、可观测是否合理? | approve design / revise | C4, data flow, sequence, threat model |
| G5 Eval and Risk Gate | 需求是否可测, 风险是否可控? | approve pilot / revise | Requirements-to-Eval, control pack |
| G6 Pilot Gate | 是否可以小范围试点? | launch pilot / no-go | pilot plan, success criteria, rollback |
| G7 Release Gate | 是否可以生产发布? | release / limited release / no-go | eval report, incident plan, RACI |
| G8 Scale Gate | 是否可以扩展到更多用户/流程? | scale / hold / rework | adoption dashboard, ROI, risk review |
| G9 Quarterly AI Review | 这个 AI capability 是否仍有效? | continue / refresh / retire | quality trend, cost, risk, user feedback |
3. G0 Intake Gate
评审目标
判断一个 AI 想法是否值得投入 discovery, 而不是一听“AI”就进入方案设计。
必备问题
- 谁提出需求?
- 当前业务痛点是什么?
- 影响的用户是谁?
- 是否有 baseline metric?
- 为什么不是规则、流程、报表、培训或系统修复?
- 是否涉及客户权益、信贷、合规、隐私或高风险决策?
- 是否有明确 owner?
必备证据
- 1-page use case brief。
- 初步价值假设。
- 初步风险等级。
- 初步 no-AI alternative。
- Sponsor / process owner。
Red Flags
- “领导想做 AI”但没有业务问题。
- “先做 demo 再说”但涉及高风险流程。
- 没有流程 owner。
- 只描述技术方案, 没描述用户工作。
- 成功标准是“模型能回答”。
Gate Decision
| Decision | 条件 |
|---|---|
| Accept discovery | 有明确业务问题、owner、初步价值和风险边界 |
| Park | 价值不清但方向可能有潜力 |
| Reject | 无业务 owner、风险过高、明显可用非 AI 方案解决 |
4. G1 Business Fit Gate
评审目标
确认业务问题、流程、用户、价值和 AI fit 足够清楚。
Checklist
- 是否有
AI Opportunity Canvas? - 是否有 AS-IS BPMN 或 workflow map?
- 是否标出 pain metrics: cycle time、touch time、error rate、rework、backlog、cost?
- 是否识别用户角色和 stakeholder?
- 是否区分 AI fit 与 no-AI boundary?
- 是否定义 first workflow insertion point?
- 是否识别高风险 decision boundary?
- 是否有 business owner 和 operational owner?
关键追问
- 这个流程如果不用 AI, 最小改进是什么?
- AI 介入点是在 read、summarize、recommend、draft、decide 还是 act?
- 哪些输出对客户、合规、财务或风险有直接影响?
- 用户如何知道 AI 输出可信?
- 如果用户拒绝采用, 业务价值是否消失?
金融零售例子
AML Copilot:
- 好的 AI fit: evidence aggregation, red-flag checklist, case narrative draft。
- 不好的 AI fit: automatic SAR filing decision。
Customer Service RAG:
- 好的 AI fit: approved knowledge retrieval, answer draft, citation。
- 不好的 AI fit: unauthorized fee waiver commitment。
Lending Assistant:
- 好的 AI fit: memo drafting, policy citation, missing document checklist。
- 不好的 AI fit: LLM-owned credit decision。
5. G2 Data and Knowledge Gate
评审目标
确认 AI 能否获得正确、合法、可追溯、按权限可见的数据和知识。
Checklist
- 是否列出 source of truth?
- 是否标出数据分类: public/internal/confidential/PII/financial/special category?
- 是否定义 data owner 和 knowledge owner?
- 是否有 data lineage?
- 是否有 policy/document versioning?
- 是否有 access control 和 entitlement filter?
- 是否有 retention policy?
- 是否有 embedding/index 的治理策略?
- 是否有 stale knowledge detection?
- 是否定义哪些数据不能进入 prompt/log?
必备资产
- Data Readiness Pack。
- Knowledge Source Inventory。
- Permission Model。
- Document Versioning Plan。
- Data Flow Diagram。
Red Flags
- 把向量库当 source of truth。
- 不知道文档版本和 effective date。
- 不知道 prompt 和日志会保存什么。
- 没有权限过滤就做企业 RAG。
- 历史标签包含偏见, 但没有评估。
- 供应商会使用客户数据训练, 但合同和配置不清。
Gate Decision
| Decision | 条件 |
|---|---|
| Ready | 关键数据源、权限、质量和 owner 清楚 |
| Conditional | 可做低风险 pilot, 但需要补数据或限制 scope |
| Blocked | 数据无法使用、权限不清、质量不足或风险不可接受 |
6. G3 AI Pattern Gate
评审目标
选择合适的 AI pattern, 避免所有需求都用 chatbot、RAG 或 agent。
Pattern Decision Matrix
| Need | Prefer | Avoid |
|---|---|---|
| 最新政策、引用、权限 | RAG + citation + metadata | LoRA-only |
| 稳定分类、标签、路由 | classifier, rules, PEFT if needed | free-form generation |
| 多步系统操作 | workflow orchestration + bounded agent | autonomous open agent |
| 高风险建议 | decision support + HITL | direct automation |
| 文档抽取 | OCR/document AI + validation | pure chat prompt |
| 风格/格式适配 | prompt baseline then LoRA/PEFT | full fine-tune first |
| 严格计算 | deterministic tool/rules | LLM arithmetic |
| 低风险 FAQ | RAG/cache/fast model | expensive reasoning model |
Required ADRs
- RAG vs long-context vs fine-tuning。
- Workflow vs agent。
- Buy vs build vs hybrid。
- Model/provider choice。
- Tool access boundary。
- Human review boundary。
- Eval strategy。
Red Flags
- “做一个 agent”但没有 action allowlist。
- “微调模型记住政策”但政策频繁变化。
- “RAG 能解决所有幻觉”。
- “让模型自己判断是否高风险”但没有外部 gate。
- 没有 rejected options。
7. G4 Architecture Gate
评审目标
确认系统架构可落地、可观测、可治理、可回滚。
必画图
至少 4 张:
- C4 Context。
- C4 Container。
- Data Flow。
- Sequence Diagram。
高风险或 agent 场景再加:
- Agent Tool Loop。
- Risk / Control Architecture。
- Eval Architecture。
- Deployment Topology。
- Cost / Latency Model。
Checklist
- UI 是否只通过 backend/orchestrator 调用模型和工具?
- 是否有 AuthN/AuthZ 和 entitlement check?
- 是否有 model gateway?
- 是否有 retrieval service 和 metadata filter?
- 是否有 tool gateway 和 action policy?
- 是否有 policy/guardrail service?
- 是否有 audit writer?
- 是否有 eval runner 或 online quality checks?
- 是否有 telemetry: latency, cost, route, retrieval, tool, output quality?
- 是否定义 fallback/rollback?
- 是否区分 sync path 和 async path?
- 是否说明 vendor boundary?
Architecture Quality Bar
一张合格 AI C4 Container 图必须能回答:
- 数据从哪里来?
- 证据如何进入 prompt?
- 模型如何被调用?
- 工具如何被授权?
- 输出如何被验证?
- 高风险如何升级?
- 日志和审计证据在哪里?
- 线上如何监控?
- 谁能回滚?
Red Flags
- UI 直接调用 LLM API。
- 没有 tool permission。
- 没有 audit log。
- 没有 eval / monitoring。
- 所有错误都靠 prompt 防止。
- 没有成本和延迟预算。
8. G5 Eval and Risk Gate
评审目标
确认需求可测、风险可控、上线门禁明确。
Checklist
- 是否有 Requirements-to-Eval Matrix?
- 每条关键需求是否有 eval method?
- 是否有 golden dataset?
- 是否覆盖 common, edge, adversarial, missing-data, high-risk cases?
- 是否区分 deterministic checks、LLM-as-Judge、expert review?
- 是否有 severity levels?
- 是否有 release thresholds?
- 是否有 red-team backlog?
- 是否映射 OWASP LLM Top 10?
- 是否映射业务/合规风险?
- 是否有 human oversight design?
- 是否有 incident response path?
Risk-Tiered Gate
| Risk tier | Example | Gate |
|---|---|---|
| Low | internal product FAQ | automated eval + sample review |
| Medium | customer service draft | automated eval + QA sampling + policy guardrail |
| High | AML, lending, wealth compliance | expert review + HITL + audit + strict release gate |
| Critical | autonomous customer-impacting decision | generally no-go unless deterministic, authorized, governed |
Minimum Release Gate
| Gate | Suggested threshold |
|---|---|
| Critical unsafe output | 0 |
| Unauthorized action | 0 |
| Unsupported factual claim in high-risk output | 0 |
| Citation coverage for policy answers | target defined by scenario, often very high |
| Expert acceptance | threshold by use case |
| Regression vs previous version | no critical regression |
| Cost / latency | within workflow SLA |
Red Flags
- “准确率 90%”但不知道是什么样本。
- 只用通用 benchmark。
- 没有失败样本库。
- LLM judge 没有人类校准。
- 没有上线后监控。
- 高风险场景没有人工复核。
9. G6 Pilot Gate
评审目标
确认可以小范围试点, 且试点可学习、可停止、可回滚。
Checklist
- Pilot scope 是否明确?
- 是否限定用户、流程、数据、风险等级?
- 是否有 success metrics?
- 是否有 stop rules?
- 是否有 rollback plan?
- 是否有 user training?
- 是否有 support channel?
- 是否有 daily/weekly review cadence?
- 是否有 incident escalation?
- 是否有 feedback capture?
Pilot Success Criteria
至少覆盖:
- Quality: answer/evidence/action quality。
- Safety: critical violation。
- Workflow: cycle time, handoff, rework。
- Adoption: activation, repeat use, override, trust。
- Cost: cost per task。
- Risk: incidents, escalations。
Red Flags
- Pilot 目标是“看用户喜不喜欢”。
- 没有 stop rule。
- 没有 baseline。
- 试点范围太大。
- 试点输出直接影响客户权益。
10. G7 Release Gate
评审目标
确认系统可以进入生产或受限生产。
Required Evidence Pack
- Final architecture diagrams。
- ADR set。
- Data readiness sign-off。
- Eval report。
- Risk/control pack。
- Security/privacy review。
- Operating model / RACI。
- Incident runbook。
- Release notes。
- User training material。
- Monitoring dashboard。
- Rollback plan。
Release Decision Options
| Decision | Meaning |
|---|---|
| Full release | 符合质量、风险、运营和 adoption 条件 |
| Limited release | 限定用户、流程、风险等级或输出类型 |
| Shadow mode | 只观察、不影响真实流程 |
| Internal-only | 仅内部辅助, 不面向客户 |
| No-go | 关键风险或证据不足 |
Final Questions
- 如果模型今天出错, 谁第一个知道?
- 如果供应商明天更新模型, 谁跑回归?
- 如果知识库过期, 谁负责修?
- 如果用户绕过流程, 谁处理?
- 如果输出导致客户投诉, 证据链在哪里?
- 如果成本翻倍, 谁有权降级模型或关停功能?
11. G8 Scale Gate
评审目标
确认是否值得从 pilot 扩展到更多用户、产品线、区域或风险等级。
Checklist
- Pilot 是否达到 business metric?
- 是否达到 quality metric?
- 是否无 critical incident?
- 是否有 stable adoption?
- 是否用户 trust 上升?
- 是否 manager / QA / risk 接受 operating model?
- 是否有 capacity plan?
- 是否有 cost forecast?
- 是否评估新区域/产品的政策差异?
- 是否有 scale rollout plan?
Scale Risks
- 一个团队有效, 另一个团队不采用。
- 一个产品知识准确, 多产品冲突。
- 一个地区合规可接受, 另一个地区不接受。
- 高峰并发导致延迟恶化。
- Vendor pricing 随规模失控。
- Eval dataset 没覆盖新增场景。
12. G9 Quarterly AI Review
评审目标
上线不是结束。每季度判断这个 AI capability 是否仍然有效、合规、经济、被采用。
Quarterly Review Questions
- Use case 是否仍有业务价值?
- 用户是否继续使用?
- 是否出现质量漂移?
- 知识库是否过期?
- 供应商模型是否变化?
- 成本是否符合预算?
- 是否发生 incident?
- Eval set 是否更新?
- 风险控制是否仍有效?
- 是否应该扩大、缩小、重构或 retire?
Quarterly Evidence
- Quality trend。
- Adoption trend。
- Cost trend。
- Incident review。
- Knowledge freshness report。
- Eval regression report。
- Vendor change report。
- User feedback。
- Business outcome。
13. Financial Retail Gate Profiles
AML Copilot
Gate emphasis:
- Human oversight。
- Evidence citation。
- Audit trail。
- No autonomous SAR decision。
- Typology coverage。
- Prompt injection from adverse media。
Required diagrams:
- BPMN alert investigation。
- C4 container。
- RAG pipeline。
- Eval architecture。
- Risk/control architecture。
No-go triggers:
- Unsupported claims in narrative。
- Model suggests final filing decision。
- Missing audit log。
- Analyst cannot see evidence。
KYC Remediation
Gate emphasis:
- Data quality。
- Customer communication。
- Jurisdiction policy。
- Source-of-truth update。
- Customer privacy。
No-go triggers:
- AI requests unauthorized documents。
- Golden source updated without approval。
- Customer message misstates legal requirement。
Customer Service RAG
Gate emphasis:
- Knowledge version。
- Citation。
- No unauthorized promise。
- Escalation。
- Adoption and QA feedback。
No-go triggers:
- Answers without sources。
- Uses outdated fee/policy。
- Commits fee waiver or legal advice。
Payments Exception Agent
Gate emphasis:
- Tool permission。
- Idempotency。
- Approval before action。
- Ledger reconciliation。
- Customer impact。
No-go triggers:
- Agent can execute payment repair without approval。
- Tool action not audited。
- Duplicate or irreversible action risk。
Lending Assistant
Gate emphasis:
- Deterministic calculations。
- Fair lending。
- Reason codes。
- Human decision owner。
- Adverse action control。
No-go triggers:
- LLM owns credit decision。
- Protected/proxy variables unreviewed。
- Unsupported denial explanation。
14. Architecture Review Board Pack
Before review, prepare:
- Executive summary.
- Use case scope and no-AI option.
- AS-IS / TO-BE BPMN.
- C4 Context / Container.
- Data Flow.
- AI pattern ADR.
- Requirements-to-Eval Matrix.
- Data Readiness Pack.
- AI Control Pack.
- Threat Model.
- Cost / Latency Model.
- Operating Model / RACI.
- Pilot / Release Plan.
- Open risks and requested decisions.
Review meeting agenda:
| Time | Topic |
|---|---|
| 5 min | Decision requested |
| 10 min | Business problem and workflow |
| 10 min | Architecture and data flow |
| 10 min | Eval and risk controls |
| 10 min | Operations and adoption |
| 10 min | Open risks and decision |
15. Interview Talking Points
Question: How do you review an AI architecture?
30-second answer:
I review AI architecture through gates, not just diagrams. I start with business fit and workflow, then data and knowledge readiness, then AI pattern selection, then C4/data/tool architecture, then eval and risk controls, then pilot/release/scale readiness. For financial services, I require evidence citation, human oversight, audit trail, risk-tiered release gates, and operating ownership.
2-minute answer:
A model demo is not architecture approval. My review starts by asking whether the use case is worth AI and where it enters the workflow. Then I review data readiness: source of truth, permissions, metadata, retention and knowledge freshness. Next I review the AI pattern: RAG, workflow, agent, rules, fine-tuning or vendor product. The architecture gate requires C4, data flow, sequence, model gateway, retrieval, tool gateway, policy, audit and observability. The eval gate converts requirements into golden datasets, deterministic checks, LLM judge, expert review and risk-tiered release thresholds. Finally, release requires RACI, runbook, monitoring, rollback and adoption dashboard. In AML or lending, I would block release if the model can make final high-risk decisions without human approval and audit evidence.
Question: What makes an AI architecture review different from regular architecture review?
Answer:
Regular architecture review asks whether services, data, security and operations are sound. AI architecture review also asks whether model behavior is measurable, whether knowledge is grounded and current, whether tool use is bounded, whether high-risk outputs are controlled, whether eval covers failures, whether model/prompt/index changes are governed, and whether users adopt the workflow safely.
Question: What would make you stop an AI project?
Answer:
I would stop or park a project when the business problem is unclear, there is no owner, source data cannot be used, high-risk decisions are being delegated to the model, eval cannot measure critical failures, or the operating model cannot support incidents and change control. Stopping early is good architecture governance, not failure.
16. Practice Drills
Drill 1: Review an AML Copilot
Create a review pack with:
- AS-IS BPMN。
- RAG architecture。
- Requirements-to-Eval Matrix。
- Control pack。
- Release gate。
Then answer:
- What would you approve?
- What would you block?
- What evidence is missing?
Drill 2: Review a Customer Service RAG
Focus on:
- Knowledge ownership。
- Citation。
- Policy version。
- No-answer behavior。
- QA feedback。
- Adoption dashboard。
Drill 3: Review a Payments Agent
Focus on:
- Tool gateway。
- Action allowlist。
- Idempotency。
- Human approval。
- Audit。
- Rollback。
Drill 4: Review a Lending Assistant
Focus on:
- High-risk classification。
- Deterministic calculations。
- Policy RAG。
- Fair lending review。
- Reason-code eval。
- Human decision owner。
Drill 5: Build a Gate Decision Memo
Write a one-page decision:
Decision requested:
Gate:
Use case:
Evidence reviewed:
Approve / conditional / no-go:
Top 3 risks:
Required changes:
Next review date:
Owner:
17. Connection To Existing Assets
| Existing asset | How to use |
|---|---|
docs/AI_ARCHITECTURE_DIAGRAM_PLAYBOOK.md | Draw the diagrams required by each gate |
docs/AI_REQUIREMENTS_TO_EVAL_COOKBOOK.md | Build eval evidence for G5/G7 |
docs/AI_OPERATING_MODEL_RACI_RUNBOOK.md | Build RACI and incident evidence for release gate |
docs/AI_VENDOR_BUILD_BUY_ADOPTION_PLAYBOOK.md | Support G3/G6/G8 vendor and adoption decisions |
docs/FINANCIAL_RETAIL_AI_CASE_PORTFOLIO.md | Use case examples for financial retail gates |
docs/AI_GOVERNANCE_EVALOPS_RISK_90_PLAN.md | Deepen governance and EvalOps practice |
docs/abpa/templates/08-ai-architecture-adr-set.md | Write architecture decisions |
docs/abpa/templates/05-ai-control-pack.md | Write risk controls |
docs/abpa/templates/09-operating-model-raci.md | Write ownership and cadence |
18. Minimum Definition of Ready and Done
Ready for Discovery
- Business owner exists.
- Pain point has initial evidence.
- Risk tier is roughly known.
- No-AI option considered.
Ready for Architecture
- AS-IS workflow documented.
- Data/knowledge sources identified.
- AI behavior boundaries defined.
- Initial eval ideas exist.
Ready for Pilot
- Architecture diagrams approved.
- Data readiness acceptable.
- Eval matrix and control pack drafted.
- Pilot scope and stop rules defined.
- RACI drafted.
Ready for Production
- Eval gate passed.
- Critical risks controlled.
- Incident runbook tested.
- Monitoring dashboard live.
- Rollback plan ready.
- Owners assigned.
- Users trained.
Ready for Scale
- Pilot value proven.
- Quality stable.
- Adoption real.
- Cost understood.
- Risk accepted.
- New scope evaluated.
19. Final Principle
AI architecture review is not a paperwork exercise. It is the discipline of proving that a probabilistic capability can be inserted into a business workflow without losing control of quality, risk, cost, accountability and user trust.
The strongest AI architect is not the person who draws the most complex diagram, but the person who can say:
This is the business decision.
This is the system boundary.
This is the evidence.
This is how we evaluate it.
This is how we control it.
This is who owns it.
This is how we stop it if it fails.