返回 Papers
AI 扩展计划 / Playbooks

AI Product Architecture Strategy Playbook

这些来源作为架构治理、AI 风险管理和管理体系的官方锚点。本文不是法律、合规、采购或审计意见;正式项目必须由 legal、compliance、risk、security、privacy、procurement、data owner 和 business owner 审查。

834AI_PRODUCT_ARCHITECTURE_STRATEGY_PLAYBOOK.md

AI Product Architecture Strategy Playbook

适用对象:AI PM、Product Architect、AI Architect、Enterprise Architect、AI 平台负责人、金融零售 AI 转型负责人。 核心问题:如何把分散的 AI use case、模型能力、数据资产、平台组件、治理要求和投资决策组织成一套可复用、可审计、可扩展的产品架构决策系统。 学习目标:不是学习 BA 基础需求分析,而是训练高级产品架构判断力:何时做平台,何时做 point solution,何时 build / buy / hybrid,何时 scale,何时 stop,以及如何用 executive memo 和 architecture artifacts 证明决策质量。 作品集定位:本文可直接转化为 AI 产品架构 portfolio evidence,包括 capability map、decision portfolio、architecture runway、funding gate、ARB pack、scale/stop rule、executive memo 和金融零售案例设计。


Source Anchors

这些来源作为架构治理、AI 风险管理和管理体系的官方锚点。本文不是法律、合规、采购或审计意见;正式项目必须由 legal、compliance、risk、security、privacy、procurement、data owner 和 business owner 审查。

AnchorOfficial / primary source在本 playbook 中的用法
TOGAF Standard - Architecture Governancehttps://pubs.opengroup.org/togaf-standard/architecture-governance/用 architecture board、compliance review、governance repository、exception / dispensation 语言组织 AI 架构评审和决策责任
TOGAF ADMhttps://pubs.opengroup.org/togaf-standard/adm/用 ADM 的迭代式架构开发思路组织 capability、target architecture、roadmap、migration planning 和 governance
NIST AI RMFhttps://www.nist.gov/itl/ai-risk-management-framework用 Govern、Map、Measure、Manage 组织 AI risk、eval、monitoring、control 和 operating model
NIST AI RMF Generative AI Profilehttps://www.nist.gov/publications/artificial-intelligence-risk-management-framework-generative-artificial-intelligence用于 GenAI 特有风险:hallucination、data leakage、content provenance、harmful content、misuse、over-reliance、incident response
ISO/IEC 42001https://www.iso.org/standard/42001用 AI management system 思路组织政策、责任、生命周期、风险、供应商、持续改进和管理评审

标准到作品集证据的映射:

Source lens可以产出的 evidence面试时的表达
TOGAF Architecture GovernanceARB charter、architecture compliance checklist、exception log、decision register“我把 AI 架构决策纳入正式治理节奏,而不是让 PoC 自行绕过生产门禁。”
TOGAF ADMcapability roadmap、transition architecture、architecture runway、migration plan“我用目标能力和过渡架构管理 AI 组合,而不是按项目临时堆功能。”
NIST AI RMFrisk tiering、control pack、eval gate、monitoring plan“每个 AI capability 都有风险分类、可测门槛、上线控制和持续监控。”
NIST GenAI ProfileGenAI risk checklist、red-team backlog、incident taxonomy“我把生成式 AI 的错误输出、引用、滥用、内容来源和过度信任当成产品架构问题处理。”
ISO/IEC 42001AI operating model、AI system inventory、management review pack“我能把单个 AI 产品设计上升为企业 AI 管理体系的一部分。”

1. 高级定位:Product Architecture 是决策系统

AI 产品架构不是一张系统图,也不是“选模型 + 接 API”。它是一个持续运行的决策系统,用来回答:

  • 哪些业务能力值得 AI 化,哪些只需要流程、规则、数据质量或系统改造。
  • 哪些能力应该沉淀为平台能力,哪些保持业务团队 point solution。
  • 哪些组件应该 build、buy、partner 或 hybrid。
  • 哪些模型、数据、工具、上下文、eval 和治理能力必须统一,哪些可以本地化。
  • 哪些 use case 应该 scale,哪些应该停掉、降级、替代或合并。
  • 如何把短期 pilot 转化为 architecture runway,而不是形成不可维护的 PoC 债务。

1.1 从“项目交付”升级到“架构决策”

普通项目视角产品架构视角高级问题
这个 AI 功能能不能 demo这个 AI capability 是否值得进入企业能力地图它是否改变核心流程、风险责任和平台复用能力
选择哪个模型模型策略如何支持多 use case、成本、延迟、风险和可替换性需要 model gateway、routing、fallback、eval replay 吗
做一个知识库助手企业知识架构如何管理 source of truth、权限、版本和引用检索前授权、文档有效期、引用质量谁负责
做一个 agent工具权限、人工审批、事务一致性和 kill switch 如何设计agent 能 read、recommend、draft、act 到哪一步
上线一个 pilot组合级 funding gate 和 scale/stop rule 是否清晰何时扩展,何时停止,何时抽象成平台

1.2 AI Product Architecture 的七类决策

决策域关键决策错误决策的代价必备 artifact
Capability哪些业务能力进入 AI portfolio资源分散,低价值 demo 堆积Capability map、AI opportunity scorecard
PatternRAG、copilot、agent、workflow automation、fine-tuning、rules 的选择过度自动化或用错技术形态AI pattern decision matrix、ADR
Platform平台化还是 point solution过早平台化或重复建设Platform boundary map、reuse thesis
SourcingBuild、buy、partner、hybrid锁定供应商、成本失控、内部能力空心化Build/buy/hybrid memo、vendor architecture review
Risk风险等级、控制强度、人工监督合规风险、客户伤害、审计失败Risk tier、control pack、HITL design
Roadmap架构 runway、投资节奏、迁移路径PoC 债务、无法规模化Architecture runway、transition roadmap
PortfolioScale、stop、merge、retire资源长期沉没在无效 AI 应用Decision portfolio、scale/stop rules

2. Advanced Framework:PADS 决策系统

PADS = Product Architecture Decision System。它把 AI 产品从 idea 管到 portfolio review。

PADS loop

Capability intent
  -> workflow and decision boundary
  -> AI pattern selection
  -> architecture decomposition
  -> sourcing strategy
  -> runway and roadmap
  -> funding gate
  -> pilot evidence
  -> scale / stop / merge decision
  -> platform reuse or retirement

2.1 PADS 七层框架

Layer目标核心输出决策问题
L1 Capability Strategy把业务战略转成能力投资主题AI capability map、value theme哪些能力值得 AI 化,为什么现在做
L2 Workflow Architecture定义 AI 介入流程的位置和责任边界AS-IS / TO-BE workflow、decision boundaryAI 是读、找、总结、建议、草拟、执行还是决策
L3 Solution Pattern选择 AI pattern 和系统边界Pattern ADR、C4 context/containerRAG、copilot、agent、rules、workflow automation 如何组合
L4 AI Stack Decomposition拆解 model / data / tool / context / eval / governanceAI stack map、control points哪些组件可复用,哪些必须本地化
L5 Sourcing and Platform决定 build / buy / hybrid 和平台边界Sourcing memo、platform capability map哪些买,哪些建,哪些必须由企业控制
L6 Runway and Governance建立架构 runway、门禁和变更治理Roadmap、ARB pack、funding gates如何让 pilot 不变成孤岛
L7 Portfolio Operations组合级 scale、stop、merge、retireDecision portfolio、quarterly review pack哪些扩展,哪些停止,哪些沉淀为平台

2.2 决策原则

Principle操作化解释反例
Architecture follows decision rights先确定 AI、人和系统的决策责任,再设计模型和流程先做 agent,再补审批
Platform only after repeatability evidence至少两个高价值 use case 证明相同能力可复用,再进入平台 backlog第一个 demo 就建大平台
Eval is product contracteval 不只是测试,是产品承诺、上线门禁和风险边界上线靠主观体验
Context is controlled infrastructure上下文不是 prompt 拼接,是权限、来源、版本、时效和证据链把文档扔进向量库就算 RAG
Governance must be executable治理要体现在审批、日志、阈值、kill switch 和 release gate写政策但系统不可执行
Reversibility is a design requirement关键 AI 决策必须有 rollback、fallback、exit 和 stop rule供应商或模型不可替换

3. Capability-to-Architecture Mapping

高级 AI 产品架构的起点不是用户故事,而是能力地图。能力地图回答“我们要增强哪类企业能力”,架构映射回答“这类能力需要哪些 AI 平台、数据、流程和控制”。

3.1 能力分层

Capability layer金融零售例子AI 架构含义投资判断
Strategic capabilityAML 风险识别、信贷决策支持、客户信任经营需要长期平台能力、治理和审计可进入多年 roadmap
Operational capabilityKYC 审查、贷后监控、客服知识答复、支付异常处理需要 workflow integration、HITL、case management适合 pilot 后规模化
Analytical capability异常检测、客户分群、政策影响分析需要数据平台、特征、模型监控要确认数据 owner 和指标
Experience capability客户-facing AI、员工 copilot、智能搜索需要 UX trust、引用、拒答、handoff以 adoption 和 trust 为核心
Control capabilityeval、audit、risk tiering、policy guardrail需要平台化和强治理优先沉淀为企业公用能力

3.2 Capability 到架构组件映射表

AI capabilityWorkflow insertion pointCore architectureShared platform capabilityLocal domain capabilityReview owner
AML investigation copilotCase triage、evidence gathering、narrative draftRAG + entity graph + case summarization + HITLModel gateway、retrieval service、audit log、eval harnessAML taxonomy、red flags、SAR policy boundaryFinancial crime, risk, architecture
KYC policy assistantAnalyst question answering、document checklist、policy comparisonPolicy RAG + citation + versioned knowledgeKnowledge ingestion、permission filter、prompt registryJurisdiction policy、product rules、effective dateCompliance, operations
Credit copilotLoan memo draft、policy citation、missing data checkRAG + rules + structured extraction + human approvalTool gateway、eval、observabilityCredit policy、risk appetite、exception reason codesCredit risk, lending ops
Customer-facing AICustomer answer, guided servicing, escalationConversational AI + retrieval + intent routing + guardrailsModel gateway、policy filters、conversation logsProduct terms、customer eligibility、tone policyCX, legal, compliance
Payments operations agentException detection、case routing、read-only investigationEvent stream + anomaly model + workflow agentTool gateway、idempotency control、kill switchRail rules、SLA、settlement calendarPayments, ops risk

3.3 Capability fit scorecard

Dimension1 分3 分5 分
Strategic relevance边缘效率需求支持部门目标支持企业级战略能力
Workflow leverage独立小任务改善一个流程节点改变端到端流程表现
Data and knowledge readinessowner 不清关键源可用但需治理source of truth、权限、版本清楚
Risk controllability无法定义控制可用 HITL 限制风险边界、eval、monitoring 清楚
Reuse potential单一团队2 个相关场景可复用多业务可复用平台能力
Differentiation普通自动化改善运营成本形成客户、风控或运营优势
Adoption likelihood用户不信任或无流程入口有 pilot 用户真实 workflow 中有明确使用节奏

评分解释:

  • 28-35:进入 architecture runway 候选,要求 funding gate 和 ARB 审查。
  • 20-27:适合受控 pilot,需明确 scale trigger。
  • 12-19:先补数据、流程或 owner,不直接进入 AI build。
  • 低于 12:停止或改用非 AI 改进方案。

4. AI-Native Workflow Design

AI-native workflow 不是把 chatbot 放进旧流程,而是重新设计“人、模型、规则、工具、数据和控制”的分工。

4.1 AI 介入等级

LevelAI 角色典型架构适用场景不适用场景
L0 Inform检索、解释、引用RAG + citation政策查询、知识助手需要执行动作
L1 Draft草拟文本、摘要、表单预填RAG + template + human edit案件摘要、贷审 memo、客服回复草稿输出直接影响客户权益且无复核
L2 Recommend给出建议和理由Model + rules + scoring + HITLAML 优先级、补件建议法规禁止或责任无法划分
L3 Act with approval生成动作,人工批准执行Agent + tool gateway + approval queue创建 case、发内部请求、更新低风险字段高风险自动支付、自动拒贷
L4 Act autonomously在边界内自动执行Agent + policy engine + monitoring + kill switch低风险、可逆、频繁、规则清晰动作不可逆、客户影响大、审计要求高动作

4.2 Workflow architecture checklist

设计项必须明确的问题Artifact
Decision boundary哪些决定由 AI 影响,哪些必须由人或规则系统决定Decision boundary map
Human role人是 reviewer、approver、coach、exception owner 还是 accountable decision makerHITL design
Evidence pathAI 输出依据哪些来源,如何展示引用和置信信号Evidence and citation design
Exception pathAI 失败、拒答、低信心、工具失败时如何处理Exception workflow
Rework loop用户纠错如何进入 eval、知识更新或 prompt 改进Feedback-to-eval loop
Control point哪些节点需要 policy check、dual approval、logging、samplingControl map
Operational ownership谁维护知识、eval、prompt、policy、tool permission、incidentRACI

4.3 金融零售 AI-native workflow patterns

PatternBeforeAI-native design关键控制
AML case preparationAnalyst 手工查多系统、复制证据、写 narrativeAI 汇总交易、实体、历史案例和 red flags,生成带引用的 narrative draftSAR 决策保留给人,证据来源可追溯,敏感字段最小化
KYC policy interpretationAnalyst 在多个 PDF 中搜索政策AI 按 jurisdiction、产品、客户类型检索有效政策并生成 checklist引用 effective date,过期政策拒用,合规 owner 审批内容
Credit memo draftingRM / underwriter 手工整理客户资料和政策条款AI 预填 memo、列出缺失材料、引用政策例外不自动做 credit decision,不替代评分卡和审批
Customer-facing servicing客服或客户自己搜索 FAQAI 根据客户身份、产品和授权范围回答,必要时升级人工不承诺未授权豁免,不输出个性化金融建议,完整 conversation audit

5. Platform vs Point Solution

平台不是目标,复用才是目标。point solution 不是低级,错误的平台化才危险。

5.1 决策矩阵

判断维度Point solution 倾向Platform 倾向
Use case 重复性单一流程、特殊规则多多流程共享模型接入、检索、eval、审计
风险和控制低风险、本地控制足够高风险,需要统一 policy、audit、release gate
数据和知识专属小数据集多团队共享知识源、权限和元数据标准
技术变化快速探索,模式未稳定模式已经被多个 use case 验证
组织能力单团队可维护需要平台 team、SRE、security、governance 支持
成本结构小规模成本可接受多团队重复成本明显
供应商依赖Vendor UI 已覆盖需求需要统一 vendor abstraction 和 exit path

5.2 平台化候选能力

Capability何时平台化不宜平台化的信号
Model gateway多团队接入多个模型,需要 routing、quota、logs、fallback只有一个低风险 demo
Knowledge ingestion多业务使用文档、政策、产品资料,需要权限和版本文档 owner 不清,source of truth 混乱
Retrieval service多个 RAG 用例共享 chunking、metadata、rerank、citation每个 use case 的 retrieval 逻辑完全不同且未验证
Tool gatewayAI 需要调用内部系统、读写动作、审批和审计只有只读知识问答
Eval harness多个 AI release 需要统一测试、阈值、报告团队还没有稳定 failure taxonomy
Prompt/config registryprompt、policy、guardrail 需要版本化、审批和回滚仍处于个人实验阶段
AI observability需要跨 use case 查看质量、成本、延迟、风险和 adoption没有生产流量
Governance workflowAI inventory、risk tier、release approval、exception 管理组织没有 owner 承接

5.3 平台边界模板

Boundary item平台负责业务 use case 负责Governance owner
Model accessApproved model list、routing、quota、fallback、logging任务级模型选择理由、业务验收AI platform, security
KnowledgeIngestion pipeline、metadata schema、permission filterSource owner、内容质量、有效期Data governance, compliance
Prompt and policyRegistry、versioning、approval workflow、rollbackDomain prompt、policy rule、tone guideProduct, risk
ToolsTool catalog、RBAC、audit、approval hooksTool use case、业务动作、异常处理Architecture, system owner
EvalRunner、report format、release gate workflowGold set、rubric、failure taxonomyEvalOps, business owner
ObservabilityTrace、metric、cost、incident dashboardThreshold、business KPI、triagePlatform ops, product ops

6. Build / Buy / Hybrid Strategy

AI product architecture 的 sourcing 决策不能只看速度。金融零售场景必须同时看差异化、控制权、审计、数据边界、供应商锁定、成本曲线和退出能力。

6.1 选项定义

OptionDefinition适合主要风险
Buy购买 SaaS、vendor product 或 managed platform通用流程、成熟产品、快速落地锁定、黑盒、合同和审计不足
Build内部构建核心能力差异化能力、强控制、规模经济交付风险、运营成本、人才要求
Partner与 vendor、SI、咨询或 domain expert 共建缺少领域实施能力或加速交付依赖外部、知识转移不足
Hybrid买通用能力,建控制层和差异化层金融零售大多数高价值场景边界复杂,需要强架构治理

6.2 Component-level sourcing matrix

ComponentBuy 倾向Build 倾向Hybrid 推荐
Foundation model使用主流模型 API 或托管模型极少数有专有数据和训练能力的企业通过 model gateway 抽象供应商
Embedding / reranker通用质量足够特定语言、文档结构或召回要求高托管模型 + 内部 eval 选择
Vector DB / search托管服务满足数据边界严格数据驻留和成本控制托管 infra + 内部权限和 retention policy
Knowledge ingestionVendor connectors 覆盖企业文档复杂、权限细、版本多买 connector,建 metadata 和 governance
Agent runtime低风险内部自动化高风险动作、复杂事务和审批买 runtime,建 tool gateway 和 approval layer
Eval tooling通用 test management 和 dashboard领域 gold set、rubric、release policy买 runner,建领域 eval 和 gate
Policy guardrails通用内容安全金融政策、合规边界、客户权益通用 guardrail + 内部 policy engine
Audit evidenceVendor 能导出完整 evidence审计链必须跨系统重建Vendor logs 汇入内部 evidence binder

6.3 Build / Buy / Hybrid memo 模板

# AI Sourcing Decision Memo

## Decision
Recommend: Hybrid
Scope: Buy model access and extraction accelerators; build governance, eval, tool permission and audit evidence layer.

## Business capability
- Target capability:
- Workflow affected:
- Business outcome:
- Risk tier:

## Options compared
| Option | Delivery speed | Control | Cost curve | Auditability | Exit ability | Recommendation |
|---|---:|---:|---:|---:|---:|---|
| Buy | 5 | 2 | 3 | 2 | 2 | Not enough control for regulated workflow |
| Build | 2 | 5 | 3 | 5 | 5 | Too slow for first pilot |
| Hybrid | 4 | 4 | 4 | 4 | 4 | Best balance |

## Architecture implication
- Vendor-controlled components:
- Enterprise-controlled components:
- Data boundary:
- Model upgrade policy:
- Exit trigger:

## Funding request
- Gate requested:
- Evidence already available:
- Evidence required before scale:
- Stop rule:

7. Model / Data / Tool / Context / Eval / Governance Decomposition

高级架构评审必须拆解 AI 系统,而不是把它当成一个“智能黑盒”。

7.1 Decomposition map

Layer架构问题设计产物常见失败
Model哪些模型可用,如何 routing、fallback、升级、回滚Model strategy、model gateway ADR模型版本静默变化,质量不可复现
Data哪些数据进入 prompt、index、logs、eval、analyticsData flow、classification、retentionPII 泄露,数据 owner 不清
ToolAI 能调用哪些系统和动作Tool catalog、RBAC、approval policyExcessive agency,越权写入
Context上下文来源、权限、时效、引用、压缩策略Context architecture、retrieval design幻觉、过期政策、引用错误
Eval如何证明质量、风险和业务效果Eval matrix、gold set、release gatedemo 好看但生产失败
Governance谁审批、谁监控、谁处理例外、谁承担残余风险ARB pack、RACI、exception log治理停留在文档

7.2 AI stack decision register

Decision IDLayerDecisionOptions consideredEvidenceOwnerReview trigger
ADR-AI-001ModelUse model gateway with approved model allowlistDirect vendor API, single cloud model, gatewayCost, audit, routing and fallback needsAI architectNew model class or risk tier change
ADR-AI-002ContextRetrieval must enforce permission before generationPost-generation filter, pre-retrieval ACL, separate indexData leakage riskData architectNew corpus or entitlement model
ADR-AI-003ToolWrite actions require approval queueDirect tool call, approval, read-onlyCustomer and financial impactProduct architectExpansion to new action type
ADR-AI-004EvalRelease blocked by critical failure classManual review, offline eval, production samplingRegulated output riskEvalOps leadPrompt/model/policy update
ADR-AI-005GovernanceHigh-risk use cases require ARB signoffProduct approval only, ARB, committeeEnterprise riskEnterprise architectGate 3 and Gate 5

7.3 Control design by layer

LayerPreventive controlDetective controlCorrective control
ModelApproved model list、routing rule、temperature policyModel drift metric、quality trend、latency/cost alertsRollback、fallback、model freeze
DataData minimization、redaction、access controlData leakage scan、log review、source freshness monitorDelete/reindex、source quarantine、permission fix
ToolRBAC、scope-limited tokens、approval gateTool call anomaly detection、audit samplingKill switch、credential rotation、action reversal
ContextMetadata filter、effective date、citation requirementCitation accuracy eval、retrieval miss analysisRechunking、rerank tuning、source correction
EvalRequired gold set、failure taxonomy、release thresholdProduction sampling、human override analysisRegression fix、release block、prompt rollback
GovernanceRisk tiering、ARB approval、exception expiryQuarterly review、exception aging reportRetire、scope reduction、risk re-acceptance

8. Architecture Runway and Roadmap Governance

Architecture runway 是未来多个 AI use case 可以复用的技术、数据、治理和运营基础。没有 runway,AI portfolio 会变成一堆无法规模化的 demo。

8.1 Runway 类型

Runway解决的问题代表能力触发条件
Technical runwayAI 应用重复建设基础设施Model gateway、retrieval service、tool gateway、observability2 个以上 use case 重复搭建
Data runway知识和数据不可用、不可信、不可授权Source inventory、metadata、data quality、lineageAI 输出依赖多个核心源
Eval runway上线没有统一质量证据Eval harness、gold set library、failure taxonomy多个 AI release 需要门禁
Governance runway风险和审批不可执行AI inventory、risk tier、ARB、exception workflow高风险 use case 增多
Adoption runway做完没人用或使用不可证明Training、workflow integration、adoption analyticsPilot 价值依赖用户行为改变

8.2 12 个月 roadmap governance 模板

QuarterProduct themeArchitecture runwayUse case deliveryGovernance gateEvidence
Q1Prove high-value workflowModel gateway MVP、eval report format、AI inventoryAML copilot pilot、KYC assistant pilotG0-G4Opportunity score、ADR、data readiness
Q2Controlled productionRetrieval permission filter、prompt registry、audit logAML limited release、credit copilot pilotG5-G7Eval report、HITL metrics、incident runbook
Q3Reuse and platformizationTool gateway、central observability、gold set libraryCustomer-facing AI controlled rolloutG8Adoption dashboard、cost per task、risk review
Q4Portfolio optimizationScale/stop engine、exception management、quarterly ARBExpand winners、retire weak pilotsQuarterly reviewDecision portfolio、ROI and risk-adjusted funding

8.3 Roadmap governance rules

RuleWhy it mattersEvidence
每个 use case 必须绑定一个 capability避免零散 demoCapability map
每个 pilot 必须声明 platform reuse hypothesis让试点产生架构学习Reuse thesis
每个 release 必须有 eval 和 rollback防止不可控上线Eval report、rollback plan
每个季度必须做 portfolio review停止低价值项目,释放资源Decision portfolio
每个 exception 必须有 expiry防止临时豁免永久化Exception log

9. Decision Portfolio

AI portfolio 不是项目清单,而是决策组合。每个条目都应说明:为什么投、投到哪一阶段、风险如何控制、何时扩展、何时停止。

9.1 Portfolio board

Use caseCapabilityPatternStageSourcingRisk tierReuse potentialFunding statusNext decision
AML investigation copilotFinancial crime operationsRAG + copilot + HITLLimited productionHybridHighHighFund scale readinessScale after critical failure = 0 for 2 cycles
KYC policy assistantCompliance operationsPolicy RAGPilotBuy + internal governanceMediumMediumFund controlled pilotExpand jurisdictions if citation accuracy passes
Credit copilotLending operationsRAG + rules + draftDiscoveryHybridHighMediumFund architecture discoveryProceed only after data owner and policy eval
Customer-facing AICustomer experienceConversational AI + guardrailsDiscoveryBuy + platform controlsHighHighFund risk design onlyPilot after legal-approved answer boundaries
Payments ops agentPayments operationsEvent + workflow agentParkedBuild control layerHighMediumHoldRevisit after tool gateway runway

9.2 Scale / stop rules

DecisionTriggerEvidence requiredAction
ScaleBusiness value proven, critical risk controlled, adoption stableEval pass, no critical incidents, usage in target workflow, cost per task acceptableExpand users, integrate deeper, fund platform reuse
HoldValue plausible but evidence incompleteMixed eval, weak adoption, unresolved data issueLimit scope, run targeted fix, re-review in next gate
MergeMultiple use cases share same platform needDuplicate model/RAG/eval/tool componentsCombine into platform backlog
Reduce scopeRisk or cost too high for current automation levelHigh override rate, expensive tool calls, low trustMove from recommend/act to inform/draft
StopNo measurable workflow value or unacceptable residual riskLow adoption, poor eval, no owner, risk rejectionRetire pilot, archive evidence, release budget
RetireProduction capability no longer meets quality, policy or cost thresholdDrift, stale knowledge, vendor issue, low usageSunset, migrate users, update portfolio

9.3 Funding gates

GateFunding decisionRequired evidenceBudget type
Gate 0 IntakeFund discovery or rejectCapability fit score、owner、initial risk tierSmall discovery budget
Gate 1 ArchitectureFund prototype / vendor evalWorkflow design、pattern ADR、data readinessTimeboxed architecture and eval budget
Gate 2 PilotFund controlled pilotControl pack、eval plan、pilot success metricPilot budget with stop rule
Gate 3 ProductionFund production hardeningEval report、security/risk signoff、runbook、RACIProduction engineering and ops budget
Gate 4 ScaleFund expansion and platformizationAdoption dashboard、ROI、risk trend、reuse evidenceScaling and platform runway budget
Gate 5 OptimizeContinue, merge, retireQuarterly review packPortfolio budget reallocation

10. Architecture Review Board for AI

AI ARB 不应变成“审批委员会”。它的价值是让复杂决策有清晰证据、责任和例外管理。

10.1 AI ARB charter

AreaCharter
Scope高风险 AI use case、平台能力、供应商架构、模型策略、工具调用、客户-facing AI、重大变更
Decision rightsApprove、approve with conditions、reject、request redesign、grant time-limited exception
Required attendeesEnterprise architect、AI architect、product owner、security、privacy、risk、compliance、data owner、platform owner、business sponsor
Artifacts reviewedCapability map、ADR、data flow、risk tier、eval plan/report、control pack、runbook、sourcing memo
CadenceDiscovery gate、architecture gate、release gate、scale gate、quarterly portfolio review
OutputDecision record、conditions、owner、due date、exception expiry、next review trigger

10.2 ARB submission pack template

# AI Architecture Review Pack

## 1. Decision requested
- Approve architecture direction / pilot / production release / scale / exception

## 2. Capability and business context
- Capability:
- Workflow:
- Users:
- Value metric:
- Risk tier:

## 3. Architecture summary
- Pattern:
- Model strategy:
- Data and context sources:
- Tools and actions:
- Human oversight:
- Platform dependencies:

## 4. Key decisions
| Decision | Options | Recommendation | Evidence | Residual risk |
|---|---|---|---|---|

## 5. Controls and eval
- Preventive controls:
- Detective controls:
- Corrective controls:
- Eval threshold:
- Critical failure classes:

## 6. Operating model
- Product owner:
- Data owner:
- Eval owner:
- Incident owner:
- Risk owner:

## 7. Decision log
- Approved / conditional / rejected:
- Conditions:
- Exception expiry:
- Next review:

10.3 AI architecture compliance checklist

Checklist itemPass criteria
Capability fitUse case maps to approved capability theme and has owner
Decision boundaryAI role and human accountability are explicit
Data boundaryPrompt、index、logs、eval、analytics data flows documented
PermissionRetrieval and tool access enforce user/use-case entitlement
EvalOffline eval and production monitoring cover critical failures
GovernanceRisk tier、approval、exception and rollback are documented
SourcingVendor/internal boundaries and exit triggers are clear
ObservabilityQuality、cost、latency、adoption、incident metrics are defined
RunbookIncident, kill switch, fallback and owner escalation exist
PortfolioScale/stop rule and funding gate are defined

11. Financial Retail Cases

11.1 AML platform

DimensionArchitecture decision
Product intent提升 investigator 的证据收集、case narrative 和 red flag 一致性,不自动决定 SAR filing
AI patternRAG + entity graph + summarization + workflow copilot
Platform capabilityModel gateway、audit evidence、entity resolution、case trace、eval harness
Local domain assetAML typology、red flag library、SAR narrative standard、case disposition codes
Build/buy/hybridHybrid:买通用 extraction / summarization 能力,内部控制 audit、HITL、policy boundary 和 case workflow
Eval focusEvidence completeness、citation accuracy、false narrative、missed red flag、overstatement
Scale rule两个 review cycle 无 critical factual error,case prep time 降低,analyst trust 稳定
Stop ruleAI narrative 引入高风险错误、用户绕过人工判断、审计链无法重建

AML portfolio artifact:

Artifact证明什么
AML capability map能把 financial crime operations 分解成可投资能力
AML AI decision boundary能说明 AI 不替代合规判断
AML eval matrix能把监管风险转成可测 failure classes
AML audit evidence pack能重建输入、检索、模型、输出、人工审批和最终动作

11.2 KYC policy assistant

DimensionArchitecture decision
Product intent让 KYC analyst 快速解释政策、生成补件 checklist、比较 jurisdiction 差异
AI patternPolicy RAG + citation + effective-date filter + answer boundary
Platform capabilityKnowledge ingestion、metadata schema、permission filter、prompt registry
Local domain assetKYC policy hierarchy、jurisdiction、customer type、product eligibility
Build/buy/hybridBuy + internal governance:买知识助手能力,内部管理政策版本、引用标准和合规审批
Eval focusCitation correctness、policy freshness、jurisdiction match、refusal on unknown policy
Scale ruleCitation accuracy 达标,policy owner 更新流程稳定,analyst adoption 高
Stop rule引用过期政策、跨 jurisdiction 混答、无法证明有效版本

11.3 Credit copilot

DimensionArchitecture decision
Product intent支持 RM、underwriter 和 credit officer 准备贷款 memo、政策引用和补件建议
AI patternRAG + rules + structured extraction + controlled draft
Platform capabilityTool gateway、document processing、eval runner、approval queue
Local domain assetCredit policy、risk appetite、评分卡边界、例外审批规则
Build/buy/hybridHybrid:买 OCR/extraction,内部构建 credit policy control、decision boundary 和 audit
Eval focusMissing document detection、policy citation、unfair bias signal、unsupported recommendation
Scale ruleMemo prep cycle time 下降,policy citation 准确,审批人 override reason 可学习
Stop ruleAI 暗示最终 credit decision,输出偏见风险不可控,用户过度信任

11.4 Customer-facing AI

DimensionArchitecture decision
Product intent提供合规、可引用、可升级人工的客户服务,不替代受监管建议或客户权益决定
AI patternConversational AI + retrieval + intent routing + guardrails + human handoff
Platform capabilityModel gateway、conversation audit、policy filters、PII redaction、handoff integration
Local domain asset产品条款、客户资格、费用政策、投诉处理话术
Build/buy/hybridBuy + platform controls:买成熟对话能力,内部控制知识、合规边界、审计和升级路径
Eval focusUnauthorized promise、wrong fee/term、privacy leak、failure to escalate、tone and clarity
Scale ruleEscalation quality、containment with safety、complaint trend、customer trust 达标
Stop rule造成客户误导、输出未授权承诺、投诉或合规事件上升

12. Executive Memo Language

高级产品架构师必须能把复杂架构决策翻译成 executive language:价值、风险、选择、证据、资金和责任。

12.1 Executive framing

技术表达Executive memo 表达
我们要建 model gateway我们需要统一模型接入和审计控制,否则每个 AI 项目会形成不可见成本、不可追溯输出和供应商锁定
我们要做 RAG我们要让 AI 回答绑定到批准的知识源、权限和有效版本,而不是让模型凭记忆回答
我们需要 eval我们需要把 AI 质量从主观体验变成上线前门禁和上线后风险指标
我们要 tool gateway在 AI 调用内部系统前,需要统一权限、审批、审计和 kill switch
我们建议 hybrid速度来自采购成熟组件,控制权保留在数据、政策、eval、审计和高风险动作层

12.2 One-page executive memo template

# Executive Decision Memo: AI Product Architecture

## Recommendation
Approve a controlled hybrid architecture for [capability], funding Gate [x] with explicit scale/stop rules.

## Why now
- Business pressure:
- Operational pain:
- Risk of inaction:
- Strategic capability built:

## Options
| Option | Value | Risk | Time | Control | Recommendation |
|---|---|---|---|---|---|

## Architecture decision
- AI pattern:
- Platform capabilities reused:
- Components bought:
- Components built:
- Human decision boundary:
- Data and audit boundary:

## Evidence
- Baseline:
- Eval result:
- Pilot/adoption evidence:
- Risk review:
- Cost model:

## Funding gate
- Approved spend:
- Gate duration:
- Required evidence before next funding:
- Stop rule:

## Executive ask
- Decision requested:
- Named accountable owner:
- Next review date:

12.3 典型 executive decision statements

Situation可直接使用的 statement
过早平台化“We should not fund a broad AI platform until two or more priority workflows prove repeatable requirements for model access, retrieval, eval and audit.”
供应商方案过黑盒“The vendor can accelerate the user workflow, but production approval requires enterprise-controlled audit, eval, data retention and exit rights.”
高风险自动化“This workflow can use AI to prepare and recommend, but accountable decisions remain with trained staff until eval and oversight evidence justify further automation.”
需要停掉 pilot“The pilot produced learning value but not enough workflow value or risk control to justify scaling; funding should move to the shared retrieval and eval runway.”
需要 architecture runway“The bottleneck is no longer model access; it is reusable context, eval, tool permission and operating controls across multiple AI use cases.”

13. Portfolio Artifacts

这些 artifact 可以直接作为求职作品集、内部评审材料或面试讲解素材。

13.1 Artifact map

Artifact目标读者证明的高级能力最小内容
AI Capability MapExecutive, EA, Product能把业务战略转成 AI 投资主题Capability、value、workflow、risk、reuse
AI Pattern ADR SetArchitect, PM, Risk能做架构取舍并记录证据Options、decision、evidence、risk、reversal trigger
Platform Boundary MapPlatform, business teams能控制平台化边界Shared vs local responsibility
Build/Buy/Hybrid MemoExecutive, procurement能做 sourcing strategyOptions、cost、control、lock-in、exit
Architecture Runway RoadmapEA, engineering, sponsor能把 pilot 变成可复用能力Quarter、runway、use case、gate、evidence
Decision Portfolio BoardExecutive, PMO能做组合管理Stage、funding、risk、scale/stop
AI ARB PackArchitecture board能组织正式评审Decision request、architecture、controls、owners
Scale/Stop Rule SheetSponsor, product ops能防止沉没成本Trigger、evidence、action
Executive MemoSenior leadership能用高管语言表达架构决策Recommendation、why now、options、funding

13.2 Portfolio evidence narrative

# Portfolio Story: AI Product Architecture Strategy

## Situation
Financial retail organization had multiple AI PoCs in AML, KYC, credit and customer service, but no consistent architecture runway, eval gate, governance or sourcing strategy.

## Task
Create a portfolio-level AI product architecture strategy that identifies reusable platform capabilities, defines build/buy/hybrid boundaries and establishes funding gates.

## Actions
- Built capability-to-architecture map across four regulated workflows.
- Defined model/data/tool/context/eval/governance decomposition.
- Created ARB submission pack and scale/stop rules.
- Recommended hybrid sourcing for high-risk workflows.
- Converted pilot evidence into architecture runway roadmap.

## Result
- Reduced duplicated AI infrastructure decisions.
- Created clear production gates for high-risk AI.
- Identified shared platform investments: model gateway, retrieval, eval, audit and tool gateway.
- Established executive funding language for scale, hold and stop decisions.

## Artifacts
- Capability map
- Architecture runway
- Decision portfolio
- Build/buy/hybrid memo
- ARB pack
- Executive memo

14. 30-Day Lab

目标:用 30 天产出一套可展示的 AI Product Architecture Strategy portfolio。每天不做基础 BA 练习,只做高级决策资产。

DayFocusOutput
1选择金融零售企业背景和 4 个 AI use casePortfolio scenario brief
2定义企业 AI capability themesAI capability map v1
3为 AML、KYC、credit、customer AI 评分Capability fit scorecard
4设计每个 use case 的 workflow insertion pointDecision boundary map
5选择 AI pattern:RAG、copilot、agent、rules、workflowPattern decision matrix
6拆解 model/data/tool/context/eval/governanceAI stack decomposition
7产出 3 个关键 ADRADR set v1
8设计 platform vs point solution boundaryPlatform boundary map
9定义 reusable platform capabilitiesPlatform capability backlog
10做 build/buy/hybrid matrixSourcing strategy table
11为 AML 写 sourcing memoAML build/buy/hybrid memo
12为 KYC 写 policy RAG architectureKYC architecture one-pager
13为 credit copilot 写 risk boundaryCredit decision boundary memo
14为 customer-facing AI 写 guardrail designCustomer AI control map
15定义 eval failure taxonomyEval taxonomy and thresholds
16设计 ARB charterAI ARB charter
17完成 ARB submission packARB pack v1
18定义 funding gatesFunding gate model
19写 scale/stop rulesScale/stop rule sheet
20建立 12 个月 architecture runwayRoadmap v1
21建 decision portfolio boardPortfolio board
22设计 operating model RACIAI product architecture RACI
23设计 executive dashboard metricsPortfolio metrics
24写 executive memoOne-page executive memo
25做 quarterly review packQuarterly AI review pack
26进行自我红队:找出平台化过度点Red-team notes
27进行自我红队:找出供应商锁定点Lock-in and exit analysis
28进行自我红队:找出治理不可执行点Governance execution gaps
29整理 portfolio narrativePortfolio story
30准备面试讲解脚本10-minute architecture strategy pitch

30-day acceptance criteria

EvidenceDone standard
Capability map至少 4 个 use case 映射到 capability、risk、reuse、value
ADR set至少 5 个关键架构决策,有 options、evidence、risk、trigger
Platform boundary清楚区分 shared platform 和 local domain responsibility
Sourcing memo至少一个高风险 use case 有 build/buy/hybrid 推荐
ARB pack能支持 approve / conditional / reject 决策
Roadmap包含 runway、use case、gate、evidence、quarterly review
Executive memo一页内说清 recommendation、why now、funding gate、stop rule

15. Interview Answers

Q1:AI Product Architect 和 AI PM 的区别是什么?

30 秒回答:

AI PM 主要对用户价值、产品范围、adoption 和业务结果负责;AI Product Architect 进一步负责把多个 AI use case 组织成可复用、可治理、可演进的产品架构决策系统。它不仅问“做什么功能”,还问“哪些能力平台化,哪些本地化,如何 build/buy/hybrid,如何定义架构 runway、eval gate、风险控制和 scale/stop 规则”。

2 分钟回答:

在金融零售企业里,一个 AI PM 可以成功交付 KYC assistant 或 AML copilot,但如果每个团队都单独接模型、做 RAG、建 eval、写审计日志,企业会形成 PoC 债务。AI Product Architect 的价值是把单个产品判断上升到 portfolio 和 architecture runway:统一 model gateway、knowledge ingestion、eval harness、tool permission、audit evidence,同时保留 AML、credit、KYC 的领域差异。这个角色需要同时懂产品策略、架构治理、供应商策略、风险控制和高管沟通。

Q2:什么时候应该做 AI 平台,什么时候只做 point solution?

30 秒回答:

当多个高价值 use case 反复需要相同能力,比如模型接入、RAG、eval、审计、tool permission 和 observability,就应该平台化;当 workflow 独特、风险低、模式未验证时,先用 point solution 学习,不要过早建平台。

2 分钟回答:

我的判断标准是 repeatability evidence。第一个 AI demo 不足以证明平台需求。至少要看到两个或更多优先 workflow 共享相同架构痛点,例如每个团队都要权限过滤的检索、prompt 版本控制、release eval 和审计证据。此时平台化能降低重复成本并提升治理一致性。但业务领域规则、政策解释、failure taxonomy 和用户体验通常仍由本地 use case owner 负责。平台负责横向能力,业务负责领域语义和价值结果。

Q3:Build、buy、hybrid 怎么决策?

30 秒回答:

我按组件而不是按整个系统决策。通用模型、OCR、基础对话能力可以 buy;差异化流程、政策边界、eval gold set、审计、tool permission 和高风险决策控制通常要 build 或企业控制。金融零售高价值场景多数是 hybrid。

2 分钟回答:

整包 buy 的风险是 vendor 黑盒、数据和审计不足、退出困难;全 build 的风险是速度慢、运营成本高。更实用的是 hybrid:采购成熟组件加速,如模型、提取、对话 UI 或基础检索;内部构建控制层,如 model gateway、policy guardrail、eval gate、audit evidence、tool approval 和 exit abstraction。AML copilot 就是典型 hybrid:可以买 summarization 和 extraction,但 SAR 决策边界、证据链、人工复核和审计必须由企业控制。

Q4:如何把 AI pilot 转成 architecture runway?

30 秒回答:

每个 pilot 一开始就要声明 reuse hypothesis:它会验证哪些未来可复用能力。pilot 结束不只看业务效果,也要提炼平台 backlog、ADR、eval 模板、control pattern 和 roadmapping evidence。

2 分钟回答:

我会在 pilot gate 要求三类证据:业务证据、风险证据、复用证据。业务证据证明 workflow value;风险证据证明 eval、HITL、audit 和 incident path 可控;复用证据证明哪些能力可抽象为平台,例如 retrieval permission、prompt registry、tool gateway 或 eval runner。如果 pilot 价值一般但发现了关键平台瓶颈,可能不 scale use case,但仍投资 architecture runway。这样避免把失败 pilot 当成纯浪费。

Q5:AI 架构评审委员会应该看什么?

30 秒回答:

AI ARB 不应只看系统图,而要看 decision boundary、data flow、model strategy、context design、tool permission、eval threshold、risk controls、rollback、owner 和 scale/stop rule。

2 分钟回答:

传统架构评审关注服务、API、数据库和部署。AI ARB 还要关注模型输出如何影响业务决定,检索是否按权限过滤,工具调用是否越权,prompt/model/knowledge 变更是否回归测试,高风险输出是否人工复核,线上质量和成本如何监控。评审输出应该是 approve、conditional approve、reject 或 time-limited exception,并记录 owner、条件、到期日和下次触发点。

Q6:如何向高管解释为什么需要 model gateway 和 eval harness?

30 秒回答:

我不会从技术组件讲起。我会说:model gateway 是控制模型成本、审计、权限、fallback 和供应商替换的企业控制面;eval harness 是把 AI 质量从主观体验变成上线门禁和持续风险指标。

2 分钟回答:

如果每个团队直接接不同模型,企业无法回答哪些数据发给了谁、哪个模型产生了哪个输出、成本归属在哪里、模型升级后是否破坏业务质量。model gateway 解决的是控制权和可替换性。eval harness 解决的是证据问题:它让我们知道模型、prompt、知识和政策变更是否引入关键错误。高管不需要听 token 和 embedding 细节,但需要知道这些能力直接降低生产风险、审计风险、供应商锁定和无效扩张。

Q7:客户-facing AI 的 scale rule 怎么定义?

30 秒回答:

客户-facing AI 不能只看 containment 或调用量。scale rule 必须同时看正确性、授权边界、升级质量、投诉趋势、隐私事件、客户信任和成本。

2 分钟回答:

我会把客户-facing AI 的 scale rule 分成四类:第一,质量,关键问题的正确率和引用准确性达标;第二,安全,不产生未授权承诺、个性化金融建议或隐私泄露;第三,运营,handoff 到人工时上下文完整且不增加投诉;第四,经济性,cost per resolved case 合理。只要出现客户误导、合规事件、投诉上升或无法审计,就应该 hold 或 rollback,而不是因为 deflection rate 高就继续扩张。

Q8:如何证明你具备 Enterprise Architect 级别的 AI 产品判断?

30 秒回答:

我会展示不止一个 PRD,而是一组组合级 artifacts:AI capability map、architecture runway、decision portfolio、build/buy/hybrid memo、ARB pack、scale/stop rule 和 executive memo。它们证明我能从单点功能上升到企业能力、治理、投资和演进。

2 分钟回答:

Enterprise Architect 级别的判断体现在三点:第一,能把业务战略转成 capability 和 target architecture,而不是从功能列表出发;第二,能治理多个 AI use case 的共性和差异,明确平台能力、本地领域能力和供应商边界;第三,能管理生命周期,包括 funding gate、architecture review、risk exception、scale/stop 和 retirement。我的 portfolio 会围绕 AML、KYC、credit 和 customer AI 四个金融零售案例,展示如何建立可复用架构 runway 和决策组合。


16. Final Checklist

Area高级检查问题
Capability是否能解释这个 AI use case 属于哪个企业能力,为什么值得投资
Workflow是否明确 AI 的角色是 inform、draft、recommend、act with approval 还是 autonomous act
Architecture是否拆解 model、data、tool、context、eval、governance
Platform是否有平台化证据,而不是凭直觉建平台
Sourcing是否按组件做 build/buy/hybrid 决策
Runway是否能从 pilot 提炼 reusable capability
Governance是否有 ARB、funding gate、exception、rollback 和 owner
Portfolio是否有 scale、hold、merge、stop、retire 决策规则
Executive language是否能把技术架构翻译成价值、风险、控制、资金和责任
Portfolio evidence是否能把上述产物打包成面试和求职作品集

一句话总结:

AI Product Architecture Strategy 的核心不是“让 AI 做更多事”,而是让企业知道哪些 AI 能力值得投资、如何被安全复用、何时扩张、何时停止,以及每个决策背后的证据和责任。