AI 扩展计划 / Playbooks

AI Vendor / Build-Buy / Adoption Playbook

AI 采购和落地不是 "选一个模型".

458 行AI_VENDOR_BUILD_BUY_ADOPTION_PLAYBOOK.md

AI Vendor / Build-vs-Buy / Enterprise Adoption Playbook

目的: 为 AI BA / AI PM / AI Solutions Architect / Enterprise Architect 建立 AI vendor evaluation, build-vs-buy decision, enterprise adoption and change management playbook. 范围: 金融零售企业 AI 场景, 包括 AML, KYC, customer service, payments operations, lending, compliance, risk operations. 说明: 本文是学习, 架构和作品集材料, 不是法律, 合规, 采购或投资建议. 正式项目必须由 legal, compliance, procurement, security, privacy, risk 和 business owner 审查.

1. Core Positioning

AI 采购和落地不是 "选一个模型". 真正要回答的是:

业务问题是否值得用 AI, baseline 是什么, no-AI option 是什么.
应该 buy, build, partner, 还是 hybrid.
vendor 是否满足 data privacy, security, governance, eval, SLA, cost, lock-in, customization, integration, audit, incident response, procurement 要求.
架构是否可评估, 可回滚, 可审计, 可替换.
用户是否真的改变工作方式, 组织是否有 owner, controls, eval, runbook, adoption metrics 和 vendor review cadence. 角色分工: | Role | 核心任务 | 关键证据 | |---|---|---| | AI BA | 定义问题, stakeholder evidence, workflow, requirements-to-eval | BPMN, stakeholder map, eval matrix | | AI PM | 定义用户价值, MVP scope, pilot, rollout, adoption, business case | PRD, scorecard, adoption dashboard | | Solutions Architect | 定义 RAG/agent/model gateway/integration/security/eval architecture | ADR, control pack, threat model | | Enterprise Architect | 定义 portfolio fit, standards, target architecture, governance model | capability map, roadmap, review pack | | EvalOps / Product Ops | 定义 eval dataset, release gate, monitoring, feedback loop | eval report, runbook, incident review | | Procurement / Vendor Owner | 定义 supplier risk, contract terms, SLA, exit plan | due diligence pack, commercial model |

2. ABPA Template Alignment

本文件不替换 docs/abpa/templates/, 而是说明如何组合使用.

Template	用法
`01-ai-opportunity-canvas.md`	证明问题值得做, 记录 no-AI option 和 baseline
`02-stakeholder-evidence-map.md`	识别 business, risk, legal, security, data, IT, frontline objections
`03-bpmn-pain-metrics.md`	画 AS-IS / TO-BE workflow, HITL, exception path
`04-requirements-to-eval-matrix.md`	把需求转成 vendor demo script, eval case, release gate
`05-ai-control-pack.md`	把 AI risk 转成 preventive, detective, corrective controls
`06-executive-decision-memo.md`	记录 buy/build/partner/hybrid 推荐和 funding gate
`07-data-readiness-pack.md`	评估 source of truth, quality, PII, lineage, retention, access
`08-ai-architecture-adr-set.md`	记录 model/provider, RAG, HITL, eval, integration, audit decision
`09-operating-model-raci.md`	明确 product, data, eval, risk, vendor, incident, adoption owner
`10-adoption-dashboard.md`	追踪 activation, usage, trust, quality, business outcome
`11-business-case.md`	建立 TCO, unit economics, benefit, risk-adjusted ROI
`12-portfolio-evidence-map.md`	把产物包装成面试和作品集证据

3. Source And Standard Anchors

这些 anchor 用来组织问题和证据, 不等同于法律或采购结论.

Anchor	Official / primary source	用法
NIST AI RMF 1.0	https://www.nist.gov/itl/ai-risk-management-framework	用 Govern, Map, Measure, Manage 组织 AI risk, eval, monitoring, governance
NIST AI 600-1 GenAI Profile	https://www.nist.gov/publications/artificial-intelligence-risk-management-framework-generative-artificial-intelligence	用于 GenAI 风险, content provenance, incident, model behavior controls
EU AI Act, Regulation (EU) 2024/1689	https://eur-lex.europa.eu/eli/reg/2024/1689/oj/eng	用 risk-based lens 识别 high-risk use case, transparency, human oversight, documentation
ISO/IEC 42001:2023	https://www.iso.org/standard/42001	用 AI management system 建立 lifecycle, accountability, policy, continual improvement
OWASP Top 10 for LLM Applications 2025	https://genai.owasp.org/llm-top-10/	用 prompt injection, sensitive data disclosure, supply chain, excessive agency 做安全审查
SOC Suite / SOC 2	https://www.aicpa-cima.com/resources/landing/system-and-organization-controls-soc-suite-of-services	用于 vendor security, availability, confidentiality, privacy, processing integrity diligence
ISO/IEC 27001:2022	https://www.iso.org/standard/27001	用于 ISMS, security risk management, certification evidence
TOGAF Standard, 10th Edition	https://www.opengroup.org/togaf-standard-10th-edition-downloads	用 ADM, architecture governance, capability planning, roadmap 管理架构决策
Standards-to-artifacts:

NIST AI RMF -> AI Control Pack, risk register, eval gate, monitoring plan.
NIST GenAI Profile -> GenAI risk checklist, red-team plan, incident criteria.
EU AI Act -> risk classification memo, transparency checklist, human oversight memo.
ISO/IEC 42001 -> AI operating model, accountability matrix, lifecycle cadence.
OWASP LLM Top 10 -> LLM threat model, prompt/tool/data controls, red-team backlog.
SOC 2 / ISO 27001 -> vendor security evidence checklist, security review gates.
TOGAF -> capability map, architecture roadmap, ADR, governance board decision.

4. Decision Flow And Stage Gates

Business problem
  -> AI fit and no-AI option
  -> data readiness and risk tier
  -> build / buy / partner / hybrid options
  -> vendor due diligence or internal platform readiness
  -> architecture ADR
  -> pilot success criteria
  -> security, risk, compliance, procurement gates
  -> controlled rollout
  -> adoption dashboard
  -> ongoing governance and vendor review

Gate	Decision	Required evidence	Stop signal
Gate 0: Problem	Should we explore AI?	Opportunity canvas, baseline, owner	No measurable pain
Gate 1: Feasibility	Is AI plausible?	Data readiness, workflow map, risk tier	No data owner or unacceptable risk
Gate 2: Option	Build, buy, partner, or hybrid?	Decision matrix, TCO, lock-in analysis	Preference-only selection
Gate 3: Vendor / architecture	Safe enough for pilot?	Due diligence, ADR, control pack	No audit, privacy, security, eval path
Gate 4: Pilot	Did pilot prove value and control?	Eval report, feedback, pilot metrics	Poor quality or low trust
Gate 5: Production	Can we operate this?	RACI, runbook, incident response	No owner for data, eval, vendor
Gate 6: Scale	Should we expand?	Adoption dashboard, ROI, risk review	Usage without quality or value
Decision rules:

Do not start vendor selection before defining workflow and success metric.
Do not compare vendors only by model benchmark.
Do not let procurement lead alone without architecture, data, risk and users.
Do not let a PoC bypass production controls because it looks useful.
Do not scale without adoption metrics and rollback criteria.
Do not accept "vendor handles it" without evidence, contract terms and operational workflow.

5. AI Vendor Due Diligence

Use this table as the first-pass diligence map. Expand high-risk rows into detailed questionnaires.

Dimension	Key questions	Evidence to request	Red flags
Model / provider	Which foundation models, embeddings and rerankers are used? Can versions be frozen, tested, upgraded and rolled back? Are deterministic calculations separated from probabilistic outputs? Are prompts, completions, embeddings or logs used for training?	Model/provider list, lifecycle policy, release notes, eval report, data usage terms, subprocessor list	Silent model updates, vague model identity, generic accuracy claims, no fallback behavior
Data privacy	What data enters prompts, embeddings, logs, telemetry and support tools? Is PII, PHI, PCI, account or transaction data processed? Where is data stored, processed and retained? Are embeddings treated as sensitive data?	Data flow diagram, classification, retention schedule, DPA, subprocessor list, deletion process	"We do not store data" without log details, no embedding governance, broad support access, residency only in sales material
Security	Is SSO/SAML/OIDC, SCIM, RBAC and tenant isolation supported? How are secrets and API keys stored? How are prompt injection, excessive agency and data exfiltration tested?	SOC 2 or bridge letter, ISO 27001 certificate, pen test summary, security whitepaper, secure SDLC, incident policy	No SSO, broad DB access, prompt-only security control, no incident notification commitment
Governance	Is there AI system inventory and risk classification? Can regulated actions be blocked, routed or dual-approved? Are prompts, tools, retrieval index and models versioned?	AI governance policy, change workflow, HITL config, audit log schema, eval and red-team process	Governance PDF only, no versioning, no case-level audit reconstruction
Evaluation	Are evals generic or customer-specific? Can customer gold datasets and rubrics be used? Are offline eval, shadow mode and production sampling supported? Can eval failures block release?	Eval methodology, sample eval report, failure taxonomy, release gate policy, reviewer calibration	Aggregate thumbs-up only, no negative cases, no stale-source tests, no eval export
SLA / reliability	What uptime and latency SLO apply to the full product? Are rate limits and regional limits known? Is there fallback model, degraded mode or read-only mode?	SLA terms, status history, DR policy, rate-limit docs, support plan, incident postmortem sample	SLA excludes key dependencies, no degraded mode, support slower than business SLA
Cost	Is pricing per user, case, document, token, tool call, workflow or environment? Are model usage, embeddings, vector storage, observability, connectors and support included?	Pricing workbook, usage telemetry, cost allocation export, rate card, renewal terms, services estimate	Cannot model cost per case, token costs excluded, no budget caps, unclear renewal uplift
Lock-in	Which parts are proprietary: prompts, workflow, index, embeddings, memory, evals, connectors, policies, agents? Can data, logs, evals and config be exported?	Export docs, termination terms, IP ownership, open API docs, migration plan	No export for audit logs or evals, proprietary workflow, no exit cost definition
Customization	Can the product support bank policy, jurisdiction, product, language and role differences? Are policy updates approved, versioned and regression tested?	Configuration model, admin demo, policy workflow, prompt registry, regression results, reference customer	Vendor-only customization, no content owner/effective date, custom logic hidden in prompts
Integration	Which systems integrate: CRM, core banking, case management, LOS, payment systems, contact center, IAM, SIEM? Are integrations read-only, write, pre-fill, or automated?	API docs, reference architecture, auth pattern, sandbox, data mapping	Broad DB access, write actions without idempotency, bypassed authorization
Audit	Can logs show user, role, input, source, model version, prompt version, retrieval results, output, approval and final action? Can evidence be exported?	Audit log schema, sample evidence package, retention config, export API, admin audit event list	Audit trail is chat transcript only, logs omit retrieval or tools, admin changes not audited
Incident response	Are data exposure, prompt injection, unsafe output, outage, cost runaway and tool misuse in scope? Can customer trigger kill switch?	Incident response plan, notification commitment, kill switch docs, postmortem template, severity matrix	Security-only incident policy, no AI quality incident, no customer-side kill switch
Procurement	Is vendor approved for data classification and geography? Are DPA, BAA, security addendum or AI addendum needed? Are liability, audit rights, subprocessors and exit terms acceptable?	MSA, DPA, SLA, order form, support terms, security addendum, subprocessor schedule	Contract contradicts sales promises, no audit rights, weak liability cap, subprocessor changes without notice
Minimum due diligence questions:

Executive: What workflows have you solved in regulated industries? Which production customers are closest? Which parts are product, services and partner work? What assumptions must be true for ROI?
Product: Which users and approvers are supported? What exception paths are supported? How are citations, uncertainty, refusal, feedback and override reason shown?
Architecture: Show end-to-end data flow. Which components are customer-controlled? How are retrieval, grounding, tool permissions and audit implemented?
Data: What is copied, indexed, embedded or cached? Can data be filtered by role, region, product, customer segment and effective date? How are stale sources handled?
Risk: Which AI risks do you explicitly manage? How do you prevent over-reliance, automation bias, prohibited advice and unsafe action?
Security: How do you protect secrets and downstream credentials? How do you test prompt injection and excessive agency? How are logs retained?
Commercial: Give conservative, expected and upside cost scenarios. What costs are excluded? What is the cost to export and exit?

6. Vendor Scorecard

Adjust weights by risk tier. For AML, lending and payments, increase governance, audit, security and eval weights.

Dimension	Weight	Score 1	Score 3	Score 5
Business fit	10	Generic demo	Some workflow fit	Strong domain workflow fit
Model quality	10	Claims only	Generic benchmark	Customer eval with failure analysis
Data privacy	10	Vague	Basic DPA	Configurable by data class
Security	10	Weak controls	Some evidence	SSO, RBAC, logs, SOC/ISO evidence
AI governance	10	Checklist only	Partial controls	Versioning, HITL, release gates
EvalOps	10	None	Basic tests	Offline, shadow and production eval
Integration	10	Manual export	Limited API	IAM, SIEM, workflow integration
Reliability	7	Best effort	Basic SLA	DR, fallback, degraded mode
Cost	8	Unclear	Pricing available	Cost per case and caps
Lock-in / exit	7	Black box	Partial export	Clear exit plan
Customization	5	Vendor-only	Some config	Governed domain config
Procurement risk	3	Sales promises	Standard terms	Contractable obligations
Score guidance:

0-50: Do not pilot except sandbox learning.
51-70: Low-risk pilot only, strict controls.
71-85: Candidate for controlled pilot.
86-100: Production candidate, still requiring architecture and risk gates. Interpretation:
High product quality with weak audit may still fail AML or lending.
Strong governance with moderate model quality may be better for regulated operations.
Low price without exit plan can become expensive at scale.

7. Build Vs Buy Vs Partner Vs Hybrid

Option	Definition	Best when	Main risk
Buy	Use vendor SaaS or managed platform for most capability	Speed, common workflow, small platform team	Lock-in, hidden cost, limited control
Build	Internal team builds core capability	Differentiating workflow, strict constraints, scale economics	Delivery risk, ops burden, security burden
Partner	Co-build with vendor, SI or domain specialist	Need domain expertise or delivery capacity	Dependency and knowledge transfer risk
Hybrid	Build control-heavy or differentiating parts, buy commodity parts	Regulated workflow, speed plus control	Boundary complexity
Decision heuristics:

Buy when workflow is common, vendor has audited controls, time-to-value is critical, differentiation is adoption/workflow rather than infrastructure.
Build when workflow is core competitive advantage, SaaS cannot meet data/security constraints, controls cannot be implemented by vendor, scale changes economics, internal team can operate production AI.
Partner when regulated domain design or delivery capacity is missing, but internal owners must learn and take over.
Hybrid when vendor UI/model layer is strong but governance, policy, audit, eval, data or workflow boundary must remain internally controlled. Component matrix: | Capability | Buy | Build | Partner | Practical recommendation | |---|---|---|---|---| | RAG knowledge assistant | Common KB and service workflows | Sensitive corpus, complex entitlement | Taxonomy cleanup | Buy retrieval, build metadata, eval and source governance | | Agent workflow | Bounded internal automation | High-risk actions, proprietary orchestration | Workflow redesign | Buy runtime, build tool policy and approval layer | | Model gateway | Cloud/platform routing and telemetry | Multi-cloud or custom policy | Setup help | Hybrid gateway with internal policy abstraction | | Eval platform | Test management and dashboards | Domain labels and release gates | Initial eval library | Buy tooling, build gold data and acceptance policy | | Vector DB / search | Managed service | Strict data control or cost | Indexing architecture | Managed infra with internal access and retention controls | | Document processing | Standard forms and OCR | Proprietary docs | Annotation and tuning | Buy extraction, build validation workflow | | Workflow automation | Existing case/workflow platform fits | Deep core ops integration | BPMN/process redesign | Use existing workflow platform, integrate AI services | | Monitoring | Platform telemetry | Enterprise SIEM and risk metrics | Runbook setup | Export vendor telemetry to internal dashboards | | Governance tooling | Enterprise inventory/GRC fit | Focused lightweight registry | Policy library | Buy inventory, build use-case evidence | | Prompt/policy registry | Integrated release workflow | Strategic prompt/policy control | Standards setup | Build internal registry, sync to vendor config | | Tool gateway | Managed connectors | High-risk permissions | Integration adapters | Build policy wrapper around vendor tools | | Human review workbench | Vendor reviewer UX fits | Domain review UX is advantage | Frontline design | Buy basic review, build domain QA queue | Default financial retail patterns: | Use case | Starting option | Reason | |---|---|---| | AML investigation copilot | Hybrid | Buy extraction/RAG, build audit, HITL and SAR boundary | | KYC document automation | Buy + partner | OCR is commodity, policy and remediation workflow need domain setup | | Customer service copilot | Buy | Common pattern, value depends on knowledge governance and adoption | | Payments operations agent | Hybrid | Payment actions require idempotency, approval, audit and rail controls | | Lending policy assistant | Hybrid | Policy RAG can be bought, credit support needs governance and eval |

8. Required Artifacts

8.1 Vendor Scorecard

Fields: vendor name, use case, risk tier, business fit score, security score, privacy score, eval score, governance score, integration score, cost score, lock-in score, recommendation, mitigations, go/no-go decision.

8.2 Due Diligence Pack

Fields: question, owner, evidence requested, evidence received, evidence quality, open gap, blocker status, due date.

8.3 Architecture ADR

Use docs/abpa/templates/08-ai-architecture-adr-set.md. Minimum ADRs:

AI pattern selection.
Model and provider strategy.
RAG and knowledge architecture.
Human-in-the-loop design.
Eval and observability architecture.
Integration, security and audit.
Vendor strategy and exit trigger. ADR must answer:
What decision are we making.
What options were considered.
What evidence supports the decision.
What risk remains.
What controls reduce risk.
What would make the decision wrong.
What is the reversal trigger.

8.4 Risk Acceptance Memo

Fields:

Field	Required content
Use case	Business workflow and users
Decision	Proceed, pause, reduce scope, or reject
Risk owner	Person accepting residual risk
Business owner	Person accountable for value
Architecture owner	Person accountable for technical controls
Risk statement	What could go wrong
Impact	Customer, financial, operational, legal, compliance, reputational
Controls	Preventive, detective, corrective
Evidence	Eval result, security review, vendor evidence, pilot result
Residual risk	What remains after controls
Expiry	Date or trigger for re-review
Stop rule	Condition requiring rollback or suspension
Rules: no unnamed owner, no vague risk, no permanent pilot acceptance, no bypass of legal/regulatory review.

8.5 Pilot Success Criteria

Define workflow included/excluded, user group, case type, data sources, AI actions allowed/prohibited, human approval points, quality thresholds, risk thresholds, cost thresholds, adoption thresholds, stop rules, decision date.

Category	Example metrics
Quality	Citation precision, unsupported claim rate, policy violation rate
Workflow	Cycle time, touch time, rework, backlog
Trust	Acceptance rate, edit rate, override reason, confidence score
Risk	Escalation rate, incident count, audit defect
Cost	Cost per assisted case, model spend, support cost
Adoption	Activated users, repeat users, eligible cases touched
Stop rules: unsupported claim exceeds threshold, sensitive data leaks, users copy without evidence review, SLA worsens, vendor outage blocks workflow, cost per case exceeds threshold, risk/legal/security raises blocker.

8.6 Rollout Plan

Phase	Scope	Goal
Phase 0	Sandbox with redacted data	Learn product and validate assumptions
Phase 1	Shadow mode	Compare AI output with human workflow
Phase 2	Assisted pilot	Users see AI output, no external action
Phase 3	Controlled production	Limited team and case type, full monitoring
Phase 4	Scale	More teams and cases with standardized controls
Rollout fields: phase, users, case types, features enabled, training, controls, support model, monitoring cadence, success metric, expansion criteria, rollback criteria.

8.7 Adoption Dashboard

Use `docs/abpa/templates/10-adoption-dashboard.md`. Required views: funnel by team, usage by workflow step, acceptance/edit/rejection trend, override reasons, quality defects, escalations, time saved, cost per case, incident/rollback events, training completion.

9. Enterprise Adoption And Change Management

Adoption is not login count. Adoption means users safely change how work is done, managers reinforce the workflow, quality improves, and controls remain observable.

9.1 Stakeholder Segmentation

Segment	Example	Need	Concern	Engagement
Executive sponsor	COO, CDO, business head	Value, risk, funding	Pilot theater	Monthly decision memo
Process owner	AML ops lead, contact center lead	Workflow performance	SLA disruption	BPMN and metric review
Frontline users	Analysts, agents, underwriters	Useful output	Trust, job impact	Shadowing, training, feedback
Managers	Team leads, QA leads	Coaching and monitoring	Hard to assess usage quality	Adoption dashboard
Risk / compliance	Compliance, MRM, legal	Controls and evidence	Unsafe automation	Control pack and release gate
Security / privacy	CISO, DPO, IAM	Data and access control	Leakage, vendor risk	Security review
Data owners	CRM, policy KB, transaction data	Stewardship	Bad data blamed on source team	Data readiness
IT / engineering	Platform, integration, support	Operability	Fragile integration	Runbook
Procurement	Sourcing, vendor management	Contract and supplier risk	Hidden obligations	Due diligence pack

9.2 Champion Network

Champion design: pick respected frontline users, include skeptics, give feedback influence, train on boundaries, collect failure examples, co-create SOP and office hours. Champion responsibilities: validate workflow fit, test hard cases, explain tool boundaries, report trust/adoption blockers, review training material, tune feedback taxonomy.

9.3 Training Model

Audience	Training focus
Frontline users	When to use, verify evidence, reject, escalate
Reviewers / approvers	Evaluate output, override, record decision, sample defects
Managers	Read adoption dashboard and coach behavior
Product / BA	Turn feedback into backlog and eval cases
Risk / compliance	Controls, audit logs, HITL, incident process
Support / IT	Triage outage, access, latency, integration failures
Training artifacts: role SOP, allowed/prohibited use cases, evidence checklist, escalation tree, failure examples, feedback taxonomy, data handling reminder, office hours.

9.4 Trust-Building

Trust mechanisms: show source citations and effective dates, prefer "not enough evidence" over overconfident answers, make uncertainty visible, publish quality trend and limitations, separate suggested text from approved final action, require evidence confirmation for sensitive cases. Bad patterns: hidden evidence, speed-only incentives, manager pressure to accept output, usage-only success, failure reports without response.

9.5 Human-In-The-Loop Process Change

Define for each AI output: human role, decision authority, evidence required, override reason taxonomy, escalation owner, SLA impact, audit field, stop condition.

Level	Meaning	Financial retail example
L0	AI disabled	Legal hold or disputed incident
L1	Evidence retrieval only	AML evidence packet
L2	Draft, human edits	Service response draft
L3	Recommend, human approves	Fraud next action
L4	Pre-fill action, human confirms	Payment retry instruction
L5	Execute bounded low-risk action	Low-risk knowledge routing
Default: regulated, financial, adverse or customer-impacting actions start at L1-L3. L4 requires audit, idempotency, rollback and approval. L5 should be rare in financial retail unless low-risk and reversible.

9.6 Adoption Metrics

Minimum metrics: eligible users, activated users, repeat users, habit users, eligible cases touched, suggestions generated, accepted, edited, rejected, override reasons, user confidence, unsupported claim rate, escalation rate, cycle time, rework rate, cost per assisted case. Interpretation:

High usage + high override = tool is in workflow but not trusted.
Low usage + high quality = workflow trigger, training or manager incentive problem.
High acceptance + high defect = automation bias risk.
High edit rate = output useful but format or policy fit weak.
High escalation = scope too broad or data insufficient.

9.7 Rollback And Escalation

Level	Action	Trigger
1	Disable one feature or prompt	Quality regression in one output
2	Disable one tool/action	Tool misuse or integration issue
3	Switch fallback model/provider	Model outage or regression
4	Read-only mode	Workflow action risk
5	Full suspension	Data exposure, customer harm, regulatory issue
Escalation paths: frontline issue -> support -> product owner. Quality issue -> EvalOps -> release gate. Security issue -> security incident. Privacy issue -> privacy/legal. Compliance issue -> risk/compliance. Vendor outage -> vendor management/engineering. Customer harm -> executive sponsor and incident commander.

10. Financial Retail Examples

Scenario	Business problem	AI pattern	Diligence focus	Build/buy recommendation	Pilot success criteria	Interview point
AML copilot vendor	Investigators spend too much time gathering evidence and drafting narratives. False positives and backlog reduce time for judgment	Case investigation copilot. RAG over KYC, transactions, case notes, typologies and SOP. Human owns suspicious activity decision	Evidence retrieval, citation quality, prompt injection from adverse media, no autonomous SAR/STR decision, immutable audit log, RBAC, supervisor review	Hybrid. Buy extraction/RAG, build SAR boundary, case approval, audit and policy enforcement	Draft time reduced, QA defect not worse, citation precision above threshold, no unsupported red-flag claims, supervisor acceptance above threshold	AML AI is investigation evidence support, not a decision engine
KYC document automation	Remediation and onboarding are slowed by document intake, OCR, field extraction and policy validation	Document AI + workflow automation + policy checklist. Human approval for high-risk customers, UBO changes, source-of-funds exceptions	OCR accuracy by document type, PII retention, data residency, document fraud controls, policy version, reviewer override, lineage to customer master	Buy + partner. Buy standard extraction, partner/build domain validation and policy mapping	Extraction threshold met, false accept below threshold, cycle time improves, override reasons captured, source lineage recorded	KYC value is controlled loop from data gap to outreach to verified source-of-truth update
Customer service copilot	Agents search multiple systems, answer quality varies, after-call work is high	RAG assistant + response draft + call/chat summary. Agent confirms and sends	Knowledge governance, effective date, jurisdiction, authentication boundary, advice guardrails, contact center integration, QA feedback	Buy. Focus internal effort on knowledge governance, eval and training	AHT or after-call work improves, QA defect stable/improved, agents use citations, knowledge gaps fixed, escalation improves	Service copilot succeeds when knowledge governance and behavior change are designed together
Payments operations agent	Exceptions, returns, retries, reconciliation breaks and customer inquiries create backlog	Bounded operations agent. Evidence gathering, classification, next action recommendation, action pre-fill, human approval	Tool permission, idempotency, rail rules, ledger integrity, segregation of duties, tool-call audit, kill switch	Hybrid. Buy workflow assistant/model gateway, build payment action policy, approval, idempotency, audit	Break resolution time improves, no unauthorized action, reviewer agreement high, SLA breaches decrease, cost per resolved exception improves	Payments AI must separate understanding from execution
Lending policy assistant	Underwriters need consistent policy interpretation, document summaries and exception analysis	Policy RAG + document summarization + decision support. Credit decision remains human-owned	High-risk AI lens, policy citation, deterministic calculations separate from LLM, fair lending, reason-code support, decision audit	Hybrid. Buy policy retrieval/document summary, build governance, reason-code controls, decision record and fairness monitoring	Citation correctness above threshold, memo time improves, no prohibited language, overrides captured, segment quality acceptable	Do not let an LLM own the credit decision

11. 30 / 60 / 90-Day Execution Roadmap

First 30 Days: Frame And Shortlist

Objectives: define business problem and baseline, identify stakeholders, map workflow, define AI fit/no-AI option, shortlist build/buy/partner/hybrid options. Weeks:

Week 1: Choose scenario, write opportunity canvas, identify sponsor, process owner, data owner, risk owner, tech owner.
Week 2: Build stakeholder map, interview frontline users, draft AS-IS BPMN with exception flow.
Week 3: Build data readiness pack, classify data sensitivity, identify eval samples and AI patterns.
Week 4: Draft option matrix, create vendor shortlist, send RFI, draft executive memo v0.1. Outputs: opportunity canvas, stakeholder map, BPMN pain metrics, data readiness pack, option matrix, vendor shortlist.

First 60 Days: Evaluate And Decide

Objectives: run vendor demos with scripted eval cases, estimate TCO, create architecture ADR, define controls and pilot scope, make go/no-go recommendation. Weeks:

Week 5: Convert requirements to eval matrix, run same demo cases across vendors.
Week 6: Run security, privacy, architecture and procurement review, build cost scenarios.
Week 7: Write ADRs, draft AI control pack, define pilot metrics and stop rules.
Week 8: Finalize scorecard, compare options, present executive decision memo. Outputs: requirements-to-eval matrix, vendor scorecard, TCO model, architecture ADR set, AI control pack, pilot success criteria, executive decision memo.

First 90 Days: Pilot And Adoption Setup

Objectives: execute controlled pilot, validate quality, risk, value and adoption, prepare operating model, decide scale, narrow, switch or stop. Weeks:

Week 9: Set up pilot, configure data access, roles, logs, train pilot users and champions.
Week 10: Move from shadow to assisted pilot if thresholds pass, collect feedback and override reasons.
Week 11: Build adoption dashboard, review metrics, run incident and rollback drill.
Week 12: Write pilot report, update business case, finalize RACI, decide next step. Outputs: pilot report, adoption dashboard, updated business case, operating model RACI, runbook, rollout plan, production readiness recommendation.

12. Interview Talking Points

30-second answer: "我做 AI vendor 和 build-vs-buy 时不会先问哪个模型最强, 而是先定义业务流程, 数据敏感度, 风险等级, eval 标准和 adoption 路径. 然后用 scorecard 比较 buy, build, partner 和 hybrid, 特别看 privacy, security, audit, eval, SLA, cost, lock-in 和 HITL. 最后用 pilot success criteria 和 adoption dashboard 证明这个工具是否真的改变工作方式." 2-minute structure:

Business problem: 场景, baseline, users, pain metric.
Workflow and risk: AS-IS / TO-BE, human approval, regulated boundary.
Options: build, buy, partner, hybrid.
Vendor diligence: model, data, security, governance, eval, SLA, cost, lock-in, integration.
Architecture: RAG, agent, model gateway, tool permission, audit.
Pilot: eval, stop rule, adoption metric, rollback.
Result: 用 dashboard 和 business case 决定 scale or stop. Role angles:
Senior architect: "我会把最容易锁定且风险最高的部分内部可控化, 比如 identity, policy, audit, eval, data governance and approval workflow. 对于 OCR, generic RAG, managed model gateway 或 observability 这类成熟能力, 可以 buy 或 partner 加速."
AI PM: "供应商好不好, 不是 demo 好不好看, 而是用户是否在真实流程中重复使用, 是否减少 rework, 是否提升质量, 是否能解释和回滚失败."
AI BA: "我会先把 stakeholder objections 和 workflow exceptions 画出来, 因为很多 AI 采购失败不是模型失败, 是没有处理真实审批, 异常, data ownership 和 audit evidence."

13. Common Failure Modes

Vendor selection failures:

Model benchmark 替代业务 eval.
Sales demo 替代 workflow pilot.
Security questionnaire 替代 architecture review.
Contract language 与实际 data flow 不一致.
Scorecard 不包含 lock-in 和 exit.
Pilot 用 toy data, 无法证明真实效果. Build-vs-buy failures:
团队因为 "我们能做" 而选择 build.
团队因为 "vendor 有功能" 而选择 buy.
没有计算 internal operating cost.
没有区分 commodity capability 和 differentiating capability.
忽略 model updates, prompt changes, eval maintenance and support.
Hybrid 边界不清, 最后两边都要维护. Adoption failures:
只培训一次.
只追 logins.
用户不知道什么时候不能用.
Manager KPI 与新流程冲突.
Champion network 只选积极用户.
Feedback 没有闭环.
质量问题被包装成 user resistance.
没有 rollback, 导致一线绕开工具. Governance failures:
AI control pack 没有 owner.
HITL 没有实际审批状态.
Audit trail 不能重建 case.
Prompt changes 不走 release gate.
Vendor model update 没有回归测试.
Incident response 不包含 AI quality incident.
Risk acceptance 没有期限.

14. Self-Review Checklist

Before recommending vendor or build path, answer:

Do we have a measurable business baseline.
Do we have a no-AI or rules-only alternative.
Do we know which data enters prompts, embeddings and logs.
Do we know whether outputs affect regulated or customer-impacting decisions.
Do we have a risk tier and control pack.
Do we have a customer-specific eval set.
Do we have vendor evidence for security, privacy, audit and incident response.
Do we have a cost per case model.
Do we have an exit plan.
Do we have an ADR for architecture decisions.
Do we have HITL workflow states, not just policy wording.
Do we have pilot success criteria and stop rules.
Do we have adoption metrics beyond usage.
Do we have rollback and escalation paths.
Do we have named owners for product, data, model, eval, vendor, incident and adoption. If any answer is no: record the gap, assign an owner, decide whether it blocks pilot/production/scale, add it to executive decision memo.

15. Portfolio Use

To turn this playbook into evidence:

Pick one scenario from section 10.
Fill 01-ai-opportunity-canvas.md.
Fill 03-bpmn-pain-metrics.md.
Fill 04-requirements-to-eval-matrix.md with at least 10 eval cases.
Fill 05-ai-control-pack.md.
Fill 08-ai-architecture-adr-set.md.
Fill 10-adoption-dashboard.md.
Fill 11-business-case.md.
Add all artifacts to 12-portfolio-evidence-map.md. Best flagship combinations:
AML copilot: governance, audit, human oversight.
KYC document automation: data readiness and document workflow.
Customer service copilot: AI PM adoption and knowledge governance.
Payments operations agent: architecture and tool-action controls.
Lending policy assistant: high-risk decision support.

16. Quick Review Prompts

Use these prompts before presenting a recommendation:

What business metric changes if this AI solution works.
What work should remain human-owned.
What data should never enter prompts or logs.
What is the first workflow step safe enough for pilot.
What vendor promise must be contractually binding.
What evidence would make us stop the pilot.
What part should be built internally to preserve control or exit option.
What cost driver could surprise us at scale.
What adoption signal proves behavior change, not curiosity.
What audit question must we be able to answer one year later.
Who owns model changes, prompt changes, eval refresh and incident response.
Which ABPA artifact proves each decision.

17. Final Operating Principle

Enterprise AI adoption is a chain of evidence:

Business pain
  -> workflow evidence
  -> data readiness
  -> AI fit
  -> build/buy decision
  -> vendor evidence
  -> architecture controls
  -> eval gates
  -> pilot proof
  -> adoption metrics
  -> operating model
  -> ongoing governance

If one link is missing, the AI project may still demo well, but it is not ready for enterprise scale. The strongest AI BA / PM / architect signal is the ability to explain every link in that chain, with artifacts and trade-offs, in a way that business, risk, technology and frontline teams can all act on.