返回 Papers
AI 扩展计划 / Playbooks

AI Vendor / Build-Buy / Adoption Playbook

AI 采购和落地不是 "选一个模型".

458AI_VENDOR_BUILD_BUY_ADOPTION_PLAYBOOK.md

AI Vendor / Build-vs-Buy / Enterprise Adoption Playbook

目的: 为 AI BA / AI PM / AI Solutions Architect / Enterprise Architect 建立 AI vendor evaluation, build-vs-buy decision, enterprise adoption and change management playbook. 范围: 金融零售企业 AI 场景, 包括 AML, KYC, customer service, payments operations, lending, compliance, risk operations. 说明: 本文是学习, 架构和作品集材料, 不是法律, 合规, 采购或投资建议. 正式项目必须由 legal, compliance, procurement, security, privacy, risk 和 business owner 审查.


1. Core Positioning

AI 采购和落地不是 "选一个模型". 真正要回答的是:

  • 业务问题是否值得用 AI, baseline 是什么, no-AI option 是什么.
  • 应该 buy, build, partner, 还是 hybrid.
  • vendor 是否满足 data privacy, security, governance, eval, SLA, cost, lock-in, customization, integration, audit, incident response, procurement 要求.
  • 架构是否可评估, 可回滚, 可审计, 可替换.
  • 用户是否真的改变工作方式, 组织是否有 owner, controls, eval, runbook, adoption metrics 和 vendor review cadence. 角色分工: | Role | 核心任务 | 关键证据 | |---|---|---| | AI BA | 定义问题, stakeholder evidence, workflow, requirements-to-eval | BPMN, stakeholder map, eval matrix | | AI PM | 定义用户价值, MVP scope, pilot, rollout, adoption, business case | PRD, scorecard, adoption dashboard | | Solutions Architect | 定义 RAG/agent/model gateway/integration/security/eval architecture | ADR, control pack, threat model | | Enterprise Architect | 定义 portfolio fit, standards, target architecture, governance model | capability map, roadmap, review pack | | EvalOps / Product Ops | 定义 eval dataset, release gate, monitoring, feedback loop | eval report, runbook, incident review | | Procurement / Vendor Owner | 定义 supplier risk, contract terms, SLA, exit plan | due diligence pack, commercial model |

2. ABPA Template Alignment

本文件不替换 docs/abpa/templates/, 而是说明如何组合使用.

Template用法
01-ai-opportunity-canvas.md证明问题值得做, 记录 no-AI option 和 baseline
02-stakeholder-evidence-map.md识别 business, risk, legal, security, data, IT, frontline objections
03-bpmn-pain-metrics.md画 AS-IS / TO-BE workflow, HITL, exception path
04-requirements-to-eval-matrix.md把需求转成 vendor demo script, eval case, release gate
05-ai-control-pack.md把 AI risk 转成 preventive, detective, corrective controls
06-executive-decision-memo.md记录 buy/build/partner/hybrid 推荐和 funding gate
07-data-readiness-pack.md评估 source of truth, quality, PII, lineage, retention, access
08-ai-architecture-adr-set.md记录 model/provider, RAG, HITL, eval, integration, audit decision
09-operating-model-raci.md明确 product, data, eval, risk, vendor, incident, adoption owner
10-adoption-dashboard.md追踪 activation, usage, trust, quality, business outcome
11-business-case.md建立 TCO, unit economics, benefit, risk-adjusted ROI
12-portfolio-evidence-map.md把产物包装成面试和作品集证据

3. Source And Standard Anchors

这些 anchor 用来组织问题和证据, 不等同于法律或采购结论.

AnchorOfficial / primary source用法
NIST AI RMF 1.0https://www.nist.gov/itl/ai-risk-management-framework用 Govern, Map, Measure, Manage 组织 AI risk, eval, monitoring, governance
NIST AI 600-1 GenAI Profilehttps://www.nist.gov/publications/artificial-intelligence-risk-management-framework-generative-artificial-intelligence用于 GenAI 风险, content provenance, incident, model behavior controls
EU AI Act, Regulation (EU) 2024/1689https://eur-lex.europa.eu/eli/reg/2024/1689/oj/eng用 risk-based lens 识别 high-risk use case, transparency, human oversight, documentation
ISO/IEC 42001:2023https://www.iso.org/standard/42001用 AI management system 建立 lifecycle, accountability, policy, continual improvement
OWASP Top 10 for LLM Applications 2025https://genai.owasp.org/llm-top-10/用 prompt injection, sensitive data disclosure, supply chain, excessive agency 做安全审查
SOC Suite / SOC 2https://www.aicpa-cima.com/resources/landing/system-and-organization-controls-soc-suite-of-services用于 vendor security, availability, confidentiality, privacy, processing integrity diligence
ISO/IEC 27001:2022https://www.iso.org/standard/27001用于 ISMS, security risk management, certification evidence
TOGAF Standard, 10th Editionhttps://www.opengroup.org/togaf-standard-10th-edition-downloads用 ADM, architecture governance, capability planning, roadmap 管理架构决策
Standards-to-artifacts:
  • NIST AI RMF -> AI Control Pack, risk register, eval gate, monitoring plan.
  • NIST GenAI Profile -> GenAI risk checklist, red-team plan, incident criteria.
  • EU AI Act -> risk classification memo, transparency checklist, human oversight memo.
  • ISO/IEC 42001 -> AI operating model, accountability matrix, lifecycle cadence.
  • OWASP LLM Top 10 -> LLM threat model, prompt/tool/data controls, red-team backlog.
  • SOC 2 / ISO 27001 -> vendor security evidence checklist, security review gates.
  • TOGAF -> capability map, architecture roadmap, ADR, governance board decision.

4. Decision Flow And Stage Gates

Business problem
  -> AI fit and no-AI option
  -> data readiness and risk tier
  -> build / buy / partner / hybrid options
  -> vendor due diligence or internal platform readiness
  -> architecture ADR
  -> pilot success criteria
  -> security, risk, compliance, procurement gates
  -> controlled rollout
  -> adoption dashboard
  -> ongoing governance and vendor review
GateDecisionRequired evidenceStop signal
Gate 0: ProblemShould we explore AI?Opportunity canvas, baseline, ownerNo measurable pain
Gate 1: FeasibilityIs AI plausible?Data readiness, workflow map, risk tierNo data owner or unacceptable risk
Gate 2: OptionBuild, buy, partner, or hybrid?Decision matrix, TCO, lock-in analysisPreference-only selection
Gate 3: Vendor / architectureSafe enough for pilot?Due diligence, ADR, control packNo audit, privacy, security, eval path
Gate 4: PilotDid pilot prove value and control?Eval report, feedback, pilot metricsPoor quality or low trust
Gate 5: ProductionCan we operate this?RACI, runbook, incident responseNo owner for data, eval, vendor
Gate 6: ScaleShould we expand?Adoption dashboard, ROI, risk reviewUsage without quality or value
Decision rules:
  • Do not start vendor selection before defining workflow and success metric.
  • Do not compare vendors only by model benchmark.
  • Do not let procurement lead alone without architecture, data, risk and users.
  • Do not let a PoC bypass production controls because it looks useful.
  • Do not scale without adoption metrics and rollback criteria.
  • Do not accept "vendor handles it" without evidence, contract terms and operational workflow.

5. AI Vendor Due Diligence

Use this table as the first-pass diligence map. Expand high-risk rows into detailed questionnaires.

DimensionKey questionsEvidence to requestRed flags
Model / providerWhich foundation models, embeddings and rerankers are used? Can versions be frozen, tested, upgraded and rolled back? Are deterministic calculations separated from probabilistic outputs? Are prompts, completions, embeddings or logs used for training?Model/provider list, lifecycle policy, release notes, eval report, data usage terms, subprocessor listSilent model updates, vague model identity, generic accuracy claims, no fallback behavior
Data privacyWhat data enters prompts, embeddings, logs, telemetry and support tools? Is PII, PHI, PCI, account or transaction data processed? Where is data stored, processed and retained? Are embeddings treated as sensitive data?Data flow diagram, classification, retention schedule, DPA, subprocessor list, deletion process"We do not store data" without log details, no embedding governance, broad support access, residency only in sales material
SecurityIs SSO/SAML/OIDC, SCIM, RBAC and tenant isolation supported? How are secrets and API keys stored? How are prompt injection, excessive agency and data exfiltration tested?SOC 2 or bridge letter, ISO 27001 certificate, pen test summary, security whitepaper, secure SDLC, incident policyNo SSO, broad DB access, prompt-only security control, no incident notification commitment
GovernanceIs there AI system inventory and risk classification? Can regulated actions be blocked, routed or dual-approved? Are prompts, tools, retrieval index and models versioned?AI governance policy, change workflow, HITL config, audit log schema, eval and red-team processGovernance PDF only, no versioning, no case-level audit reconstruction
EvaluationAre evals generic or customer-specific? Can customer gold datasets and rubrics be used? Are offline eval, shadow mode and production sampling supported? Can eval failures block release?Eval methodology, sample eval report, failure taxonomy, release gate policy, reviewer calibrationAggregate thumbs-up only, no negative cases, no stale-source tests, no eval export
SLA / reliabilityWhat uptime and latency SLO apply to the full product? Are rate limits and regional limits known? Is there fallback model, degraded mode or read-only mode?SLA terms, status history, DR policy, rate-limit docs, support plan, incident postmortem sampleSLA excludes key dependencies, no degraded mode, support slower than business SLA
CostIs pricing per user, case, document, token, tool call, workflow or environment? Are model usage, embeddings, vector storage, observability, connectors and support included?Pricing workbook, usage telemetry, cost allocation export, rate card, renewal terms, services estimateCannot model cost per case, token costs excluded, no budget caps, unclear renewal uplift
Lock-inWhich parts are proprietary: prompts, workflow, index, embeddings, memory, evals, connectors, policies, agents? Can data, logs, evals and config be exported?Export docs, termination terms, IP ownership, open API docs, migration planNo export for audit logs or evals, proprietary workflow, no exit cost definition
CustomizationCan the product support bank policy, jurisdiction, product, language and role differences? Are policy updates approved, versioned and regression tested?Configuration model, admin demo, policy workflow, prompt registry, regression results, reference customerVendor-only customization, no content owner/effective date, custom logic hidden in prompts
IntegrationWhich systems integrate: CRM, core banking, case management, LOS, payment systems, contact center, IAM, SIEM? Are integrations read-only, write, pre-fill, or automated?API docs, reference architecture, auth pattern, sandbox, data mappingBroad DB access, write actions without idempotency, bypassed authorization
AuditCan logs show user, role, input, source, model version, prompt version, retrieval results, output, approval and final action? Can evidence be exported?Audit log schema, sample evidence package, retention config, export API, admin audit event listAudit trail is chat transcript only, logs omit retrieval or tools, admin changes not audited
Incident responseAre data exposure, prompt injection, unsafe output, outage, cost runaway and tool misuse in scope? Can customer trigger kill switch?Incident response plan, notification commitment, kill switch docs, postmortem template, severity matrixSecurity-only incident policy, no AI quality incident, no customer-side kill switch
ProcurementIs vendor approved for data classification and geography? Are DPA, BAA, security addendum or AI addendum needed? Are liability, audit rights, subprocessors and exit terms acceptable?MSA, DPA, SLA, order form, support terms, security addendum, subprocessor scheduleContract contradicts sales promises, no audit rights, weak liability cap, subprocessor changes without notice
Minimum due diligence questions:
  • Executive: What workflows have you solved in regulated industries? Which production customers are closest? Which parts are product, services and partner work? What assumptions must be true for ROI?
  • Product: Which users and approvers are supported? What exception paths are supported? How are citations, uncertainty, refusal, feedback and override reason shown?
  • Architecture: Show end-to-end data flow. Which components are customer-controlled? How are retrieval, grounding, tool permissions and audit implemented?
  • Data: What is copied, indexed, embedded or cached? Can data be filtered by role, region, product, customer segment and effective date? How are stale sources handled?
  • Risk: Which AI risks do you explicitly manage? How do you prevent over-reliance, automation bias, prohibited advice and unsafe action?
  • Security: How do you protect secrets and downstream credentials? How do you test prompt injection and excessive agency? How are logs retained?
  • Commercial: Give conservative, expected and upside cost scenarios. What costs are excluded? What is the cost to export and exit?

6. Vendor Scorecard

Adjust weights by risk tier. For AML, lending and payments, increase governance, audit, security and eval weights.

DimensionWeightScore 1Score 3Score 5
Business fit10Generic demoSome workflow fitStrong domain workflow fit
Model quality10Claims onlyGeneric benchmarkCustomer eval with failure analysis
Data privacy10VagueBasic DPAConfigurable by data class
Security10Weak controlsSome evidenceSSO, RBAC, logs, SOC/ISO evidence
AI governance10Checklist onlyPartial controlsVersioning, HITL, release gates
EvalOps10NoneBasic testsOffline, shadow and production eval
Integration10Manual exportLimited APIIAM, SIEM, workflow integration
Reliability7Best effortBasic SLADR, fallback, degraded mode
Cost8UnclearPricing availableCost per case and caps
Lock-in / exit7Black boxPartial exportClear exit plan
Customization5Vendor-onlySome configGoverned domain config
Procurement risk3Sales promisesStandard termsContractable obligations
Score guidance:
  • 0-50: Do not pilot except sandbox learning.
  • 51-70: Low-risk pilot only, strict controls.
  • 71-85: Candidate for controlled pilot.
  • 86-100: Production candidate, still requiring architecture and risk gates. Interpretation:
  • High product quality with weak audit may still fail AML or lending.
  • Strong governance with moderate model quality may be better for regulated operations.
  • Low price without exit plan can become expensive at scale.

7. Build Vs Buy Vs Partner Vs Hybrid

OptionDefinitionBest whenMain risk
BuyUse vendor SaaS or managed platform for most capabilitySpeed, common workflow, small platform teamLock-in, hidden cost, limited control
BuildInternal team builds core capabilityDifferentiating workflow, strict constraints, scale economicsDelivery risk, ops burden, security burden
PartnerCo-build with vendor, SI or domain specialistNeed domain expertise or delivery capacityDependency and knowledge transfer risk
HybridBuild control-heavy or differentiating parts, buy commodity partsRegulated workflow, speed plus controlBoundary complexity
Decision heuristics:
  • Buy when workflow is common, vendor has audited controls, time-to-value is critical, differentiation is adoption/workflow rather than infrastructure.
  • Build when workflow is core competitive advantage, SaaS cannot meet data/security constraints, controls cannot be implemented by vendor, scale changes economics, internal team can operate production AI.
  • Partner when regulated domain design or delivery capacity is missing, but internal owners must learn and take over.
  • Hybrid when vendor UI/model layer is strong but governance, policy, audit, eval, data or workflow boundary must remain internally controlled. Component matrix: | Capability | Buy | Build | Partner | Practical recommendation | |---|---|---|---|---| | RAG knowledge assistant | Common KB and service workflows | Sensitive corpus, complex entitlement | Taxonomy cleanup | Buy retrieval, build metadata, eval and source governance | | Agent workflow | Bounded internal automation | High-risk actions, proprietary orchestration | Workflow redesign | Buy runtime, build tool policy and approval layer | | Model gateway | Cloud/platform routing and telemetry | Multi-cloud or custom policy | Setup help | Hybrid gateway with internal policy abstraction | | Eval platform | Test management and dashboards | Domain labels and release gates | Initial eval library | Buy tooling, build gold data and acceptance policy | | Vector DB / search | Managed service | Strict data control or cost | Indexing architecture | Managed infra with internal access and retention controls | | Document processing | Standard forms and OCR | Proprietary docs | Annotation and tuning | Buy extraction, build validation workflow | | Workflow automation | Existing case/workflow platform fits | Deep core ops integration | BPMN/process redesign | Use existing workflow platform, integrate AI services | | Monitoring | Platform telemetry | Enterprise SIEM and risk metrics | Runbook setup | Export vendor telemetry to internal dashboards | | Governance tooling | Enterprise inventory/GRC fit | Focused lightweight registry | Policy library | Buy inventory, build use-case evidence | | Prompt/policy registry | Integrated release workflow | Strategic prompt/policy control | Standards setup | Build internal registry, sync to vendor config | | Tool gateway | Managed connectors | High-risk permissions | Integration adapters | Build policy wrapper around vendor tools | | Human review workbench | Vendor reviewer UX fits | Domain review UX is advantage | Frontline design | Buy basic review, build domain QA queue | Default financial retail patterns: | Use case | Starting option | Reason | |---|---|---| | AML investigation copilot | Hybrid | Buy extraction/RAG, build audit, HITL and SAR boundary | | KYC document automation | Buy + partner | OCR is commodity, policy and remediation workflow need domain setup | | Customer service copilot | Buy | Common pattern, value depends on knowledge governance and adoption | | Payments operations agent | Hybrid | Payment actions require idempotency, approval, audit and rail controls | | Lending policy assistant | Hybrid | Policy RAG can be bought, credit support needs governance and eval |

8. Required Artifacts

8.1 Vendor Scorecard

Fields: vendor name, use case, risk tier, business fit score, security score, privacy score, eval score, governance score, integration score, cost score, lock-in score, recommendation, mitigations, go/no-go decision.

8.2 Due Diligence Pack

Fields: question, owner, evidence requested, evidence received, evidence quality, open gap, blocker status, due date.

8.3 Architecture ADR

Use docs/abpa/templates/08-ai-architecture-adr-set.md. Minimum ADRs:

  • AI pattern selection.
  • Model and provider strategy.
  • RAG and knowledge architecture.
  • Human-in-the-loop design.
  • Eval and observability architecture.
  • Integration, security and audit.
  • Vendor strategy and exit trigger. ADR must answer:
  • What decision are we making.
  • What options were considered.
  • What evidence supports the decision.
  • What risk remains.
  • What controls reduce risk.
  • What would make the decision wrong.
  • What is the reversal trigger.

8.4 Risk Acceptance Memo

Fields:

FieldRequired content
Use caseBusiness workflow and users
DecisionProceed, pause, reduce scope, or reject
Risk ownerPerson accepting residual risk
Business ownerPerson accountable for value
Architecture ownerPerson accountable for technical controls
Risk statementWhat could go wrong
ImpactCustomer, financial, operational, legal, compliance, reputational
ControlsPreventive, detective, corrective
EvidenceEval result, security review, vendor evidence, pilot result
Residual riskWhat remains after controls
ExpiryDate or trigger for re-review
Stop ruleCondition requiring rollback or suspension
Rules: no unnamed owner, no vague risk, no permanent pilot acceptance, no bypass of legal/regulatory review.

8.5 Pilot Success Criteria

Define workflow included/excluded, user group, case type, data sources, AI actions allowed/prohibited, human approval points, quality thresholds, risk thresholds, cost thresholds, adoption thresholds, stop rules, decision date.

CategoryExample metrics
QualityCitation precision, unsupported claim rate, policy violation rate
WorkflowCycle time, touch time, rework, backlog
TrustAcceptance rate, edit rate, override reason, confidence score
RiskEscalation rate, incident count, audit defect
CostCost per assisted case, model spend, support cost
AdoptionActivated users, repeat users, eligible cases touched
Stop rules: unsupported claim exceeds threshold, sensitive data leaks, users copy without evidence review, SLA worsens, vendor outage blocks workflow, cost per case exceeds threshold, risk/legal/security raises blocker.

8.6 Rollout Plan

PhaseScopeGoal
Phase 0Sandbox with redacted dataLearn product and validate assumptions
Phase 1Shadow modeCompare AI output with human workflow
Phase 2Assisted pilotUsers see AI output, no external action
Phase 3Controlled productionLimited team and case type, full monitoring
Phase 4ScaleMore teams and cases with standardized controls
Rollout fields: phase, users, case types, features enabled, training, controls, support model, monitoring cadence, success metric, expansion criteria, rollback criteria.

8.7 Adoption Dashboard

Use docs/abpa/templates/10-adoption-dashboard.md. Required views: funnel by team, usage by workflow step, acceptance/edit/rejection trend, override reasons, quality defects, escalations, time saved, cost per case, incident/rollback events, training completion.

9. Enterprise Adoption And Change Management

Adoption is not login count. Adoption means users safely change how work is done, managers reinforce the workflow, quality improves, and controls remain observable.

9.1 Stakeholder Segmentation

SegmentExampleNeedConcernEngagement
Executive sponsorCOO, CDO, business headValue, risk, fundingPilot theaterMonthly decision memo
Process ownerAML ops lead, contact center leadWorkflow performanceSLA disruptionBPMN and metric review
Frontline usersAnalysts, agents, underwritersUseful outputTrust, job impactShadowing, training, feedback
ManagersTeam leads, QA leadsCoaching and monitoringHard to assess usage qualityAdoption dashboard
Risk / complianceCompliance, MRM, legalControls and evidenceUnsafe automationControl pack and release gate
Security / privacyCISO, DPO, IAMData and access controlLeakage, vendor riskSecurity review
Data ownersCRM, policy KB, transaction dataStewardshipBad data blamed on source teamData readiness
IT / engineeringPlatform, integration, supportOperabilityFragile integrationRunbook
ProcurementSourcing, vendor managementContract and supplier riskHidden obligationsDue diligence pack

9.2 Champion Network

Champion design: pick respected frontline users, include skeptics, give feedback influence, train on boundaries, collect failure examples, co-create SOP and office hours. Champion responsibilities: validate workflow fit, test hard cases, explain tool boundaries, report trust/adoption blockers, review training material, tune feedback taxonomy.

9.3 Training Model

AudienceTraining focus
Frontline usersWhen to use, verify evidence, reject, escalate
Reviewers / approversEvaluate output, override, record decision, sample defects
ManagersRead adoption dashboard and coach behavior
Product / BATurn feedback into backlog and eval cases
Risk / complianceControls, audit logs, HITL, incident process
Support / ITTriage outage, access, latency, integration failures
Training artifacts: role SOP, allowed/prohibited use cases, evidence checklist, escalation tree, failure examples, feedback taxonomy, data handling reminder, office hours.

9.4 Trust-Building

Trust mechanisms: show source citations and effective dates, prefer "not enough evidence" over overconfident answers, make uncertainty visible, publish quality trend and limitations, separate suggested text from approved final action, require evidence confirmation for sensitive cases. Bad patterns: hidden evidence, speed-only incentives, manager pressure to accept output, usage-only success, failure reports without response.

9.5 Human-In-The-Loop Process Change

Define for each AI output: human role, decision authority, evidence required, override reason taxonomy, escalation owner, SLA impact, audit field, stop condition.

LevelMeaningFinancial retail example
L0AI disabledLegal hold or disputed incident
L1Evidence retrieval onlyAML evidence packet
L2Draft, human editsService response draft
L3Recommend, human approvesFraud next action
L4Pre-fill action, human confirmsPayment retry instruction
L5Execute bounded low-risk actionLow-risk knowledge routing
Default: regulated, financial, adverse or customer-impacting actions start at L1-L3. L4 requires audit, idempotency, rollback and approval. L5 should be rare in financial retail unless low-risk and reversible.

9.6 Adoption Metrics

Minimum metrics: eligible users, activated users, repeat users, habit users, eligible cases touched, suggestions generated, accepted, edited, rejected, override reasons, user confidence, unsupported claim rate, escalation rate, cycle time, rework rate, cost per assisted case. Interpretation:

  • High usage + high override = tool is in workflow but not trusted.
  • Low usage + high quality = workflow trigger, training or manager incentive problem.
  • High acceptance + high defect = automation bias risk.
  • High edit rate = output useful but format or policy fit weak.
  • High escalation = scope too broad or data insufficient.

9.7 Rollback And Escalation

LevelActionTrigger
1Disable one feature or promptQuality regression in one output
2Disable one tool/actionTool misuse or integration issue
3Switch fallback model/providerModel outage or regression
4Read-only modeWorkflow action risk
5Full suspensionData exposure, customer harm, regulatory issue
Escalation paths: frontline issue -> support -> product owner. Quality issue -> EvalOps -> release gate. Security issue -> security incident. Privacy issue -> privacy/legal. Compliance issue -> risk/compliance. Vendor outage -> vendor management/engineering. Customer harm -> executive sponsor and incident commander.

10. Financial Retail Examples

ScenarioBusiness problemAI patternDiligence focusBuild/buy recommendationPilot success criteriaInterview point
AML copilot vendorInvestigators spend too much time gathering evidence and drafting narratives. False positives and backlog reduce time for judgmentCase investigation copilot. RAG over KYC, transactions, case notes, typologies and SOP. Human owns suspicious activity decisionEvidence retrieval, citation quality, prompt injection from adverse media, no autonomous SAR/STR decision, immutable audit log, RBAC, supervisor reviewHybrid. Buy extraction/RAG, build SAR boundary, case approval, audit and policy enforcementDraft time reduced, QA defect not worse, citation precision above threshold, no unsupported red-flag claims, supervisor acceptance above thresholdAML AI is investigation evidence support, not a decision engine
KYC document automationRemediation and onboarding are slowed by document intake, OCR, field extraction and policy validationDocument AI + workflow automation + policy checklist. Human approval for high-risk customers, UBO changes, source-of-funds exceptionsOCR accuracy by document type, PII retention, data residency, document fraud controls, policy version, reviewer override, lineage to customer masterBuy + partner. Buy standard extraction, partner/build domain validation and policy mappingExtraction threshold met, false accept below threshold, cycle time improves, override reasons captured, source lineage recordedKYC value is controlled loop from data gap to outreach to verified source-of-truth update
Customer service copilotAgents search multiple systems, answer quality varies, after-call work is highRAG assistant + response draft + call/chat summary. Agent confirms and sendsKnowledge governance, effective date, jurisdiction, authentication boundary, advice guardrails, contact center integration, QA feedbackBuy. Focus internal effort on knowledge governance, eval and trainingAHT or after-call work improves, QA defect stable/improved, agents use citations, knowledge gaps fixed, escalation improvesService copilot succeeds when knowledge governance and behavior change are designed together
Payments operations agentExceptions, returns, retries, reconciliation breaks and customer inquiries create backlogBounded operations agent. Evidence gathering, classification, next action recommendation, action pre-fill, human approvalTool permission, idempotency, rail rules, ledger integrity, segregation of duties, tool-call audit, kill switchHybrid. Buy workflow assistant/model gateway, build payment action policy, approval, idempotency, auditBreak resolution time improves, no unauthorized action, reviewer agreement high, SLA breaches decrease, cost per resolved exception improvesPayments AI must separate understanding from execution
Lending policy assistantUnderwriters need consistent policy interpretation, document summaries and exception analysisPolicy RAG + document summarization + decision support. Credit decision remains human-ownedHigh-risk AI lens, policy citation, deterministic calculations separate from LLM, fair lending, reason-code support, decision auditHybrid. Buy policy retrieval/document summary, build governance, reason-code controls, decision record and fairness monitoringCitation correctness above threshold, memo time improves, no prohibited language, overrides captured, segment quality acceptableDo not let an LLM own the credit decision

11. 30 / 60 / 90-Day Execution Roadmap

First 30 Days: Frame And Shortlist

Objectives: define business problem and baseline, identify stakeholders, map workflow, define AI fit/no-AI option, shortlist build/buy/partner/hybrid options. Weeks:

  • Week 1: Choose scenario, write opportunity canvas, identify sponsor, process owner, data owner, risk owner, tech owner.
  • Week 2: Build stakeholder map, interview frontline users, draft AS-IS BPMN with exception flow.
  • Week 3: Build data readiness pack, classify data sensitivity, identify eval samples and AI patterns.
  • Week 4: Draft option matrix, create vendor shortlist, send RFI, draft executive memo v0.1. Outputs: opportunity canvas, stakeholder map, BPMN pain metrics, data readiness pack, option matrix, vendor shortlist.

First 60 Days: Evaluate And Decide

Objectives: run vendor demos with scripted eval cases, estimate TCO, create architecture ADR, define controls and pilot scope, make go/no-go recommendation. Weeks:

  • Week 5: Convert requirements to eval matrix, run same demo cases across vendors.
  • Week 6: Run security, privacy, architecture and procurement review, build cost scenarios.
  • Week 7: Write ADRs, draft AI control pack, define pilot metrics and stop rules.
  • Week 8: Finalize scorecard, compare options, present executive decision memo. Outputs: requirements-to-eval matrix, vendor scorecard, TCO model, architecture ADR set, AI control pack, pilot success criteria, executive decision memo.

First 90 Days: Pilot And Adoption Setup

Objectives: execute controlled pilot, validate quality, risk, value and adoption, prepare operating model, decide scale, narrow, switch or stop. Weeks:

  • Week 9: Set up pilot, configure data access, roles, logs, train pilot users and champions.
  • Week 10: Move from shadow to assisted pilot if thresholds pass, collect feedback and override reasons.
  • Week 11: Build adoption dashboard, review metrics, run incident and rollback drill.
  • Week 12: Write pilot report, update business case, finalize RACI, decide next step. Outputs: pilot report, adoption dashboard, updated business case, operating model RACI, runbook, rollout plan, production readiness recommendation.

12. Interview Talking Points

30-second answer: "我做 AI vendor 和 build-vs-buy 时不会先问哪个模型最强, 而是先定义业务流程, 数据敏感度, 风险等级, eval 标准和 adoption 路径. 然后用 scorecard 比较 buy, build, partner 和 hybrid, 特别看 privacy, security, audit, eval, SLA, cost, lock-in 和 HITL. 最后用 pilot success criteria 和 adoption dashboard 证明这个工具是否真的改变工作方式." 2-minute structure:

  • Business problem: 场景, baseline, users, pain metric.
  • Workflow and risk: AS-IS / TO-BE, human approval, regulated boundary.
  • Options: build, buy, partner, hybrid.
  • Vendor diligence: model, data, security, governance, eval, SLA, cost, lock-in, integration.
  • Architecture: RAG, agent, model gateway, tool permission, audit.
  • Pilot: eval, stop rule, adoption metric, rollback.
  • Result: 用 dashboard 和 business case 决定 scale or stop. Role angles:
  • Senior architect: "我会把最容易锁定且风险最高的部分内部可控化, 比如 identity, policy, audit, eval, data governance and approval workflow. 对于 OCR, generic RAG, managed model gateway 或 observability 这类成熟能力, 可以 buy 或 partner 加速."
  • AI PM: "供应商好不好, 不是 demo 好不好看, 而是用户是否在真实流程中重复使用, 是否减少 rework, 是否提升质量, 是否能解释和回滚失败."
  • AI BA: "我会先把 stakeholder objections 和 workflow exceptions 画出来, 因为很多 AI 采购失败不是模型失败, 是没有处理真实审批, 异常, data ownership 和 audit evidence."

13. Common Failure Modes

Vendor selection failures:

  • Model benchmark 替代业务 eval.
  • Sales demo 替代 workflow pilot.
  • Security questionnaire 替代 architecture review.
  • Contract language 与实际 data flow 不一致.
  • Scorecard 不包含 lock-in 和 exit.
  • Pilot 用 toy data, 无法证明真实效果. Build-vs-buy failures:
  • 团队因为 "我们能做" 而选择 build.
  • 团队因为 "vendor 有功能" 而选择 buy.
  • 没有计算 internal operating cost.
  • 没有区分 commodity capability 和 differentiating capability.
  • 忽略 model updates, prompt changes, eval maintenance and support.
  • Hybrid 边界不清, 最后两边都要维护. Adoption failures:
  • 只培训一次.
  • 只追 logins.
  • 用户不知道什么时候不能用.
  • Manager KPI 与新流程冲突.
  • Champion network 只选积极用户.
  • Feedback 没有闭环.
  • 质量问题被包装成 user resistance.
  • 没有 rollback, 导致一线绕开工具. Governance failures:
  • AI control pack 没有 owner.
  • HITL 没有实际审批状态.
  • Audit trail 不能重建 case.
  • Prompt changes 不走 release gate.
  • Vendor model update 没有回归测试.
  • Incident response 不包含 AI quality incident.
  • Risk acceptance 没有期限.

14. Self-Review Checklist

Before recommending vendor or build path, answer:

  • Do we have a measurable business baseline.
  • Do we have a no-AI or rules-only alternative.
  • Do we know which data enters prompts, embeddings and logs.
  • Do we know whether outputs affect regulated or customer-impacting decisions.
  • Do we have a risk tier and control pack.
  • Do we have a customer-specific eval set.
  • Do we have vendor evidence for security, privacy, audit and incident response.
  • Do we have a cost per case model.
  • Do we have an exit plan.
  • Do we have an ADR for architecture decisions.
  • Do we have HITL workflow states, not just policy wording.
  • Do we have pilot success criteria and stop rules.
  • Do we have adoption metrics beyond usage.
  • Do we have rollback and escalation paths.
  • Do we have named owners for product, data, model, eval, vendor, incident and adoption. If any answer is no: record the gap, assign an owner, decide whether it blocks pilot/production/scale, add it to executive decision memo.

15. Portfolio Use

To turn this playbook into evidence:

  • Pick one scenario from section 10.
  • Fill 01-ai-opportunity-canvas.md.
  • Fill 03-bpmn-pain-metrics.md.
  • Fill 04-requirements-to-eval-matrix.md with at least 10 eval cases.
  • Fill 05-ai-control-pack.md.
  • Fill 08-ai-architecture-adr-set.md.
  • Fill 10-adoption-dashboard.md.
  • Fill 11-business-case.md.
  • Add all artifacts to 12-portfolio-evidence-map.md. Best flagship combinations:
  • AML copilot: governance, audit, human oversight.
  • KYC document automation: data readiness and document workflow.
  • Customer service copilot: AI PM adoption and knowledge governance.
  • Payments operations agent: architecture and tool-action controls.
  • Lending policy assistant: high-risk decision support.

16. Quick Review Prompts

Use these prompts before presenting a recommendation:

  • What business metric changes if this AI solution works.
  • What work should remain human-owned.
  • What data should never enter prompts or logs.
  • What is the first workflow step safe enough for pilot.
  • What vendor promise must be contractually binding.
  • What evidence would make us stop the pilot.
  • What part should be built internally to preserve control or exit option.
  • What cost driver could surprise us at scale.
  • What adoption signal proves behavior change, not curiosity.
  • What audit question must we be able to answer one year later.
  • Who owns model changes, prompt changes, eval refresh and incident response.
  • Which ABPA artifact proves each decision.

17. Final Operating Principle

Enterprise AI adoption is a chain of evidence:

Business pain
  -> workflow evidence
  -> data readiness
  -> AI fit
  -> build/buy decision
  -> vendor evidence
  -> architecture controls
  -> eval gates
  -> pilot proof
  -> adoption metrics
  -> operating model
  -> ongoing governance

If one link is missing, the AI project may still demo well, but it is not ready for enterprise scale. The strongest AI BA / PM / architect signal is the ability to explain every link in that chain, with artifacts and trade-offs, in a way that business, risk, technology and frontline teams can all act on.