AI Vendor / Build-Buy / Adoption Playbook
AI 采购和落地不是 "选一个模型".
AI Vendor / Build-vs-Buy / Enterprise Adoption Playbook
目的: 为 AI BA / AI PM / AI Solutions Architect / Enterprise Architect 建立 AI vendor evaluation, build-vs-buy decision, enterprise adoption and change management playbook. 范围: 金融零售企业 AI 场景, 包括 AML, KYC, customer service, payments operations, lending, compliance, risk operations. 说明: 本文是学习, 架构和作品集材料, 不是法律, 合规, 采购或投资建议. 正式项目必须由 legal, compliance, procurement, security, privacy, risk 和 business owner 审查.
1. Core Positioning
AI 采购和落地不是 "选一个模型". 真正要回答的是:
- 业务问题是否值得用 AI, baseline 是什么, no-AI option 是什么.
- 应该 buy, build, partner, 还是 hybrid.
- vendor 是否满足 data privacy, security, governance, eval, SLA, cost, lock-in, customization, integration, audit, incident response, procurement 要求.
- 架构是否可评估, 可回滚, 可审计, 可替换.
- 用户是否真的改变工作方式, 组织是否有 owner, controls, eval, runbook, adoption metrics 和 vendor review cadence. 角色分工: | Role | 核心任务 | 关键证据 | |---|---|---| | AI BA | 定义问题, stakeholder evidence, workflow, requirements-to-eval | BPMN, stakeholder map, eval matrix | | AI PM | 定义用户价值, MVP scope, pilot, rollout, adoption, business case | PRD, scorecard, adoption dashboard | | Solutions Architect | 定义 RAG/agent/model gateway/integration/security/eval architecture | ADR, control pack, threat model | | Enterprise Architect | 定义 portfolio fit, standards, target architecture, governance model | capability map, roadmap, review pack | | EvalOps / Product Ops | 定义 eval dataset, release gate, monitoring, feedback loop | eval report, runbook, incident review | | Procurement / Vendor Owner | 定义 supplier risk, contract terms, SLA, exit plan | due diligence pack, commercial model |
2. ABPA Template Alignment
本文件不替换 docs/abpa/templates/, 而是说明如何组合使用.
| Template | 用法 |
|---|---|
01-ai-opportunity-canvas.md | 证明问题值得做, 记录 no-AI option 和 baseline |
02-stakeholder-evidence-map.md | 识别 business, risk, legal, security, data, IT, frontline objections |
03-bpmn-pain-metrics.md | 画 AS-IS / TO-BE workflow, HITL, exception path |
04-requirements-to-eval-matrix.md | 把需求转成 vendor demo script, eval case, release gate |
05-ai-control-pack.md | 把 AI risk 转成 preventive, detective, corrective controls |
06-executive-decision-memo.md | 记录 buy/build/partner/hybrid 推荐和 funding gate |
07-data-readiness-pack.md | 评估 source of truth, quality, PII, lineage, retention, access |
08-ai-architecture-adr-set.md | 记录 model/provider, RAG, HITL, eval, integration, audit decision |
09-operating-model-raci.md | 明确 product, data, eval, risk, vendor, incident, adoption owner |
10-adoption-dashboard.md | 追踪 activation, usage, trust, quality, business outcome |
11-business-case.md | 建立 TCO, unit economics, benefit, risk-adjusted ROI |
12-portfolio-evidence-map.md | 把产物包装成面试和作品集证据 |
3. Source And Standard Anchors
这些 anchor 用来组织问题和证据, 不等同于法律或采购结论.
| Anchor | Official / primary source | 用法 |
|---|---|---|
| NIST AI RMF 1.0 | https://www.nist.gov/itl/ai-risk-management-framework | 用 Govern, Map, Measure, Manage 组织 AI risk, eval, monitoring, governance |
| NIST AI 600-1 GenAI Profile | https://www.nist.gov/publications/artificial-intelligence-risk-management-framework-generative-artificial-intelligence | 用于 GenAI 风险, content provenance, incident, model behavior controls |
| EU AI Act, Regulation (EU) 2024/1689 | https://eur-lex.europa.eu/eli/reg/2024/1689/oj/eng | 用 risk-based lens 识别 high-risk use case, transparency, human oversight, documentation |
| ISO/IEC 42001:2023 | https://www.iso.org/standard/42001 | 用 AI management system 建立 lifecycle, accountability, policy, continual improvement |
| OWASP Top 10 for LLM Applications 2025 | https://genai.owasp.org/llm-top-10/ | 用 prompt injection, sensitive data disclosure, supply chain, excessive agency 做安全审查 |
| SOC Suite / SOC 2 | https://www.aicpa-cima.com/resources/landing/system-and-organization-controls-soc-suite-of-services | 用于 vendor security, availability, confidentiality, privacy, processing integrity diligence |
| ISO/IEC 27001:2022 | https://www.iso.org/standard/27001 | 用于 ISMS, security risk management, certification evidence |
| TOGAF Standard, 10th Edition | https://www.opengroup.org/togaf-standard-10th-edition-downloads | 用 ADM, architecture governance, capability planning, roadmap 管理架构决策 |
| Standards-to-artifacts: |
- NIST AI RMF -> AI Control Pack, risk register, eval gate, monitoring plan.
- NIST GenAI Profile -> GenAI risk checklist, red-team plan, incident criteria.
- EU AI Act -> risk classification memo, transparency checklist, human oversight memo.
- ISO/IEC 42001 -> AI operating model, accountability matrix, lifecycle cadence.
- OWASP LLM Top 10 -> LLM threat model, prompt/tool/data controls, red-team backlog.
- SOC 2 / ISO 27001 -> vendor security evidence checklist, security review gates.
- TOGAF -> capability map, architecture roadmap, ADR, governance board decision.
4. Decision Flow And Stage Gates
Business problem
-> AI fit and no-AI option
-> data readiness and risk tier
-> build / buy / partner / hybrid options
-> vendor due diligence or internal platform readiness
-> architecture ADR
-> pilot success criteria
-> security, risk, compliance, procurement gates
-> controlled rollout
-> adoption dashboard
-> ongoing governance and vendor review
| Gate | Decision | Required evidence | Stop signal |
|---|---|---|---|
| Gate 0: Problem | Should we explore AI? | Opportunity canvas, baseline, owner | No measurable pain |
| Gate 1: Feasibility | Is AI plausible? | Data readiness, workflow map, risk tier | No data owner or unacceptable risk |
| Gate 2: Option | Build, buy, partner, or hybrid? | Decision matrix, TCO, lock-in analysis | Preference-only selection |
| Gate 3: Vendor / architecture | Safe enough for pilot? | Due diligence, ADR, control pack | No audit, privacy, security, eval path |
| Gate 4: Pilot | Did pilot prove value and control? | Eval report, feedback, pilot metrics | Poor quality or low trust |
| Gate 5: Production | Can we operate this? | RACI, runbook, incident response | No owner for data, eval, vendor |
| Gate 6: Scale | Should we expand? | Adoption dashboard, ROI, risk review | Usage without quality or value |
| Decision rules: |
- Do not start vendor selection before defining workflow and success metric.
- Do not compare vendors only by model benchmark.
- Do not let procurement lead alone without architecture, data, risk and users.
- Do not let a PoC bypass production controls because it looks useful.
- Do not scale without adoption metrics and rollback criteria.
- Do not accept "vendor handles it" without evidence, contract terms and operational workflow.
5. AI Vendor Due Diligence
Use this table as the first-pass diligence map. Expand high-risk rows into detailed questionnaires.
| Dimension | Key questions | Evidence to request | Red flags |
|---|---|---|---|
| Model / provider | Which foundation models, embeddings and rerankers are used? Can versions be frozen, tested, upgraded and rolled back? Are deterministic calculations separated from probabilistic outputs? Are prompts, completions, embeddings or logs used for training? | Model/provider list, lifecycle policy, release notes, eval report, data usage terms, subprocessor list | Silent model updates, vague model identity, generic accuracy claims, no fallback behavior |
| Data privacy | What data enters prompts, embeddings, logs, telemetry and support tools? Is PII, PHI, PCI, account or transaction data processed? Where is data stored, processed and retained? Are embeddings treated as sensitive data? | Data flow diagram, classification, retention schedule, DPA, subprocessor list, deletion process | "We do not store data" without log details, no embedding governance, broad support access, residency only in sales material |
| Security | Is SSO/SAML/OIDC, SCIM, RBAC and tenant isolation supported? How are secrets and API keys stored? How are prompt injection, excessive agency and data exfiltration tested? | SOC 2 or bridge letter, ISO 27001 certificate, pen test summary, security whitepaper, secure SDLC, incident policy | No SSO, broad DB access, prompt-only security control, no incident notification commitment |
| Governance | Is there AI system inventory and risk classification? Can regulated actions be blocked, routed or dual-approved? Are prompts, tools, retrieval index and models versioned? | AI governance policy, change workflow, HITL config, audit log schema, eval and red-team process | Governance PDF only, no versioning, no case-level audit reconstruction |
| Evaluation | Are evals generic or customer-specific? Can customer gold datasets and rubrics be used? Are offline eval, shadow mode and production sampling supported? Can eval failures block release? | Eval methodology, sample eval report, failure taxonomy, release gate policy, reviewer calibration | Aggregate thumbs-up only, no negative cases, no stale-source tests, no eval export |
| SLA / reliability | What uptime and latency SLO apply to the full product? Are rate limits and regional limits known? Is there fallback model, degraded mode or read-only mode? | SLA terms, status history, DR policy, rate-limit docs, support plan, incident postmortem sample | SLA excludes key dependencies, no degraded mode, support slower than business SLA |
| Cost | Is pricing per user, case, document, token, tool call, workflow or environment? Are model usage, embeddings, vector storage, observability, connectors and support included? | Pricing workbook, usage telemetry, cost allocation export, rate card, renewal terms, services estimate | Cannot model cost per case, token costs excluded, no budget caps, unclear renewal uplift |
| Lock-in | Which parts are proprietary: prompts, workflow, index, embeddings, memory, evals, connectors, policies, agents? Can data, logs, evals and config be exported? | Export docs, termination terms, IP ownership, open API docs, migration plan | No export for audit logs or evals, proprietary workflow, no exit cost definition |
| Customization | Can the product support bank policy, jurisdiction, product, language and role differences? Are policy updates approved, versioned and regression tested? | Configuration model, admin demo, policy workflow, prompt registry, regression results, reference customer | Vendor-only customization, no content owner/effective date, custom logic hidden in prompts |
| Integration | Which systems integrate: CRM, core banking, case management, LOS, payment systems, contact center, IAM, SIEM? Are integrations read-only, write, pre-fill, or automated? | API docs, reference architecture, auth pattern, sandbox, data mapping | Broad DB access, write actions without idempotency, bypassed authorization |
| Audit | Can logs show user, role, input, source, model version, prompt version, retrieval results, output, approval and final action? Can evidence be exported? | Audit log schema, sample evidence package, retention config, export API, admin audit event list | Audit trail is chat transcript only, logs omit retrieval or tools, admin changes not audited |
| Incident response | Are data exposure, prompt injection, unsafe output, outage, cost runaway and tool misuse in scope? Can customer trigger kill switch? | Incident response plan, notification commitment, kill switch docs, postmortem template, severity matrix | Security-only incident policy, no AI quality incident, no customer-side kill switch |
| Procurement | Is vendor approved for data classification and geography? Are DPA, BAA, security addendum or AI addendum needed? Are liability, audit rights, subprocessors and exit terms acceptable? | MSA, DPA, SLA, order form, support terms, security addendum, subprocessor schedule | Contract contradicts sales promises, no audit rights, weak liability cap, subprocessor changes without notice |
| Minimum due diligence questions: |
- Executive: What workflows have you solved in regulated industries? Which production customers are closest? Which parts are product, services and partner work? What assumptions must be true for ROI?
- Product: Which users and approvers are supported? What exception paths are supported? How are citations, uncertainty, refusal, feedback and override reason shown?
- Architecture: Show end-to-end data flow. Which components are customer-controlled? How are retrieval, grounding, tool permissions and audit implemented?
- Data: What is copied, indexed, embedded or cached? Can data be filtered by role, region, product, customer segment and effective date? How are stale sources handled?
- Risk: Which AI risks do you explicitly manage? How do you prevent over-reliance, automation bias, prohibited advice and unsafe action?
- Security: How do you protect secrets and downstream credentials? How do you test prompt injection and excessive agency? How are logs retained?
- Commercial: Give conservative, expected and upside cost scenarios. What costs are excluded? What is the cost to export and exit?
6. Vendor Scorecard
Adjust weights by risk tier. For AML, lending and payments, increase governance, audit, security and eval weights.
| Dimension | Weight | Score 1 | Score 3 | Score 5 |
|---|---|---|---|---|
| Business fit | 10 | Generic demo | Some workflow fit | Strong domain workflow fit |
| Model quality | 10 | Claims only | Generic benchmark | Customer eval with failure analysis |
| Data privacy | 10 | Vague | Basic DPA | Configurable by data class |
| Security | 10 | Weak controls | Some evidence | SSO, RBAC, logs, SOC/ISO evidence |
| AI governance | 10 | Checklist only | Partial controls | Versioning, HITL, release gates |
| EvalOps | 10 | None | Basic tests | Offline, shadow and production eval |
| Integration | 10 | Manual export | Limited API | IAM, SIEM, workflow integration |
| Reliability | 7 | Best effort | Basic SLA | DR, fallback, degraded mode |
| Cost | 8 | Unclear | Pricing available | Cost per case and caps |
| Lock-in / exit | 7 | Black box | Partial export | Clear exit plan |
| Customization | 5 | Vendor-only | Some config | Governed domain config |
| Procurement risk | 3 | Sales promises | Standard terms | Contractable obligations |
| Score guidance: |
- 0-50: Do not pilot except sandbox learning.
- 51-70: Low-risk pilot only, strict controls.
- 71-85: Candidate for controlled pilot.
- 86-100: Production candidate, still requiring architecture and risk gates. Interpretation:
- High product quality with weak audit may still fail AML or lending.
- Strong governance with moderate model quality may be better for regulated operations.
- Low price without exit plan can become expensive at scale.
7. Build Vs Buy Vs Partner Vs Hybrid
| Option | Definition | Best when | Main risk |
|---|---|---|---|
| Buy | Use vendor SaaS or managed platform for most capability | Speed, common workflow, small platform team | Lock-in, hidden cost, limited control |
| Build | Internal team builds core capability | Differentiating workflow, strict constraints, scale economics | Delivery risk, ops burden, security burden |
| Partner | Co-build with vendor, SI or domain specialist | Need domain expertise or delivery capacity | Dependency and knowledge transfer risk |
| Hybrid | Build control-heavy or differentiating parts, buy commodity parts | Regulated workflow, speed plus control | Boundary complexity |
| Decision heuristics: |
- Buy when workflow is common, vendor has audited controls, time-to-value is critical, differentiation is adoption/workflow rather than infrastructure.
- Build when workflow is core competitive advantage, SaaS cannot meet data/security constraints, controls cannot be implemented by vendor, scale changes economics, internal team can operate production AI.
- Partner when regulated domain design or delivery capacity is missing, but internal owners must learn and take over.
- Hybrid when vendor UI/model layer is strong but governance, policy, audit, eval, data or workflow boundary must remain internally controlled. Component matrix: | Capability | Buy | Build | Partner | Practical recommendation | |---|---|---|---|---| | RAG knowledge assistant | Common KB and service workflows | Sensitive corpus, complex entitlement | Taxonomy cleanup | Buy retrieval, build metadata, eval and source governance | | Agent workflow | Bounded internal automation | High-risk actions, proprietary orchestration | Workflow redesign | Buy runtime, build tool policy and approval layer | | Model gateway | Cloud/platform routing and telemetry | Multi-cloud or custom policy | Setup help | Hybrid gateway with internal policy abstraction | | Eval platform | Test management and dashboards | Domain labels and release gates | Initial eval library | Buy tooling, build gold data and acceptance policy | | Vector DB / search | Managed service | Strict data control or cost | Indexing architecture | Managed infra with internal access and retention controls | | Document processing | Standard forms and OCR | Proprietary docs | Annotation and tuning | Buy extraction, build validation workflow | | Workflow automation | Existing case/workflow platform fits | Deep core ops integration | BPMN/process redesign | Use existing workflow platform, integrate AI services | | Monitoring | Platform telemetry | Enterprise SIEM and risk metrics | Runbook setup | Export vendor telemetry to internal dashboards | | Governance tooling | Enterprise inventory/GRC fit | Focused lightweight registry | Policy library | Buy inventory, build use-case evidence | | Prompt/policy registry | Integrated release workflow | Strategic prompt/policy control | Standards setup | Build internal registry, sync to vendor config | | Tool gateway | Managed connectors | High-risk permissions | Integration adapters | Build policy wrapper around vendor tools | | Human review workbench | Vendor reviewer UX fits | Domain review UX is advantage | Frontline design | Buy basic review, build domain QA queue | Default financial retail patterns: | Use case | Starting option | Reason | |---|---|---| | AML investigation copilot | Hybrid | Buy extraction/RAG, build audit, HITL and SAR boundary | | KYC document automation | Buy + partner | OCR is commodity, policy and remediation workflow need domain setup | | Customer service copilot | Buy | Common pattern, value depends on knowledge governance and adoption | | Payments operations agent | Hybrid | Payment actions require idempotency, approval, audit and rail controls | | Lending policy assistant | Hybrid | Policy RAG can be bought, credit support needs governance and eval |
8. Required Artifacts
8.1 Vendor Scorecard
Fields: vendor name, use case, risk tier, business fit score, security score, privacy score, eval score, governance score, integration score, cost score, lock-in score, recommendation, mitigations, go/no-go decision.
8.2 Due Diligence Pack
Fields: question, owner, evidence requested, evidence received, evidence quality, open gap, blocker status, due date.
8.3 Architecture ADR
Use docs/abpa/templates/08-ai-architecture-adr-set.md.
Minimum ADRs:
- AI pattern selection.
- Model and provider strategy.
- RAG and knowledge architecture.
- Human-in-the-loop design.
- Eval and observability architecture.
- Integration, security and audit.
- Vendor strategy and exit trigger. ADR must answer:
- What decision are we making.
- What options were considered.
- What evidence supports the decision.
- What risk remains.
- What controls reduce risk.
- What would make the decision wrong.
- What is the reversal trigger.
8.4 Risk Acceptance Memo
Fields:
| Field | Required content |
|---|---|
| Use case | Business workflow and users |
| Decision | Proceed, pause, reduce scope, or reject |
| Risk owner | Person accepting residual risk |
| Business owner | Person accountable for value |
| Architecture owner | Person accountable for technical controls |
| Risk statement | What could go wrong |
| Impact | Customer, financial, operational, legal, compliance, reputational |
| Controls | Preventive, detective, corrective |
| Evidence | Eval result, security review, vendor evidence, pilot result |
| Residual risk | What remains after controls |
| Expiry | Date or trigger for re-review |
| Stop rule | Condition requiring rollback or suspension |
| Rules: no unnamed owner, no vague risk, no permanent pilot acceptance, no bypass of legal/regulatory review. |
8.5 Pilot Success Criteria
Define workflow included/excluded, user group, case type, data sources, AI actions allowed/prohibited, human approval points, quality thresholds, risk thresholds, cost thresholds, adoption thresholds, stop rules, decision date.
| Category | Example metrics |
|---|---|
| Quality | Citation precision, unsupported claim rate, policy violation rate |
| Workflow | Cycle time, touch time, rework, backlog |
| Trust | Acceptance rate, edit rate, override reason, confidence score |
| Risk | Escalation rate, incident count, audit defect |
| Cost | Cost per assisted case, model spend, support cost |
| Adoption | Activated users, repeat users, eligible cases touched |
| Stop rules: unsupported claim exceeds threshold, sensitive data leaks, users copy without evidence review, SLA worsens, vendor outage blocks workflow, cost per case exceeds threshold, risk/legal/security raises blocker. |
8.6 Rollout Plan
| Phase | Scope | Goal |
|---|---|---|
| Phase 0 | Sandbox with redacted data | Learn product and validate assumptions |
| Phase 1 | Shadow mode | Compare AI output with human workflow |
| Phase 2 | Assisted pilot | Users see AI output, no external action |
| Phase 3 | Controlled production | Limited team and case type, full monitoring |
| Phase 4 | Scale | More teams and cases with standardized controls |
| Rollout fields: phase, users, case types, features enabled, training, controls, support model, monitoring cadence, success metric, expansion criteria, rollback criteria. |
8.7 Adoption Dashboard
Use docs/abpa/templates/10-adoption-dashboard.md.
Required views: funnel by team, usage by workflow step, acceptance/edit/rejection trend, override reasons, quality defects, escalations, time saved, cost per case, incident/rollback events, training completion.
9. Enterprise Adoption And Change Management
Adoption is not login count. Adoption means users safely change how work is done, managers reinforce the workflow, quality improves, and controls remain observable.
9.1 Stakeholder Segmentation
| Segment | Example | Need | Concern | Engagement |
|---|---|---|---|---|
| Executive sponsor | COO, CDO, business head | Value, risk, funding | Pilot theater | Monthly decision memo |
| Process owner | AML ops lead, contact center lead | Workflow performance | SLA disruption | BPMN and metric review |
| Frontline users | Analysts, agents, underwriters | Useful output | Trust, job impact | Shadowing, training, feedback |
| Managers | Team leads, QA leads | Coaching and monitoring | Hard to assess usage quality | Adoption dashboard |
| Risk / compliance | Compliance, MRM, legal | Controls and evidence | Unsafe automation | Control pack and release gate |
| Security / privacy | CISO, DPO, IAM | Data and access control | Leakage, vendor risk | Security review |
| Data owners | CRM, policy KB, transaction data | Stewardship | Bad data blamed on source team | Data readiness |
| IT / engineering | Platform, integration, support | Operability | Fragile integration | Runbook |
| Procurement | Sourcing, vendor management | Contract and supplier risk | Hidden obligations | Due diligence pack |
9.2 Champion Network
Champion design: pick respected frontline users, include skeptics, give feedback influence, train on boundaries, collect failure examples, co-create SOP and office hours. Champion responsibilities: validate workflow fit, test hard cases, explain tool boundaries, report trust/adoption blockers, review training material, tune feedback taxonomy.
9.3 Training Model
| Audience | Training focus |
|---|---|
| Frontline users | When to use, verify evidence, reject, escalate |
| Reviewers / approvers | Evaluate output, override, record decision, sample defects |
| Managers | Read adoption dashboard and coach behavior |
| Product / BA | Turn feedback into backlog and eval cases |
| Risk / compliance | Controls, audit logs, HITL, incident process |
| Support / IT | Triage outage, access, latency, integration failures |
| Training artifacts: role SOP, allowed/prohibited use cases, evidence checklist, escalation tree, failure examples, feedback taxonomy, data handling reminder, office hours. |
9.4 Trust-Building
Trust mechanisms: show source citations and effective dates, prefer "not enough evidence" over overconfident answers, make uncertainty visible, publish quality trend and limitations, separate suggested text from approved final action, require evidence confirmation for sensitive cases. Bad patterns: hidden evidence, speed-only incentives, manager pressure to accept output, usage-only success, failure reports without response.
9.5 Human-In-The-Loop Process Change
Define for each AI output: human role, decision authority, evidence required, override reason taxonomy, escalation owner, SLA impact, audit field, stop condition.
| Level | Meaning | Financial retail example |
|---|---|---|
| L0 | AI disabled | Legal hold or disputed incident |
| L1 | Evidence retrieval only | AML evidence packet |
| L2 | Draft, human edits | Service response draft |
| L3 | Recommend, human approves | Fraud next action |
| L4 | Pre-fill action, human confirms | Payment retry instruction |
| L5 | Execute bounded low-risk action | Low-risk knowledge routing |
| Default: regulated, financial, adverse or customer-impacting actions start at L1-L3. L4 requires audit, idempotency, rollback and approval. L5 should be rare in financial retail unless low-risk and reversible. |
9.6 Adoption Metrics
Minimum metrics: eligible users, activated users, repeat users, habit users, eligible cases touched, suggestions generated, accepted, edited, rejected, override reasons, user confidence, unsupported claim rate, escalation rate, cycle time, rework rate, cost per assisted case. Interpretation:
- High usage + high override = tool is in workflow but not trusted.
- Low usage + high quality = workflow trigger, training or manager incentive problem.
- High acceptance + high defect = automation bias risk.
- High edit rate = output useful but format or policy fit weak.
- High escalation = scope too broad or data insufficient.
9.7 Rollback And Escalation
| Level | Action | Trigger |
|---|---|---|
| 1 | Disable one feature or prompt | Quality regression in one output |
| 2 | Disable one tool/action | Tool misuse or integration issue |
| 3 | Switch fallback model/provider | Model outage or regression |
| 4 | Read-only mode | Workflow action risk |
| 5 | Full suspension | Data exposure, customer harm, regulatory issue |
| Escalation paths: frontline issue -> support -> product owner. Quality issue -> EvalOps -> release gate. Security issue -> security incident. Privacy issue -> privacy/legal. Compliance issue -> risk/compliance. Vendor outage -> vendor management/engineering. Customer harm -> executive sponsor and incident commander. |
10. Financial Retail Examples
| Scenario | Business problem | AI pattern | Diligence focus | Build/buy recommendation | Pilot success criteria | Interview point |
|---|---|---|---|---|---|---|
| AML copilot vendor | Investigators spend too much time gathering evidence and drafting narratives. False positives and backlog reduce time for judgment | Case investigation copilot. RAG over KYC, transactions, case notes, typologies and SOP. Human owns suspicious activity decision | Evidence retrieval, citation quality, prompt injection from adverse media, no autonomous SAR/STR decision, immutable audit log, RBAC, supervisor review | Hybrid. Buy extraction/RAG, build SAR boundary, case approval, audit and policy enforcement | Draft time reduced, QA defect not worse, citation precision above threshold, no unsupported red-flag claims, supervisor acceptance above threshold | AML AI is investigation evidence support, not a decision engine |
| KYC document automation | Remediation and onboarding are slowed by document intake, OCR, field extraction and policy validation | Document AI + workflow automation + policy checklist. Human approval for high-risk customers, UBO changes, source-of-funds exceptions | OCR accuracy by document type, PII retention, data residency, document fraud controls, policy version, reviewer override, lineage to customer master | Buy + partner. Buy standard extraction, partner/build domain validation and policy mapping | Extraction threshold met, false accept below threshold, cycle time improves, override reasons captured, source lineage recorded | KYC value is controlled loop from data gap to outreach to verified source-of-truth update |
| Customer service copilot | Agents search multiple systems, answer quality varies, after-call work is high | RAG assistant + response draft + call/chat summary. Agent confirms and sends | Knowledge governance, effective date, jurisdiction, authentication boundary, advice guardrails, contact center integration, QA feedback | Buy. Focus internal effort on knowledge governance, eval and training | AHT or after-call work improves, QA defect stable/improved, agents use citations, knowledge gaps fixed, escalation improves | Service copilot succeeds when knowledge governance and behavior change are designed together |
| Payments operations agent | Exceptions, returns, retries, reconciliation breaks and customer inquiries create backlog | Bounded operations agent. Evidence gathering, classification, next action recommendation, action pre-fill, human approval | Tool permission, idempotency, rail rules, ledger integrity, segregation of duties, tool-call audit, kill switch | Hybrid. Buy workflow assistant/model gateway, build payment action policy, approval, idempotency, audit | Break resolution time improves, no unauthorized action, reviewer agreement high, SLA breaches decrease, cost per resolved exception improves | Payments AI must separate understanding from execution |
| Lending policy assistant | Underwriters need consistent policy interpretation, document summaries and exception analysis | Policy RAG + document summarization + decision support. Credit decision remains human-owned | High-risk AI lens, policy citation, deterministic calculations separate from LLM, fair lending, reason-code support, decision audit | Hybrid. Buy policy retrieval/document summary, build governance, reason-code controls, decision record and fairness monitoring | Citation correctness above threshold, memo time improves, no prohibited language, overrides captured, segment quality acceptable | Do not let an LLM own the credit decision |
11. 30 / 60 / 90-Day Execution Roadmap
First 30 Days: Frame And Shortlist
Objectives: define business problem and baseline, identify stakeholders, map workflow, define AI fit/no-AI option, shortlist build/buy/partner/hybrid options. Weeks:
- Week 1: Choose scenario, write opportunity canvas, identify sponsor, process owner, data owner, risk owner, tech owner.
- Week 2: Build stakeholder map, interview frontline users, draft AS-IS BPMN with exception flow.
- Week 3: Build data readiness pack, classify data sensitivity, identify eval samples and AI patterns.
- Week 4: Draft option matrix, create vendor shortlist, send RFI, draft executive memo v0.1. Outputs: opportunity canvas, stakeholder map, BPMN pain metrics, data readiness pack, option matrix, vendor shortlist.
First 60 Days: Evaluate And Decide
Objectives: run vendor demos with scripted eval cases, estimate TCO, create architecture ADR, define controls and pilot scope, make go/no-go recommendation. Weeks:
- Week 5: Convert requirements to eval matrix, run same demo cases across vendors.
- Week 6: Run security, privacy, architecture and procurement review, build cost scenarios.
- Week 7: Write ADRs, draft AI control pack, define pilot metrics and stop rules.
- Week 8: Finalize scorecard, compare options, present executive decision memo. Outputs: requirements-to-eval matrix, vendor scorecard, TCO model, architecture ADR set, AI control pack, pilot success criteria, executive decision memo.
First 90 Days: Pilot And Adoption Setup
Objectives: execute controlled pilot, validate quality, risk, value and adoption, prepare operating model, decide scale, narrow, switch or stop. Weeks:
- Week 9: Set up pilot, configure data access, roles, logs, train pilot users and champions.
- Week 10: Move from shadow to assisted pilot if thresholds pass, collect feedback and override reasons.
- Week 11: Build adoption dashboard, review metrics, run incident and rollback drill.
- Week 12: Write pilot report, update business case, finalize RACI, decide next step. Outputs: pilot report, adoption dashboard, updated business case, operating model RACI, runbook, rollout plan, production readiness recommendation.
12. Interview Talking Points
30-second answer: "我做 AI vendor 和 build-vs-buy 时不会先问哪个模型最强, 而是先定义业务流程, 数据敏感度, 风险等级, eval 标准和 adoption 路径. 然后用 scorecard 比较 buy, build, partner 和 hybrid, 特别看 privacy, security, audit, eval, SLA, cost, lock-in 和 HITL. 最后用 pilot success criteria 和 adoption dashboard 证明这个工具是否真的改变工作方式." 2-minute structure:
- Business problem: 场景, baseline, users, pain metric.
- Workflow and risk: AS-IS / TO-BE, human approval, regulated boundary.
- Options: build, buy, partner, hybrid.
- Vendor diligence: model, data, security, governance, eval, SLA, cost, lock-in, integration.
- Architecture: RAG, agent, model gateway, tool permission, audit.
- Pilot: eval, stop rule, adoption metric, rollback.
- Result: 用 dashboard 和 business case 决定 scale or stop. Role angles:
- Senior architect: "我会把最容易锁定且风险最高的部分内部可控化, 比如 identity, policy, audit, eval, data governance and approval workflow. 对于 OCR, generic RAG, managed model gateway 或 observability 这类成熟能力, 可以 buy 或 partner 加速."
- AI PM: "供应商好不好, 不是 demo 好不好看, 而是用户是否在真实流程中重复使用, 是否减少 rework, 是否提升质量, 是否能解释和回滚失败."
- AI BA: "我会先把 stakeholder objections 和 workflow exceptions 画出来, 因为很多 AI 采购失败不是模型失败, 是没有处理真实审批, 异常, data ownership 和 audit evidence."
13. Common Failure Modes
Vendor selection failures:
- Model benchmark 替代业务 eval.
- Sales demo 替代 workflow pilot.
- Security questionnaire 替代 architecture review.
- Contract language 与实际 data flow 不一致.
- Scorecard 不包含 lock-in 和 exit.
- Pilot 用 toy data, 无法证明真实效果. Build-vs-buy failures:
- 团队因为 "我们能做" 而选择 build.
- 团队因为 "vendor 有功能" 而选择 buy.
- 没有计算 internal operating cost.
- 没有区分 commodity capability 和 differentiating capability.
- 忽略 model updates, prompt changes, eval maintenance and support.
- Hybrid 边界不清, 最后两边都要维护. Adoption failures:
- 只培训一次.
- 只追 logins.
- 用户不知道什么时候不能用.
- Manager KPI 与新流程冲突.
- Champion network 只选积极用户.
- Feedback 没有闭环.
- 质量问题被包装成 user resistance.
- 没有 rollback, 导致一线绕开工具. Governance failures:
- AI control pack 没有 owner.
- HITL 没有实际审批状态.
- Audit trail 不能重建 case.
- Prompt changes 不走 release gate.
- Vendor model update 没有回归测试.
- Incident response 不包含 AI quality incident.
- Risk acceptance 没有期限.
14. Self-Review Checklist
Before recommending vendor or build path, answer:
- Do we have a measurable business baseline.
- Do we have a no-AI or rules-only alternative.
- Do we know which data enters prompts, embeddings and logs.
- Do we know whether outputs affect regulated or customer-impacting decisions.
- Do we have a risk tier and control pack.
- Do we have a customer-specific eval set.
- Do we have vendor evidence for security, privacy, audit and incident response.
- Do we have a cost per case model.
- Do we have an exit plan.
- Do we have an ADR for architecture decisions.
- Do we have HITL workflow states, not just policy wording.
- Do we have pilot success criteria and stop rules.
- Do we have adoption metrics beyond usage.
- Do we have rollback and escalation paths.
- Do we have named owners for product, data, model, eval, vendor, incident and adoption. If any answer is no: record the gap, assign an owner, decide whether it blocks pilot/production/scale, add it to executive decision memo.
15. Portfolio Use
To turn this playbook into evidence:
- Pick one scenario from section 10.
- Fill
01-ai-opportunity-canvas.md. - Fill
03-bpmn-pain-metrics.md. - Fill
04-requirements-to-eval-matrix.mdwith at least 10 eval cases. - Fill
05-ai-control-pack.md. - Fill
08-ai-architecture-adr-set.md. - Fill
10-adoption-dashboard.md. - Fill
11-business-case.md. - Add all artifacts to
12-portfolio-evidence-map.md. Best flagship combinations: - AML copilot: governance, audit, human oversight.
- KYC document automation: data readiness and document workflow.
- Customer service copilot: AI PM adoption and knowledge governance.
- Payments operations agent: architecture and tool-action controls.
- Lending policy assistant: high-risk decision support.
16. Quick Review Prompts
Use these prompts before presenting a recommendation:
- What business metric changes if this AI solution works.
- What work should remain human-owned.
- What data should never enter prompts or logs.
- What is the first workflow step safe enough for pilot.
- What vendor promise must be contractually binding.
- What evidence would make us stop the pilot.
- What part should be built internally to preserve control or exit option.
- What cost driver could surprise us at scale.
- What adoption signal proves behavior change, not curiosity.
- What audit question must we be able to answer one year later.
- Who owns model changes, prompt changes, eval refresh and incident response.
- Which ABPA artifact proves each decision.
17. Final Operating Principle
Enterprise AI adoption is a chain of evidence:
Business pain
-> workflow evidence
-> data readiness
-> AI fit
-> build/buy decision
-> vendor evidence
-> architecture controls
-> eval gates
-> pilot proof
-> adoption metrics
-> operating model
-> ongoing governance
If one link is missing, the AI project may still demo well, but it is not ready for enterprise scale. The strongest AI BA / PM / architect signal is the ability to explain every link in that chain, with artifacts and trade-offs, in a way that business, risk, technology and frontline teams can all act on.