AI Procurement Intake:供应商评估沙盒与 Build-Buy 架构
AI 采购入口不是行政流程, 而是企业 AI 架构的第一道控制点。它决定一个 AI idea 会不会被错误地推入 vendor demo、PoC、采购谈判或生产集成。成熟机构不会先问"哪家供应商最好", 而是先问:
AI 采购入口 / 供应商评估沙盒 / Build-Buy 决策架构解读 (AI Procurement Intake / Vendor Evaluation Sandbox / Build-Buy Decision Architecture)
Date: 2026-06-30
Status: evergreen
Audience: experienced CBAP / financial retail PM / solution architect / enterprise architect moving into AI product and AI architecture
Output: 一份可放入作品集的 AI procurement intake, vendor sandbox, benchmark, build-buy-partner decision 和 production promotion gate 架构笔记
Why Procurement Intake Is An Architecture Control Point
AI 采购入口不是行政流程, 而是企业 AI 架构的第一道控制点。它决定一个 AI idea 会不会被错误地推入 vendor demo、PoC、采购谈判或生产集成。成熟机构不会先问"哪家供应商最好", 而是先问:
| 控制问题 | 架构含义 | 金融零售后果 |
|---|---|---|
| 这个用例是否值得进入 AI funnel | 先验证 outcome, workflow, data, risk tier, no-AI option | 避免把普通流程问题包装成 GenAI 项目 |
| AI 在流程中扮演什么角色 | read, summarize, recommend, draft, decide, act 分层 | 避免客服、信贷、AML、支付场景中越权决策 |
| 应该 build, buy, partner, hybrid 还是 stop | 将能力差异化、控制权、时间、成本、风险放进同一决策 | 避免因 demo 好看而买入不适合的黑盒 |
| sandbox 评估什么 | 用真实但受控的数据、任务、rubric 和门禁测试供应商 | 避免只看销售演示和通用 benchmark |
| 证据如何进入生产放行 | 评估结果必须连接 ADR, risk acceptance, release gate | 避免 PoC 成功后绕过安全、隐私、模型风险和运营控制 |
上游 procurement intake 的核心价值:
- 把 AI idea 变成可测量的业务和风险假设。
- 把 vendor comparison 变成 architecture comparison。
- 把 PoC 变成受控 sandbox, 不让试点自然漂移成 shadow production。
- 把 build-buy 选择从偏好讨论升级为 evidence-based decision。
- 把后续合同、退出、投资叙事和生产上线建立在证据之上。
边界说明: 本文聚焦 procurement lifecycle 上游的 intake, triage, sandbox, benchmark 和 build-buy decision。合同条款、退出迁移和董事会投资 narrative 属于下游材料, 这里只定义它们需要的输入证据。
Concept Diagram
flowchart LR
A[AI idea / vendor pitch / business pain] --> B[Intake funnel]
B --> C{Use-case triage}
C -->|No AI fit| C1[Process / rules / data fix]
C -->|Low value or high risk| C2[Stop or defer]
C -->|Candidate| D[Decision architecture]
D --> D1[Build]
D --> D2[Buy]
D --> D3[Partner]
D --> D4[Hybrid]
D --> E[Sandbox charter]
E --> F[Vendor and internal option benchmark]
F --> G[Evidence pack]
G --> H{Architecture review board}
H -->|No-go| H1[Reject / redesign]
H -->|Limited pilot| H2[Controlled pilot with constraints]
H -->|Production candidate| I[Production promotion gate]
I --> J[Contract, security, privacy, risk, model validation, operating model]
一条实用原则:
Intake owns the question "should this enter the AI option space"; sandbox owns "which option works under controlled evidence"; architecture gate owns "can this be operated safely at production scale".
Intake-To-Sandbox-To-Decision Architecture
1. Intake Funnel
AI intake 要求每个 idea 先提交最小证据, 而不是直接预约 vendor demo。
| Intake field | 高级要求 | 不合格信号 |
|---|---|---|
| Business outcome | 明确 baseline, target movement, impacted workflow, owner | "提升效率", "智能化", "更懂客户" |
| AI role | read / retrieve / summarize / draft / recommend / decide / act | 直接说"AI 自动处理" |
| Customer or regulatory impact | 是否影响客户承诺、授信、KYC、AML、支付、投诉、收费、适当性 | 只说内部工具所以低风险 |
| Data boundary | 数据源、PII、PCI、账户、交易、文档、语音、日志、跨境、保留 | 不知道会把什么发给供应商 |
| Workflow insertion point | AS-IS / TO-BE 节点, human review, exception path, fallback | 没有流程图, 只有功能清单 |
| No-AI alternative | 流程优化、规则引擎、搜索、RPA、知识治理、报表 | 默认 AI 是唯一方案 |
| Evidence plan | sandbox 数据、rubric、benchmark、control evidence | 只打算看供应商 demo |
2. Triage Gate
把用例分成四类:
| Tier | 典型场景 | 推荐动作 |
|---|---|---|
| T0: Reject / defer | 没有 owner、没有 baseline、数据不可用、风险不可接受 | stop, 先补流程或数据 |
| T1: Learn-only sandbox | 价值假设早期, 数据可脱敏, 不连接生产 | controlled demo and architecture learning |
| T2: Controlled pilot candidate | 明确 workflow, 有评估集, 人类保留最终权责 | sandbox -> limited pilot |
| T3: High-impact candidate | 信贷、AML、KYC、支付、客户承诺、投诉或监管证据 | sandbox 加强, independent challenge, production gate |
3. Sandbox Charter
Sandbox charter 必须在供应商测试前冻结:
| Section | 内容 |
|---|---|
| Scope | 具体流程、用户角色、允许任务、禁止任务 |
| Data | synthetic, masked, historical, gold set, red-team set, access policy |
| Architecture | vendor route, internal baseline, model gateway, logging, retrieval, tool boundary |
| Evaluation | benchmark task, rubric, thresholds, critical failures, slice analysis |
| Controls | privacy, security, HITL, DLP, prompt injection, cost cap, kill switch |
| Evidence | trace, logs, output samples, evaluator notes, cost, latency, defect taxonomy |
| Decision | build / buy / partner / hybrid / stop 的判定规则 |
4. Decision Board
Decision board 不应只由 procurement 或 product 决定。最小构成:
| Role | 负责挑战的问题 |
|---|---|
| Business owner | 业务价值是否真实, 是否愿意承担 adoption 和 residual risk |
| AI PM / Product owner | 用户场景、MVP、体验、采用、收益假设是否清晰 |
| CBAP / BA | 流程、规则、需求、例外、验收和证据是否完整 |
| Solution architect | 集成、数据流、RAG、agent、日志、可观测、降级是否可行 |
| Enterprise architect | 平台复用、能力地图、供应商集中度、目标架构适配 |
| Security / privacy | 数据、身份、权限、日志、DLP、威胁模型是否达标 |
| Risk / compliance / model risk | 风险等级、监管影响、模型验证、人工监督是否充分 |
| Procurement / TPRM | 供应商风险、商业条款、后续合同尽调是否可进入下一阶段 |
Build / Buy / Partner Decision Model
Build-buy-partner 不是三选一口号, 而是一组架构边界决策。
Decision Axes
| Axis | Build 倾向 | Buy 倾向 | Partner 倾向 | Hybrid 倾向 |
|---|---|---|---|---|
| Differentiation | 流程或数据是竞争优势 | 能力通用, 市场成熟 | 需要行业经验转移 | 控制层差异化, 能力层通用 |
| Control need | 数据、模型、策略、审计、工具权限必须内部控制 | 供应商可提供充分控制证据 | 机构缺少短期能力 | 内部保留 policy, eval, gateway, audit |
| Time to value | 可以等待内部能力建设 | 需要快速验证和上线 | 需要加速交付但保留学习 | 先买后抽象, 或买组件建控制面 |
| Scale economics | 用量大, 单位成本可被内部平台摊薄 | 用量不确定或中小规模 | 早期探索 | 高风险部分内部化, 普通能力外部化 |
| Talent readiness | 有 AI platform, data, eval, security, SRE 能力 | 内部团队不足 | 需要 co-build and knowledge transfer | 内部团队能运营控制层 |
| Regulatory evidence | 内部可生成更强证据 | 供应商证据成熟且可导出 | 需要顾问补齐控制设计 | 机构证据层统一, vendor 提供组件证据 |
| Exit optionality | 内部架构可替换 | vendor lock-in 可接受 | 依赖转移需要计划 | 抽象接口降低退出成本 |
Component-Level Decision
不要为整个用例做一个笼统决定。把 AI system 拆到组件层:
| Component | 常见选择 | 判断逻辑 |
|---|---|---|
| Base model | buy or use managed model | 基础模型通常不是金融零售机构差异化来源 |
| Model gateway | build or platform buy | 需要统一 routing, logging, policy, cost, versioning |
| RAG ingestion | hybrid | 文档处理可买, source registry 和 entitlement 应内部控制 |
| Vector store / search | buy managed or internal platform | 取决于数据分类、延迟、成本和地域要求 |
| Prompt / policy registry | build lightweight | 政策、prompt、release evidence 应机构可审计 |
| Eval harness | hybrid | 工具可买, golden set、rubric、门禁阈值必须内部拥有 |
| Agent tool gateway | build | 高风险动作、权限、审批、幂等和审计不宜交给黑盒 |
| Human review workbench | buy, build, or existing workflow | 取决于是否嵌入 AML/KYC/信贷/客服 case system |
| Observability | hybrid | vendor trace 要进入内部 SIEM, audit, cost 和 quality dashboard |
Practical Decision Rule
| Condition | Recommendation |
|---|---|
| 供应商产品强, 但不能导出 trace/eval/log | sandbox 可以学, 不进入生产候选 |
| 供应商质量好, 但工具动作权限不可控 | buy UI/model layer only, build tool gateway |
| 内部模型质量一般, 但证据和控制强 | 可做 high-risk pilot, 因为金融场景安全证据比平均分重要 |
| 多供应商效果接近 | 选择架构适配、证据导出、成本可预测和退出约束更好的方案 |
| 用例不是差异化, 但需要快速 adoption | buy with strong sandbox and production gate |
| 用例是核心风控或客户承诺 | hybrid by default, 内部控制 decision boundary and evidence |
Vendor Sandbox And Benchmark Design
Sandbox Design Principles
- 同一任务, 同一数据, 同一 rubric, 同一 cost and latency measurement。
- 至少比较 vendor option, internal baseline, no-AI baseline。
- 测试 positive cases, negative cases, edge cases, abuse cases, stale-source cases, restricted-data cases。
- 输出必须可追踪到 prompt, model, source, tool call, reviewer, decision。
- sandbox 只能使用批准数据, 不能连接生产写动作。
- benchmark 结论必须包含 architecture fit, not just accuracy。
Benchmark Plan
| Dimension | Measurement | Release implication |
|---|---|---|
| Task quality | groundedness, completeness, policy compliance, extraction accuracy, narrative quality | 决定是否满足 workflow outcome |
| Critical failure | hallucinated commitment, missed red flag, unauthorized advice, PII leakage, wrong adverse action | high-risk 用例通常要求为 0 |
| Retrieval quality | source recall, citation correctness, freshness, entitlement respect | 决定 RAG 是否可用于受控生产 |
| Tool safety | allowed tool choice, argument correctness, approval compliance, idempotency | 决定 agent 是否可启用工具 |
| Human oversight | reviewer agreement, override rate, review time, escalation quality | 决定 HITL 是否真实有效 |
| Cost | cost per case, token variance, document cost, eval cost, monitoring cost | 决定 TCO 和 scale feasibility |
| Latency | p50, p95, timeout, retry, end-to-end workflow time | 决定客户体验和运营队列可用性 |
| Security / privacy | prompt injection result, DLP pass, data retention proof, access control | 决定是否进入 pilot |
| Architecture fit | API, IAM, audit export, SIEM, model gateway, data residency, change control | 决定 build-buy-partner 边界 |
| Evidence maturity | eval export, trace completeness, admin audit, versioning, incident evidence | 决定是否满足金融审计和模型风险 |
Sandbox Evidence Pack
每个 vendor 或 internal option 都要产出:
| Evidence object | 内容 |
|---|---|
| Option card | vendor/internal option, deployment model, model/provider, data route, components |
| Data map | 输入字段、文档、日志、embedding、retention、masking、region |
| Benchmark report | dataset, tasks, rubric, score, confidence, slice failures |
| Failure taxonomy | critical, high, medium, low failure examples and root cause |
| Trace sample | prompt version, retrieval results, model version, output, tool calls, reviewer action |
| Cost and latency sheet | unit economics, p50/p95 latency, rate limit, stress result |
| Architecture fit review | integration, IAM, observability, RAG, agent boundary, platform fit |
| Risk review | privacy, security, third-party, model risk, compliance, operational resilience |
| Recommendation | build / buy / partner / hybrid / stop, with conditions and reversal triggers |
Financial Retail Scenarios
1. GenAI Contact Center Copilot
| Intake decision | Sandbox focus | Architecture decision |
|---|---|---|
| AI drafts agent guidance, not customer commitments | policy answer, citation, escalation, vulnerable customer handling | buy copilot UI if strong; build knowledge governance, eval, telemetry export |
Hard failures:
- AI promises fee reversal outside policy。
- AI misses complaint language or vulnerable customer marker。
- AI cites stale product terms。
- AI outputs account data to unauthorized role。
2. KYC Document Intelligence
| Intake decision | Sandbox focus | Architecture decision |
|---|---|---|
| AI extracts and reconciles document facts; final KYC disposition stays human/system controlled | field extraction, document fraud signals, missing document checklist, data lineage | buy OCR/extraction, partner for policy tuning, build case workflow and evidence layer |
Hard failures:
- Wrong identity attribute without confidence flag。
- Document retention exceeds approved period。
- Evidence cannot be exported for audit or regulator inquiry。
- Model cannot handle jurisdiction-specific document rules。
3. AML Investigation Workbench
| Intake decision | Sandbox focus | Architecture decision |
|---|---|---|
| AI summarizes evidence and drafts narrative; no final SAR/no-SAR decision | red flag coverage, source-grounded narrative, analyst override, missed-risk rate | hybrid; internal control over data, RAG, audit, case action boundary |
Hard failures:
- AI omits material suspicious activity。
- AI invents transaction rationale。
- AI suggests final SAR decision as authoritative。
- Case trace cannot reconstruct evidence used。
4. Credit Decision Support
| Intake decision | Sandbox focus | Architecture decision |
|---|---|---|
| AI supports memo drafting and policy retrieval; credit decision remains governed by approved decisioning process | policy retrieval, adverse-action boundary, fair lending slice, explanation evidence | hybrid; build decision boundary and model risk evidence, buy retrieval or document summarization components only |
Hard failures:
- AI uses protected-class proxy or unsupported inference。
- AI drafts adverse action reason not supported by system of record。
- Human reviewers over-rely without challenge。
- Vendor cannot provide versioned evidence for model/prompt changes。
5. Payments Fraud Intervention
| Intake decision | Sandbox focus | Architecture decision |
|---|---|---|
| AI recommends intervention scripts and case prioritization; payment block/release requires deterministic policy and approval | false-positive customer harm, scam typology coverage, latency, tool permissions | hybrid; build tool gateway, approval and audit; consider buy for scam narrative intelligence |
Hard failures:
- AI triggers payment action without authorization。
- AI misses urgent scam indicators。
- Latency breaks real-time intervention window。
- Tool call cannot be replayed or reversed。
Metrics / Control / Evidence Model
Use a three-layer evidence model: metric proves behavior, control constrains risk, evidence proves the control operated.
| Layer | Examples | Owner |
|---|---|---|
| Outcome metrics | handle time, document review cycle time, AML case completeness, fraud intervention conversion, complaint escalation accuracy | business owner and PM |
| Quality metrics | groundedness, extraction accuracy, source coverage, critical failure rate, reviewer agreement, override reason | EvalOps and domain SMEs |
| Risk metrics | PII leakage, unauthorized tool call, policy violation, under-escalation, biased slice regression, stale-source answer | risk, compliance, model risk |
| Operational metrics | p50/p95 latency, timeout, fallback, cost per case, rate-limit hit, support ticket volume | platform and operations |
| Adoption metrics | active users, task completion, accepted suggestions, edit distance, trust survey, manual fallback rate | PM and operations |
Control mapping:
| Risk | Preventive control | Detective control | Corrective control | Evidence |
|---|---|---|---|---|
| Vendor selected before problem clarity | intake completeness gate | funnel review log | reject/defer decision | intake card, decision minutes |
| Demo bias | common benchmark plan | score normalization | re-run benchmark | benchmark report |
| Sensitive data leakage | masking, DLP, approved sandbox data | payload sampling, DLP alert | delete, notify, retrain reviewers | data map, DLP test |
| Prompt injection | red-team cases, tool isolation | injection failure monitor | disable route, update guardrail | red-team report |
| Cost runaway | budget cap, token limits | cost dashboard | throttle, switch model, revise scope | cost sheet |
| Over-reliance | UI uncertainty, mandatory review for high risk | override and review audit | retraining, stricter HITL | reviewer logs |
| Architecture lock-in | gateway, export requirement, component boundaries | dependency review | redesign or reject vendor | ADR, architecture map |
| Production promotion without evidence | release gate checklist | evidence binder completeness check | limited pilot or no-go | gate memo |
Anti-Patterns And Failure Modes
| Anti-pattern | Why it fails | Better pattern |
|---|---|---|
| Vendor-first discovery | Demo defines problem and success criteria | Intake starts from outcome, workflow, risk and baseline |
| Accuracy-only scorecard | Ignores audit, latency, cost, data, tool safety and architecture fit | Multi-dimensional sandbox scorecard |
| PoC using production data without control | Creates privacy and shadow-production risk | Approved sandbox data boundary and DLP |
| One build-buy decision for the whole system | Hides component-level control needs | Component decision matrix |
| Vendor black-box RAG | Cannot prove source, freshness, entitlement or citation | Source registry and retrieval trace |
| Contract promises without technical enforcement | Rights cannot be exercised in operations | Contract-control-evidence mapping |
| Pilot becomes production by adoption pressure | Controls arrive after risk exposure | Production promotion gate with hard stop criteria |
| No no-AI baseline | AI value cannot be defended | Compare process/rules/search baseline |
| Weak cost measurement | Token and eval cost surprise at scale | Cost per case and capacity model |
| Missing exit constraints at intake | Lock-in discovered after integration | Exit constraints and concentration risk before pilot |
Architecture Mapping To RAG / Agent / Copilot / Eval / Governance
| Architecture pattern | Intake question | Sandbox test | Production gate evidence |
|---|---|---|---|
| RAG | Which sources are authoritative, current and permissioned? | citation correctness, stale-source failure, ACL filtering, retrieval recall | source registry, index version, retrieval eval, access review |
| Agent | What tools can be called, with what authority and side effects? | tool choice accuracy, argument validation, approval path, idempotency | tool policy, audit trace, kill switch, rollback test |
| Copilot | What does the human see, edit, approve, reject and own? | reviewer agreement, override rate, UX trust calibration, escalation | HITL log, training, adoption and quality dashboard |
| Eval | What behavior contract must be proven before release? | golden set, red-team set, slice metrics, critical failures | eval report, threshold decision, exception memo |
| Governance | Who owns AI risk, change, evidence, incident and lifecycle? | RACI simulation, gate dry run, evidence completeness | AI inventory, ADR, risk acceptance, operating cadence |
| Model gateway | Which providers and versions are allowed? | route comparison, fallback, cost/latency benchmark | routing policy, model registry, telemetry export |
| Observability | Can one case be reconstructed end to end? | trace completeness, log redaction, SIEM export | evidence binder, retention setting, audit export |
ADR Draft
ADR: AI Procurement Intake And Vendor Sandbox Decision Architecture
Status: Proposed
Date: 2026-06-30
Context:
Financial retail AI initiatives are entering the portfolio through business ideas, vendor pitches,
executive pressure and local productivity experiments. Without a common intake and sandbox
architecture, teams may select vendors before defining outcomes, data boundaries, eval contracts,
architecture fit, risk tier and production evidence gates.
Decision:
Create an upstream AI procurement intake and vendor evaluation sandbox architecture. Every AI
candidate must pass intake completeness, use-case triage, sandbox charter, controlled benchmark,
build-buy-partner decision record and production promotion gate before procurement contracting
or production integration.
Options Considered:
1. Let procurement run standard vendor questionnaires first.
2. Let product teams run ad hoc PoCs and bring winners to architecture review.
3. Establish a common intake-to-sandbox-to-decision architecture across product, risk, procurement and architecture.
Decision Rationale:
Option 3 keeps AI vendor selection evidence-based and architecture-aware. It prevents demo bias,
shadow production, uncontrolled data exposure and premature lock-in. It also gives downstream
contracting, third-party risk, model validation and executive funding a stronger evidence base.
Consequences:
- Teams must define outcome, workflow, AI role, data boundary, no-AI baseline and eval contract before vendor testing.
- Vendors must be compared against the same sandbox tasks, rubric, cost, latency, risk and architecture-fit criteria.
- Production promotion requires traceable evidence, not product enthusiasm.
- The organization must maintain reusable templates, datasets, scorecards and gate records.
Reversal Triggers:
- Intake gate blocks too many low-risk experiments without learning value.
- Sandbox cycle time becomes disproportionate to risk tier.
- Platform-level approved patterns make some repeated sandbox steps redundant.
- Regulatory, legal or internal policy changes require a stronger or different gate.
Interview Answer
30 秒版本
我会把 AI procurement intake 当成架构控制点, 而不是采购表单。先用 outcome、workflow、AI role、data boundary、risk tier 和 no-AI baseline 做 triage, 再用受控 sandbox 对 build、buy、partner、hybrid 方案做同一数据、同一 rubric、同一成本和延迟基准测试。最后用 evidence pack 支撑 ADR、risk acceptance 和 production gate, 防止 vendor demo 直接变成生产系统。
2 分钟版本
在金融零售里, 例如 AML investigation workbench 或 KYC document intelligence, 我不会先问哪家供应商 demo 最好。我会先建立 intake funnel: 业务 outcome 是什么, AI 在流程中是检索、摘要、建议还是执行, 哪些数据会进入 prompt、embedding、日志和供应商, 哪些行为必须人工审批, no-AI baseline 是什么。通过 triage 后, 我会设计 sandbox charter, 用同一批脱敏或合成数据、golden set、red-team cases 和业务 rubric 比较供应商方案、内部方案和流程方案。评估不只看 accuracy, 还看 critical failure、grounding、权限过滤、成本、p95 延迟、trace 完整性、审计导出、模型变更、工具权限和架构适配。决策不是简单买或不买, 而是组件级 build-buy-partner: 例如可以买 OCR 或 copilot UI, 但内部保留 model gateway、RAG source registry、eval contract、tool gateway 和 audit evidence。只有 evidence pack 能支撑 risk、privacy、security、model validation 和 production promotion gate 时, 才进入受控 pilot 或采购合同阶段。
CTO 版本
I would institutionalize AI procurement intake as a control-plane pattern. The control plane has five artifacts: intake card, sandbox charter, benchmark evidence, build-buy-partner ADR and production promotion packet. It prevents premature vendor coupling by forcing every option through the same workflow, data, risk, evaluation, cost, latency and architecture-fit tests. At component level, I would usually buy commodity capability, build policy and evidence control points, and partner only where domain transfer is required. For regulated financial use cases, the winning option is not the one with the best demo; it is the option that can be integrated through our identity, data, RAG, tool, eval, observability, incident and audit architecture while keeping concentration risk and exit constraints explicit.
7-Day Practice Plan
| Day | Practice | Output |
|---|---|---|
| 1 | Choose one use case: GenAI contact center, KYC document intelligence, AML workbench, credit support, or payments fraud | One-page intake card with outcome, workflow, AI role, data and no-AI option |
| 2 | Draw AS-IS / TO-BE workflow and decision authority boundary | BPMN-lite flow plus allowed, restricted and prohibited AI actions |
| 3 | Build sandbox charter | Data plan, task list, rubric, critical failures, cost and latency measures |
| 4 | Create vendor scorecard and internal baseline comparison | Weighted scorecard with architecture-fit criteria |
| 5 | Write component-level build-buy-partner decision | Component matrix for model, RAG, eval, gateway, workflow, observability |
| 6 | Assemble evidence model | Metrics, controls, evidence objects, risk acceptance and gate thresholds |
| 7 | Practice interview narrative | 30秒, 2分钟, CTO answer and one financial scenario deep dive |
Source Anchors
| Anchor | Link | 本文使用方式 |
|---|---|---|
| NIST AI Risk Management Framework | https://www.nist.gov/itl/ai-risk-management-framework | 用 Govern / Map / Measure / Manage 组织 AI risk, evidence, monitoring and management action |
| NIST AI RMF Generative AI Profile | https://www.nist.gov/publications/artificial-intelligence-risk-management-framework-generative-artificial-intelligence | 用 GenAI risk lens 设计 sandbox red-team, content provenance, data leakage and misuse cases |
| ISO/IEC 42001 AI management systems | https://www.iso.org/standard/81230.html | 用 AI management system 思路定义 accountability, lifecycle, operation, performance evaluation and improvement |
| ISO/IEC/IEEE 29148 Requirements engineering | https://www.iso.org/standard/72089.html | 用 requirements quality, stakeholder concern and validation thinking 支撑 intake and eval contract |
| ISO/IEC/IEEE 42010 Architecture description | https://www.iso.org/standard/74393.html | 用 stakeholder concern, viewpoint and architecture rationale 组织 ADR and architecture fit review |
| Interagency Third-Party Risk Guidance, FDIC FIL-29-2023 | https://www.fdic.gov/news/financial-institution-letters/2023/fil23029.html | 用 third-party lifecycle 思维连接 planning, due diligence, selection, monitoring and termination inputs |
| FFIEC AIO booklet summary, OCC Bulletin 2021-30 | https://www.occ.gov/news-issuances/bulletins/2021/bulletin-2021-30.html | 用 architecture, infrastructure and operations lens 检查 resilience, integration, operations and evidence |
| OWASP Top 10 for Large Language Model Applications | https://owasp.org/www-project-top-10-for-large-language-model-applications/ | 用 prompt injection, sensitive information disclosure, supply chain and excessive agency 设计安全测试 |