AI 扩展计划 / Playbooks

AI Post-Quantum / Cryptographic Agility Playbook

核心判断:

647 行AI_POST_QUANTUM_CRYPTOGRAPHIC_AGILITY_PLAYBOOK.md

AI Post-Quantum / Cryptographic Agility Architecture Playbook

定位: 面向 Advanced AI PM、Senior BA、AI Product Architect、Enterprise Architect、Security Architect、Platform Owner、Model Risk、Operational Risk、Third-Party Risk、Records/Evidence Owner 和金融零售业务 owner。本文不是量子计算入门, 而是训练你把 AI portfolio、长期证据、RAG/agent/tool 依赖、供应商路线图和 crypto migration 组织成可执行的架构治理系统。

核心判断:

The institution does not need to know the exact quantum breakthrough date to act. It needs to know which AI data, evidence, signatures, keys, vendors and protocols must survive algorithm change without losing confidentiality, integrity, auditability or customer trust.

0. Disclaimer

本文是学习、架构训练和作品集材料, 不构成密码学实施建议、法律意见、监管意见、合规结论、安全认证结论、采购建议或生产变更方案。

正式项目必须由 Security Architecture、Cryptography Engineering、CISO、Enterprise Architecture、Platform Engineering、Legal、Privacy、Records、Model Risk、Third-Party Risk、Compliance、Procurement、Internal Audit、Business Owner、KMS/HSM Owner、Certificate Authority Owner、Archive Owner 和关键供应商共同确认。

适用性取决于 jurisdiction、regulated product、data retention、customer segment、technology stack、cloud/on-prem boundary、HSM/KMS capability、certificate lifecycle、protocol support、mobile/edge footprint、vendor roadmap、contract terms、legal hold、audit expectation 和机构内部政策。

1. Executive Framing

Post-quantum readiness 经常被误解为“等安全团队更新算法”。在 AI 组织里, 更现实的问题是:

Can we find every AI artifact protected by quantum-vulnerable cryptography?
Can we prioritize long-lived customer data and evidence?
Can vendors support our migration timeline?
Can we change cryptographic profiles without breaking model, RAG, tool, token, signature and archive flows?
Can we prove after migration that old AI evidence is still valid?

AI post-quantum program 的管理层叙事应包含:

Risk is time-lagged: 今天创建的敏感数据和证据可能多年后才被攻击、审计或争议。
Scope is portfolio-wide: 模型网关、RAG、agent tools、logs、archives、signatures、KMS/HSM、供应商都在范围内。
Migration is supply-chain dependent: 云、API、SDK、HSM、证书、移动端、数据平台和归档工具都可能成为阻塞点。
Architecture is the lever: crypto agility 能减少未来算法替换对业务代码、客户体验和证据链的冲击。
Governance must be continuous: 新 AI use case、供应商变更和标准更新都要重新进入 crypto posture。

Executive one-liner:

Post-quantum readiness is a portfolio risk reduction program,
not a one-time algorithm swap.

2. Source Anchors

Anchor	Official link	本文使用方式
NIST Post-Quantum Cryptography project	https://www.nist.gov/pqcrypto	用 NIST PQC 项目、principal standards 和 migration guidance 建立主锚点
NIST CSRC PQC project	https://csrc.nist.gov/projects/post-quantum-cryptography	用 PQC standards、migration to PQC 和 ongoing standardization 组织路线图
NIST PQC standardization	https://csrc.nist.gov/projects/post-quantum-cryptography/post-quantum-cryptography-standardization	用标准化过程说明算法选择需要持续治理
NIST NCCoE Migration to PQC	https://www.nccoe.nist.gov/crypto-agility-considerations-migrating-post-quantum-cryptographic-algorithms	用 cryptographic discovery、crypto inventory、interoperability testing 和 migration project language 设计执行
CISA Post-Quantum Cryptography Initiative	https://www.cisa.gov/topics/risk-management/quantum	用 critical infrastructure、risk management、supply chain readiness 语言支持高管沟通
CISA / NSA / NIST Quantum-Readiness factsheet	https://www.cisa.gov/resources-tools/resources/quantum-readiness-migration-post-quantum-cryptography	用 cryptographic inventory、vendor engagement 和 supply chain assessment 设计准备工作
NIST AI RMF	https://www.nist.gov/itl/ai-risk-management-framework	用 Govern / Map / Measure / Manage 组织 AI PQC risk lifecycle
ISO/IEC 42001 overview	https://www.iso.org/standard/42001	用 AI management system、roles、operation、performance evaluation 和 improvement 建立治理节奏

Source nuance:

NIST PQC standards 不是一张“所有系统立即替换”的清单。企业需要 discovery、priority、testing、interoperability 和 vendor readiness。
CISA quantum readiness 语言适合管理层和供应链沟通, 但具体迁移要落到资产、协议、密钥、证书、数据和证据。
NIST AI RMF / ISO 42001 不是密码学标准, 但可以作为 AI risk operating model 的组织框架。

3. Operating Thesis

AI post-quantum readiness 的工作拆成 6 个可管理能力:

Discover -> Classify -> Prioritize -> Design -> Migrate -> Monitor

Capability	目标	典型产物
Discover	找到 AI 系统中的 crypto usage	AI crypto inventory、scan report、owner attestation
Classify	判断数据/证据寿命和业务影响	lifetime matrix、evidence criticality tier
Prioritize	按长期保密、长期完整性、客户影响和供应商约束排序	migration wave plan、risk heatmap
Design	建立 crypto-agile architecture	ADR、target architecture、gateway abstraction
Migrate	测试、上线、回退、验证	pilot results、release bundle、replay report
Monitor	持续发现新风险和过期例外	posture dashboard、KRI、monthly review

4. Scope Taxonomy

4.1 In Scope AI Assets

Asset type	Include
Model access	model gateway、external model API、embedding API、fine-tuning endpoint
RAG and knowledge	source docs、chunks、embedding index、metadata、retrieval trace
Agent tools	tool registry、API contract、MCP server、approval token、side-effect event
Evidence plane	prompt、output、retrieval、policy、human review、eval、incident and complaint records
Content and communication	generated content、approved claims、customer messages、provenance manifest
Platform security	TLS/mTLS、certificates、KMS/HSM、JWT/signature、secrets、token exchange
Vendor layer	cloud AI platform、archive provider、observability provider、API gateway、identity provider

4.2 Out of Scope but Connected

不要把以下内容混进同一个 backlog, 但要建立接口:

普通企业 PKI modernization。
非 AI 的网络设备迁移。
独立 cryptographic module validation。
纯研究型算法评估。
物理安全和硬件生命周期。

接口关系:

Enterprise PQC program
  -> AI PQC stream
  -> AI platform migration
  -> use-case migration waves

5. Crypto Inventory Object Model

建议建立一个最小可用 object model。

AIUseCase
  id
  owner
  business_process
  customer_impact
  risk_tier
  data_lifetime
  evidence_lifetime

CryptoDependency
  id
  dependency_type
  algorithm
  key_id
  certificate_id
  protocol
  vendor_product
  environment
  quantum_vulnerable_flag
  owner

EvidenceObject
  id
  artifact_type
  retention_class
  integrity_requirement
  confidentiality_requirement
  signature_profile
  verification_profile
  legal_hold_capable

VendorReadiness
  vendor
  product
  pqc_support_status
  roadmap_date
  contract_clause_status
  interoperability_evidence
  exit_option

关键不是字段多, 而是可查询:

Show all customer-facing AI use cases with long-lived evidence,
external vendor dependency and quantum-vulnerable public-key algorithm.

6. Decision Gates

Gate 1: AI Use Case Intake

每个新 AI use case 必须回答:

是否处理客户、员工、交易、KYC、投诉、欺诈、信贷、投资、健康或困难信息?
数据保密期是否超过 3 年、7 年或账户生命周期?
证据是否需要在未来监管、审计、法律保全或客户争议中使用?
是否使用外部模型、embedding、RAG、observability、archive 或 agent tool vendor?
是否创建 signed artifact、approval token、customer message 或 provenance manifest?

通过条件:

已分配 crypto inventory owner。
已标记 data/evidence lifetime。
已映射关键 crypto dependency。
已进入 migration wave 或 approved exception。

Gate 2: Vendor Onboarding

供应商必须提供:

当前 cryptographic algorithms and protocols。
KMS/HSM/certificate/key management boundary。
PQC roadmap and support timeline。
hybrid / dual-stack support where applicable。
audit/log/export evidence format。
contract language for crypto change notification。
exit and data/evidence export capability。

通过条件:

readiness score 可接受。
blocking gaps 有 owner 和 date。
高风险 use case 不依赖无法迁移且无退出路径的黑盒服务。

Gate 3: Architecture Review

架构评审必须看到:

target crypto profile。
gateway and abstraction design。
evidence verification strategy。
latency and availability impact。
rollback design。
operational runbook。
residual risk decision。

Gate 4: Release

上线前必须有:

migration test evidence。
interoperability evidence。
evidence replay report。
security review。
operations readiness。
incident and rollback plan。
exception expiry if any。

Gate 5: Post-Migration Assurance

迁移后必须验证:

证据可以重放和验证。
客户渠道没有异常摩擦。
日志和监控覆盖新旧 profile。
供应商 SLA 和 support path 正常。
例外进入 closure track。

7. Architecture Patterns

Pattern A: Crypto Profile Service

把 cryptographic choices 变成 policy-managed profile。

profile_id: AI_HIGH_RISK_EVIDENCE_V2
purpose: long-lived evidence signing
allowed_algorithms:
key_management_boundary:
verification_rules:
rotation_policy:
retention_alignment:
exception_process:

收益:

避免各团队硬编码算法。
支持未来 profile 替换。
让审计看到规则版本。

Pattern B: Evidence Manifest

每个 AI evidence bundle 都带 manifest。

manifest_id
case_id
artifact_list
hashes
signature_profile
timestamp_profile
model_version
prompt_version
retrieval_sources
tool_events
approval_events
human_review_events
verification_instructions

收益:

迁移后仍可验证旧证据。
支持 legal hold、audit query 和 customer dispute。

Pattern C: Gateway-Mediated Vendor Calls

所有 high-risk model/RAG/tool 调用通过 gateway。

gateway 负责:

certificate and protocol telemetry。
approved vendor endpoint。
token and signing profile。
request/response hashing。
data residency and retention tags。
audit and evidence export。

Pattern D: Dual Signature Period

在迁移窗口中, 同一证据可能同时具备 legacy signature 和 new profile signature。

治理要求:

明确 dual-sign 起止日期。
明确验证优先级。
明确 old profile sunset。
明确 failed verification escalation。

Pattern E: PQC-Ready Procurement Clause

采购和合同不是最后一步, 而是迁移能力的一部分。

合同语言应覆盖:

crypto change notification。
standards support roadmap。
interoperability testing participation。
evidence export。
key ownership。
audit right。
exit and transition assistance。

8. Operating Model

Role	Accountability
Executive Sponsor	资金、风险偏好、跨部门优先级
CISO / Security Architecture	PQC strategy、crypto standards、risk acceptance
Cryptography Engineering	algorithm profile、KMS/HSM、cert、signing implementation
Enterprise Architecture	target architecture、dependency map、transition roadmap
AI Platform Owner	model/RAG/tool gateway、developer path、platform telemetry
Product Owner	use case priority、customer impact、release tradeoff
Records / Evidence Owner	retention、legal hold、archive verification
Model Risk	AI evidence requirements、validation impact、change trigger
Third-Party Risk	vendor readiness、contract clauses、exit plan
Legal / Compliance	regulatory applicability、record and disclosure considerations
Internal Audit	control design review、evidence sufficiency、independent testing

RACI sketch:

Activity	Product	Platform	Security	EA	Legal/Compliance	TPRM	Audit
AI crypto inventory	R	R	A	C	C	C	I
Risk tiering	R	C	A	R	C	C	I
Target architecture	C	R	A	R	I	C	I
Vendor readiness	C	C	C	I	C	A/R	I
Migration release	A/R	R	A/R	C	C	C	I
Evidence assurance	C	R	C	C	C	I	A/R

9. Implementation Roadmap

First 30 Days: Frame and Scope

Deliverables:

executive problem statement。
AI portfolio scope list。
high-risk use case shortlist。
source system and vendor list。
initial data/evidence lifetime matrix。
governance cadence and owners。

Practical focus:

Do not attempt full enterprise perfection.
Start with customer-facing AI, agentic AI and long-lived evidence.

Days 31-60: Discover and Classify

Deliverables:

AI crypto inventory v1。
certificate/key/signature/protocol scan results。
vendor readiness questionnaire。
quantum-vulnerable dependency heatmap。
unknown ownership list。

Practical focus:

combine automated scanning with owner attestation。
include logs, archives, RAG, vector stores and observability data。
identify hard-to-migrate SDKs and mobile/edge components early。

Days 61-90: Design and Decide

Deliverables:

target crypto-agile architecture。
crypto profile model。
evidence manifest design。
migration wave plan。
exception and residual risk process。
pilot candidate selection。

Practical focus:

choose one high-value but bounded pilot。
include evidence replay, not just connection success。

Days 91-180: Pilot and Prove

Deliverables:

pilot implementation。
latency / throughput benchmark。
interoperability report。
evidence replay report。
operations runbook。
rollback exercise。
lessons learned。

Practical focus:

test customer channel impact。
test archive verification。
test vendor support path。
test observability and incident response。

6-12 Months: Scale by Risk Tier

Deliverables:

wave rollout dashboard。
vendor contract updates。
developer golden path。
monthly crypto posture review。
board/risk committee update。
audit-ready evidence binder。

Practical focus:

make crypto readiness a standard AI release gate。
close expired exceptions。
refresh readiness when NIST, vendors or protocols change。

10. Evidence Pack

An audit-ready AI PQC evidence pack should include:

1. Scope statement
2. AI use case inventory extract
3. Data and evidence lifetime classification
4. Crypto dependency graph
5. Quantum-vulnerable algorithm register
6. Vendor readiness scorecards
7. Target architecture and ADRs
8. Migration wave plan
9. Pilot and interoperability test results
10. Evidence replay and verification report
11. Release approvals and exceptions
12. Rollback and incident runbook
13. Post-migration monitoring dashboard
14. Open risks and management actions

Evidence quality checks:

Can every high-risk use case be traced to a business owner?
Can every long-lived evidence object be traced to a crypto profile?
Can every critical vendor dependency be traced to a readiness answer?
Can every exception be traced to an expiry date and compensating control?
Can every migrated evidence sample be verified after migration?

11. Metrics and Dashboard

Executive Metrics

Metric	Use
High-risk AI use cases inventoried	Shows scope control
Long-lived data/evidence exposure count	Shows urgency and priority
Critical vendors with PQC roadmap	Shows supply-chain readiness
Migration wave completion	Shows delivery progress
Open high-risk exceptions	Shows residual risk

Architecture Metrics

Metric	Use
Algorithm metadata completeness	Measures crypto-agile evidence readiness
Gateway coverage for model/RAG/tool calls	Measures central control adoption
Evidence replay success rate	Measures audit defensibility
Unknown crypto dependency count	Measures discovery maturity
Hardcoded crypto usage count	Measures technical debt

Product and Operations Metrics

Metric	Use
Login/API/tool latency change	Tracks customer and workflow impact
Failed authentication or signature verification	Detects migration defects
Customer complaints linked to migration	Tracks conduct and experience impact
Operational incidents during wave	Tracks readiness
Rollback time	Measures recoverability

12. Checklists

12.1 Product Intake Checklist

The use case has a named business owner.
Customer, employee, operational and regulatory impact are classified.
Data lifetime and evidence lifetime are documented.
RAG, tool, model, archive and observability vendors are listed.
Signed artifacts and approval tokens are identified.
High-risk customer workflows have fallback and rollback considerations.
Crypto owner reviewed dependency assumptions.

12.2 Architecture Review Checklist

Crypto dependencies are not embedded directly in business logic.
Model/RAG/tool calls use approved gateways where feasible.
Algorithm metadata is captured with evidence objects.
Evidence manifest and verification profile are designed.
Vendor endpoints and SDKs are approved.
Latency and compatibility risks are measured.
Rollback and dual-verification periods are explicit.

12.3 Vendor Readiness Checklist

Vendor describes current cryptographic algorithms and protocols.
Vendor has PQC or crypto agility roadmap.
Vendor supports export of evidence and logs.
Vendor supports customer-managed keys where required.
Vendor provides change notification for cryptographic profiles.
Vendor participates in interoperability testing or provides evidence.
Contract includes exit and transition support.

12.4 Evidence Migration Checklist

Sample evidence bundles are selected across risk tiers.
Old signatures and timestamps verify before migration.
New or dual signatures verify after migration.
Legal hold objects remain untouched or controlled.
Archive indexes and search remain accurate.
Verification instructions are updated.
Audit trail records who migrated what and when.

13. Anti-Patterns

Anti-pattern	Why it fails
“PQC is only infrastructure”	AI evidence, RAG, tool calls and vendor logs sit above infrastructure
“We will wait until vendors solve it”	Vendor readiness is a managed dependency, not a strategy
“Scan certificates and call it done”	Keys, signatures, archives, tokens, SDKs and evidence are missing
“Migrate everything at once”	Risk tiers, interoperability and rollback need staged waves
“No customer impact expected”	Auth, latency, mobile SDKs and payment/tool workflows can affect customers
“Old evidence does not matter”	Complaints, audits, legal hold and disputes often rely on old evidence
“Algorithm choice is the roadmap”	Discovery, ownership, testing, contracts and operating model are the roadmap

14. Workshop Design

Workshop 1: AI PQC Exposure Mapping

Participants:

Product owner。
AI platform owner。
Security architect。
Records/evidence owner。
TPRM。
Compliance/legal partner。

Agenda:

Select top 10 AI use cases.
Map data and evidence lifetime.
Identify model/RAG/tool/archive vendors.
Identify signed artifacts and approval tokens.
Mark unknown crypto dependencies.
Assign owners and next actions.

Output:

first AI PQC heatmap。
unknowns list。
wave candidate shortlist。

Workshop 2: Evidence Replay Drill

Agenda:

Pick one customer-facing AI case.
Export prompt/retrieval/tool/output/approval evidence.
Verify signatures and hashes.
Simulate crypto profile change.
Verify evidence after simulated migration.
Document gaps.

Output:

evidence replay report。
archive and signature gaps。
release gate updates。

Workshop 3: Vendor Readiness Review

Agenda:

Rank vendors by AI criticality.
Review crypto roadmap and support.
Review export and exit capability.
Identify contract gaps.
Define remediation and escalation.

Output:

vendor readiness scorecard。
contract addendum backlog。
exit risk list。

15. Portfolio Prioritization Formula

一个实用评分:

PQC priority =
  data confidentiality lifetime
+ evidence integrity lifetime
+ customer/regulatory impact
+ agent side-effect severity
+ vendor lock-in
+ migration complexity
- compensating controls

示例:

Use case	Priority reason
AI fraud hold explanation	customer harm, complaints, long-lived evidence, regulated workflow
Agent-assisted payment exception	side effect, authorization token, audit trail, customer dispute
RAG for KYC investigations	sensitive customer data, long retention, law enforcement/regulatory evidence
Internal HR summary bot	employee data, but no direct customer side effect; still needs HR/legal review
Marketing copy assistant	content provenance and approved claims, but lower data confidentiality

16. Interview Drill

问题:

How would you prepare an AI platform for post-quantum cryptography?

高级回答:

I would not start with algorithm selection. I would start with an AI crypto inventory that links use cases, data lifetime, evidence lifetime, RAG sources, tool invocations, signatures, keys, certificates, vendors and archives. Then I would prioritize use cases with long-lived confidential data, long-lived evidence integrity, agentic side effects and customer-facing regulated workflows. Architecturally, I would use crypto profiles, central signing/KMS services, gateway-mediated model/RAG/tool calls, evidence manifests and dual-verification periods so that cryptographic profiles can change without rewriting business workflows or breaking audit evidence.

追问:

Follow-up	Answer direction
What makes AI different from ordinary PQC migration?	AI creates prompt/retrieval/tool/output/eval evidence that may need long-term confidentiality and integrity
What would you ask vendors?	Current algorithms, PQC roadmap, export format, key ownership, change notice, interoperability testing, exit support
How do you prevent migration from breaking evidence?	Evidence manifest, algorithm metadata, verification profile versioning, replay tests, dual-signature period
How do you prioritize?	Long-lived data, long-lived evidence, customer impact, agent side effects, vendor lock-in and migration complexity
How do you report to executives?	Portfolio exposure, critical vendors, migration waves, exceptions, customer impact and residual risk

17. Minimum Viable Artifact Set

For portfolio or interview demonstration, create:

Artifact	Purpose
AI PQC Heatmap	Shows business-prioritized exposure
AI Crypto Inventory Sample	Shows concrete dependency mapping
Evidence Manifest Template	Shows audit and records thinking
Vendor Readiness Scorecard	Shows supply-chain governance
Crypto-Agile ADR	Shows architecture decision quality
Migration Wave Roadmap	Shows execution planning
Evidence Replay Report	Shows assurance mindset
Executive One-Pager	Shows product/architecture translation

18. Closing Principle

Post-quantum readiness for AI is not a fear-based program. It is an architecture hygiene program for systems that create high-value, long-lived, AI-mediated evidence.

The strongest product architecture answer is:

We know what AI evidence we create.
We know how long it matters.
We know what cryptography protects it.
We know which vendors and platforms constrain it.
We know how to migrate without losing verification.

That is cryptographic agility in a financial retail AI environment.