AI Post-Quantum / Cryptographic Agility Playbook
核心判断:
AI Post-Quantum / Cryptographic Agility Architecture Playbook
定位: 面向 Advanced AI PM、Senior BA、AI Product Architect、Enterprise Architect、Security Architect、Platform Owner、Model Risk、Operational Risk、Third-Party Risk、Records/Evidence Owner 和金融零售业务 owner。本文不是量子计算入门, 而是训练你把 AI portfolio、长期证据、RAG/agent/tool 依赖、供应商路线图和 crypto migration 组织成可执行的架构治理系统。
核心判断:
The institution does not need to know the exact quantum breakthrough date to act. It needs to know which AI data, evidence, signatures, keys, vendors and protocols must survive algorithm change without losing confidentiality, integrity, auditability or customer trust.
0. Disclaimer
本文是学习、架构训练和作品集材料, 不构成密码学实施建议、法律意见、监管意见、合规结论、安全认证结论、采购建议或生产变更方案。
正式项目必须由 Security Architecture、Cryptography Engineering、CISO、Enterprise Architecture、Platform Engineering、Legal、Privacy、Records、Model Risk、Third-Party Risk、Compliance、Procurement、Internal Audit、Business Owner、KMS/HSM Owner、Certificate Authority Owner、Archive Owner 和关键供应商共同确认。
适用性取决于 jurisdiction、regulated product、data retention、customer segment、technology stack、cloud/on-prem boundary、HSM/KMS capability、certificate lifecycle、protocol support、mobile/edge footprint、vendor roadmap、contract terms、legal hold、audit expectation 和机构内部政策。
1. Executive Framing
Post-quantum readiness 经常被误解为“等安全团队更新算法”。在 AI 组织里, 更现实的问题是:
Can we find every AI artifact protected by quantum-vulnerable cryptography?
Can we prioritize long-lived customer data and evidence?
Can vendors support our migration timeline?
Can we change cryptographic profiles without breaking model, RAG, tool, token, signature and archive flows?
Can we prove after migration that old AI evidence is still valid?
AI post-quantum program 的管理层叙事应包含:
- Risk is time-lagged: 今天创建的敏感数据和证据可能多年后才被攻击、审计或争议。
- Scope is portfolio-wide: 模型网关、RAG、agent tools、logs、archives、signatures、KMS/HSM、供应商都在范围内。
- Migration is supply-chain dependent: 云、API、SDK、HSM、证书、移动端、数据平台和归档工具都可能成为阻塞点。
- Architecture is the lever: crypto agility 能减少未来算法替换对业务代码、客户体验和证据链的冲击。
- Governance must be continuous: 新 AI use case、供应商变更和标准更新都要重新进入 crypto posture。
Executive one-liner:
Post-quantum readiness is a portfolio risk reduction program,
not a one-time algorithm swap.
2. Source Anchors
| Anchor | Official link | 本文使用方式 |
|---|---|---|
| NIST Post-Quantum Cryptography project | https://www.nist.gov/pqcrypto | 用 NIST PQC 项目、principal standards 和 migration guidance 建立主锚点 |
| NIST CSRC PQC project | https://csrc.nist.gov/projects/post-quantum-cryptography | 用 PQC standards、migration to PQC 和 ongoing standardization 组织路线图 |
| NIST PQC standardization | https://csrc.nist.gov/projects/post-quantum-cryptography/post-quantum-cryptography-standardization | 用标准化过程说明算法选择需要持续治理 |
| NIST NCCoE Migration to PQC | https://www.nccoe.nist.gov/crypto-agility-considerations-migrating-post-quantum-cryptographic-algorithms | 用 cryptographic discovery、crypto inventory、interoperability testing 和 migration project language 设计执行 |
| CISA Post-Quantum Cryptography Initiative | https://www.cisa.gov/topics/risk-management/quantum | 用 critical infrastructure、risk management、supply chain readiness 语言支持高管沟通 |
| CISA / NSA / NIST Quantum-Readiness factsheet | https://www.cisa.gov/resources-tools/resources/quantum-readiness-migration-post-quantum-cryptography | 用 cryptographic inventory、vendor engagement 和 supply chain assessment 设计准备工作 |
| NIST AI RMF | https://www.nist.gov/itl/ai-risk-management-framework | 用 Govern / Map / Measure / Manage 组织 AI PQC risk lifecycle |
| ISO/IEC 42001 overview | https://www.iso.org/standard/42001 | 用 AI management system、roles、operation、performance evaluation 和 improvement 建立治理节奏 |
Source nuance:
- NIST PQC standards 不是一张“所有系统立即替换”的清单。企业需要 discovery、priority、testing、interoperability 和 vendor readiness。
- CISA quantum readiness 语言适合管理层和供应链沟通, 但具体迁移要落到资产、协议、密钥、证书、数据和证据。
- NIST AI RMF / ISO 42001 不是密码学标准, 但可以作为 AI risk operating model 的组织框架。
3. Operating Thesis
AI post-quantum readiness 的工作拆成 6 个可管理能力:
Discover -> Classify -> Prioritize -> Design -> Migrate -> Monitor
| Capability | 目标 | 典型产物 |
|---|---|---|
| Discover | 找到 AI 系统中的 crypto usage | AI crypto inventory、scan report、owner attestation |
| Classify | 判断数据/证据寿命和业务影响 | lifetime matrix、evidence criticality tier |
| Prioritize | 按长期保密、长期完整性、客户影响和供应商约束排序 | migration wave plan、risk heatmap |
| Design | 建立 crypto-agile architecture | ADR、target architecture、gateway abstraction |
| Migrate | 测试、上线、回退、验证 | pilot results、release bundle、replay report |
| Monitor | 持续发现新风险和过期例外 | posture dashboard、KRI、monthly review |
4. Scope Taxonomy
4.1 In Scope AI Assets
| Asset type | Include |
|---|---|
| Model access | model gateway、external model API、embedding API、fine-tuning endpoint |
| RAG and knowledge | source docs、chunks、embedding index、metadata、retrieval trace |
| Agent tools | tool registry、API contract、MCP server、approval token、side-effect event |
| Evidence plane | prompt、output、retrieval、policy、human review、eval、incident and complaint records |
| Content and communication | generated content、approved claims、customer messages、provenance manifest |
| Platform security | TLS/mTLS、certificates、KMS/HSM、JWT/signature、secrets、token exchange |
| Vendor layer | cloud AI platform、archive provider、observability provider、API gateway、identity provider |
4.2 Out of Scope but Connected
不要把以下内容混进同一个 backlog, 但要建立接口:
- 普通企业 PKI modernization。
- 非 AI 的网络设备迁移。
- 独立 cryptographic module validation。
- 纯研究型算法评估。
- 物理安全和硬件生命周期。
接口关系:
Enterprise PQC program
-> AI PQC stream
-> AI platform migration
-> use-case migration waves
5. Crypto Inventory Object Model
建议建立一个最小可用 object model。
AIUseCase
id
owner
business_process
customer_impact
risk_tier
data_lifetime
evidence_lifetime
CryptoDependency
id
dependency_type
algorithm
key_id
certificate_id
protocol
vendor_product
environment
quantum_vulnerable_flag
owner
EvidenceObject
id
artifact_type
retention_class
integrity_requirement
confidentiality_requirement
signature_profile
verification_profile
legal_hold_capable
VendorReadiness
vendor
product
pqc_support_status
roadmap_date
contract_clause_status
interoperability_evidence
exit_option
关键不是字段多, 而是可查询:
Show all customer-facing AI use cases with long-lived evidence,
external vendor dependency and quantum-vulnerable public-key algorithm.
6. Decision Gates
Gate 1: AI Use Case Intake
每个新 AI use case 必须回答:
- 是否处理客户、员工、交易、KYC、投诉、欺诈、信贷、投资、健康或困难信息?
- 数据保密期是否超过 3 年、7 年或账户生命周期?
- 证据是否需要在未来监管、审计、法律保全或客户争议中使用?
- 是否使用外部模型、embedding、RAG、observability、archive 或 agent tool vendor?
- 是否创建 signed artifact、approval token、customer message 或 provenance manifest?
通过条件:
- 已分配 crypto inventory owner。
- 已标记 data/evidence lifetime。
- 已映射关键 crypto dependency。
- 已进入 migration wave 或 approved exception。
Gate 2: Vendor Onboarding
供应商必须提供:
- 当前 cryptographic algorithms and protocols。
- KMS/HSM/certificate/key management boundary。
- PQC roadmap and support timeline。
- hybrid / dual-stack support where applicable。
- audit/log/export evidence format。
- contract language for crypto change notification。
- exit and data/evidence export capability。
通过条件:
- readiness score 可接受。
- blocking gaps 有 owner 和 date。
- 高风险 use case 不依赖无法迁移且无退出路径的黑盒服务。
Gate 3: Architecture Review
架构评审必须看到:
- target crypto profile。
- gateway and abstraction design。
- evidence verification strategy。
- latency and availability impact。
- rollback design。
- operational runbook。
- residual risk decision。
Gate 4: Release
上线前必须有:
- migration test evidence。
- interoperability evidence。
- evidence replay report。
- security review。
- operations readiness。
- incident and rollback plan。
- exception expiry if any。
Gate 5: Post-Migration Assurance
迁移后必须验证:
- 证据可以重放和验证。
- 客户渠道没有异常摩擦。
- 日志和监控覆盖新旧 profile。
- 供应商 SLA 和 support path 正常。
- 例外进入 closure track。
7. Architecture Patterns
Pattern A: Crypto Profile Service
把 cryptographic choices 变成 policy-managed profile。
profile_id: AI_HIGH_RISK_EVIDENCE_V2
purpose: long-lived evidence signing
allowed_algorithms:
key_management_boundary:
verification_rules:
rotation_policy:
retention_alignment:
exception_process:
收益:
- 避免各团队硬编码算法。
- 支持未来 profile 替换。
- 让审计看到规则版本。
Pattern B: Evidence Manifest
每个 AI evidence bundle 都带 manifest。
manifest_id
case_id
artifact_list
hashes
signature_profile
timestamp_profile
model_version
prompt_version
retrieval_sources
tool_events
approval_events
human_review_events
verification_instructions
收益:
- 迁移后仍可验证旧证据。
- 支持 legal hold、audit query 和 customer dispute。
Pattern C: Gateway-Mediated Vendor Calls
所有 high-risk model/RAG/tool 调用通过 gateway。
gateway 负责:
- certificate and protocol telemetry。
- approved vendor endpoint。
- token and signing profile。
- request/response hashing。
- data residency and retention tags。
- audit and evidence export。
Pattern D: Dual Signature Period
在迁移窗口中, 同一证据可能同时具备 legacy signature 和 new profile signature。
治理要求:
- 明确 dual-sign 起止日期。
- 明确验证优先级。
- 明确 old profile sunset。
- 明确 failed verification escalation。
Pattern E: PQC-Ready Procurement Clause
采购和合同不是最后一步, 而是迁移能力的一部分。
合同语言应覆盖:
- crypto change notification。
- standards support roadmap。
- interoperability testing participation。
- evidence export。
- key ownership。
- audit right。
- exit and transition assistance。
8. Operating Model
| Role | Accountability |
|---|---|
| Executive Sponsor | 资金、风险偏好、跨部门优先级 |
| CISO / Security Architecture | PQC strategy、crypto standards、risk acceptance |
| Cryptography Engineering | algorithm profile、KMS/HSM、cert、signing implementation |
| Enterprise Architecture | target architecture、dependency map、transition roadmap |
| AI Platform Owner | model/RAG/tool gateway、developer path、platform telemetry |
| Product Owner | use case priority、customer impact、release tradeoff |
| Records / Evidence Owner | retention、legal hold、archive verification |
| Model Risk | AI evidence requirements、validation impact、change trigger |
| Third-Party Risk | vendor readiness、contract clauses、exit plan |
| Legal / Compliance | regulatory applicability、record and disclosure considerations |
| Internal Audit | control design review、evidence sufficiency、independent testing |
RACI sketch:
| Activity | Product | Platform | Security | EA | Legal/Compliance | TPRM | Audit |
|---|---|---|---|---|---|---|---|
| AI crypto inventory | R | R | A | C | C | C | I |
| Risk tiering | R | C | A | R | C | C | I |
| Target architecture | C | R | A | R | I | C | I |
| Vendor readiness | C | C | C | I | C | A/R | I |
| Migration release | A/R | R | A/R | C | C | C | I |
| Evidence assurance | C | R | C | C | C | I | A/R |
9. Implementation Roadmap
First 30 Days: Frame and Scope
Deliverables:
- executive problem statement。
- AI portfolio scope list。
- high-risk use case shortlist。
- source system and vendor list。
- initial data/evidence lifetime matrix。
- governance cadence and owners。
Practical focus:
Do not attempt full enterprise perfection.
Start with customer-facing AI, agentic AI and long-lived evidence.
Days 31-60: Discover and Classify
Deliverables:
- AI crypto inventory v1。
- certificate/key/signature/protocol scan results。
- vendor readiness questionnaire。
- quantum-vulnerable dependency heatmap。
- unknown ownership list。
Practical focus:
- combine automated scanning with owner attestation。
- include logs, archives, RAG, vector stores and observability data。
- identify hard-to-migrate SDKs and mobile/edge components early。
Days 61-90: Design and Decide
Deliverables:
- target crypto-agile architecture。
- crypto profile model。
- evidence manifest design。
- migration wave plan。
- exception and residual risk process。
- pilot candidate selection。
Practical focus:
- choose one high-value but bounded pilot。
- include evidence replay, not just connection success。
Days 91-180: Pilot and Prove
Deliverables:
- pilot implementation。
- latency / throughput benchmark。
- interoperability report。
- evidence replay report。
- operations runbook。
- rollback exercise。
- lessons learned。
Practical focus:
- test customer channel impact。
- test archive verification。
- test vendor support path。
- test observability and incident response。
6-12 Months: Scale by Risk Tier
Deliverables:
- wave rollout dashboard。
- vendor contract updates。
- developer golden path。
- monthly crypto posture review。
- board/risk committee update。
- audit-ready evidence binder。
Practical focus:
- make crypto readiness a standard AI release gate。
- close expired exceptions。
- refresh readiness when NIST, vendors or protocols change。
10. Evidence Pack
An audit-ready AI PQC evidence pack should include:
1. Scope statement
2. AI use case inventory extract
3. Data and evidence lifetime classification
4. Crypto dependency graph
5. Quantum-vulnerable algorithm register
6. Vendor readiness scorecards
7. Target architecture and ADRs
8. Migration wave plan
9. Pilot and interoperability test results
10. Evidence replay and verification report
11. Release approvals and exceptions
12. Rollback and incident runbook
13. Post-migration monitoring dashboard
14. Open risks and management actions
Evidence quality checks:
- Can every high-risk use case be traced to a business owner?
- Can every long-lived evidence object be traced to a crypto profile?
- Can every critical vendor dependency be traced to a readiness answer?
- Can every exception be traced to an expiry date and compensating control?
- Can every migrated evidence sample be verified after migration?
11. Metrics and Dashboard
Executive Metrics
| Metric | Use |
|---|---|
| High-risk AI use cases inventoried | Shows scope control |
| Long-lived data/evidence exposure count | Shows urgency and priority |
| Critical vendors with PQC roadmap | Shows supply-chain readiness |
| Migration wave completion | Shows delivery progress |
| Open high-risk exceptions | Shows residual risk |
Architecture Metrics
| Metric | Use |
|---|---|
| Algorithm metadata completeness | Measures crypto-agile evidence readiness |
| Gateway coverage for model/RAG/tool calls | Measures central control adoption |
| Evidence replay success rate | Measures audit defensibility |
| Unknown crypto dependency count | Measures discovery maturity |
| Hardcoded crypto usage count | Measures technical debt |
Product and Operations Metrics
| Metric | Use |
|---|---|
| Login/API/tool latency change | Tracks customer and workflow impact |
| Failed authentication or signature verification | Detects migration defects |
| Customer complaints linked to migration | Tracks conduct and experience impact |
| Operational incidents during wave | Tracks readiness |
| Rollback time | Measures recoverability |
12. Checklists
12.1 Product Intake Checklist
- The use case has a named business owner.
- Customer, employee, operational and regulatory impact are classified.
- Data lifetime and evidence lifetime are documented.
- RAG, tool, model, archive and observability vendors are listed.
- Signed artifacts and approval tokens are identified.
- High-risk customer workflows have fallback and rollback considerations.
- Crypto owner reviewed dependency assumptions.
12.2 Architecture Review Checklist
- Crypto dependencies are not embedded directly in business logic.
- Model/RAG/tool calls use approved gateways where feasible.
- Algorithm metadata is captured with evidence objects.
- Evidence manifest and verification profile are designed.
- Vendor endpoints and SDKs are approved.
- Latency and compatibility risks are measured.
- Rollback and dual-verification periods are explicit.
12.3 Vendor Readiness Checklist
- Vendor describes current cryptographic algorithms and protocols.
- Vendor has PQC or crypto agility roadmap.
- Vendor supports export of evidence and logs.
- Vendor supports customer-managed keys where required.
- Vendor provides change notification for cryptographic profiles.
- Vendor participates in interoperability testing or provides evidence.
- Contract includes exit and transition support.
12.4 Evidence Migration Checklist
- Sample evidence bundles are selected across risk tiers.
- Old signatures and timestamps verify before migration.
- New or dual signatures verify after migration.
- Legal hold objects remain untouched or controlled.
- Archive indexes and search remain accurate.
- Verification instructions are updated.
- Audit trail records who migrated what and when.
13. Anti-Patterns
| Anti-pattern | Why it fails |
|---|---|
| “PQC is only infrastructure” | AI evidence, RAG, tool calls and vendor logs sit above infrastructure |
| “We will wait until vendors solve it” | Vendor readiness is a managed dependency, not a strategy |
| “Scan certificates and call it done” | Keys, signatures, archives, tokens, SDKs and evidence are missing |
| “Migrate everything at once” | Risk tiers, interoperability and rollback need staged waves |
| “No customer impact expected” | Auth, latency, mobile SDKs and payment/tool workflows can affect customers |
| “Old evidence does not matter” | Complaints, audits, legal hold and disputes often rely on old evidence |
| “Algorithm choice is the roadmap” | Discovery, ownership, testing, contracts and operating model are the roadmap |
14. Workshop Design
Workshop 1: AI PQC Exposure Mapping
Participants:
- Product owner。
- AI platform owner。
- Security architect。
- Records/evidence owner。
- TPRM。
- Compliance/legal partner。
Agenda:
- Select top 10 AI use cases.
- Map data and evidence lifetime.
- Identify model/RAG/tool/archive vendors.
- Identify signed artifacts and approval tokens.
- Mark unknown crypto dependencies.
- Assign owners and next actions.
Output:
- first AI PQC heatmap。
- unknowns list。
- wave candidate shortlist。
Workshop 2: Evidence Replay Drill
Agenda:
- Pick one customer-facing AI case.
- Export prompt/retrieval/tool/output/approval evidence.
- Verify signatures and hashes.
- Simulate crypto profile change.
- Verify evidence after simulated migration.
- Document gaps.
Output:
- evidence replay report。
- archive and signature gaps。
- release gate updates。
Workshop 3: Vendor Readiness Review
Agenda:
- Rank vendors by AI criticality.
- Review crypto roadmap and support.
- Review export and exit capability.
- Identify contract gaps.
- Define remediation and escalation.
Output:
- vendor readiness scorecard。
- contract addendum backlog。
- exit risk list。
15. Portfolio Prioritization Formula
一个实用评分:
PQC priority =
data confidentiality lifetime
+ evidence integrity lifetime
+ customer/regulatory impact
+ agent side-effect severity
+ vendor lock-in
+ migration complexity
- compensating controls
示例:
| Use case | Priority reason |
|---|---|
| AI fraud hold explanation | customer harm, complaints, long-lived evidence, regulated workflow |
| Agent-assisted payment exception | side effect, authorization token, audit trail, customer dispute |
| RAG for KYC investigations | sensitive customer data, long retention, law enforcement/regulatory evidence |
| Internal HR summary bot | employee data, but no direct customer side effect; still needs HR/legal review |
| Marketing copy assistant | content provenance and approved claims, but lower data confidentiality |
16. Interview Drill
问题:
How would you prepare an AI platform for post-quantum cryptography?
高级回答:
I would not start with algorithm selection. I would start with an AI crypto inventory that links use cases, data lifetime, evidence lifetime, RAG sources, tool invocations, signatures, keys, certificates, vendors and archives. Then I would prioritize use cases with long-lived confidential data, long-lived evidence integrity, agentic side effects and customer-facing regulated workflows. Architecturally, I would use crypto profiles, central signing/KMS services, gateway-mediated model/RAG/tool calls, evidence manifests and dual-verification periods so that cryptographic profiles can change without rewriting business workflows or breaking audit evidence.
追问:
| Follow-up | Answer direction |
|---|---|
| What makes AI different from ordinary PQC migration? | AI creates prompt/retrieval/tool/output/eval evidence that may need long-term confidentiality and integrity |
| What would you ask vendors? | Current algorithms, PQC roadmap, export format, key ownership, change notice, interoperability testing, exit support |
| How do you prevent migration from breaking evidence? | Evidence manifest, algorithm metadata, verification profile versioning, replay tests, dual-signature period |
| How do you prioritize? | Long-lived data, long-lived evidence, customer impact, agent side effects, vendor lock-in and migration complexity |
| How do you report to executives? | Portfolio exposure, critical vendors, migration waves, exceptions, customer impact and residual risk |
17. Minimum Viable Artifact Set
For portfolio or interview demonstration, create:
| Artifact | Purpose |
|---|---|
| AI PQC Heatmap | Shows business-prioritized exposure |
| AI Crypto Inventory Sample | Shows concrete dependency mapping |
| Evidence Manifest Template | Shows audit and records thinking |
| Vendor Readiness Scorecard | Shows supply-chain governance |
| Crypto-Agile ADR | Shows architecture decision quality |
| Migration Wave Roadmap | Shows execution planning |
| Evidence Replay Report | Shows assurance mindset |
| Executive One-Pager | Shows product/architecture translation |
18. Closing Principle
Post-quantum readiness for AI is not a fear-based program. It is an architecture hygiene program for systems that create high-value, long-lived, AI-mediated evidence.
The strongest product architecture answer is:
We know what AI evidence we create.
We know how long it matters.
We know what cryptography protects it.
We know which vendors and platforms constrain it.
We know how to migrate without losing verification.
That is cryptographic agility in a financial retail AI environment.