AI 底层逻辑 / 经典论文

AI Post-Quantum：密码敏捷与迁移架构

500 行ai-foundations/papers/127-ai-post-quantum-cryptographic-agility-ai-architecture.md

AI Post-Quantum / Cryptographic Agility / Long-Lived Evidence Architecture 解读

面向对象: Advanced AI PM / Senior BA / Product Architect / Enterprise Architect / Security Architect / Platform Owner / Model Risk / Operational Risk / Third-Party Risk / Records and Evidence Owner。核心问题: 当金融零售 AI 系统依赖 TLS、JWT、签名、密钥、加密归档、RAG 语料、agent tool 调用和长期证据链时, 组织如何在量子风险真正到来前, 建立 post-quantum cryptography migration 和 cryptographic agility 架构? 学习目标: 建立 AI crypto inventory、quantum-vulnerable dependency map、long-lived evidence risk classification、crypto-agile design pattern、vendor readiness gate、migration roadmap、runtime evidence protection 和 board-ready risk narrative 的架构能力。

重要说明: 本文是学习、架构训练和作品集材料, 不构成密码学实施建议、法律意见、监管意见、信息安全认证意见、供应商采购意见或生产迁移方案。正式项目必须由 Security Architecture、Cryptography Engineering、CISO、Enterprise Architecture、Platform Engineering、Legal、Privacy、Records、Model Risk、Third-Party Risk、Compliance、Procurement、Internal Audit 和业务 owner 共同确认。适用性取决于业务场景、数据寿命、监管留存要求、协议栈、供应商产品、硬件生命周期、证书体系、密钥管理、合规要求和机构政策。

Source Anchors

Source	Link	用途
NIST Post-Quantum Cryptography project	https://www.nist.gov/pqcrypto	用 NIST PQC 项目、2024 年 FIPS 203/204/205 标准和迁移方向建立主锚点
NIST CSRC PQC project	https://csrc.nist.gov/projects/post-quantum-cryptography	用 PQC standards、migration to PQC、ongoing standardization process 组织技术路线
NIST PQC Standardization	https://csrc.nist.gov/projects/post-quantum-cryptography/post-quantum-cryptography-standardization	用算法标准化过程、候选算法和后续标准化说明算法选择不是一次性事件
NIST NCCoE Migration to PQC	https://www.nccoe.nist.gov/crypto-agility-considerations-migrating-post-quantum-cryptographic-algorithms	用 cryptographic discovery、crypto inventory、interoperability testing 和 migration roadmap 组织企业迁移
CISA Post-Quantum Cryptography Initiative	https://www.cisa.gov/topics/risk-management/quantum	用 quantum readiness、critical infrastructure、supply chain readiness 语言支持管理层叙事
CISA / NSA / NIST Quantum-Readiness factsheet	https://www.cisa.gov/resources-tools/resources/quantum-readiness-migration-post-quantum-cryptography	用 cryptographic inventory、vendor engagement、supply chain assessment 作为迁移准备锚点
NIST AI RMF	https://www.nist.gov/itl/ai-risk-management-framework	用 Govern / Map / Measure / Manage 把 PQC 迁移和 AI risk management 连接起来
ISO/IEC 42001 overview	https://www.iso.org/standard/42001	用 AI management system、policy、operation、performance evaluation 和 continual improvement 组织 operating model

一句话:

Post-quantum AI architecture 的核心不是预测量子计算何时成熟, 而是让 AI 系统里的加密依赖、长期证据、签名、密钥、供应商和协议可以被发现、分级、替换、验证和审计。

1. Thesis

AI 系统的 post-quantum 风险不是只存在于底层网络团队。

普通加密迁移常问:

Where do we use RSA, ECC, TLS and certificates?

AI 架构还要继续问:

Which AI decisions, prompts, retrieval records, tool invocations,
approval tokens, evidence archives, provenance signatures and customer records
must remain confidential, authentic or defensible for years?

在金融零售 AI 里, crypto agility 不是一个纯安全工程课题。它直接影响:

客户数据长期保密。
AI 决策证据链可验证性。
RAG 语料和 embedding 资产保护。
agent tool 调用的身份、授权和不可抵赖。
模型包、prompt 包、policy 包和评测包的签名完整性。
法律保全、监管检查、审计取证和客户申诉场景下的证据可信度。
供应商平台、云服务、HSM/KMS、证书机构、API 网关和客户端 SDK 的迁移节奏。

高级 PM / BA / Architect 的价值不是替代密码学专家选择算法。你的价值是把业务数据寿命、AI 证据寿命、协议依赖、供应商依赖、客户影响和迁移顺序组织成可执行的 product and architecture roadmap。

2. Why It Matters

Post-quantum migration 有一个容易被低估的问题: 攻击者可以今天收集加密数据, 将来解密。这常被称为 harvest now, decrypt later risk。

对 AI 系统来说, 这种风险会放大:

AI 资产	为什么和 PQC 有关	金融零售影响
Prompt and conversation logs	包含客户身份、财务状况、投诉、困难、欺诈线索和顾问建议	长期泄露会造成隐私、诈骗、投诉和监管风险
RAG corpora	包含政策、客户文件、合同、理赔、贷款、KYC、员工知识库	语料可能被长期保留, 也可能跨供应商复制
Vector indexes	embedding 可能泄露语义结构和敏感关联	不是明文但仍可能形成可重识别风险
Tool invocation logs	记录 agent 代表客户或员工执行的动作	影响授权、审计、争议处理和不可抵赖
Model and prompt package signatures	证明上线版本、审批版本和运行版本一致	签名算法迁移失败会削弱证据链
Customer decision evidence	支撑 adverse action、投诉回复、欺诈冻结、赎回、费用调整	需要多年可验证和可解释
Content provenance	证明 AI 生成内容、批准 claim、客户沟通版本	影响误导性声明和内容责任边界
Vendor APIs and SDKs	加密能力由供应商路线图决定	内部无法单独迁移, 需要 contract and readiness gate

NIST 已发布 principal PQC standards, 包括 FIPS 203 ML-KEM、FIPS 204 ML-DSA 和 FIPS 205 SLH-DSA。NIST 同时强调组织需要识别量子脆弱算法、规划迁移、更新产品和服务。对企业架构师来说, 重点不是背算法名, 而是把这些变化转成 portfolio impact。

3. Architecture Model

参考架构:

AI portfolio and data inventory
  -> crypto discovery and dependency scan
  -> AI evidence and data lifetime classification
  -> quantum-vulnerable algorithm mapping
  -> vendor / protocol / product readiness assessment
  -> crypto-agile target architecture
  -> pilot migration and interoperability testing
  -> release gates, rollback and evidence validation
  -> production migration by risk tier
  -> continuous crypto posture monitoring

AI-specific control plane:

AI application
  -> orchestration / agent runtime
  -> model gateway
  -> RAG / vector / knowledge layer
  -> tool/API gateway
  -> policy and approval service
  -> observability and evidence plane
  -> records / archive / legal hold store
  -> KMS/HSM/certificate/signature services

Crypto agility 要让这些层能回答:

当前使用哪些算法、密钥长度、证书链、签名方案和协议版本?
哪些调用或记录依赖 RSA/ECC 等量子脆弱 public-key cryptography?
哪些数据需要保护 7 年、10 年、账户生命周期、诉讼生命周期或更久?
哪些证据必须在未来仍能证明未被篡改?
哪些供应商、SDK、边缘设备、移动 app、浏览器、HSM 或硬件限制迁移?
哪些系统可以先 hybrid / dual-stack, 哪些必须等待生态兼容?

4. Crypto Inventory for AI Systems

传统 crypto inventory 关注证书、密钥、算法和协议。AI crypto inventory 还要把 AI artifact 纳入资产对象。

Inventory object	关键字段	AI 架构含义
Data object	sensitivity、retention、jurisdiction、purpose、customer segment	判断是否存在长期保密风险
Evidence object	prompt、retrieval、tool、approval、output、human review、policy version	判断是否需要长期完整性和不可抵赖
Model artifact	model id、weights source、adapter、prompt package、eval package、approval signature	判断签名、校验和供应链证明是否需要迁移
RAG object	corpus、chunk、embedding、index、metadata、access policy、refresh cycle	判断加密、签名和删除证明如何覆盖知识资产
Tool/API object	endpoint、scope、auth method、side effect、idempotency、audit token	判断 agent 调用的身份与授权如何迁移
Certificate object	issuer、subject、algorithm、expiry、chain、renewal process	发现量子脆弱证书链和生命周期风险
Key object	algorithm、owner、KMS/HSM、rotation、backup、escrow、usage	支撑替换、轮换、销毁和合规证据
Vendor object	product、crypto roadmap、PQC support、contract clause、exit option	将供应链迁移纳入产品路线图

一个成熟 inventory 不只是 CMDB 字段。它要连接:

business process -> AI use case -> data/evidence lifetime -> cryptographic dependency -> vendor readiness -> migration wave

5. Risk Tiering

不要按系统名排序迁移, 要按 risk tier 排序。

Tier	判断条件	迁移优先级
Tier 1: Long-lived confidential data	客户身份、财务、健康、未公开业务信息、KYC、欺诈证据, 保密寿命超过若干年	优先 discovery、vendor assessment、target architecture
Tier 2: Long-lived evidence integrity	决策证据、客户沟通、审批记录、模型版本、audit pack 需要多年可验证	优先签名、时间戳、hash chain 和 archive 策略
Tier 3: High-value agent action	agent 可执行支付、冻结、开户、额度、理赔、交易或客户承诺	优先工具授权、approval token 和不可抵赖机制
Tier 4: Regulated customer-facing AI	聊天、推荐、说明、投诉、销售、适当性、信贷解释	优先内容证据、claim provenance 和运行日志保护
Tier 5: Internal productivity AI	摘要、检索、草稿、知识问答, 不直接影响客户决策	可纳入统一平台迁移节奏

这里的关键是“数据寿命”和“证据寿命”。

一个 2026 年的模型调用看似瞬时, 但它产生的证据可能在 2032 年的客户争议、审计检查、法律保全或保险理赔中才被打开。架构上必须保证那时仍然能证明:

哪个模型和 prompt 生成了输出。
哪些检索材料被使用。
哪个 policy gate 放行。
哪个员工复核或覆盖。
哪个证书、签名、hash 和时间戳证明材料未被篡改。

6. AI-Specific Migration Surfaces

6.1 Model Gateway and TLS

模型网关连接内部应用、外部模型供应商、embedding 服务和评测服务。这里通常依赖 TLS、mTLS、证书链、API token 和客户端 SDK。

关键设计:

建立 certificate and algorithm telemetry, 不要只在证书到期时才发现算法。
对外部模型供应商设置 PQC readiness questionnaire。
对 high-risk AI use case 建立 approved endpoint list。
在非生产环境测试 hybrid / PQC-capable TLS 或供应商替代路径。

6.2 RAG and Vector Stores

RAG 层通常包含最敏感的业务知识和客户上下文。

需要区分:

source document encryption。
chunk store encryption。
vector index encryption。
metadata encryption。
retrieval trace signing。
deletion and retention evidence。

如果只迁移数据库静态加密, 但忽略 embedding pipeline、index backup、retrieval trace 和跨区域复制, 仍然会留下完整风险面。

6.3 Agent Tool Invocation

Agent 工具调用风险不在“模型会说错话”这么简单, 而在“模型触发了有副作用的动作”。

PQC/crypto agility 相关点:

tool request signing。
approval token signing。
delegated authorization claims。
replay protection。
non-repudiation evidence。
side-effect event hash。
tamper-evident audit trail。

未来签名算法迁移时, 不能让旧证据突然无法验证, 也不能让新旧签名验证规则在审计时混乱。

6.4 AI Evidence Archive

长期证据归档是最容易被低估的地方。

典型证据包:

case id
customer segment
prompt hash and redacted prompt
retrieval source ids
model id and version
policy id and version
tool invocation ids
approval and human review events
output hash
customer communication version
signature and timestamp metadata
retention and legal hold state

Crypto agility 要求 evidence archive 支持:

algorithm metadata preserved with every signature。
signature verification profile versioning。
re-signing or timestamp renewal policy。
hash agility where needed。
migration audit log。
legal hold safe migration。

6.5 Content Provenance

AI 生成内容、客户沟通、营销 claim、知识库答案和 agent-generated documents 可能依赖内容 provenance。

问题不只是“有没有水印”。更关键是:

provenance manifest 是否被签名。
签名算法是否能迁移。
客户看到的版本和 archive 中的版本是否一致。
后续编辑、人工审批、渠道发送是否保留链路。
供应商格式是否可导出和长期验证。

7. Product and Architecture Decisions

高级 PM / Architect 需要把 PQC 迁移转成明确决策, 而不是泛泛说“安全团队会处理”。

Decision	关键问题	产出
Scope boundary	哪些 AI use case 进入 PQC readiness wave 1?	AI crypto scope register
Data lifetime	哪些数据有 harvest-now-decrypt-later 风险?	Data/evidence lifetime matrix
Evidence strategy	哪些证据需要长期可验证?	Evidence signing and timestamp policy
Vendor dependency	哪些 AI vendor、KMS、HSM、API gateway、archive vendor 是阻塞点?	Vendor PQC readiness scorecard
Migration mode	直接替换、hybrid、dual-stack、gateway abstraction 还是等待协议生态?	Migration ADR
Release gate	什么证据证明迁移没有破坏业务和审计?	PQC release evidence pack
Customer impact	迁移是否影响登录、授权、支付、文件上传、移动端、延迟和可用性?	Customer and operations impact assessment

8. Control Matrix

Control	Objective	Evidence
AI crypto inventory	发现 AI 系统使用的算法、密钥、证书、签名、加密存储和供应商依赖	Inventory extract、scan result、owner attestation
Long-lived data classification	识别需要长期保密或长期完整性的数据和证据	Data lifetime matrix、retention policy mapping
Quantum-vulnerable dependency map	映射 RSA/ECC 等 public-key dependency 到业务流程和 AI use case	Dependency graph、risk tier heatmap
Vendor PQC readiness gate	确认关键供应商的路线图、支持范围、测试计划和退出路径	Questionnaire、contract addendum、roadmap evidence
Crypto-agile abstraction	降低算法替换对业务代码的冲击	ADR、architecture diagram、interface spec
Evidence re-verification	确保证据迁移后仍可验证	Verification report、sample replay、audit sign-off
Migration release gate	防止迁移破坏认证、授权、日志、审计和业务流程	Test results、rollback plan、exception log
Continuous crypto posture	持续发现新证书、新算法、新供应商和过期例外	Dashboard、KRI、monthly attestation

9. Crypto-Agile Design Patterns

Pattern 1: Algorithm Metadata Everywhere

每个签名、加密对象、token、证书、hash、timestamp 都要保留算法元数据。

artifact_id
artifact_type
algorithm
key_id
signature_profile
created_at
verification_profile
rotation_policy
retention_class

没有 metadata, 未来就无法安全判断如何验证、迁移或重新签名。

Pattern 2: Central Crypto Services, Not Embedded Choices

业务团队不应在各自应用里硬编码算法、证书处理和签名规则。高风险 AI 平台应通过:

central KMS/HSM。
certificate management service。
signing service。
token service。
evidence timestamp service。
policy-managed crypto profile。

Pattern 3: Gateway Isolation

把模型、RAG、工具和供应商调用通过 gateway 封装, 让迁移尽量发生在平台层。

这对 AI 很重要, 因为很多 POC 会直接调用外部 API 或 SDK。如果每个团队各自集成, PQC 迁移会变成无法盘点的长尾问题。

Pattern 4: Dual Verification Period

迁移签名和证据时, 新旧验证路径可能需要共存。

设计要点:

明确旧签名的验证 profile。
明确新签名的验证 profile。
明确何时 re-sign 或 add timestamp。
明确审计查询如何解释新旧证据。

Pattern 5: Vendor Exit-Friendly Evidence

不要把 AI evidence 完全锁在供应商专有日志里。

要能导出:

prompt/retrieval/tool trace。
model/version/policy metadata。
signature and timestamp metadata。
customer communication record。
access and approval events。
verification manifest。

10. Migration Roadmap

Phase 0: Executive Framing

输出:

Why now narrative。
AI portfolio exposure estimate。
Long-lived data and evidence story。
Supplier dependency risk。
Funding and ownership model。

Phase 1: Discovery

输出:

AI crypto inventory。
Data/evidence lifetime matrix。
Quantum-vulnerable dependency map。
Vendor readiness register。
Unknowns and blind spots。

Phase 2: Architecture Target

输出:

Crypto-agile target architecture。
KMS/HSM/cert/signing service strategy。
AI evidence archive migration strategy。
Gateway abstraction roadmap。
Risk-tiered migration waves。

Phase 3: Pilot and Interoperability

输出:

Non-production pilot。
Vendor compatibility evidence。
Latency and throughput benchmark。
Verification replay test。
Rollback and exception process。

Phase 4: Production Migration

输出:

Wave-based rollout。
Customer/operations impact monitoring。
Evidence verification report。
Residual risk acceptance for exceptions。
Post-migration control effectiveness review。

Phase 5: Continuous Crypto Governance

输出:

Crypto posture dashboard。
Supplier roadmap refresh。
New AI use case onboarding gate。
Architecture fitness functions。
Board and risk committee reporting。

11. Metrics and KRIs

Metric	含义
AI use cases with crypto inventory coverage	已纳入盘点的 AI use case 比例
Long-lived evidence objects classified	已完成证据寿命分类的对象比例
Quantum-vulnerable dependencies by tier	各风险层级的脆弱依赖数量
Critical vendor PQC readiness coverage	关键供应商完成 readiness assessment 的比例
Crypto exceptions past expiry	逾期例外数量
Evidence replay success rate	证据包在迁移后可重放和验证的成功率
Algorithm metadata completeness	签名、token、证书、归档对象元数据完整率
Migration wave defect rate	每波迁移带来的认证、授权、日志、业务缺陷率
Unknown crypto usage count	未知或未归属加密使用数量

成熟组织不会只问“有没有 PQC 项目”。它会问:

Which customer-impacting AI evidence would fail verification if we changed cryptographic profiles tomorrow?

12. Failure Modes

Failure mode	表现	后果
Security-only migration	安全团队做证书扫描, AI 产品和证据团队未参与	长期证据和 RAG/agent 资产遗漏
Algorithm-name theater	文档里写 PQC, 但没有 inventory、owner、roadmap 和测试	管理层误以为风险已处理
Vendor roadmap dependency	核心 AI vendor 不支持迁移, 合同没有路线图或退出权	高风险 use case 被供应商锁住
Evidence breakage	迁移后旧签名、时间戳或归档证据无法验证	审计、争议、监管和法律保全能力下降
POC sprawl	业务团队直接调用外部模型和工具, 没有 central gateway	盘点不完整, 迁移成本失控
Retention mismatch	数据保留期、密钥轮换期和证据验证期不一致	到期销毁或密钥不可用导致证据失效
Latency surprise	PQC/hybrid 方案影响登录、API、agent tool 调用延迟	客户体验和运营 SLA 受影响
No rollback	迁移失败无法回退到已知安全状态	生产事故和信任损失

13. Interview-Ready Takeaways

可以这样回答:

我不会把 post-quantum 当成单纯算法替换。对金融零售 AI, 我会先建立 AI crypto inventory, 把 use case、数据寿命、证据寿命、RAG、tool invocation、签名、密钥、证书和供应商依赖连起来。然后按 long-lived confidentiality、long-lived integrity、agent side effect 和 regulated customer impact 做风险分层, 选择迁移波次。

继续展开:

NIST 已发布 principal PQC standards, 但企业迁移的难点是 discovery、inventory、interoperability 和供应商 readiness。
AI 系统独有的风险在 evidence plane: prompt、retrieval、tool、approval、output、policy version 和 human review 需要长期可验证。
架构上要避免算法硬编码, 用 central crypto services、gateway abstraction、algorithm metadata 和 dual verification profile 建立 crypto agility。
产品侧要管理客户影响、延迟、渠道兼容、供应商 SLA、操作培训和例外到期。
治理侧要把迁移证据接入 AI RMF/ISO 42001 类 operating model, 形成持续监控而不是一次性项目。

14. Practical Templates

14.1 AI Crypto Inventory Row

Use case:
Business owner:
AI system owner:
Customer segment:
Decision impact:
Data sensitivity:
Data retention:
Evidence retention:
Crypto dependency:
Algorithm / key / cert profile:
Vendor / product:
Quantum-vulnerable exposure:
PQC readiness:
Migration wave:
Exception owner:
Evidence link:

14.2 Data and Evidence Lifetime Matrix

Object	Confidentiality lifetime	Integrity lifetime	Retention driver	PQC priority
Customer AI conversation	7 years	7 years	complaint / supervision / records	High
Agent tool approval token	short	7 years	audit / dispute	High
RAG policy source	until superseded plus archive	7 years	policy history	Medium
Model eval pack	internal	model lifecycle plus audit	model risk	Medium

14.3 Migration ADR Outline

Title: Crypto-agile migration strategy for [AI capability]
Context: Which AI use case, data/evidence lifetime and crypto dependency are in scope?
Decision: Which migration pattern is selected?
Options considered: direct replacement / hybrid / gateway abstraction / vendor wait / exit
Consequences: latency, compatibility, evidence verification, cost, residual risk
Controls: release gate, rollback, monitoring, exception expiry
Evidence: inventory, scan, vendor attestation, test results, replay report
Review trigger: new standard, vendor change, incident, audit finding, key/cert lifecycle event

14.4 Board Narrative

We are not attempting to predict the exact date of quantum capability.
We are reducing avoidable migration risk now by identifying quantum-vulnerable
cryptography in AI systems, prioritizing long-lived customer data and evidence,
requiring vendor readiness, and building crypto-agile platform services so that
future algorithm changes do not require uncontrolled business rewrites.

14.5 PM Design Prompt

For every new AI product:
1. What data or evidence created by this product must remain protected for years?
2. Which signatures, tokens, certificates, keys and archives prove what happened?
3. Which vendor or platform owns the cryptographic implementation?
4. Can we replace the crypto profile without rewriting business workflow?
5. What evidence will an auditor need after migration?

15. Senior Architect Lens

The senior move is to connect three maps:

AI value map
  -> where AI changes customer, employee or operational decisions

AI evidence map
  -> what must be preserved, replayed, explained and defended

Crypto dependency map
  -> which algorithms, keys, certificates, signatures and vendors protect that value/evidence

只看其中一个都会失真:

只看 value map, 会忽略长期安全债。
只看 evidence map, 会忽略迁移技术约束。
只看 crypto dependency map, 会把项目变成脱离业务优先级的扫描练习。

金融零售 AI 的 post-quantum 架构成熟度, 最终体现在一句话:

组织知道哪些 AI 决策和证据最重要, 知道它们依赖哪些加密机制, 知道哪些供应商限制迁移, 也知道如何在不破坏客户体验和审计证据的情况下逐步替换。