AI 底层逻辑 / 经典论文

AI Records / Retention：记录留存与法律保全架构

适用性说明:

226 行ai-foundations/papers/119-ai-records-retention-legal-hold-ediscovery-architecture.md

AI Records / Retention / Legal Hold / eDiscovery Architecture 解读

面向对象: AI Product Architect / AI PM / Senior BA / Enterprise Architect / Risk Technology Lead / Legal Operations Partner / Compliance Technology Lead / Internal Audit Partner。核心问题: AI 系统产生的 prompt、RAG evidence、tool call、approval、output、eval、incident 和 audit event, 哪些是 business record、regulatory record、litigation-relevant ESI 或 operational telemetry? 学习目标: 建立 AI record object taxonomy、retention schedule mapping、legal hold orchestration、immutable audit trail、eDiscovery export、regulator production 和 deletion-vs-hold conflict handling 的完整架构语言。

Source Anchors

Source	Link	用途
FINRA Rule 4511	https://www.finra.org/rules-guidance/rulebooks/finra-rules/4511	用 books and records、preservation period 和 SEA Rule 17a-4 reference 连接金融记录义务
SEC Rule 17a-4 electronic recordkeeping overview	https://www.sec.gov/investment/amendments-electronic-recordkeeping-requirements-broker-dealers	用 WORM / audit-trail alternative、third-party recordkeeping 和 prompt production 定义电子记录保存能力
CFTC 17 CFR 1.31	https://www.ecfr.gov/current/title-17/chapter-I/part-1/subject-group-ECFR26e2c365a191fa7/section-1.31	用 authenticity、reliability、metadata、system inventory、readily accessible 和 production 设计监管记录层
FRCP Rule 37(e)	https://www.law.cornell.edu/rules/frcp/rule_37	用 ESI preservation、reasonable steps、routine operation 和 sanctions risk 设计 legal hold control
NARA records management guidance	https://www.archives.gov/records-mgmt/policy	用 records lifecycle、ERM requirements 和 disposition 思维定义记录治理
NIST AI RMF	https://www.nist.gov/itl/ai-risk-management-framework	用 Govern / Map / Measure / Manage 组织 AI 记录风险、证据质量和治理指标

适用性说明:

本文是架构、产品和 BA 学习材料, 不是法律意见、合规结论、监管解释或 eDiscovery legal strategy。
真实适用范围取决于 entity type、product、jurisdiction、customer segment、communication channel、record category、regulator、contract 和 litigation posture。
金融零售场景要由 Legal、Compliance、Privacy、Records Management、Risk、InfoSec、Model Risk、Operations 和业务 owner 共同确认 retention schedule、legal hold scope 和 production protocol。

一句话:

AI record architecture prevents fast automation from becoming unpreserved, undiscoverable and unverifiable business conduct.

1. Thesis

AI records 不是普通日志。

普通日志问:

What happened in the system?

AI records 问:

What business, regulatory, legal, customer-impacting or audit-relevant event
was created by the AI workflow, and can we preserve, hold, search, export
and explain it under the right obligation?

在金融零售 AI 中, prompt、retrieved document、model output、approval decision、tool call、customer message、eval result 和 incident timeline 都可能成为证据。

如果架构只保存 final answer, 组织会失去原始输入、RAG source version、prompt template、model version、tool parameters、approval reason、delivery channel、eval evidence、incident timeline 和 legal hold preservation proof。

AI record architecture 的目标不是“什么都永久保存”, 而是:

Classify precisely.
Retain lawfully.
Preserve on hold.
Delete when eligible.
Produce defensibly.
Replay truthfully.

2. Why It Matters

AI 让 records management 变难, 因为业务事实被拆进多个技术层。

Layer	可能产生的 AI record	风险
Chat / Copilot UI	prompt、uploaded file、final response	客户沟通和员工建议无法复盘
RAG	query、retrieved chunks、source version、citation	引用了错误或过期政策却没有证据
Agent orchestration	plan、tool call request、routing decision	自动动作的责任链断裂
Approval workflow	reviewer decision、reason、approval packet	法律保全时无法证明谁批准
Tool gateway	execution request、response、side effect	系统状态改变但缺少 business record
Evaluation / Incident	score、defect、alert、CAPA	控制有效性和事故 timeline 缺失

金融零售 AI 的关键问题不是“日志够不够多”, 而是“记录对象是否被正确分类并可依法处置”。

错误设计通常有两类极端:

under-retention: 为节省成本或隐私风险, 过早删除了应保留或应保全的 ESI。
over-retention: 所有 prompt 和 response 永久保存, 增加隐私、数据泄露、诉讼和运营成本。

成熟设计必须在 retention、privacy minimization、legal hold、regulatory production 和 deletion right 之间做显式冲突处理。

3. Architecture Model

AI Application / Channel
  -> Record Capture SDK
  -> AI Record Classifier
  -> Policy Decision Point: retention / privacy / hold / access
  -> Evidence Ledger and Immutable Audit Trail
  -> Retention Store: WORM or audit-trail capable
  -> Legal Hold Service
  -> Search / Review / Redaction Workbench
  -> eDiscovery Export and Regulator Production
  -> Disposition Engine with Hold Conflict Check

关键原则:

Capture must be close to the event, not reconstructed from narrative summaries.
Classification must happen at record-object level, not only at application level.
Legal hold must override deletion until released by authorized legal process.
Retention policy must be versioned and explainable.
Production must include chain-of-custody, access history and redaction evidence.
Deletion must be policy-driven, auditable and blocked by active hold.

最小 record object:

Field	Example
record_id / record_type	`air_20260630_000119`, `rag_retrieval`, `tool_call`, `approval`
business context	domain, business object, customer impact, channel
actor chain	human user, agent id, service account, approver
model context	model id, prompt template, policy version
source context	document ids, chunk ids, source version, content hash
lifecycle context	retention class, legal hold status, privacy class, export profile

Architecture control must cover capture、classification、retention、legal hold、eDiscovery / production 和 disposition:

Capture: user prompt、RAG query、retrieved chunk、output、tool call、approval、customer message、eval、incident。
Classification: record type + business domain + obligation basis + privacy class + retention class + hold eligibility。
Retention: trigger event、authoritative copy、derived record policy、tokenization / minimization、disposition certificate。
Legal hold: trigger、matter id、custodian、system、date range、record type、business object、keyword、model version、channel。
Production: scoped record set、search criteria、chain-of-custody、metadata dictionary、redaction log、hash manifest、load file。
Disposition: no deletion without hold conflict check and auditable approval path.

4. Financial Retail Scenarios

Scenario	AI records that matter	Architecture judgment
Payment dispute assistant	customer intake prompt, transaction evidence, policy retrieval, AI recommendation, specialist approval, provisional credit tool call, customer notice	不能只保存 final case note; credit execution must link to approval packet and action hash
Credit underwriting copilot	applicant prompt, policy citation chunks, AI memo draft, underwriter edits, decision, override reason, eval result	AI draft may influence decision even if not final authority; adverse action support needs controlled source and version
AML / fraud copilot	alert summary, red flag analysis, SAR narrative draft, closure recommendation, human closure approval, QA sample	AI 不应自主删除或降级调查证据; SAR-sensitive material needs restricted access and export control
Customer service RAG	customer question, retrieved policy version, answer draft, agent edit, final sent message	RAG answer 的可证明性来自 source version; later complaint may turn these into ESI
Complaint remediation agent	complaint classification, policy/account evidence, remediation recommendation, legal/compliance review, closure evidence	regulator mention, vulnerability, discrimination or legal threat can trigger escalation and preservation

5. PM / BA / Architect Implications

AI PM 要把 record requirement 放进功能设计:

哪些用户动作创建 record。
哪些 AI 输出进入 system of record。
哪些记录 customer-visible。
哪些记录仅用于 debug 并可短期保留。
哪些记录必须支持 legal hold、regulator production 和 deletion conflict handling。

Senior BA 要把业务语义转成 record taxonomy:

business object、record trigger、retention class、owner、source of truth。
hold trigger、export audience、redaction rule、exception route。
customer impact、privacy class、legal sensitivity 和 model governance relevance。

Architect 要把记录治理变成 runtime capability:

capture SDK、event schema、retention policy engine、immutable store。
legal hold service、search / review workbench、eDiscovery exporter、disposition engine、audit replay。
cross-system propagation across app DB、object store、vector store、archive、warehouse and vendor platforms。

6. Required Artifacts

Artifact	Purpose
AI record inventory	列出 prompt、RAG、tool、approval、output、eval、incident record
Record taxonomy	定义 record type、domain、owner、retention class、privacy class
Retention matrix	映射 category、trigger、period、authority、disposition
Legal hold playbook	定义 trigger、scope、notice、freeze、collection、release
Evidence schema	定义 required metadata and chain-of-custody
eDiscovery export spec	定义 search、redaction、load file、hash manifest、production format
Deletion conflict rules	定义 retention expiry、privacy deletion、legal hold、regulatory hold 的优先级
Audit replay test plan	验证能重建一个 AI-influenced business decision

7. Control / Evidence Design

好的 AI record evidence 不是自然语言总结。

它应证明:

记录在事件发生时或接近事件时被 capture。
记录未被不可见修改。
修改和删除有完整 time-stamped audit trail。
legal hold 生效后 disposition 被暂停。
eDiscovery export 没有选择性遗漏。
redaction 有依据和日志。
production package 可以由第三方复核。

Control	Evidence
Record capture completeness	event count reconciliation, dropped event report
Retention classification	policy decision id, classifier version, reviewer override
Immutable preservation	WORM setting or audit-trail chain
Legal hold propagation	hold id, affected systems, acknowledgement, exception list
Deletion block	disposition engine shows active hold conflict
Export integrity	hash manifest, chain-of-custody, access log
Audit replay	sample case reconstructed from source to final action

8. Interview Questions

Q1: AI record 和普通 log 有什么区别?

普通 log 主要说明系统发生了什么。AI record 要说明一个 AI workflow 中哪些输入、来源、模型版本、输出、审批、工具动作和客户沟通构成 business、regulatory、legal 或 audit-relevant evidence, 并能被保留、保全、搜索、导出和删除。

Q2: 如何设计 AI retention architecture?

我会先做 record inventory 和 taxonomy, 把 prompt、RAG retrieval、tool call、approval、output、eval、incident 分成 record objects。然后用 retention policy engine 绑定 record category、business domain、trigger event、retention period、privacy class、legal hold eligibility 和 disposition rule。存储层要支持 WORM 或完整 audit trail, legal hold 必须能暂停删除。

Q3: Legal hold 对 AI 系统意味着什么?

Legal hold 不是给一个数据库打标签。它要冻结相关 ESI 的删除和覆盖, 范围可能包括 prompt、chat transcript、RAG source、vector index metadata、tool call、approval、model config、eval、incident 和 customer output。系统要记录 hold trigger、scope、notice、acknowledgement、collection、release 和 deletion conflict。

Q4: 删除请求和 legal hold 冲突怎么办?

不能用产品层简单删除。架构上应由 policy decision point 判断 retention expiry、privacy request、regulatory retention 和 active legal hold 的优先级。active legal hold 通常会暂停 disposition, 但具体处理必须由 Legal、Privacy 和 Compliance 按适用法律确认。系统要保留 denial / deferral reason 和 release 后的 disposition workflow。

Q5: 怎样证明 AI 输出可被监管或审计复盘?

需要 replay package: user prompt、system prompt version、model version、retrieved source ids、source content hash、output、human edits、approval、tool call、final customer message、policy decision、audit trail 和 chain-of-custody。只提供最终摘要无法证明当时依据和控制运行。

9. Common Pitfalls

Pitfall	Why it fails	Better design
只保存 final answer	无法证明来源、版本、审批和工具动作	preserve record chain
把所有 AI telemetry 都永久保存	隐私、成本和诉讼暴露过高	classify and schedule
Legal hold 只冻结 app DB	vector store、object store、logs、exports 仍可能删除	cross-system hold propagation
RAG source 没有版本	无法说明当时模型看到了什么	source version and content hash
Approval 不绑定 output	批准后内容或参数可变	approval packet and action hash
eDiscovery export 是手工 SQL	不可重复, chain-of-custody 弱	controlled export workflow
删除任务忽略 hold	spoliation and regulatory risk	disposition conflict check
Vendor 保存记录但机构不可及时取得	production failure	contract, system inventory, export SLA
Audit trail 可被管理员改写	证据可信度不足	immutable or tamper-evident trail
没有 record owner	policy 无人维护	RACI and periodic review

10. Final Operating Principle

AI records architecture 的成熟度, 不是看保存了多少日志。

成熟度取决于是否知道什么是 record、为什么保留、如何在 hold 下停止删除、如何在请求下合理生产、如何在期限届满后合法处置、如何在审计和争议中重建 AI 影响过的业务事实。

对于高级 AI PM / Senior BA / Architect, 这是一项核心能力:

Turn AI behavior into governed, retained, discoverable and defensible records.