返回 Papers
AI 底层逻辑 / 经典论文

AI Records / Retention:记录留存与法律保全架构

适用性说明:

226ai-foundations/papers/119-ai-records-retention-legal-hold-ediscovery-architecture.md

AI Records / Retention / Legal Hold / eDiscovery Architecture 解读

面向对象: AI Product Architect / AI PM / Senior BA / Enterprise Architect / Risk Technology Lead / Legal Operations Partner / Compliance Technology Lead / Internal Audit Partner。 核心问题: AI 系统产生的 prompt、RAG evidence、tool call、approval、output、eval、incident 和 audit event, 哪些是 business record、regulatory record、litigation-relevant ESI 或 operational telemetry? 学习目标: 建立 AI record object taxonomy、retention schedule mapping、legal hold orchestration、immutable audit trail、eDiscovery export、regulator production 和 deletion-vs-hold conflict handling 的完整架构语言。


Source Anchors

SourceLink用途
FINRA Rule 4511https://www.finra.org/rules-guidance/rulebooks/finra-rules/4511用 books and records、preservation period 和 SEA Rule 17a-4 reference 连接金融记录义务
SEC Rule 17a-4 electronic recordkeeping overviewhttps://www.sec.gov/investment/amendments-electronic-recordkeeping-requirements-broker-dealers用 WORM / audit-trail alternative、third-party recordkeeping 和 prompt production 定义电子记录保存能力
CFTC 17 CFR 1.31https://www.ecfr.gov/current/title-17/chapter-I/part-1/subject-group-ECFR26e2c365a191fa7/section-1.31用 authenticity、reliability、metadata、system inventory、readily accessible 和 production 设计监管记录层
FRCP Rule 37(e)https://www.law.cornell.edu/rules/frcp/rule_37用 ESI preservation、reasonable steps、routine operation 和 sanctions risk 设计 legal hold control
NARA records management guidancehttps://www.archives.gov/records-mgmt/policy用 records lifecycle、ERM requirements 和 disposition 思维定义记录治理
NIST AI RMFhttps://www.nist.gov/itl/ai-risk-management-framework用 Govern / Map / Measure / Manage 组织 AI 记录风险、证据质量和治理指标

适用性说明:

  • 本文是架构、产品和 BA 学习材料, 不是法律意见、合规结论、监管解释或 eDiscovery legal strategy。
  • 真实适用范围取决于 entity type、product、jurisdiction、customer segment、communication channel、record category、regulator、contract 和 litigation posture。
  • 金融零售场景要由 Legal、Compliance、Privacy、Records Management、Risk、InfoSec、Model Risk、Operations 和业务 owner 共同确认 retention schedule、legal hold scope 和 production protocol。

一句话:

AI record architecture prevents fast automation from becoming unpreserved, undiscoverable and unverifiable business conduct.


1. Thesis

AI records 不是普通日志。

普通日志问:

What happened in the system?

AI records 问:

What business, regulatory, legal, customer-impacting or audit-relevant event
was created by the AI workflow, and can we preserve, hold, search, export
and explain it under the right obligation?

在金融零售 AI 中, prompt、retrieved document、model output、approval decision、tool call、customer message、eval result 和 incident timeline 都可能成为证据。

如果架构只保存 final answer, 组织会失去原始输入、RAG source version、prompt template、model version、tool parameters、approval reason、delivery channel、eval evidence、incident timeline 和 legal hold preservation proof。

AI record architecture 的目标不是“什么都永久保存”, 而是:

Classify precisely.
Retain lawfully.
Preserve on hold.
Delete when eligible.
Produce defensibly.
Replay truthfully.

2. Why It Matters

AI 让 records management 变难, 因为业务事实被拆进多个技术层。

Layer可能产生的 AI record风险
Chat / Copilot UIprompt、uploaded file、final response客户沟通和员工建议无法复盘
RAGquery、retrieved chunks、source version、citation引用了错误或过期政策却没有证据
Agent orchestrationplan、tool call request、routing decision自动动作的责任链断裂
Approval workflowreviewer decision、reason、approval packet法律保全时无法证明谁批准
Tool gatewayexecution request、response、side effect系统状态改变但缺少 business record
Evaluation / Incidentscore、defect、alert、CAPA控制有效性和事故 timeline 缺失

金融零售 AI 的关键问题不是“日志够不够多”, 而是“记录对象是否被正确分类并可依法处置”。

错误设计通常有两类极端:

  • under-retention: 为节省成本或隐私风险, 过早删除了应保留或应保全的 ESI。
  • over-retention: 所有 prompt 和 response 永久保存, 增加隐私、数据泄露、诉讼和运营成本。

成熟设计必须在 retention、privacy minimization、legal hold、regulatory production 和 deletion right 之间做显式冲突处理。


3. Architecture Model

AI Application / Channel
  -> Record Capture SDK
  -> AI Record Classifier
  -> Policy Decision Point: retention / privacy / hold / access
  -> Evidence Ledger and Immutable Audit Trail
  -> Retention Store: WORM or audit-trail capable
  -> Legal Hold Service
  -> Search / Review / Redaction Workbench
  -> eDiscovery Export and Regulator Production
  -> Disposition Engine with Hold Conflict Check

关键原则:

  • Capture must be close to the event, not reconstructed from narrative summaries.
  • Classification must happen at record-object level, not only at application level.
  • Legal hold must override deletion until released by authorized legal process.
  • Retention policy must be versioned and explainable.
  • Production must include chain-of-custody, access history and redaction evidence.
  • Deletion must be policy-driven, auditable and blocked by active hold.

最小 record object:

FieldExample
record_id / record_typeair_20260630_000119, rag_retrieval, tool_call, approval
business contextdomain, business object, customer impact, channel
actor chainhuman user, agent id, service account, approver
model contextmodel id, prompt template, policy version
source contextdocument ids, chunk ids, source version, content hash
lifecycle contextretention class, legal hold status, privacy class, export profile

Architecture control must cover capture、classification、retention、legal hold、eDiscovery / production 和 disposition:

  • Capture: user prompt、RAG query、retrieved chunk、output、tool call、approval、customer message、eval、incident。
  • Classification: record type + business domain + obligation basis + privacy class + retention class + hold eligibility。
  • Retention: trigger event、authoritative copy、derived record policy、tokenization / minimization、disposition certificate。
  • Legal hold: trigger、matter id、custodian、system、date range、record type、business object、keyword、model version、channel。
  • Production: scoped record set、search criteria、chain-of-custody、metadata dictionary、redaction log、hash manifest、load file。
  • Disposition: no deletion without hold conflict check and auditable approval path.

4. Financial Retail Scenarios

ScenarioAI records that matterArchitecture judgment
Payment dispute assistantcustomer intake prompt, transaction evidence, policy retrieval, AI recommendation, specialist approval, provisional credit tool call, customer notice不能只保存 final case note; credit execution must link to approval packet and action hash
Credit underwriting copilotapplicant prompt, policy citation chunks, AI memo draft, underwriter edits, decision, override reason, eval resultAI draft may influence decision even if not final authority; adverse action support needs controlled source and version
AML / fraud copilotalert summary, red flag analysis, SAR narrative draft, closure recommendation, human closure approval, QA sampleAI 不应自主删除或降级调查证据; SAR-sensitive material needs restricted access and export control
Customer service RAGcustomer question, retrieved policy version, answer draft, agent edit, final sent messageRAG answer 的可证明性来自 source version; later complaint may turn these into ESI
Complaint remediation agentcomplaint classification, policy/account evidence, remediation recommendation, legal/compliance review, closure evidenceregulator mention, vulnerability, discrimination or legal threat can trigger escalation and preservation

5. PM / BA / Architect Implications

AI PM 要把 record requirement 放进功能设计:

  • 哪些用户动作创建 record。
  • 哪些 AI 输出进入 system of record。
  • 哪些记录 customer-visible。
  • 哪些记录仅用于 debug 并可短期保留。
  • 哪些记录必须支持 legal hold、regulator production 和 deletion conflict handling。

Senior BA 要把业务语义转成 record taxonomy:

  • business object、record trigger、retention class、owner、source of truth。
  • hold trigger、export audience、redaction rule、exception route。
  • customer impact、privacy class、legal sensitivity 和 model governance relevance。

Architect 要把记录治理变成 runtime capability:

  • capture SDK、event schema、retention policy engine、immutable store。
  • legal hold service、search / review workbench、eDiscovery exporter、disposition engine、audit replay。
  • cross-system propagation across app DB、object store、vector store、archive、warehouse and vendor platforms。

6. Required Artifacts

ArtifactPurpose
AI record inventory列出 prompt、RAG、tool、approval、output、eval、incident record
Record taxonomy定义 record type、domain、owner、retention class、privacy class
Retention matrix映射 category、trigger、period、authority、disposition
Legal hold playbook定义 trigger、scope、notice、freeze、collection、release
Evidence schema定义 required metadata and chain-of-custody
eDiscovery export spec定义 search、redaction、load file、hash manifest、production format
Deletion conflict rules定义 retention expiry、privacy deletion、legal hold、regulatory hold 的优先级
Audit replay test plan验证能重建一个 AI-influenced business decision

7. Control / Evidence Design

好的 AI record evidence 不是自然语言总结。

它应证明:

  • 记录在事件发生时或接近事件时被 capture。
  • 记录未被不可见修改。
  • 修改和删除有完整 time-stamped audit trail。
  • legal hold 生效后 disposition 被暂停。
  • eDiscovery export 没有选择性遗漏。
  • redaction 有依据和日志。
  • production package 可以由第三方复核。

推荐 controls:

ControlEvidence
Record capture completenessevent count reconciliation, dropped event report
Retention classificationpolicy decision id, classifier version, reviewer override
Immutable preservationWORM setting or audit-trail chain
Legal hold propagationhold id, affected systems, acknowledgement, exception list
Deletion blockdisposition engine shows active hold conflict
Export integrityhash manifest, chain-of-custody, access log
Audit replaysample case reconstructed from source to final action

8. Interview Questions

Q1: AI record 和普通 log 有什么区别?

普通 log 主要说明系统发生了什么。AI record 要说明一个 AI workflow 中哪些输入、来源、模型版本、输出、审批、工具动作和客户沟通构成 business、regulatory、legal 或 audit-relevant evidence, 并能被保留、保全、搜索、导出和删除。

Q2: 如何设计 AI retention architecture?

我会先做 record inventory 和 taxonomy, 把 prompt、RAG retrieval、tool call、approval、output、eval、incident 分成 record objects。然后用 retention policy engine 绑定 record category、business domain、trigger event、retention period、privacy class、legal hold eligibility 和 disposition rule。存储层要支持 WORM 或完整 audit trail, legal hold 必须能暂停删除。

Legal hold 不是给一个数据库打标签。它要冻结相关 ESI 的删除和覆盖, 范围可能包括 prompt、chat transcript、RAG source、vector index metadata、tool call、approval、model config、eval、incident 和 customer output。系统要记录 hold trigger、scope、notice、acknowledgement、collection、release 和 deletion conflict。

不能用产品层简单删除。架构上应由 policy decision point 判断 retention expiry、privacy request、regulatory retention 和 active legal hold 的优先级。active legal hold 通常会暂停 disposition, 但具体处理必须由 Legal、Privacy 和 Compliance 按适用法律确认。系统要保留 denial / deferral reason 和 release 后的 disposition workflow。

Q5: 怎样证明 AI 输出可被监管或审计复盘?

需要 replay package: user prompt、system prompt version、model version、retrieved source ids、source content hash、output、human edits、approval、tool call、final customer message、policy decision、audit trail 和 chain-of-custody。只提供最终摘要无法证明当时依据和控制运行。


9. Common Pitfalls

PitfallWhy it failsBetter design
只保存 final answer无法证明来源、版本、审批和工具动作preserve record chain
把所有 AI telemetry 都永久保存隐私、成本和诉讼暴露过高classify and schedule
Legal hold 只冻结 app DBvector store、object store、logs、exports 仍可能删除cross-system hold propagation
RAG source 没有版本无法说明当时模型看到了什么source version and content hash
Approval 不绑定 output批准后内容或参数可变approval packet and action hash
eDiscovery export 是手工 SQL不可重复, chain-of-custody 弱controlled export workflow
删除任务忽略 holdspoliation and regulatory riskdisposition conflict check
Vendor 保存记录但机构不可及时取得production failurecontract, system inventory, export SLA
Audit trail 可被管理员改写证据可信度不足immutable or tamper-evident trail
没有 record ownerpolicy 无人维护RACI and periodic review

10. Final Operating Principle

AI records architecture 的成熟度, 不是看保存了多少日志。

成熟度取决于是否知道什么是 record、为什么保留、如何在 hold 下停止删除、如何在请求下合理生产、如何在期限届满后合法处置、如何在审计和争议中重建 AI 影响过的业务事实。

对于高级 AI PM / Senior BA / Architect, 这是一项核心能力:

Turn AI behavior into governed, retained, discoverable and defensible records.