AI Document Intelligence:非结构化数据与证据质量架构
本文是学习、架构训练和作品集材料, 不构成法律意见、监管意见、记录保留结论、e-discovery 建议、KYC/KYB 充分性判断、贷款或保险承保结论、消费者争议处置意见、欺诈处置指令、模型验证报告或供应商推荐。
AI Document Intelligence / Unstructured Data / Evidence Quality Architecture 解读
面向对象: CBAP+ Senior BA / Advanced AI PM / Product Architect / Enterprise Architect / Operations Architect / Model Risk / Records Management / Fraud Risk / KYC-KYB Operations / Claims and Disputes Lead / Loan and Insurance Servicing Product Owner。 核心问题: 金融零售 AI 系统如何把 bank statement、paystub、claim package、dispute evidence、KYC/KYB 文件、insurance / loan servicing 文档和运营来信, 从 unstructured documents 转成 evidence-grade, auditable, reviewable, workflow-ready 的事实, 同时控制 OCR/layout/multimodal extraction、classification、entity extraction、summarization、confidence scoring、human review、document provenance、records retention、legal hold、fraud/tamper checks 和 model risk? 学习目标: 建立 document intelligence reference architecture、evidence quality model、document provenance and chain-of-custody、confidence and review design、records/legal hold integration、workflow automation controls、fraud/tamper detection、model risk governance 和 senior PM/architect decision framework。
0. Disclaimer
本文是学习、架构训练和作品集材料, 不构成法律意见、监管意见、记录保留结论、e-discovery 建议、KYC/KYB 充分性判断、贷款或保险承保结论、消费者争议处置意见、欺诈处置指令、模型验证报告或供应商推荐。
正式项目必须由 Legal、Compliance、Privacy、Records Management、Information Governance、Model Risk、Fraud Risk、Financial Crime、Operations、Product、Architecture、Information Security、Data Governance、Vendor Management、Internal Audit 和相关业务 owner 共同判断。记录、证据、法律保留、客户通知、KYC/KYB、信贷、保险、投诉、争议、索赔、跨境数据和 e-discovery 的具体适用性, 取决于 product、record type、jurisdiction、retention schedule、legal hold status、customer segment、channel、policy、contract 和 Legal / Compliance / Records interpretation。
本文不把 document intelligence 简化成 OCR 教程。OCR 只是把图像转成文本的一个能力。金融零售场景真正需要的是 evidence-grade extraction architecture: 能说明文档从哪里来、是否完整、字段从哪一页哪一区域抽取、置信度如何校准、何时需要人工复核、如何进入工作流、如何保留记录、如何处理 legal hold、如何检测篡改和欺诈, 以及事后如何重放决策证据。
Source Anchors
| Source | Link | 用途 |
|---|---|---|
| NIST AI Risk Management Framework | https://www.nist.gov/itl/ai-risk-management-framework | 用 Govern / Map / Measure / Manage 组织 document AI 的风险治理、eval、monitoring、human oversight、incident and evidence controls |
| NIST Privacy Framework | https://www.nist.gov/privacy-framework | 用 privacy risk management、data minimization、purpose、processing、access and monitoring 设计文档数据采集、抽取、使用和保留边界 |
| NARA Records Management | https://www.archives.gov/records-mgmt | 用 records lifecycle、disposition、records program 和 accountability 作为 records retention / evidence management 的官方锚点 |
| NARA Electronic Records Management | https://www.archives.gov/records-mgmt/policy/transfer-guidance-tables.html | 用电子记录格式、metadata、transfer/readiness guidance 作为 electronic records architecture and preservation discussion 的锚点 |
| CFPB Consumer Complaint Database | https://www.consumerfinance.gov/data-research/consumer-complaints/ | 用消费者投诉和 complaint operations 视角校验 document evidence trace、dispute handling、case explanation 和 operational learning loop |
| FFIEC Authentication and Access to Financial Institution Services and Systems | https://www.ffiec.gov/press/pr081121.htm | 用金融机构认证、访问控制、风险评估和 layered security 思路设计 document intake、reviewer access、workflow action 和 privileged operation controls |
| ISO/IEC 42001 overview | https://www.iso.org/standard/42001 | 用 AI management system、roles、operation、performance evaluation、internal audit 和 continual improvement 建立 document AI operating model |
一句话:
Document intelligence is not "OCR + LLM summary". In financial operations, it is an evidence system that converts unstructured documents into policy-bound, source-linked, confidence-calibrated, human-reviewable and records-aware decision inputs.
1. Thesis
金融零售文档智能的核心目标不是“更快读文档”, 而是把文档变成可依赖的 operational evidence。成熟架构应实现下面的转换:
from: uploaded PDF / scanned image / email attachment / photo / fax
to: evidence envelope
+ document provenance
+ page/layout map
+ extracted entities with source coordinates
+ normalized facts
+ confidence and validation results
+ fraud/tamper signals
+ human review decisions
+ workflow action
+ records retention / legal hold metadata
+ replayable audit trail
核心判断:
Text recognized does not mean fact established.
A model summary does not mean evidence accepted.
Confidence score does not mean business risk resolved.
Human review does not mean control effectiveness unless review is designed, sampled and evidenced.
Document storage does not mean records compliance.
高级 PM / Architect 要把 document AI 设计成四层系统:
| Layer | 目标 | 关键问题 |
|---|---|---|
| Capture and provenance | 证明文档从哪里来、何时进入、是否完整、是否被处理过 | source channel、hash、version、customer/session/case binding、chain of custody |
| Extraction and understanding | 把 layout、text、tables、images、signatures、entities、relationships 转成 structured evidence | OCR/layout/multimodal model、field mapping、normalization、source coordinates |
| Evidence quality and controls | 判断字段是否可用于业务, 何时人工复核, 如何处理冲突 | confidence calibration、validation rules、human review、fraud/tamper checks、policy gates |
| Workflow and records | 把证据进入 claims/disputes/KYC/servicing 工作流, 并保留可重放记录 | case management、decision logs、records retention、legal hold、complaint linkage |
最重要的架构边界是:
extraction result
!= verified fact
!= policy-accepted evidence
!= legal sufficiency
!= final business decision
2. Why It Matters
金融零售运营高度依赖非结构化文档:
| Journey | Document examples | Business decision risk |
|---|---|---|
| Loan origination and servicing | bank statements、paystubs、tax forms、hardship letters、income proof、servicing correspondence | affordability, income verification, repayment plan, adverse action, servicing treatment |
| Insurance claims | claim forms、photos、repair invoices、medical bills、police reports、adjuster notes | claim eligibility, payout amount, fraud triage, escalation, customer communication |
| Payment disputes and chargebacks | receipts、merchant correspondence、tracking proof、screenshots、cardholder statements | dispute reason code, evidence package, representment, regulatory timelines |
| KYC/KYB onboarding and refresh | ID images、business registration、ownership docs、licenses、utility bills、board resolutions | identity/entity evidence, authority, beneficial ownership, sanctions/financial crime review |
| Operations and complaints | complaint letters、email threads、call transcripts、agent notes、documents attached to cases | issue classification, remediation, response evidence, root cause analysis |
| Account maintenance | name/address change proof、death certificate、power of attorney、court order、consent forms | entitlement, authority, privacy, account access, legal/ops escalation |
AI 放大三类风险:
- 规模风险: 一个 extraction defect 会批量影响成千上万个 case。
- 语义风险: 模型把“看起来像工资单”误写成“收入已验证”。
- 证据风险: 业务行动发生后无法证明字段来自哪份文档、哪个模型、哪个版本、哪个 reviewer。
Senior PM / Architect 的目标不是“自动化率最高”, 而是:
Use automation where evidence quality is sufficient,
route ambiguity to the right human queue,
preserve proof of what was seen and decided,
and keep records, privacy, fraud and model risk controls in the same workflow.
3. Evidence Object Taxonomy
document AI 中至少要区分七类对象。混用这些对象会导致不可审计的自动化。
| Object | Definition | Example | Control implication |
|---|---|---|---|
| Raw document | 客户、商户、员工、第三方或系统提交的原始文件 | PDF statement、photo paystub、email attachment | 保留原始 hash、source channel、received time、case binding |
| Rendered page | 系统渲染出的 page image / normalized PDF page | page 3 of bank statement | 记录 renderer version、page count、image quality |
| Layout element | 表格、段落、checkbox、signature block、stamp、logo、field region | paystub earnings table | 需要坐标、reading order、table structure |
| Extracted field | 从文档中抽取的字段和值 | gross pay = 4,850.00 | 需要 source coordinates、confidence、parser/model version |
| Normalized entity | 经过标准化和业务字典映射的实体 | employer_name、account_holder、claimant、policy_number | 需要 normalization rule、entity resolution evidence |
| Derived fact | 由多个字段或规则计算出的事实 | average monthly deposit, income variance, coverage period | 需要 formula、input fields、calculation version |
| Decision evidence | 被业务 policy 接受或人工确认的证据 | accepted income evidence for case X | 需要 policy decision、reason code、reviewer/action trace |
| Summary | 面向 reviewer 或客户的 source-linked 摘要 | claim package summary | 必须引用来源, 不能替代原证据 |
字段级 evidence metadata 建议:
field_name
document_id
document_version
page_number
bounding_box_or_anchor
raw_text
normalized_value
extraction_method
model_or_rule_version
confidence_score
calibration_bucket
validation_results
cross_document_match
fraud_or_tamper_signals
human_review_status
policy_acceptance_status
evidence_retention_rule
legal_hold_flag
4. Reference Architecture Model
参考架构:
intake channels
-> document capture and provenance service
-> file normalization / rendering / virus and content safety scan
-> document classification and package splitting
-> OCR + layout understanding + table extraction
-> multimodal extraction / entity extraction / relationship mapping
-> normalization and business validation
-> confidence calibration and quality scoring
-> fraud / tamper / duplicate / synthetic-document checks
-> evidence policy engine
-> human review and exception queues
-> workflow integration: KYC, claims, disputes, servicing, complaints
-> records retention / legal hold / disposition integration
-> evidence ledger, monitoring, QA and model governance
关键组件:
| Component | Responsibility | Senior design question |
|---|---|---|
| Intake gateway | 接收 upload、email、fax、branch scan、mobile capture、API、vendor feed | 是否绑定 customer/case/session, 是否记录 channel risk and consent context? |
| Provenance service | 生成 document id、hash、timestamp、source、custody events、version | 事后能否证明文档没有被替换或静默改写? |
| Classification service | 判断文档类型、子类型、issuer/source、语言、质量、package structure | 错分是否会进入错误 workflow 或错误 retention rule? |
| Layout and OCR service | 识别 text、reading order、tables、checkboxes、signature/stamp blocks | 表格和多栏阅读顺序是否被验证, 是否保留坐标? |
| Multimodal extraction | 结合 text、layout、image、tables 抽取字段和关系 | 模型输出是否被 schema 约束, 是否能解释来源? |
| Entity normalization | 统一姓名、地址、金额、日期、账号后四位、企业名、政策号 | normalization 是否可重放, 是否保留原文? |
| Validation and reconciliation | 跨页、跨文档、系统记录、第三方数据做一致性检查 | 冲突如何进入 review, 不能被 summary 掩盖? |
| Confidence engine | 字段、文档、case 层级的置信度和校准 | threshold 是否按 field criticality and journey risk 定义? |
| Fraud/tamper service | 检查编辑痕迹、metadata anomaly、duplicate、template abuse、image manipulation | 是否只作为信号, 不直接替代 fraud investigation? |
| Evidence policy engine | 判断 extraction 是否可被当前业务流程接受 | 是否把 extraction、validation、review、policy acceptance 分开? |
| Human review workbench | reviewer 查看 source-linked fields、conflicts、model rationale、history | reviewer 能否快速定位证据并留下结构化 decision? |
| Records and hold connector | 赋予 record class、retention schedule、legal hold flag、disposition controls | retention and hold 是否从 case/workflow 状态继承并可审计? |
| Evidence ledger | 保存 document、model、rule、review、workflow action 和 final communication trace | complaint/audit 时能否完整 replay? |
5. Document Classes and Evidence Risk
不是所有文档都应该用同样的 automation threshold。按 business impact、field criticality、fraud exposure 和 records sensitivity 分层。
| Document class | High-value fields | Special risks | Architecture control |
|---|---|---|---|
| Bank statements | account holder、institution、statement period、balances、deposits、NSF、account number mask | altered PDF、missing pages、fake bank template、income misclassification | page completeness, transaction table validation, institution/logo metadata checks, human review for high-impact use |
| Paystubs | employer、employee、pay period、gross/net pay、YTD income、deductions | generated fake paystub, mismatched employer, inconsistent YTD | arithmetic checks, pay period consistency, employer/entity validation, duplicate template detection |
| Claims documents | claim number、loss date、policy number、coverage, invoices, photos, police/medical records | inflated invoices, reused photos, inconsistent event timeline | package timeline, image metadata, duplicate media search, adjuster review |
| Dispute packages | transaction details、merchant evidence、shipping proof、customer assertion、reason codes | weak evidence, missing required proof, model over-summary | reason-code-specific evidence checklist, source-linked package summary, SLA controls |
| KYC/KYB documents | identity data、business registration、ownership、license、authorized signer | stale documents, entity mismatch, authority ambiguity | freshness policy, entity resolution, legal/compliance review boundary, beneficial ownership evidence routing |
| Insurance/loan servicing docs | hardship reason、income/expense、death/POA/court order、address/name change | authority and entitlement errors, sensitive data leakage | privileged workflow controls, dual review for authority, records/hold metadata |
| Operational correspondence | complaint letters、emails、agent notes、attachments | missed complaint, wrong product classification, incomplete response evidence | complaint taxonomy, case linkage, final response capture, CFPB-style complaint learning loop |
高级设计原则:
- 用 document class 决定 extraction schema、review threshold、retention class、fraud checks 和 workflow route。
- 对 high-impact fields 使用 field-level controls, 不只看 document-level confidence。
- 对 summaries 使用 source-linked citations inside the case tool, 不能让摘要成为唯一证据。
- 对 customer-provided documents 和 institution-generated records 分开治理。
6. Evidence-Grade Extraction Pipeline
6.1 Intake and Capture
| Control question | Strong pattern |
|---|---|
| 文档从哪里来? | source channel、user/session、case id、upload event、IP/device/risk context where permitted |
| 原始文件是否保留? | raw artifact immutable storage + hash + version pointer |
| 是否完整? | page count, file size, render success, missing/blank page check |
| 是否安全? | malware scan, file type validation, macro/script blocking, content safety routing |
| 是否可访问? | mobile capture quality feedback, supported formats, assisted channel |
6.2 Classification and Package Splitting
分类不是 UI 标签, 而是 workflow/risk/records decision 的入口。
| Classification output | Why it matters |
|---|---|
| document_type and subtype | 决定 extraction schema and workflow queue |
| issuer/source class | 决定 trust and fraud checks |
| language and locale | 决定 OCR/model and reviewer routing |
| package boundaries | 多文档 PDF 中切分 statement、paystub、invoice、letter |
| confidence and ambiguity | 低置信分类进入 intake review, 防止走错流程 |
| record category candidate | 后续 records retention / legal hold integration 的输入 |
6.3 Layout, OCR and Multimodal Understanding
架构关注点不是 OCR 算法细节, 而是 evidence recoverability:
| Capability | Evidence requirement |
|---|---|
| Text recognition | raw OCR text, confidence, language, page, text anchors |
| Layout detection | paragraphs, tables, cells, checkboxes, signature areas, stamps, reading order |
| Table extraction | row/column coordinates, header mapping, merged cell handling, totals validation |
| Image understanding | photo/document boundary, logo/stamp/signature presence, damage or blur |
| Multimodal extraction | field value must link back to visual/text source, not only generated answer |
| Summarization | source-linked, scoped to reviewer need, not used as record replacement |
6.4 Entity Extraction and Normalization
Entity extraction 必须按 field criticality 分级:
| Field type | Examples | Control |
|---|---|---|
| Identity/entity | name, DOB, business name, beneficial owner, authorized signer | source coordinate + normalization + conflict check + high-risk review |
| Monetary | gross pay, net pay, deposit amount, invoice amount, claim amount | arithmetic validation, currency, period, outlier checks |
| Temporal | statement period, pay period, loss date, coverage dates, received date | date normalization, timeline consistency |
| Authority/relationship | POA, signer role, officer, policyholder, claimant | human/legal/ops review triggers based on product policy |
| Operational | reason code, claim type, complaint category, servicing request | workflow route and SLA impact, QA sampling |
7. Confidence Architecture
置信度不是一个漂亮分数。它是 routing, review, policy acceptance and monitoring 的控制输入。
| Confidence level | Definition | Example |
|---|---|---|
| Character/text confidence | OCR 对具体字符或 token 的识别可信度 | 8 vs B in account mask |
| Field confidence | 模型认为某字段值正确的概率或 score | pay period end date |
| Layout confidence | 表格结构、reading order、checkbox 状态是否可信 | deductions table |
| Document classification confidence | 文档类型和子类型是否正确 | paystub vs payroll summary |
| Cross-validation confidence | 字段与内部系统、其他文档、规则是否一致 | YTD income vs pay period |
| Fraud/tamper confidence | 文档篡改或伪造信号强弱 | PDF metadata anomaly |
| Case evidence confidence | 整个 case package 是否足以推进下一步 | income evidence accepted for review |
弱模式:
if model_confidence > 0.85 then auto-approve
强模式:
if field is low impact and confidence calibrated and validations pass
then auto-populate with audit trace
if field is high impact or conflicts with another source
then route to human review
if document class has fraud/tamper signal or legal/authority implication
then require specialized queue
置信度设计要点:
- threshold 按 field criticality、journey risk、customer harm、fraud exposure 定义。
- 不能用 document-level average 掩盖关键字段错误。
- 置信度必须校准, 并用 reviewer outcomes 监控 drift。
- 对 high-impact decisions 使用 validation + review + policy, 不只用 score。
- reviewer override 进入 feedback loop, 但不自动训练模型, 除非数据治理和模型治理已批准。
8. Human Review Design
人工复核不是“自动化失败后的人工兜底”, 而是 evidence architecture 的组成部分。
| Review pattern | Use case | Control requirement |
|---|---|---|
| Intake review | 分类不确定、文件质量差、缺页、格式异常 | 确认 document type, package completeness, reroute |
| Field review | 高影响字段低置信或冲突 | reviewer 看到原文、坐标、候选值、规则失败原因 |
| Specialist review | authority, legal document, KYB ownership, fraud signal | 分配给有资质/权限队列, 记录 rationale |
| QA sampling | 自动化通过的 case 抽样 | 估算 false accept / false extraction, 触发 model/control tuning |
| Dual control | 高金额 claim, sensitive servicing action, authority change | second reviewer or approver, separation of duties |
| Complaint review | 客户质疑文档处理或 AI 结论 | 连接 evidence ledger、AI run、reviewer action、final response |
Reviewer UI 应具备:
- 左侧原文/页面/图像, 右侧结构化字段, 字段点击可定位 source。
- 显示 model output、confidence、validation failures、previous reviewer actions。
- 不显示会诱导 rubber-stamping 的“AI says approve”。
- 对 fraud/risk signal 做最小必要解释, 避免泄露敏感规则。
- reviewer 必须选择 structured reason code and free-text rationale where needed。
- 每次 override 记录 old value、new value、reason、reviewer、timestamp、policy version。
9. Document Provenance and Chain of Custody
Evidence-grade document AI 必须能回答:
Who or what submitted the document?
When was it received?
Which exact file and page were processed?
Which model/rule/version extracted the field?
Was the document changed, re-rendered, split, redacted or reprocessed?
Who reviewed or overrode the result?
Which workflow decision used the evidence?
Which final customer or counterparty communication referenced it?
Provenance controls:
| Control | Evidence |
|---|---|
| Immutable raw artifact | file hash, storage location, write-once policy where applicable |
| Versioned derived artifacts | rendered pages, OCR JSON, layout graph, extracted field set |
| Processing lineage | model id/version, prompt/template id, parser version, ruleset version |
| Source coordinate binding | page number, bounding box, table cell id, paragraph anchor |
| Custody event log | uploaded, scanned, normalized, classified, reviewed, redacted, exported |
| Access trace | reviewer, system, service account, vendor access, download/export events |
| Workflow linkage | case id, task id, decision id, communication id |
| Records metadata | record class, retention rule, hold flag, disposition state |
LLM output 必须 grounded:
summary_claim:
text: "The claimant submitted two repair invoices totaling 3,420.00."
sources:
- document_id: D123
page: 2
field: invoice_total
value: 1,850.00
- document_id: D124
page: 1
field: invoice_total
value: 1,570.00
model_run_id: MR789
reviewer_status: reviewed
10. Records Retention and Legal Hold Architecture
Document intelligence 经常会生成多种 derived artifacts: OCR text、layout JSON、extracted fields、summary、review notes、decision logs、redacted copies、exports。它们是否构成 records、保留多久、是否进入 legal hold, 不能由 AI 团队自行判断。
架构应提供可配置机制:
| Question | Architecture response |
|---|---|
| 哪些 artifact 是 records? | record classification service + Records/Legal interpretation |
| 原文和抽取结果是否同等保留? | retention rule can differ by artifact type and case type |
| legal hold 如何传播? | hold flag propagates to raw doc, derived artifacts, case decisions, exports |
| disposition 如何执行? | scheduled disposition workflow with approval, audit and exception handling |
| reprocessing 后旧结果如何处理? | version lineage retained per policy; no silent overwrite |
| vendor 是否持有副本? | vendor data inventory, deletion/return evidence, contract controls |
| records search 如何定位文档和 AI artifacts? | metadata index with access controls and preservation status |
边界原则:
- 不在产品文案或架构文档中断言某类文档的法定保留期限。
- retention schedule、legal hold status、e-discovery、regulatory response 由 Legal / Compliance / Records 确认。
- AI summaries 不能替代原始记录, 除非 Records/Legal 明确认可该 artifact 的用途和保留方式。
- legal hold 下, 自动删除、模型训练清理、vendor purge、data minimization job 都必须检查 hold state。
11. Fraud, Tamper and Authenticity Checks
document AI 必须假设输入可能被操纵。特别是 bank statements、paystubs、invoices、receipts、screenshots、photos 和 identity/KYB documents。
| Threat | Pattern | Controls |
|---|---|---|
| Altered PDF | 字段被编辑, metadata 异常, 字体/层不一致 | PDF object analysis, metadata checks, visual inconsistency, reviewer alert |
| Fake template | 使用伪造银行/雇主/商户模板 | institution/template registry, logo/layout similarity, issuer validation where available |
| Missing pages | statement 缺少关键页或 terms/context | page count completeness, period continuity, expected section checks |
| Reused document | 同一 paystub / invoice / photo 多个客户重复使用 | perceptual hash, duplicate detection, cross-case risk signal |
| Synthetic paystub | YTD/pay period/tax/deduction 不一致 | arithmetic and chronology validation |
| Screenshot manipulation | 裁剪、拼接、覆盖、低质量绕过 | image metadata, edge artifacts, quality gate, source channel policy |
| Deepfake / generated image | AI 生成的事故照片、票据、签名 | media provenance, duplicate search, anomaly model, human/fraud review |
| Insider manipulation | 员工替换、导出、改写证据 | access control, separation of duties, immutable logs, FFIEC-aligned layered controls |
| Prompt injection in documents | 文档中写入指令诱导 LLM 忽略规则 | tool isolation, deterministic extraction schema, prompt injection filters, output validation |
重要边界:
- fraud/tamper model output 是 risk signal, 不是最终欺诈结论。
- 客户沟通应解释需要补充或复核的业务原因, 不暴露内部检测规则。
- 对高风险文档, 控制组合通常比单一模型更重要: metadata + layout + arithmetic + external validation + human review + monitoring。
12. Workflow Integration
Document intelligence 的价值来自进入业务流程, 不是停留在 extraction dashboard。
| Workflow | Integration pattern | Evidence control |
|---|---|---|
| KYC/KYB onboarding | prefill application, identify missing evidence, route ownership/authority ambiguity | source-linked fields, policy reason codes, compliance review boundary |
| Loan underwriting/servicing | income/expense extraction, hardship package completeness, servicing task creation | field confidence, calculation trace, reviewer rationale |
| Insurance claims | claim package classification, invoice/photo extraction, timeline, fraud triage | media provenance, duplicate checks, adjuster summary |
| Payment disputes | reason-code evidence checklist, merchant/customer package summary, SLA management | required evidence flags, final package trace |
| Complaints | identify complaint theme, product, customer harm, attached evidence, response deadlines | complaint-to-document linkage, final response capture |
| Back office operations | mailroom automation, form processing, correspondence routing | queue routing evidence, SLA, error sampling |
Workflow contract should include:
input artifact type
required extracted fields
confidence thresholds by field
validation rules
human review triggers
fraud/tamper triggers
policy decision states
records metadata
case update payload
customer communication constraints
fallback and exception path
monitoring metrics
13. Model Risk and AI Governance
Document AI 可以同时使用 OCR engine、layout model、classification model、multimodal LLM、entity extraction model、rules engine、fraud model、summarizer。每个模型/规则的风险不同。
| Capability | Model risk focus |
|---|---|
| Classification | wrong workflow route, wrong retention category, SLA miss |
| OCR/layout | field distortion, table errors, missed signatures or checkboxes |
| Entity extraction | wrong identity, amount, date, account, authority |
| Summarization | unsupported conclusion, omission of conflicting evidence, tone risk |
| Fraud/tamper detection | false positives, false negatives, sensitive rule leakage |
| Confidence scoring | poor calibration, threshold gaming, automation beyond evidence |
| Human review recommendation | automation bias, rubber-stamping, unequal treatment |
Governance design:
- Map use cases and harms using NIST AI RMF categories: validity, reliability, safety, security, accountability, transparency, privacy, fairness。
- Maintain model inventory with purpose, owner, vendor, version, data classes, decision impact and allowed uses。
- Define eval sets by document class, language, channel, quality, customer segment, fraud pattern and workflow outcome。
- Monitor extraction accuracy by field criticality, not only aggregate F1。
- Track reviewer overturns, complaint defects, downstream rework and customer harm indicators。
- Use ISO/IEC 42001-style AI management system controls: role accountability, operational procedures, performance evaluation, internal audit and continual improvement。
- Treat prompt/template/ruleset changes as governed artifacts when they affect extraction or workflow decisions。
14. Product / Architecture Decisions
| Decision | Weak answer | Strong architecture answer |
|---|---|---|
| What are we automating? | “Read all PDFs with AI” | Define document classes, field schemas, decision impact, review thresholds and workflow contracts |
| How to use OCR? | “OCR everything and send to LLM” | Preserve page/layout/source coordinates; use OCR/layout only as one stage of evidence pipeline |
| How to use multimodal models? | “Ask model what the document says” | Use schema-constrained extraction, grounded outputs, validation, confidence and review |
| What counts as evidence? | “Model output in JSON” | Raw document + source-linked extracted fields + validations + policy acceptance + review trace |
| When to auto-process? | “High confidence” | Field criticality + calibrated confidence + validations + fraud signals + policy threshold |
| How to handle summaries? | “Summarize the file for ops” | Source-linked, scoped summary that cannot override field evidence or policy rules |
| How to handle records? | “Store documents in S3” | Record class, retention schedule mapping, legal hold propagation, disposition audit |
| How to handle legal hold? | “Pause deletion manually” | Hold-aware storage, derived artifact propagation, vendor and downstream system checks |
| How to measure quality? | “OCR accuracy” | Field-level accuracy, calibration, review overturn, evidence completeness, downstream defect |
| How to govern vendors? | “Use best API” | Data use, retention, access, audit logs, model versioning, outage, exit and evidence obligations |
15. Control Matrix
| Control objective | Control activity | Evidence |
|---|---|---|
| Preserve original evidence | Immutable raw artifact, hash, received timestamp, source channel | document hash, intake event, storage policy |
| Classify correctly | Document type/subtype model with ambiguity routing | classification result, confidence, review decision |
| Bind fields to source | Every extracted field includes document/page/coordinate or anchor | extraction JSON, UI source link |
| Validate critical fields | Rule and cross-document checks for amounts, dates, identity, authority | validation log, failed rule reason |
| Calibrate confidence | Compare confidence with reviewer outcomes and sampling | calibration report, threshold change record |
| Prevent unsupported automation | Field criticality thresholds and policy gates before workflow action | policy decision id, reason code |
| Control summaries | Source-linked summaries with prohibited conclusion rules | model run id, citations, eval result |
| Route human review | Queue by ambiguity, high impact, authority, fraud, legal/records sensitivity | task id, reviewer, rationale |
| Detect tamper/fraud | Metadata, visual, duplicate, arithmetic and behavioral checks | risk signals, fraud case link |
| Protect privacy | Data minimization, access controls, redaction, purpose-bound use | privacy review, access logs |
| Manage records | Record class and retention metadata assigned to raw and derived artifacts | record metadata, retention rule |
| Honor legal hold | Hold flag propagates to raw docs, derived artifacts, exports and vendor purge flows | hold event, propagation log |
| Govern models | Model inventory, evals, drift monitoring, change control | model card, eval report, approval |
| Support complaints/audit | Link documents, AI runs, review actions, workflow decisions and final messages | evidence bundle, complaint id |
16. Metrics
| Metric family | Examples |
|---|---|
| Extraction quality | field-level precision/recall, table extraction accuracy, date/amount/entity error rate |
| Confidence quality | calibration error, high-confidence wrong field rate, threshold breach rate |
| Workflow outcome | straight-through processing rate by document class, review queue SLA, downstream rework |
| Human review | reviewer overturn rate, agreement rate, average handle time, QA defect rate |
| Evidence completeness | % fields with source coordinates, % decisions with policy reason, replay success rate |
| Records and hold | retention metadata completeness, hold propagation success, disposition exception count |
| Fraud/tamper | duplicate document rate, altered document detection, false positive review rate, confirmed fraud yield |
| Privacy/security | over-collection defects, unauthorized access attempts, redaction defects, vendor retention exceptions |
| Model governance | eval pass rate, drift alerts, prompt/ruleset changes, incident count |
| Customer impact | complaint rate related to document handling, dispute re-open rate, request-for-more-info rate, accessibility defects |
Balanced executive dashboard:
Speed: cycle time and review productivity improve.
Quality: critical fields are accurate and calibrated.
Risk: fraud, tamper, legal hold and records controls work.
Fairness: errors are monitored across document quality, language and channel.
Trust: every automated or reviewed decision is replayable.
17. Failure Modes
| Failure mode | Why dangerous | Better control |
|---|---|---|
| OCR text treated as truth | OCR may misread critical amounts, dates, names | source-linked fields, validation, human review |
| Document-level confidence used for all fields | High average hides one critical field error | field criticality thresholds |
| LLM summary becomes decision record | Summary may omit conflicts or invent conclusions | source-linked summary plus structured evidence |
| Wrong document classification | Routes to wrong workflow, SLA, retention class | ambiguity queue and QA sampling |
| Silent reprocessing overwrites evidence | Audit cannot explain historical decision | versioned artifacts and lineage |
| No legal hold propagation | Derived OCR/extractions may be deleted while raw doc held | hold-aware artifact graph |
| Reviewer rubber-stamping | Automation bias turns human review into weak control | source-first UI, reason codes, QA |
| Fraud model blocks customers without review | False positives can cause harm and complaints | risk signal routing and human/fraud review |
| Vendor retains documents unexpectedly | Privacy, records and legal hold exposure | contract controls, data inventory, deletion evidence |
| Model trained on records under hold or restricted use | Governance and discovery risk | data use controls and hold-aware training exclusion |
| Prompt injection from document text | Model follows malicious embedded instructions | tool isolation and output validation |
| Complaint cannot link to evidence | Root cause and remediation become speculative | complaint-to-evidence trace |
18. Interview-Ready Takeaways
Q1: 为什么 document intelligence 不是 OCR 项目?
OCR 只解决“看见文字”。金融零售真正需要的是 evidence-grade extraction: 文档 provenance、layout/source coordinates、字段置信度、业务验证、fraud/tamper checks、人工复核、workflow action、records retention、legal hold 和 audit replay。否则只是把人工读错文档变成机器批量读错文档。
Q2: 如何判断某个抽取字段可以自动进入业务流程?
不能只看模型 confidence。要看 field criticality、document class、confidence calibration、source linkage、validation result、cross-document consistency、fraud/tamper signals、policy gate 和 human review threshold。高影响字段例如 income、authority、claim amount、beneficial owner 通常需要更强验证或复核。
Q3: AI summary 在 claims/disputes/KYC 中如何安全使用?
Summary 应该作为 reviewer productivity tool, 不是 evidence replacement。每个关键陈述要 source-linked, 不能给出 unsupported eligibility、KYC/KYB、fraud 或 legal conclusion。最终业务决定应引用 structured evidence、policy reason 和 reviewer action。
Q4: records retention 和 legal hold 为什么要进入 document AI 架构?
因为 document AI 会产生 raw docs、OCR text、layout JSON、extracted fields、summaries、review notes、exports 等 derived artifacts。哪些是 records、保留多久、是否受 legal hold 影响, 取决于 product、record type、jurisdiction、retention schedule、hold status 和 Legal/Compliance/Records interpretation。架构必须能传播 metadata and hold state, 而不是事后人工查找。
Q5: 高级 PM 如何衡量 document AI 成功?
不只看自动化率或处理时间。要看 field-level accuracy、confidence calibration、review overturn、evidence completeness、records/hold metadata completeness、fraud/tamper yield、complaint defects、downstream rework、customer harm and audit replay success。速度必须和证据质量一起看。
19. Practical Templates
19.1 Document Evidence Envelope
Document ID:
Case ID:
Customer / business reference:
Source channel:
Received timestamp:
Submitter / system reference:
Raw file hash:
File type and size:
Page count:
Document class / subtype:
Classification confidence:
Language / locale:
Quality score:
Processing lineage:
renderer version:
OCR/layout version:
extraction model version:
ruleset version:
Fraud/tamper signals:
Record class:
Retention rule:
Legal hold flag:
Access restrictions:
Derived artifacts:
Workflow decisions:
Complaint / audit links:
19.2 Field Extraction Spec
| Field | Definition |
|---|---|
| field_name | gross_pay_amount |
| document_classes | paystub, payroll statement |
| source_requirement | page + bounding box + raw text |
| normalization | currency amount with locale and period |
| validations | gross >= net, pay period exists, YTD consistency |
| confidence_threshold | higher threshold for auto-populate, lower threshold for review suggestion |
| review_trigger | low confidence, arithmetic mismatch, employer mismatch, tamper signal |
| allowed_workflow_use | income package preparation, not final credit decision by itself |
| prohibited_use | unsupported affordability conclusion |
| retention_metadata | derived field linked to paystub record class |
19.3 Confidence and Review Policy
Document class:
Workflow:
Field criticality:
Customer impact:
Fraud exposure:
Auto-populate allowed when:
field confidence:
classification confidence:
validations:
tamper signals:
cross-document consistency:
Human review required when:
low confidence:
conflict:
authority/legal implication:
high amount:
vulnerable customer / complaint sensitivity:
Sampling rule:
Reviewer queue:
QA metric:
19.4 Human Review Record
Review task ID:
Reviewer role:
Document ID:
Field(s) reviewed:
Model suggestion:
Source location:
Validation failures:
Fraud/tamper signals:
Reviewer decision:
Corrected value:
Reason code:
Free-text rationale:
Second approval:
Workflow action:
Customer communication reference:
Timestamp:
19.5 Records / Legal Hold Integration Card
Artifact type:
Raw document:
Rendered page:
OCR text:
Layout JSON:
Extracted fields:
AI summary:
Review notes:
Workflow decision:
Exported package:
Record class owner:
Retention schedule reference:
Legal hold propagation rule:
Disposition approval:
Vendor copy:
Search / retrieval metadata:
Access restrictions:
Audit evidence:
20. Final Operating Principle
成熟的 AI document intelligence architecture 可以用一个问题检验:
Can the institution prove that every automated or human-assisted document decision
was based on the right document,
the right source-linked fields,
the right confidence and validation controls,
the right human review boundary,
the right fraud and records treatment,
and the right workflow policy at that point in time?
如果答案不清楚, 团队缺的不是更强 OCR。缺的是 document provenance、evidence quality、confidence calibration、human review design、records/legal hold integration、fraud controls、workflow contracts 和 AI governance 组成的一套 evidence operating architecture。