返回 Papers
AI 底层逻辑 / 经典论文

AI Document Intelligence:非结构化数据与证据质量架构

本文是学习、架构训练和作品集材料, 不构成法律意见、监管意见、记录保留结论、e-discovery 建议、KYC/KYB 充分性判断、贷款或保险承保结论、消费者争议处置意见、欺诈处置指令、模型验证报告或供应商推荐。

695ai-foundations/papers/137-ai-document-intelligence-unstructured-data-evidence-quality-architecture.md

AI Document Intelligence / Unstructured Data / Evidence Quality Architecture 解读

面向对象: CBAP+ Senior BA / Advanced AI PM / Product Architect / Enterprise Architect / Operations Architect / Model Risk / Records Management / Fraud Risk / KYC-KYB Operations / Claims and Disputes Lead / Loan and Insurance Servicing Product Owner。 核心问题: 金融零售 AI 系统如何把 bank statement、paystub、claim package、dispute evidence、KYC/KYB 文件、insurance / loan servicing 文档和运营来信, 从 unstructured documents 转成 evidence-grade, auditable, reviewable, workflow-ready 的事实, 同时控制 OCR/layout/multimodal extraction、classification、entity extraction、summarization、confidence scoring、human review、document provenance、records retention、legal hold、fraud/tamper checks 和 model risk? 学习目标: 建立 document intelligence reference architecture、evidence quality model、document provenance and chain-of-custody、confidence and review design、records/legal hold integration、workflow automation controls、fraud/tamper detection、model risk governance 和 senior PM/architect decision framework。

0. Disclaimer

本文是学习、架构训练和作品集材料, 不构成法律意见、监管意见、记录保留结论、e-discovery 建议、KYC/KYB 充分性判断、贷款或保险承保结论、消费者争议处置意见、欺诈处置指令、模型验证报告或供应商推荐。

正式项目必须由 Legal、Compliance、Privacy、Records Management、Information Governance、Model Risk、Fraud Risk、Financial Crime、Operations、Product、Architecture、Information Security、Data Governance、Vendor Management、Internal Audit 和相关业务 owner 共同判断。记录、证据、法律保留、客户通知、KYC/KYB、信贷、保险、投诉、争议、索赔、跨境数据和 e-discovery 的具体适用性, 取决于 product、record type、jurisdiction、retention schedule、legal hold status、customer segment、channel、policy、contract 和 Legal / Compliance / Records interpretation。

本文不把 document intelligence 简化成 OCR 教程。OCR 只是把图像转成文本的一个能力。金融零售场景真正需要的是 evidence-grade extraction architecture: 能说明文档从哪里来、是否完整、字段从哪一页哪一区域抽取、置信度如何校准、何时需要人工复核、如何进入工作流、如何保留记录、如何处理 legal hold、如何检测篡改和欺诈, 以及事后如何重放决策证据。


Source Anchors

SourceLink用途
NIST AI Risk Management Frameworkhttps://www.nist.gov/itl/ai-risk-management-framework用 Govern / Map / Measure / Manage 组织 document AI 的风险治理、eval、monitoring、human oversight、incident and evidence controls
NIST Privacy Frameworkhttps://www.nist.gov/privacy-framework用 privacy risk management、data minimization、purpose、processing、access and monitoring 设计文档数据采集、抽取、使用和保留边界
NARA Records Managementhttps://www.archives.gov/records-mgmt用 records lifecycle、disposition、records program 和 accountability 作为 records retention / evidence management 的官方锚点
NARA Electronic Records Managementhttps://www.archives.gov/records-mgmt/policy/transfer-guidance-tables.html用电子记录格式、metadata、transfer/readiness guidance 作为 electronic records architecture and preservation discussion 的锚点
CFPB Consumer Complaint Databasehttps://www.consumerfinance.gov/data-research/consumer-complaints/用消费者投诉和 complaint operations 视角校验 document evidence trace、dispute handling、case explanation 和 operational learning loop
FFIEC Authentication and Access to Financial Institution Services and Systemshttps://www.ffiec.gov/press/pr081121.htm用金融机构认证、访问控制、风险评估和 layered security 思路设计 document intake、reviewer access、workflow action 和 privileged operation controls
ISO/IEC 42001 overviewhttps://www.iso.org/standard/42001用 AI management system、roles、operation、performance evaluation、internal audit 和 continual improvement 建立 document AI operating model

一句话:

Document intelligence is not "OCR + LLM summary". In financial operations, it is an evidence system that converts unstructured documents into policy-bound, source-linked, confidence-calibrated, human-reviewable and records-aware decision inputs.


1. Thesis

金融零售文档智能的核心目标不是“更快读文档”, 而是把文档变成可依赖的 operational evidence。成熟架构应实现下面的转换:

from: uploaded PDF / scanned image / email attachment / photo / fax
to: evidence envelope
    + document provenance
    + page/layout map
    + extracted entities with source coordinates
    + normalized facts
    + confidence and validation results
    + fraud/tamper signals
    + human review decisions
    + workflow action
    + records retention / legal hold metadata
    + replayable audit trail

核心判断:

Text recognized does not mean fact established.
A model summary does not mean evidence accepted.
Confidence score does not mean business risk resolved.
Human review does not mean control effectiveness unless review is designed, sampled and evidenced.
Document storage does not mean records compliance.

高级 PM / Architect 要把 document AI 设计成四层系统:

Layer目标关键问题
Capture and provenance证明文档从哪里来、何时进入、是否完整、是否被处理过source channel、hash、version、customer/session/case binding、chain of custody
Extraction and understanding把 layout、text、tables、images、signatures、entities、relationships 转成 structured evidenceOCR/layout/multimodal model、field mapping、normalization、source coordinates
Evidence quality and controls判断字段是否可用于业务, 何时人工复核, 如何处理冲突confidence calibration、validation rules、human review、fraud/tamper checks、policy gates
Workflow and records把证据进入 claims/disputes/KYC/servicing 工作流, 并保留可重放记录case management、decision logs、records retention、legal hold、complaint linkage

最重要的架构边界是:

extraction result
  != verified fact
  != policy-accepted evidence
  != legal sufficiency
  != final business decision

2. Why It Matters

金融零售运营高度依赖非结构化文档:

JourneyDocument examplesBusiness decision risk
Loan origination and servicingbank statements、paystubs、tax forms、hardship letters、income proof、servicing correspondenceaffordability, income verification, repayment plan, adverse action, servicing treatment
Insurance claimsclaim forms、photos、repair invoices、medical bills、police reports、adjuster notesclaim eligibility, payout amount, fraud triage, escalation, customer communication
Payment disputes and chargebacksreceipts、merchant correspondence、tracking proof、screenshots、cardholder statementsdispute reason code, evidence package, representment, regulatory timelines
KYC/KYB onboarding and refreshID images、business registration、ownership docs、licenses、utility bills、board resolutionsidentity/entity evidence, authority, beneficial ownership, sanctions/financial crime review
Operations and complaintscomplaint letters、email threads、call transcripts、agent notes、documents attached to casesissue classification, remediation, response evidence, root cause analysis
Account maintenancename/address change proof、death certificate、power of attorney、court order、consent formsentitlement, authority, privacy, account access, legal/ops escalation

AI 放大三类风险:

  1. 规模风险: 一个 extraction defect 会批量影响成千上万个 case。
  2. 语义风险: 模型把“看起来像工资单”误写成“收入已验证”。
  3. 证据风险: 业务行动发生后无法证明字段来自哪份文档、哪个模型、哪个版本、哪个 reviewer。

Senior PM / Architect 的目标不是“自动化率最高”, 而是:

Use automation where evidence quality is sufficient,
route ambiguity to the right human queue,
preserve proof of what was seen and decided,
and keep records, privacy, fraud and model risk controls in the same workflow.

3. Evidence Object Taxonomy

document AI 中至少要区分七类对象。混用这些对象会导致不可审计的自动化。

ObjectDefinitionExampleControl implication
Raw document客户、商户、员工、第三方或系统提交的原始文件PDF statement、photo paystub、email attachment保留原始 hash、source channel、received time、case binding
Rendered page系统渲染出的 page image / normalized PDF pagepage 3 of bank statement记录 renderer version、page count、image quality
Layout element表格、段落、checkbox、signature block、stamp、logo、field regionpaystub earnings table需要坐标、reading order、table structure
Extracted field从文档中抽取的字段和值gross pay = 4,850.00需要 source coordinates、confidence、parser/model version
Normalized entity经过标准化和业务字典映射的实体employer_name、account_holder、claimant、policy_number需要 normalization rule、entity resolution evidence
Derived fact由多个字段或规则计算出的事实average monthly deposit, income variance, coverage period需要 formula、input fields、calculation version
Decision evidence被业务 policy 接受或人工确认的证据accepted income evidence for case X需要 policy decision、reason code、reviewer/action trace
Summary面向 reviewer 或客户的 source-linked 摘要claim package summary必须引用来源, 不能替代原证据

字段级 evidence metadata 建议:

field_name
document_id
document_version
page_number
bounding_box_or_anchor
raw_text
normalized_value
extraction_method
model_or_rule_version
confidence_score
calibration_bucket
validation_results
cross_document_match
fraud_or_tamper_signals
human_review_status
policy_acceptance_status
evidence_retention_rule
legal_hold_flag

4. Reference Architecture Model

参考架构:

intake channels
  -> document capture and provenance service
  -> file normalization / rendering / virus and content safety scan
  -> document classification and package splitting
  -> OCR + layout understanding + table extraction
  -> multimodal extraction / entity extraction / relationship mapping
  -> normalization and business validation
  -> confidence calibration and quality scoring
  -> fraud / tamper / duplicate / synthetic-document checks
  -> evidence policy engine
  -> human review and exception queues
  -> workflow integration: KYC, claims, disputes, servicing, complaints
  -> records retention / legal hold / disposition integration
  -> evidence ledger, monitoring, QA and model governance

关键组件:

ComponentResponsibilitySenior design question
Intake gateway接收 upload、email、fax、branch scan、mobile capture、API、vendor feed是否绑定 customer/case/session, 是否记录 channel risk and consent context?
Provenance service生成 document id、hash、timestamp、source、custody events、version事后能否证明文档没有被替换或静默改写?
Classification service判断文档类型、子类型、issuer/source、语言、质量、package structure错分是否会进入错误 workflow 或错误 retention rule?
Layout and OCR service识别 text、reading order、tables、checkboxes、signature/stamp blocks表格和多栏阅读顺序是否被验证, 是否保留坐标?
Multimodal extraction结合 text、layout、image、tables 抽取字段和关系模型输出是否被 schema 约束, 是否能解释来源?
Entity normalization统一姓名、地址、金额、日期、账号后四位、企业名、政策号normalization 是否可重放, 是否保留原文?
Validation and reconciliation跨页、跨文档、系统记录、第三方数据做一致性检查冲突如何进入 review, 不能被 summary 掩盖?
Confidence engine字段、文档、case 层级的置信度和校准threshold 是否按 field criticality and journey risk 定义?
Fraud/tamper service检查编辑痕迹、metadata anomaly、duplicate、template abuse、image manipulation是否只作为信号, 不直接替代 fraud investigation?
Evidence policy engine判断 extraction 是否可被当前业务流程接受是否把 extraction、validation、review、policy acceptance 分开?
Human review workbenchreviewer 查看 source-linked fields、conflicts、model rationale、historyreviewer 能否快速定位证据并留下结构化 decision?
Records and hold connector赋予 record class、retention schedule、legal hold flag、disposition controlsretention and hold 是否从 case/workflow 状态继承并可审计?
Evidence ledger保存 document、model、rule、review、workflow action 和 final communication tracecomplaint/audit 时能否完整 replay?

5. Document Classes and Evidence Risk

不是所有文档都应该用同样的 automation threshold。按 business impact、field criticality、fraud exposure 和 records sensitivity 分层。

Document classHigh-value fieldsSpecial risksArchitecture control
Bank statementsaccount holder、institution、statement period、balances、deposits、NSF、account number maskaltered PDF、missing pages、fake bank template、income misclassificationpage completeness, transaction table validation, institution/logo metadata checks, human review for high-impact use
Paystubsemployer、employee、pay period、gross/net pay、YTD income、deductionsgenerated fake paystub, mismatched employer, inconsistent YTDarithmetic checks, pay period consistency, employer/entity validation, duplicate template detection
Claims documentsclaim number、loss date、policy number、coverage, invoices, photos, police/medical recordsinflated invoices, reused photos, inconsistent event timelinepackage timeline, image metadata, duplicate media search, adjuster review
Dispute packagestransaction details、merchant evidence、shipping proof、customer assertion、reason codesweak evidence, missing required proof, model over-summaryreason-code-specific evidence checklist, source-linked package summary, SLA controls
KYC/KYB documentsidentity data、business registration、ownership、license、authorized signerstale documents, entity mismatch, authority ambiguityfreshness policy, entity resolution, legal/compliance review boundary, beneficial ownership evidence routing
Insurance/loan servicing docshardship reason、income/expense、death/POA/court order、address/name changeauthority and entitlement errors, sensitive data leakageprivileged workflow controls, dual review for authority, records/hold metadata
Operational correspondencecomplaint letters、emails、agent notes、attachmentsmissed complaint, wrong product classification, incomplete response evidencecomplaint taxonomy, case linkage, final response capture, CFPB-style complaint learning loop

高级设计原则:

  • 用 document class 决定 extraction schema、review threshold、retention class、fraud checks 和 workflow route。
  • 对 high-impact fields 使用 field-level controls, 不只看 document-level confidence。
  • 对 summaries 使用 source-linked citations inside the case tool, 不能让摘要成为唯一证据。
  • 对 customer-provided documents 和 institution-generated records 分开治理。

6. Evidence-Grade Extraction Pipeline

6.1 Intake and Capture

Control questionStrong pattern
文档从哪里来?source channel、user/session、case id、upload event、IP/device/risk context where permitted
原始文件是否保留?raw artifact immutable storage + hash + version pointer
是否完整?page count, file size, render success, missing/blank page check
是否安全?malware scan, file type validation, macro/script blocking, content safety routing
是否可访问?mobile capture quality feedback, supported formats, assisted channel

6.2 Classification and Package Splitting

分类不是 UI 标签, 而是 workflow/risk/records decision 的入口。

Classification outputWhy it matters
document_type and subtype决定 extraction schema and workflow queue
issuer/source class决定 trust and fraud checks
language and locale决定 OCR/model and reviewer routing
package boundaries多文档 PDF 中切分 statement、paystub、invoice、letter
confidence and ambiguity低置信分类进入 intake review, 防止走错流程
record category candidate后续 records retention / legal hold integration 的输入

6.3 Layout, OCR and Multimodal Understanding

架构关注点不是 OCR 算法细节, 而是 evidence recoverability:

CapabilityEvidence requirement
Text recognitionraw OCR text, confidence, language, page, text anchors
Layout detectionparagraphs, tables, cells, checkboxes, signature areas, stamps, reading order
Table extractionrow/column coordinates, header mapping, merged cell handling, totals validation
Image understandingphoto/document boundary, logo/stamp/signature presence, damage or blur
Multimodal extractionfield value must link back to visual/text source, not only generated answer
Summarizationsource-linked, scoped to reviewer need, not used as record replacement

6.4 Entity Extraction and Normalization

Entity extraction 必须按 field criticality 分级:

Field typeExamplesControl
Identity/entityname, DOB, business name, beneficial owner, authorized signersource coordinate + normalization + conflict check + high-risk review
Monetarygross pay, net pay, deposit amount, invoice amount, claim amountarithmetic validation, currency, period, outlier checks
Temporalstatement period, pay period, loss date, coverage dates, received datedate normalization, timeline consistency
Authority/relationshipPOA, signer role, officer, policyholder, claimanthuman/legal/ops review triggers based on product policy
Operationalreason code, claim type, complaint category, servicing requestworkflow route and SLA impact, QA sampling

7. Confidence Architecture

置信度不是一个漂亮分数。它是 routing, review, policy acceptance and monitoring 的控制输入。

Confidence levelDefinitionExample
Character/text confidenceOCR 对具体字符或 token 的识别可信度8 vs B in account mask
Field confidence模型认为某字段值正确的概率或 scorepay period end date
Layout confidence表格结构、reading order、checkbox 状态是否可信deductions table
Document classification confidence文档类型和子类型是否正确paystub vs payroll summary
Cross-validation confidence字段与内部系统、其他文档、规则是否一致YTD income vs pay period
Fraud/tamper confidence文档篡改或伪造信号强弱PDF metadata anomaly
Case evidence confidence整个 case package 是否足以推进下一步income evidence accepted for review

弱模式:

if model_confidence > 0.85 then auto-approve

强模式:

if field is low impact and confidence calibrated and validations pass
  then auto-populate with audit trace
if field is high impact or conflicts with another source
  then route to human review
if document class has fraud/tamper signal or legal/authority implication
  then require specialized queue

置信度设计要点:

  • threshold 按 field criticality、journey risk、customer harm、fraud exposure 定义。
  • 不能用 document-level average 掩盖关键字段错误。
  • 置信度必须校准, 并用 reviewer outcomes 监控 drift。
  • 对 high-impact decisions 使用 validation + review + policy, 不只用 score。
  • reviewer override 进入 feedback loop, 但不自动训练模型, 除非数据治理和模型治理已批准。

8. Human Review Design

人工复核不是“自动化失败后的人工兜底”, 而是 evidence architecture 的组成部分。

Review patternUse caseControl requirement
Intake review分类不确定、文件质量差、缺页、格式异常确认 document type, package completeness, reroute
Field review高影响字段低置信或冲突reviewer 看到原文、坐标、候选值、规则失败原因
Specialist reviewauthority, legal document, KYB ownership, fraud signal分配给有资质/权限队列, 记录 rationale
QA sampling自动化通过的 case 抽样估算 false accept / false extraction, 触发 model/control tuning
Dual control高金额 claim, sensitive servicing action, authority changesecond reviewer or approver, separation of duties
Complaint review客户质疑文档处理或 AI 结论连接 evidence ledger、AI run、reviewer action、final response

Reviewer UI 应具备:

  • 左侧原文/页面/图像, 右侧结构化字段, 字段点击可定位 source。
  • 显示 model output、confidence、validation failures、previous reviewer actions。
  • 不显示会诱导 rubber-stamping 的“AI says approve”。
  • 对 fraud/risk signal 做最小必要解释, 避免泄露敏感规则。
  • reviewer 必须选择 structured reason code and free-text rationale where needed。
  • 每次 override 记录 old value、new value、reason、reviewer、timestamp、policy version。

9. Document Provenance and Chain of Custody

Evidence-grade document AI 必须能回答:

Who or what submitted the document?
When was it received?
Which exact file and page were processed?
Which model/rule/version extracted the field?
Was the document changed, re-rendered, split, redacted or reprocessed?
Who reviewed or overrode the result?
Which workflow decision used the evidence?
Which final customer or counterparty communication referenced it?

Provenance controls:

ControlEvidence
Immutable raw artifactfile hash, storage location, write-once policy where applicable
Versioned derived artifactsrendered pages, OCR JSON, layout graph, extracted field set
Processing lineagemodel id/version, prompt/template id, parser version, ruleset version
Source coordinate bindingpage number, bounding box, table cell id, paragraph anchor
Custody event loguploaded, scanned, normalized, classified, reviewed, redacted, exported
Access tracereviewer, system, service account, vendor access, download/export events
Workflow linkagecase id, task id, decision id, communication id
Records metadatarecord class, retention rule, hold flag, disposition state

LLM output 必须 grounded:

summary_claim:
  text: "The claimant submitted two repair invoices totaling 3,420.00."
  sources:
    - document_id: D123
      page: 2
      field: invoice_total
      value: 1,850.00
    - document_id: D124
      page: 1
      field: invoice_total
      value: 1,570.00
  model_run_id: MR789
  reviewer_status: reviewed

Document intelligence 经常会生成多种 derived artifacts: OCR text、layout JSON、extracted fields、summary、review notes、decision logs、redacted copies、exports。它们是否构成 records、保留多久、是否进入 legal hold, 不能由 AI 团队自行判断。

架构应提供可配置机制:

QuestionArchitecture response
哪些 artifact 是 records?record classification service + Records/Legal interpretation
原文和抽取结果是否同等保留?retention rule can differ by artifact type and case type
legal hold 如何传播?hold flag propagates to raw doc, derived artifacts, case decisions, exports
disposition 如何执行?scheduled disposition workflow with approval, audit and exception handling
reprocessing 后旧结果如何处理?version lineage retained per policy; no silent overwrite
vendor 是否持有副本?vendor data inventory, deletion/return evidence, contract controls
records search 如何定位文档和 AI artifacts?metadata index with access controls and preservation status

边界原则:

  • 不在产品文案或架构文档中断言某类文档的法定保留期限。
  • retention schedule、legal hold status、e-discovery、regulatory response 由 Legal / Compliance / Records 确认。
  • AI summaries 不能替代原始记录, 除非 Records/Legal 明确认可该 artifact 的用途和保留方式。
  • legal hold 下, 自动删除、模型训练清理、vendor purge、data minimization job 都必须检查 hold state。

11. Fraud, Tamper and Authenticity Checks

document AI 必须假设输入可能被操纵。特别是 bank statements、paystubs、invoices、receipts、screenshots、photos 和 identity/KYB documents。

ThreatPatternControls
Altered PDF字段被编辑, metadata 异常, 字体/层不一致PDF object analysis, metadata checks, visual inconsistency, reviewer alert
Fake template使用伪造银行/雇主/商户模板institution/template registry, logo/layout similarity, issuer validation where available
Missing pagesstatement 缺少关键页或 terms/contextpage count completeness, period continuity, expected section checks
Reused document同一 paystub / invoice / photo 多个客户重复使用perceptual hash, duplicate detection, cross-case risk signal
Synthetic paystubYTD/pay period/tax/deduction 不一致arithmetic and chronology validation
Screenshot manipulation裁剪、拼接、覆盖、低质量绕过image metadata, edge artifacts, quality gate, source channel policy
Deepfake / generated imageAI 生成的事故照片、票据、签名media provenance, duplicate search, anomaly model, human/fraud review
Insider manipulation员工替换、导出、改写证据access control, separation of duties, immutable logs, FFIEC-aligned layered controls
Prompt injection in documents文档中写入指令诱导 LLM 忽略规则tool isolation, deterministic extraction schema, prompt injection filters, output validation

重要边界:

  • fraud/tamper model output 是 risk signal, 不是最终欺诈结论。
  • 客户沟通应解释需要补充或复核的业务原因, 不暴露内部检测规则。
  • 对高风险文档, 控制组合通常比单一模型更重要: metadata + layout + arithmetic + external validation + human review + monitoring。

12. Workflow Integration

Document intelligence 的价值来自进入业务流程, 不是停留在 extraction dashboard。

WorkflowIntegration patternEvidence control
KYC/KYB onboardingprefill application, identify missing evidence, route ownership/authority ambiguitysource-linked fields, policy reason codes, compliance review boundary
Loan underwriting/servicingincome/expense extraction, hardship package completeness, servicing task creationfield confidence, calculation trace, reviewer rationale
Insurance claimsclaim package classification, invoice/photo extraction, timeline, fraud triagemedia provenance, duplicate checks, adjuster summary
Payment disputesreason-code evidence checklist, merchant/customer package summary, SLA managementrequired evidence flags, final package trace
Complaintsidentify complaint theme, product, customer harm, attached evidence, response deadlinescomplaint-to-document linkage, final response capture
Back office operationsmailroom automation, form processing, correspondence routingqueue routing evidence, SLA, error sampling

Workflow contract should include:

input artifact type
required extracted fields
confidence thresholds by field
validation rules
human review triggers
fraud/tamper triggers
policy decision states
records metadata
case update payload
customer communication constraints
fallback and exception path
monitoring metrics

13. Model Risk and AI Governance

Document AI 可以同时使用 OCR engine、layout model、classification model、multimodal LLM、entity extraction model、rules engine、fraud model、summarizer。每个模型/规则的风险不同。

CapabilityModel risk focus
Classificationwrong workflow route, wrong retention category, SLA miss
OCR/layoutfield distortion, table errors, missed signatures or checkboxes
Entity extractionwrong identity, amount, date, account, authority
Summarizationunsupported conclusion, omission of conflicting evidence, tone risk
Fraud/tamper detectionfalse positives, false negatives, sensitive rule leakage
Confidence scoringpoor calibration, threshold gaming, automation beyond evidence
Human review recommendationautomation bias, rubber-stamping, unequal treatment

Governance design:

  • Map use cases and harms using NIST AI RMF categories: validity, reliability, safety, security, accountability, transparency, privacy, fairness。
  • Maintain model inventory with purpose, owner, vendor, version, data classes, decision impact and allowed uses。
  • Define eval sets by document class, language, channel, quality, customer segment, fraud pattern and workflow outcome。
  • Monitor extraction accuracy by field criticality, not only aggregate F1。
  • Track reviewer overturns, complaint defects, downstream rework and customer harm indicators。
  • Use ISO/IEC 42001-style AI management system controls: role accountability, operational procedures, performance evaluation, internal audit and continual improvement。
  • Treat prompt/template/ruleset changes as governed artifacts when they affect extraction or workflow decisions。

14. Product / Architecture Decisions

DecisionWeak answerStrong architecture answer
What are we automating?“Read all PDFs with AI”Define document classes, field schemas, decision impact, review thresholds and workflow contracts
How to use OCR?“OCR everything and send to LLM”Preserve page/layout/source coordinates; use OCR/layout only as one stage of evidence pipeline
How to use multimodal models?“Ask model what the document says”Use schema-constrained extraction, grounded outputs, validation, confidence and review
What counts as evidence?“Model output in JSON”Raw document + source-linked extracted fields + validations + policy acceptance + review trace
When to auto-process?“High confidence”Field criticality + calibrated confidence + validations + fraud signals + policy threshold
How to handle summaries?“Summarize the file for ops”Source-linked, scoped summary that cannot override field evidence or policy rules
How to handle records?“Store documents in S3”Record class, retention schedule mapping, legal hold propagation, disposition audit
How to handle legal hold?“Pause deletion manually”Hold-aware storage, derived artifact propagation, vendor and downstream system checks
How to measure quality?“OCR accuracy”Field-level accuracy, calibration, review overturn, evidence completeness, downstream defect
How to govern vendors?“Use best API”Data use, retention, access, audit logs, model versioning, outage, exit and evidence obligations

15. Control Matrix

Control objectiveControl activityEvidence
Preserve original evidenceImmutable raw artifact, hash, received timestamp, source channeldocument hash, intake event, storage policy
Classify correctlyDocument type/subtype model with ambiguity routingclassification result, confidence, review decision
Bind fields to sourceEvery extracted field includes document/page/coordinate or anchorextraction JSON, UI source link
Validate critical fieldsRule and cross-document checks for amounts, dates, identity, authorityvalidation log, failed rule reason
Calibrate confidenceCompare confidence with reviewer outcomes and samplingcalibration report, threshold change record
Prevent unsupported automationField criticality thresholds and policy gates before workflow actionpolicy decision id, reason code
Control summariesSource-linked summaries with prohibited conclusion rulesmodel run id, citations, eval result
Route human reviewQueue by ambiguity, high impact, authority, fraud, legal/records sensitivitytask id, reviewer, rationale
Detect tamper/fraudMetadata, visual, duplicate, arithmetic and behavioral checksrisk signals, fraud case link
Protect privacyData minimization, access controls, redaction, purpose-bound useprivacy review, access logs
Manage recordsRecord class and retention metadata assigned to raw and derived artifactsrecord metadata, retention rule
Honor legal holdHold flag propagates to raw docs, derived artifacts, exports and vendor purge flowshold event, propagation log
Govern modelsModel inventory, evals, drift monitoring, change controlmodel card, eval report, approval
Support complaints/auditLink documents, AI runs, review actions, workflow decisions and final messagesevidence bundle, complaint id

16. Metrics

Metric familyExamples
Extraction qualityfield-level precision/recall, table extraction accuracy, date/amount/entity error rate
Confidence qualitycalibration error, high-confidence wrong field rate, threshold breach rate
Workflow outcomestraight-through processing rate by document class, review queue SLA, downstream rework
Human reviewreviewer overturn rate, agreement rate, average handle time, QA defect rate
Evidence completeness% fields with source coordinates, % decisions with policy reason, replay success rate
Records and holdretention metadata completeness, hold propagation success, disposition exception count
Fraud/tamperduplicate document rate, altered document detection, false positive review rate, confirmed fraud yield
Privacy/securityover-collection defects, unauthorized access attempts, redaction defects, vendor retention exceptions
Model governanceeval pass rate, drift alerts, prompt/ruleset changes, incident count
Customer impactcomplaint rate related to document handling, dispute re-open rate, request-for-more-info rate, accessibility defects

Balanced executive dashboard:

Speed: cycle time and review productivity improve.
Quality: critical fields are accurate and calibrated.
Risk: fraud, tamper, legal hold and records controls work.
Fairness: errors are monitored across document quality, language and channel.
Trust: every automated or reviewed decision is replayable.

17. Failure Modes

Failure modeWhy dangerousBetter control
OCR text treated as truthOCR may misread critical amounts, dates, namessource-linked fields, validation, human review
Document-level confidence used for all fieldsHigh average hides one critical field errorfield criticality thresholds
LLM summary becomes decision recordSummary may omit conflicts or invent conclusionssource-linked summary plus structured evidence
Wrong document classificationRoutes to wrong workflow, SLA, retention classambiguity queue and QA sampling
Silent reprocessing overwrites evidenceAudit cannot explain historical decisionversioned artifacts and lineage
No legal hold propagationDerived OCR/extractions may be deleted while raw doc heldhold-aware artifact graph
Reviewer rubber-stampingAutomation bias turns human review into weak controlsource-first UI, reason codes, QA
Fraud model blocks customers without reviewFalse positives can cause harm and complaintsrisk signal routing and human/fraud review
Vendor retains documents unexpectedlyPrivacy, records and legal hold exposurecontract controls, data inventory, deletion evidence
Model trained on records under hold or restricted useGovernance and discovery riskdata use controls and hold-aware training exclusion
Prompt injection from document textModel follows malicious embedded instructionstool isolation and output validation
Complaint cannot link to evidenceRoot cause and remediation become speculativecomplaint-to-evidence trace

18. Interview-Ready Takeaways

Q1: 为什么 document intelligence 不是 OCR 项目?

OCR 只解决“看见文字”。金融零售真正需要的是 evidence-grade extraction: 文档 provenance、layout/source coordinates、字段置信度、业务验证、fraud/tamper checks、人工复核、workflow action、records retention、legal hold 和 audit replay。否则只是把人工读错文档变成机器批量读错文档。

Q2: 如何判断某个抽取字段可以自动进入业务流程?

不能只看模型 confidence。要看 field criticality、document class、confidence calibration、source linkage、validation result、cross-document consistency、fraud/tamper signals、policy gate 和 human review threshold。高影响字段例如 income、authority、claim amount、beneficial owner 通常需要更强验证或复核。

Q3: AI summary 在 claims/disputes/KYC 中如何安全使用?

Summary 应该作为 reviewer productivity tool, 不是 evidence replacement。每个关键陈述要 source-linked, 不能给出 unsupported eligibility、KYC/KYB、fraud 或 legal conclusion。最终业务决定应引用 structured evidence、policy reason 和 reviewer action。

因为 document AI 会产生 raw docs、OCR text、layout JSON、extracted fields、summaries、review notes、exports 等 derived artifacts。哪些是 records、保留多久、是否受 legal hold 影响, 取决于 product、record type、jurisdiction、retention schedule、hold status 和 Legal/Compliance/Records interpretation。架构必须能传播 metadata and hold state, 而不是事后人工查找。

Q5: 高级 PM 如何衡量 document AI 成功?

不只看自动化率或处理时间。要看 field-level accuracy、confidence calibration、review overturn、evidence completeness、records/hold metadata completeness、fraud/tamper yield、complaint defects、downstream rework、customer harm and audit replay success。速度必须和证据质量一起看。


19. Practical Templates

19.1 Document Evidence Envelope

Document ID:
Case ID:
Customer / business reference:
Source channel:
Received timestamp:
Submitter / system reference:
Raw file hash:
File type and size:
Page count:
Document class / subtype:
Classification confidence:
Language / locale:
Quality score:
Processing lineage:
  renderer version:
  OCR/layout version:
  extraction model version:
  ruleset version:
Fraud/tamper signals:
Record class:
Retention rule:
Legal hold flag:
Access restrictions:
Derived artifacts:
Workflow decisions:
Complaint / audit links:

19.2 Field Extraction Spec

FieldDefinition
field_namegross_pay_amount
document_classespaystub, payroll statement
source_requirementpage + bounding box + raw text
normalizationcurrency amount with locale and period
validationsgross >= net, pay period exists, YTD consistency
confidence_thresholdhigher threshold for auto-populate, lower threshold for review suggestion
review_triggerlow confidence, arithmetic mismatch, employer mismatch, tamper signal
allowed_workflow_useincome package preparation, not final credit decision by itself
prohibited_useunsupported affordability conclusion
retention_metadataderived field linked to paystub record class

19.3 Confidence and Review Policy

Document class:
Workflow:
Field criticality:
Customer impact:
Fraud exposure:
Auto-populate allowed when:
  field confidence:
  classification confidence:
  validations:
  tamper signals:
  cross-document consistency:
Human review required when:
  low confidence:
  conflict:
  authority/legal implication:
  high amount:
  vulnerable customer / complaint sensitivity:
Sampling rule:
Reviewer queue:
QA metric:

19.4 Human Review Record

Review task ID:
Reviewer role:
Document ID:
Field(s) reviewed:
Model suggestion:
Source location:
Validation failures:
Fraud/tamper signals:
Reviewer decision:
Corrected value:
Reason code:
Free-text rationale:
Second approval:
Workflow action:
Customer communication reference:
Timestamp:
Artifact type:
Raw document:
Rendered page:
OCR text:
Layout JSON:
Extracted fields:
AI summary:
Review notes:
Workflow decision:
Exported package:
Record class owner:
Retention schedule reference:
Legal hold propagation rule:
Disposition approval:
Vendor copy:
Search / retrieval metadata:
Access restrictions:
Audit evidence:

20. Final Operating Principle

成熟的 AI document intelligence architecture 可以用一个问题检验:

Can the institution prove that every automated or human-assisted document decision
was based on the right document,
the right source-linked fields,
the right confidence and validation controls,
the right human review boundary,
the right fraud and records treatment,
and the right workflow policy at that point in time?

如果答案不清楚, 团队缺的不是更强 OCR。缺的是 document provenance、evidence quality、confidence calibration、human review design、records/legal hold integration、fraud controls、workflow contracts 和 AI governance 组成的一套 evidence operating architecture。