AI Document Intelligence / Unstructured Data / Evidence Quality Playbook
核心判断:
AI Document Intelligence / Unstructured Data / Evidence Quality Architecture Playbook
定位: 面向 CBAP+ Senior BA、高级 AI PM、Product Architect、Enterprise Architect、Operations Architect、Records / Information Governance、Fraud Risk、Model Risk、KYC/KYB、Claims、Disputes、Loan and Insurance Servicing 负责人, 把 document intelligence 从“自动读文档”落地为 evidence-grade, workflow-ready, records-aware, auditable operating system。 适用范围: bank statements、paystubs、claims packages、payment disputes、KYC/KYB documents、insurance and loan servicing documents、complaints、operational correspondence、mailroom automation、agent assist and workflow automation。 核心产出: executive framing、taxonomy、decision gates、target architecture、required artifacts、RACI/operating model、implementation roadmap、evidence pack、release checklists、metrics、anti-patterns、tabletop scenarios and portfolio deliverables。
核心判断:
Financial document AI succeeds when the institution can rely on extracted evidence, not when the model can read more PDFs.
0. Disclaimer
本文是学习、作品集、架构训练和内部治理讨论材料, 不构成法律意见、合规结论、记录保留结论、e-discovery 建议、KYC/KYB 充分性判断、贷款或保险承保结论、消费者争议处置意见、欺诈处置指令、模型验证报告或供应商推荐。
正式项目必须由 Legal、Compliance、Privacy、Records Management、Information Governance、Model Risk、Fraud Risk、Financial Crime、Operations、Product、Architecture、Information Security、Data Governance、Vendor Management、Internal Audit 和相关业务 owner 共同判断。记录、证据、法律保留、客户通知、KYC/KYB、信贷、保险、投诉、争议、索赔、跨境数据和 e-discovery 的具体适用性, 取决于 product、record type、jurisdiction、retention schedule、legal hold status、customer segment、channel、policy、contract 和 Legal / Compliance / Records interpretation。
边界原则:
- OCR result 是 extracted text, 不是 verified fact。
- LLM summary 是 productivity aid, 不是原始证据替代品。
- Confidence score 是 routing/control input, 不是业务结论。
- Fraud/tamper model output 是 risk signal, 不是欺诈结论。
- Records retention、legal hold、e-discovery、KYC/KYB、lending、insurance 和 dispute obligations 的具体适用性必须由对应治理职能解释。
1. Executive Framing
高管常见目标:
Reduce manual document review.
Accelerate onboarding, claims, disputes and servicing.
Use AI to summarize and extract unstructured documents.
真正的项目目标应改写为:
Create an evidence-grade document intelligence capability
that automates low-risk extraction,
routes ambiguity to skilled review,
prevents unsupported decisions,
preserves records and legal hold controls,
and can replay every material document-driven action.
如果没有 evidence architecture, document AI 会引入隐藏风险:
- bank statement 被 OCR 错读, income calculation 批量错误。
- paystub 伪造模板被高置信度抽取, 进入 underwriting workflow。
- claim photo 被复用, fraud signal 未传给 adjuster。
- dispute evidence summary 省略关键商户证明, 导致不当处置。
- KYB 文件显示 authorized representative 不清楚, 但 AI prefill 直接推进开户。
- legal hold 已触发, 但 OCR JSON、summary、review notes 或 vendor copies 被清理。
- 客户投诉时, 团队找不到“模型看了什么、谁复核了什么、为什么采取该动作”。
Executive one-liner:
Document AI is a controlled evidence supply chain for operations, risk and records.
1.1 Steering Committee Questions
- 哪些 document classes 和 fields 可以自动抽取, 哪些只能辅助人工?
- 哪些字段会影响客户权利、资金、身份、保险、信贷、争议或合规?
- 如何证明字段来自哪份文档、哪一页、哪个区域、哪个模型版本?
- 置信度阈值如何校准, reviewer overturn 如何反馈?
- records retention、legal hold、vendor copies 和 derived AI artifacts 如何治理?
- 出现投诉、审计、监管问询或诉讼保全时, case 能否完整 replay?
2. Source Anchors
| Anchor | Official link | 本 playbook 使用方式 |
|---|---|---|
| NIST AI Risk Management Framework | https://www.nist.gov/itl/ai-risk-management-framework | 用 Govern / Map / Measure / Manage 组织 document AI 风险、eval、monitoring、human oversight、incident and evidence controls |
| NIST Privacy Framework | https://www.nist.gov/privacy-framework | 用 privacy risk management、data minimization、purpose limitation、processing controls 和 monitoring 设计文档数据边界 |
| NARA Records Management | https://www.archives.gov/records-mgmt | 用 records lifecycle、disposition、records program 和 accountability 设计 records governance interface |
| NARA Electronic Records Management | https://www.archives.gov/records-mgmt/policy/transfer-guidance-tables.html | 用 electronic records metadata、format、transfer/readiness guidance 设计电子记录可保存、可检索、可迁移的架构讨论 |
| CFPB Consumer Complaint Database | https://www.consumerfinance.gov/data-research/consumer-complaints/ | 用 consumer complaint operations 视角设计 complaint linkage、evidence replay、root cause and remediation learning loop |
| FFIEC Authentication and Access to Financial Institution Services and Systems | https://www.ffiec.gov/press/pr081121.htm | 用 financial institution access/authentication and layered security 思路设计 reviewer access、document workflow action、privileged operation controls |
| ISO/IEC 42001 overview | https://www.iso.org/standard/42001 | 用 AI management system、roles、operation、performance evaluation、internal audit and continual improvement 建立 operating model |
Source-to-control pattern:
source anchor -> control objective -> product decision
-> workflow requirement -> evidence artifact -> owner -> metric
3. Taxonomy
3.1 Document Use Case Classes
| Class | Examples | Primary decision impact |
|---|---|---|
| Income and affordability | bank statements, paystubs, payroll summaries, benefit letters | income evidence, servicing treatment, affordability package |
| Identity and entity | IDs, business registration, ownership docs, licenses, utility bills | KYC/KYB evidence, authority, onboarding routing |
| Claims and loss | claim forms, photos, invoices, police reports, medical bills | claim triage, payout support, fraud review |
| Payment disputes | receipts, shipping proof, merchant correspondence, customer statements | dispute reason code, evidence package, SLA |
| Account authority | POA, court order, death certificate, consent forms, corporate resolution | account access, maintenance, servicing authority |
| Complaints and correspondence | letters, emails, transcripts, attachments | complaint classification, response evidence, RCA |
| Internal operations | branch scans, mailroom forms, agent notes, back office forms | task routing, QA, operational control |
3.2 Evidence Classes
| Evidence class | Meaning | Governance need |
|---|---|---|
| Raw artifact | 原始文件或图像 | hash, source, access, retention |
| Rendered artifact | 系统生成的 page image / normalized PDF | renderer version, lineage |
| OCR/layout artifact | text, table, reading order, bounding boxes | source anchoring, versioning |
| Extracted field | schema-bound field and value | confidence, validation, review |
| Derived fact | calculations or reconciled facts | formula, input trace, policy use |
| AI summary | source-linked narrative for reviewer | prohibited conclusions, citation |
| Human decision | reviewer correction, acceptance, escalation | reason code, role, timestamp |
| Workflow action | case update, payment, request-for-info, escalation | policy decision id, evidence link |
| Records metadata | record class, retention rule, legal hold flag | Records/Legal governance |
3.3 Criticality Levels
| Level | Examples | Default treatment |
|---|---|---|
| Low | document title, non-decision routing label | auto-populate with sampling |
| Medium | address, product type, non-material date | validation and review on conflict |
| High | income amount, claim amount, policy number, account holder, dispute reason | field-level threshold, source link, validations, review triggers |
| Restricted / specialist | POA, court order, beneficial ownership, medical record, fraud signal, legal hold | specialized queue, access controls, policy review |
4. Target Operating Architecture
channel intake
-> document provenance and storage
-> normalization / rendering / safety scan
-> classification and package splitting
-> OCR / layout / table extraction
-> multimodal field and entity extraction
-> normalization and validation
-> confidence calibration
-> fraud and tamper checks
-> evidence policy gates
-> human review workbench
-> workflow orchestration
-> records retention and legal hold integration
-> evidence ledger, complaint linkage, monitoring and governance
Architecture capabilities:
| Capability | What it must do |
|---|---|
| Document provenance | Assign document id, hash, source channel, received time, custody events |
| Document classification | Identify type/subtype, package boundaries, language, quality, ambiguity |
| Layout understanding | Preserve page, coordinates, reading order, tables, checkboxes, signatures |
| Schema-constrained extraction | Extract only governed fields with allowed values and source anchors |
| Entity normalization | Normalize names, dates, addresses, monetary values, IDs, account masks |
| Evidence validation | Run arithmetic, chronology, cross-document and system-of-record checks |
| Confidence and triage | Route by calibrated confidence, field criticality and customer impact |
| Fraud/tamper detection | Detect altered files, duplicates, fake templates, metadata anomalies, prompt injection |
| Human review | Source-first review, structured overrides, QA sampling and dual control |
| Workflow integration | Feed KYC, claims, disputes, servicing, complaints and ops queues |
| Records/hold integration | Apply record class, retention metadata, legal hold propagation and disposition controls |
| Governance monitoring | Track evals, model drift, review outcomes, complaints, incidents and CAPA |
5. Decision Gates
Gate 0: Use Case Boundary
| Question | Pass condition |
|---|---|
| Which journey is in scope? | one workflow named: KYC, KYB, claims, disputes, servicing, complaints or operations |
| Which documents are accepted? | document class/subtype list and unsupported formats |
| What decisions may use extracted evidence? | allowed workflow actions and prohibited actions defined |
| Is AI summarizing, extracting, classifying or recommending? | AI role and decision boundary documented |
| Does the use case touch records, legal hold, privacy, KYC/KYB, lending, insurance or dispute obligations? | governance functions identified |
Gate 1: Evidence Schema
| Question | Pass condition |
|---|---|
| What fields are required? | field dictionary with definitions, data types, source requirements |
| Which fields are high impact? | criticality and customer harm assessment |
| What source anchor is required? | page/coordinate/table cell/paragraph anchor rule |
| What validations are required? | arithmetic, chronology, cross-document, system match |
| What summaries are allowed? | source-linked summary rules and prohibited conclusions |
Gate 2: Confidence and Review Policy
| Question | Pass condition |
|---|---|
| Are confidence scores calibrated? | calibration evidence by field and document class |
| Are thresholds field-specific? | threshold matrix by criticality and journey risk |
| What routes to human review? | review triggers and queue ownership |
| Is QA sampling defined? | sample rules for auto-processed and reviewed cases |
| Are reviewer overrides captured? | structured reason codes and source-linked corrections |
Gate 3: Fraud, Security and Access
| Question | Pass condition |
|---|---|
| Can the system detect tamper and duplicate patterns? | metadata, visual, duplicate and validation checks |
| Are prompt injection and malicious files controlled? | isolation, safety scan, schema validation |
| Are reviewer and vendor access controlled? | role-based access, logging, least privilege |
| Are privileged workflow actions protected? | step-up/dual control where appropriate |
| Are fraud signals routed without leaking sensitive rules? | reviewer-only signal display and customer-safe language |
Gate 4: Records, Legal Hold and Evidence Replay
| Question | Pass condition |
|---|---|
| Are raw and derived artifacts classified? | record class and retention metadata assigned |
| Does legal hold propagate? | raw, OCR, extraction, summaries, review notes, exports and vendor copies checked |
| Can disposition be audited? | disposition workflow and exception log |
| Can a complaint or audit replay the case? | evidence pack with document, model, review, workflow and final message |
| Are vendor obligations tracked? | retention, deletion, access, audit and exit evidence |
Gate 5: Production Readiness
| Question | Pass condition |
|---|---|
| Are eval sets representative? | document quality, channel, language, layout, fraud pattern, customer segment covered |
| Are monitoring dashboards live? | extraction, confidence, review, records, fraud, complaints and incidents |
| Are operations trained? | reviewer playbook and QA plan completed |
| Is incident response ready? | model defect, vendor outage, legal hold miss, fraud pattern, complaint spike playbooks |
| Is go/no-go tied to evidence quality? | release decision uses balanced scorecard, not automation rate alone |
6. Required Artifacts
| Artifact | What it proves |
|---|---|
| Use Case Boundary Card | 明确 journey、documents、AI role、business actions、risk and governance owners |
| Document Class Taxonomy | 证明文档类别、子类型、unsupported documents and routing |
| Evidence Field Dictionary | 证明字段定义、source anchor、data type、criticality、allowed uses |
| Extraction and Validation Spec | 证明 OCR/layout/extraction/normalization/validation controls |
| Confidence and Review Policy | 证明何时自动化、何时人工复核、何时双控 |
| Fraud/Tamper Threat Model | 覆盖 altered PDF、fake template、duplicate、deepfake、insider and prompt injection |
| Records and Legal Hold Mapping | 证明 raw and derived artifacts 如何分类、保留、hold and disposition |
| Workflow Contract | 证明 evidence payload 如何进入 case/task/decision systems |
| Human Review Workbench Spec | 证明 reviewer 能看到 source, confidence, conflicts, reason codes and history |
| AI Model and Prompt Inventory | 证明模型、ruleset、prompt、vendor、versions and allowed uses |
| Eval and QA Scenario Suite | 证明按文档类别、字段、语言、质量、欺诈和流程结果测试 |
| Evidence Bundle Schema | 证明投诉、审计、监管问询时可以 replay |
| RACI and Governance Cadence | 证明 Product、Ops、Records、Legal、Model Risk、Fraud、Architecture 的责任边界 |
6.1 Evidence Field Dictionary Pattern
| Field | Example |
|---|---|
| field_name | statement_period_end_date |
| document_class | bank statement |
| source_anchor | page + coordinate + OCR text |
| data_type | date |
| criticality | high for income verification, medium for routing |
| normalization | ISO date with locale parsing |
| validations | period continuity, not future date, matches page header |
| review_triggers | low confidence, missing page, conflict with upload metadata |
| allowed_uses | statement completeness, income package preparation |
| prohibited_uses | final lending decision by itself |
| retention_link | derived field linked to statement artifact |
6.2 Workflow Contract Pattern
Workflow:
Trigger:
Accepted document classes:
Required fields:
Optional fields:
Excluded data:
Confidence thresholds:
Validation rules:
Fraud/tamper routes:
Human review triggers:
Records metadata:
Legal hold behavior:
Case update payload:
Customer communication constraints:
Monitoring metrics:
Owner:
7. RACI / Operating Model
| Activity | Accountable | Responsible | Consulted | Informed |
|---|---|---|---|---|
| Use case prioritization | Product Owner | AI PM / Senior BA | Operations, Risk, CX | Steering Committee |
| Document taxonomy | Operations Owner | BA / Process Architect | Records, Legal, Compliance | Product |
| Evidence field dictionary | Product / Ops | BA / Data Product | Compliance, Model Risk, Fraud | Review teams |
| Extraction architecture | Enterprise Architecture | Engineering / AI Platform | Security, Data Governance, Vendor Mgmt | Operations |
| Confidence policy | Business Risk Owner | AI PM / Model Risk / Ops QA | Fraud, Compliance, Product | Audit |
| Human review design | Operations Owner | Ops Lead / UX / BA | Compliance, Accessibility, Fraud | Product |
| Fraud/tamper model | Fraud Risk | Fraud Analytics / Security | Product, Model Risk, Legal | Ops |
| Records mapping | Records Management | Information Governance / Data | Legal, Compliance, Architecture | Product |
| Legal hold propagation | Legal / Records | Platform / Case Systems | Compliance, Vendor Mgmt | Audit |
| Privacy controls | Privacy / Data Governance | Product / Engineering | Legal, Security, CX | Ops |
| Model inventory and eval | Model Risk / AI Governance | AI Platform / AI PM | Product, Compliance, Ops | Audit |
| Workflow integration | Operations / Product | Engineering / Case Platform | Architecture, Records, Fraud | CX |
| Complaint linkage | Complaint Ops | Case Management / Data | Compliance, Product, Model Risk | Audit |
| Independent assurance | Internal Audit | Audit Team | Risk, Legal, Technology | Board Committee |
Governance cadence:
| Cadence | Forum | Output |
|---|---|---|
| Weekly | Pilot operations review | review backlog, extraction defects, queue SLA |
| Weekly | Fraud/tamper review | emerging patterns, false positives, confirmed fraud yield |
| Monthly | Evidence quality council | field accuracy, calibration, overturns, downstream rework |
| Monthly | Records and hold review | metadata completeness, hold propagation tests, vendor attestations |
| Monthly | AI governance review | eval results, prompt/ruleset changes, incidents |
| Quarterly | Product/risk steering committee | scale, restrict, redesign, retire decisions |
| Quarterly | Complaint learning loop | complaint themes, RCA, CAPA and policy changes |
| Semiannual | Tabletop exercise | legal hold miss, vendor outage, model defect, fraud pattern |
8. Implementation Roadmap
Days 1-30: Baseline and Scope
| Day range | Work | Artifact |
|---|---|---|
| 1-3 | Select one bounded workflow, e.g. paystub income extraction, claims invoice review, dispute evidence package | Use Case Boundary Card |
| 4-7 | Inventory document classes, sources, channels, volumes, current defects | Document Class Taxonomy |
| 8-12 | Define high-impact fields, source anchors, allowed and prohibited uses | Evidence Field Dictionary |
| 13-16 | Map records, retention, legal hold and vendor copy requirements for raw and derived artifacts | Records and Legal Hold Mapping |
| 17-20 | Design extraction, validation, confidence and review thresholds | Extraction and Review Policy |
| 21-24 | Threat model tamper, duplicate, prompt injection, insider and vendor risks | Fraud/Tamper Threat Model |
| 25-27 | Define workflow payload, case states, reason codes and customer communication constraints | Workflow Contract |
| 28-30 | Define metrics, eval sets, QA sampling and evidence bundle | Control Dashboard Spec |
Days 31-60: Controlled Build and Pilot
| Day range | Work | Artifact |
|---|---|---|
| 31-35 | Implement provenance, hash, storage, rendering and source anchoring | Provenance Test Report |
| 36-40 | Configure classification, OCR/layout and schema-constrained extraction | Extraction Test Report |
| 41-45 | Add validation, confidence routing and review queues | Review Routing Report |
| 46-50 | Build human review workbench with source-linked fields and override capture | Reviewer QA Record |
| 51-54 | Integrate fraud/tamper checks and security/access controls | Fraud Control Report |
| 55-57 | Connect records metadata, retention and legal hold flags | Records Integration Report |
| 58-60 | Pilot with limited scope and manual QA sampling | Pilot Evidence Quality Report |
Days 61-90: Scale Decision and Assurance
| Day range | Work | Artifact |
|---|---|---|
| 61-65 | Analyze field accuracy, confidence calibration, review overturn and complaints | Outcome Review |
| 66-70 | Tune thresholds, validations, queues and customer request-for-info language | Change Control Record |
| 71-75 | Test legal hold propagation, disposition exception and vendor deletion evidence | Hold and Disposition Test |
| 76-80 | Run model defect, vendor outage and fraud pattern tabletop | Tabletop Decision Log |
| 81-85 | Complete model risk, privacy, records and compliance review | Governance Review Pack |
| 86-90 | Decide scale, restrict, redesign or retire | Go/No-Go Decision Record |
9. Evidence Pack
Minimum evidence fields:
| Field | Purpose |
|---|---|
case_id | operational case reference |
workflow_id | KYC, KYB, claims, disputes, servicing, complaints |
document_id | unique document reference |
document_version | raw and derived artifact version |
source_channel | upload, branch scan, mail, email, API, vendor |
received_timestamp | intake time |
raw_file_hash | integrity check |
document_class | type and subtype |
classification_result | model/rule output and confidence |
page_count | completeness signal |
quality_result | blur, missing pages, render errors |
processing_lineage | OCR/layout/extraction model and rule versions |
extracted_fields | field values with source anchors |
validation_results | rules, cross-document and system checks |
confidence_results | field and case scores with calibration bucket |
fraud_tamper_signals | duplicate, metadata, visual, arithmetic, prompt injection |
human_review_records | reviewer decisions, corrections, reason codes |
policy_decision | accept evidence, request more info, review, reject use |
workflow_action | case update, payment, escalation, request, communication |
ai_summary_run_id | model summary trace when used |
records_metadata | record class, retention rule, disposition state |
legal_hold_flag | hold status and propagation reference |
access_events | reviewer/system/vendor access |
customer_final_message_id | final communication or request-for-info |
complaint_id | linked complaint if applicable |
capa_id | corrective action when defect found |
Evidence rules:
- Store raw artifacts and derived artifacts with clear lineage, not as one mutable blob。
- Preserve source anchors for every material extracted field。
- Treat missing source anchor for high-impact field as a control defect。
- Separate reviewer-facing summaries from official records and decision evidence unless Records/Legal approves their role。
- Make legal hold state visible to deletion, training-data selection, vendor purge and export jobs。
- Capture final customer/counterparty communication, because disputes often turn on what was said and when。
10. Workflow Playbooks
10.1 Bank Statement / Paystub Income Evidence
| Step | Control |
|---|---|
| Intake | verify page count, statement/pay period, source channel and raw hash |
| Extract | employer, employee, period, gross/net/YTD, deposits, balances, account mask |
| Validate | arithmetic, period continuity, YTD consistency, name/account match, duplicate check |
| Confidence | high-impact amount and identity fields use stricter thresholds |
| Review | route mismatches, missing pages, fake-template signals, high amount outliers |
| Workflow | create income evidence package, not final affordability conclusion by itself |
| Records | attach raw and derived artifacts to case record class and hold state |
10.2 Claims Package
| Step | Control |
|---|---|
| Intake | split package into forms, invoices, photos, reports, correspondence |
| Extract | claimant, policy, loss date, invoice totals, provider, photo metadata |
| Validate | timeline consistency, duplicate invoice/photo, coverage period, amount totals |
| Confidence | photo and invoice authenticity signals influence triage |
| Review | adjuster sees source-linked timeline and conflicting evidence |
| Workflow | route to payout support, investigation, request-for-info or denial review per policy |
| Records | preserve package, adjuster notes, AI summary and final communication linkage |
10.3 Payment Dispute Evidence
| Step | Control |
|---|---|
| Intake | classify cardholder evidence, merchant evidence, shipping proof, screenshots |
| Extract | transaction, merchant, date, amount, reason code, delivery/tracking details |
| Validate | match transaction system, reason-code evidence checklist, SLA deadline |
| Confidence | summary cannot replace required proof element |
| Review | route weak evidence or conflicting claims to dispute specialist |
| Workflow | create source-linked representment or response package |
| Records | preserve submitted and outgoing evidence with final response |
10.4 KYC/KYB Documents
| Step | Control |
|---|---|
| Intake | identify individual, entity, license, ownership, authority and address documents |
| Extract | names, entity IDs, owners, roles, license status, address, dates |
| Validate | freshness, entity resolution, authority scope, conflicting ownership |
| Confidence | high-impact authority and ownership fields require stricter handling |
| Review | route beneficial ownership, authorization ambiguity and stale documents |
| Workflow | evidence package for compliance/ops review, not automatic compliance conclusion |
| Records | retention and hold handling per product, record type and governance interpretation |
10.5 Complaints and Operational Correspondence
| Step | Control |
|---|---|
| Intake | detect complaint indicators, product, customer, attachments, urgency |
| Extract | issue theme, harm, requested resolution, dates, referenced transactions |
| Validate | case/customer match, prior interactions, response deadlines |
| Confidence | low confidence complaint classification goes to complaint ops |
| Review | preserve customer voice and avoid summary-only handling |
| Workflow | create complaint case, route RCA, attach evidence and final response |
| Records | link documents, AI runs, agent notes, response and CAPA |
11. Checklists
11.1 Release Checklist
| Check | Passing evidence |
|---|---|
| Use case and decision boundary documented | Use Case Boundary Card |
| Document classes and unsupported formats defined | Document Class Taxonomy |
| High-impact fields identified | Evidence Field Dictionary |
| Source anchor required for material fields | extraction schema test |
| Confidence thresholds calibrated | calibration report |
| Human review queues configured | workflow test |
| Reviewer override reasons captured | review workbench test |
| Fraud/tamper checks active | threat model and test cases |
| Records metadata assigned | records integration test |
| Legal hold propagation tested | hold propagation evidence |
| Vendor retention and access controls reviewed | vendor control record |
| Privacy minimization and redaction reviewed | privacy assessment evidence |
| Complaint replay path tested | simulated complaint evidence pack |
| Monitoring dashboard live | production readiness dashboard |
11.2 Extraction QA Checklist
| Check | Passing evidence |
|---|---|
| OCR/layout preserves page and coordinates | source anchor sample |
| Table extraction handles merged cells and totals | table validation result |
| Date and currency normalization tested | locale test cases |
| Entity resolution does not overwrite raw value | raw and normalized field pair |
| Low-quality images routed | quality gate log |
| Missing pages detected | completeness rule |
| Model output schema enforced | invalid output rejection |
| Prompt injection ignored | adversarial document test |
| High-confidence wrong fields sampled | QA sampling report |
11.3 Human Review Checklist
| Check | Passing evidence |
|---|---|
| Reviewer sees original source next to extracted field | UI test |
| Source coordinate click works | workbench test |
| Confidence and validation failures visible | reviewer screenshot/spec |
| Sensitive fraud details protected | role-based display |
| Override captures reason and corrected value | review log |
| Dual control applied where required | approval record |
| QA samples auto-pass and manual-pass cases | QA plan |
| Reviewer feedback does not train models without approval | data governance control |
11.4 Records and Legal Hold Checklist
| Check | Passing evidence |
|---|---|
| Raw document has record metadata | record metadata sample |
| OCR/layout/extraction artifacts classified | derived artifact inventory |
| AI summaries mapped to retention treatment | Records/Legal decision record |
| Legal hold propagates to all linked artifacts | propagation test |
| Vendor copies checked for hold and deletion | vendor attestation/control |
| Disposition has approval and audit | disposition log |
| Training-data selection checks hold and use restrictions | data pipeline control |
| Retrieval supports case/audit/complaint replay | search/replay test |
11.5 Workflow Integration Checklist
| Check | Passing evidence |
|---|---|
| Case system receives structured evidence payload | integration test |
| Policy reason codes captured | decision log |
| Request-for-info language approved | CX/Compliance review |
| Final customer message linked to evidence | communication id |
| SLA timers consider document ambiguity | workflow config |
| Downstream systems do not treat extraction as final decision | contract test |
| Exceptions and fallbacks are operationally staffed | queue capacity plan |
12. Metrics and KRIs
| Metric | Why it matters |
|---|---|
| Field-level accuracy by criticality | avoids hiding high-impact errors |
| High-confidence wrong field rate | detects calibration failure |
| Source-anchor completeness | measures auditability |
| Classification ambiguity rate | shows routing risk |
| Missing page detection rate | protects completeness |
| Human review overturn rate | reveals extraction/control quality |
| QA defect rate after auto-pass | estimates residual risk |
| Straight-through processing rate by document class | productivity with risk context |
| Review queue SLA | operational capacity and customer impact |
| Downstream rework rate | quality of evidence entering workflow |
| Duplicate/tamper signal yield | fraud control effectiveness |
| False positive fraud review rate | customer friction and ops burden |
| Records metadata completeness | records readiness |
| Legal hold propagation success | preservation control |
| Vendor deletion/retention exceptions | third-party risk |
| Complaint document trace completeness | complaint/audit readiness |
| AI unsupported conclusion defects | model governance |
| Accessibility defect rate | inclusive operations |
Balanced scorecard:
Productivity: fewer manual minutes per case.
Evidence quality: critical fields accurate, anchored and validated.
Risk: fraud, records, privacy and legal hold controls operate.
Customer outcome: fewer unnecessary document requests and complaint defects.
Governance: every material document-driven action is replayable.
13. Anti-Patterns
| Anti-pattern | Why it fails | Better pattern |
|---|---|---|
| “OCR all docs, then ask LLM” | loses layout/source control and invites hallucination | schema-constrained, source-linked extraction |
| One confidence threshold for all fields | treats routing label and income amount the same | threshold by field criticality and workflow risk |
| Summary as evidence | unsupported omissions or invented conclusions | source-linked summary plus structured evidence |
| Auto-approve based on high confidence | ignores validation, fraud, records and policy | evidence policy gate |
| Human review with no source context | reviewer cannot verify efficiently | source-first workbench |
| Reviewer override as free text only | hard to monitor and improve | reason codes + structured corrections |
| No raw artifact hash | cannot prove integrity | immutable raw artifact and lineage |
| No derived-artifact records mapping | OCR/summary/review notes become governance blind spot | artifact inventory and records metadata |
| Legal hold handled manually outside AI pipeline | deletion/training/vendor purge may miss artifacts | hold-aware data and workflow graph |
| Vendor black box extraction | weak evidence and model governance | version, lineage, eval and audit obligations |
| Fraud model makes final decision | false positives and explainability risk | fraud signal plus policy/human review |
| Automation rate as north-star metric | hides customer harm and evidence defects | balanced scorecard |
14. Tabletop Scenarios
Scenario 1: High-Confidence Paystub Error
The model extracts gross pay correctly but misreads pay period and YTD amount.
Confidence is high. The income package is about to move forward.
Expected decisions: field criticality threshold, arithmetic validation, reviewer route, downstream case hold, model defect capture。
Scenario 2: Legal Hold Miss on Derived Artifacts
A legal hold is applied to a servicing case. Raw PDFs are preserved,
but OCR JSON, AI summaries and reviewer notes are scheduled for deletion.
Expected decisions: hold propagation graph, disposition stop, vendor copy check, Records/Legal escalation, CAPA。
Scenario 3: Dispute Summary Omits Merchant Evidence
The AI summary says the customer provided strong evidence,
but the merchant delivery proof contradicts the customer statement.
Expected decisions: source-linked summary defect, dispute specialist review, reason-code evidence checklist, eval update。
Scenario 4: Fake Bank Statement Template
A bank statement matches a known visual template but metadata and transaction pattern are inconsistent.
The extraction fields look clean.
Expected decisions: fraud/tamper signal route, no customer-facing rule leakage, manual review, duplicate/template detection improvement。
Scenario 5: KYB Authority Ambiguity
A business registration document and board resolution are uploaded.
The model extracts an officer name but cannot establish authority to open the product.
Expected decisions: authority field specialist review, KYC/KYB interpretation boundary, request-for-info wording, evidence pack。
Scenario 6: Complaint Replay Failure
A customer complains that an AI rejected their claim documents.
The team can find the raw PDF but not the model run, reviewer override or final message.
Expected decisions: evidence pack gap, incident classification, complaint remediation, logging and workflow contract update。
15. Portfolio Deliverables
| Deliverable | What it demonstrates |
|---|---|
| Executive one-pager | 你能把 document AI 讲成 evidence operating system, 不只是 OCR automation |
| Use case boundary card | 你能控制 scope and decision impact |
| Document class taxonomy | 你能把文档、流程、风险和记录治理连接 |
| Evidence field dictionary | 你能定义 source-linked, criticality-aware extraction |
| Confidence and review policy | 你能设计 calibrated automation and human oversight |
| Records/legal hold mapping | 你能把 raw and derived artifacts 纳入治理 |
| Fraud/tamper threat model | 你能覆盖 fake documents, duplicates, prompt injection and insider risks |
| Reviewer workbench spec | 你能把 human review 设计成有效控制 |
| Workflow contract | 你能让 AI evidence 正确进入 case systems |
| Evidence bundle schema | 你能支持 complaint, audit and regulatory replay |
| Metrics dashboard | 你能平衡效率、质量、风险、客户和治理 |
Portfolio storyline:
I designed an AI document intelligence architecture for financial retail operations.
It converts unstructured documents into source-linked evidence,
uses calibrated confidence and validation to route work,
integrates human review, fraud/tamper checks, records retention and legal hold,
and preserves a replayable evidence trail from intake to final workflow action.
16. Interview Answers
Q1: 如何向高管解释 document intelligence 的边界?
30 秒:
Document intelligence 不是 OCR 自动化, 而是 evidence supply chain。它把文档采集、provenance、layout、field extraction、confidence、validation、human review、fraud checks、workflow action、records retention 和 legal hold 连成一套可审计系统。能自动化的只是低风险、高质量、可验证字段; 高影响或冲突证据必须复核。
Q2: 什么样的字段可以 straight-through processing?
30 秒:
要满足五个条件: 字段风险低或中等、source anchor 完整、confidence 已校准、业务验证通过、没有 fraud/tamper/legal/authority trigger。高影响字段如 income、claim amount、beneficial ownership、POA、dispute reason 通常不能只靠模型分数推进。
Q3: 如何设计 human-in-the-loop 才不是形式主义?
30 秒:
reviewer 必须看到原文和字段定位、confidence、validation failures、conflicts、history and risk signals, 并用 structured reason code 记录接受、纠正或升级。还要做 QA sampling、overturn monitoring、dual control and feedback governance。否则 human review 很容易变成 rubber-stamp。
Q4: records retention 和 legal hold 为什么会影响 AI 文档项目?
30 秒:
因为系统不只保存原 PDF, 还会生成 OCR text、layout JSON、extracted fields、AI summaries、review notes、exports and final decisions。哪些是 records、保留多久、legal hold 是否覆盖, 取决于 product、record type、jurisdiction、retention schedule、hold status 和 Legal/Compliance/Records interpretation。架构必须让这些 artifact 可分类、可保留、可冻结、可检索。
Q5: 如何衡量 document AI 是否真的可用?
30 秒:
看 field-level accuracy、high-confidence wrong rate、source-anchor completeness、review overturn、QA defects、downstream rework、fraud/tamper yield、records metadata completeness、legal hold propagation、complaint replay success and customer impact。只看 OCR accuracy 或 automation rate 会误导决策。
17. Practical Templates
17.1 Use Case Boundary Card
Use case:
Workflow:
Product:
Customer / business segment:
Jurisdiction / policy scope:
Channels:
Document classes:
Unsupported documents:
AI role:
Allowed workflow actions:
Prohibited workflow actions:
High-impact fields:
Fraud risks:
Records owner:
Legal hold considerations:
Human review triggers:
Evidence replay requirement:
Product owner:
Risk owner:
17.2 Document Class Card
Document class:
Subtypes:
Typical source channels:
Expected pages / sections:
Required fields:
Optional fields:
High-impact fields:
Known fraud/tamper patterns:
Quality gates:
Classification ambiguity handling:
Retention metadata:
Legal hold propagation:
Workflow route:
Specialist review triggers:
17.3 Evidence Acceptance Rule
Rule ID:
Workflow:
Document class:
Field:
Criticality:
Required source anchor:
Minimum confidence:
Required validations:
Cross-document checks:
Fraud/tamper exclusions:
Human review required when:
Auto-populate allowed:
Policy acceptance state:
Customer-facing reason:
Evidence fields:
Owner:
Review cadence:
17.4 Reviewer Decision Record
Review task ID:
Case ID:
Document ID:
Field / evidence item:
Model value:
Model confidence:
Source anchor:
Validation result:
Fraud/tamper signal:
Reviewer decision:
Corrected value:
Reason code:
Rationale:
Second approval:
Workflow action:
Customer communication:
Timestamp:
QA sample flag:
17.5 Evidence Replay Script
Case:
Question to answer:
Raw documents:
Derived artifacts:
Extraction model versions:
Field source anchors:
Validation results:
Fraud/tamper results:
Human reviews:
Policy decisions:
Workflow actions:
Customer/counterparty messages:
Records metadata:
Legal hold status:
Complaint / audit / CAPA links:
Replay conclusion:
18. Final Operating Principle
这套 playbook 的成熟度可以用一个问题检验:
When a bank statement, paystub, claim package, dispute file, KYB document
or servicing letter changes a customer or business outcome,
can the institution prove exactly what document was used,
what fields were extracted,
where they came from,
how confidence and validation were handled,
who reviewed exceptions,
how fraud and records controls applied,
and why the workflow action was appropriate at that time?
如果答案不清楚, 不是缺一个更强的 OCR vendor。问题是 document intelligence、operations workflow、records governance、fraud controls、model risk and evidence quality 还没有成为同一套 operating model。