AI 扩展计划 / Playbooks

AI Document Intelligence / Unstructured Data / Evidence Quality Playbook

核心判断:

785 行AI_DOCUMENT_INTELLIGENCE_UNSTRUCTURED_DATA_EVIDENCE_QUALITY_PLAYBOOK.md

AI Document Intelligence / Unstructured Data / Evidence Quality Architecture Playbook

定位: 面向 CBAP+ Senior BA、高级 AI PM、Product Architect、Enterprise Architect、Operations Architect、Records / Information Governance、Fraud Risk、Model Risk、KYC/KYB、Claims、Disputes、Loan and Insurance Servicing 负责人, 把 document intelligence 从“自动读文档”落地为 evidence-grade, workflow-ready, records-aware, auditable operating system。适用范围: bank statements、paystubs、claims packages、payment disputes、KYC/KYB documents、insurance and loan servicing documents、complaints、operational correspondence、mailroom automation、agent assist and workflow automation。核心产出: executive framing、taxonomy、decision gates、target architecture、required artifacts、RACI/operating model、implementation roadmap、evidence pack、release checklists、metrics、anti-patterns、tabletop scenarios and portfolio deliverables。

核心判断:

Financial document AI succeeds when the institution can rely on extracted evidence, not when the model can read more PDFs.

0. Disclaimer

本文是学习、作品集、架构训练和内部治理讨论材料, 不构成法律意见、合规结论、记录保留结论、e-discovery 建议、KYC/KYB 充分性判断、贷款或保险承保结论、消费者争议处置意见、欺诈处置指令、模型验证报告或供应商推荐。

正式项目必须由 Legal、Compliance、Privacy、Records Management、Information Governance、Model Risk、Fraud Risk、Financial Crime、Operations、Product、Architecture、Information Security、Data Governance、Vendor Management、Internal Audit 和相关业务 owner 共同判断。记录、证据、法律保留、客户通知、KYC/KYB、信贷、保险、投诉、争议、索赔、跨境数据和 e-discovery 的具体适用性, 取决于 product、record type、jurisdiction、retention schedule、legal hold status、customer segment、channel、policy、contract 和 Legal / Compliance / Records interpretation。

边界原则:

OCR result 是 extracted text, 不是 verified fact。
LLM summary 是 productivity aid, 不是原始证据替代品。
Confidence score 是 routing/control input, 不是业务结论。
Fraud/tamper model output 是 risk signal, 不是欺诈结论。
Records retention、legal hold、e-discovery、KYC/KYB、lending、insurance 和 dispute obligations 的具体适用性必须由对应治理职能解释。

1. Executive Framing

高管常见目标:

Reduce manual document review.
Accelerate onboarding, claims, disputes and servicing.
Use AI to summarize and extract unstructured documents.

真正的项目目标应改写为:

Create an evidence-grade document intelligence capability
that automates low-risk extraction,
routes ambiguity to skilled review,
prevents unsupported decisions,
preserves records and legal hold controls,
and can replay every material document-driven action.

如果没有 evidence architecture, document AI 会引入隐藏风险:

bank statement 被 OCR 错读, income calculation 批量错误。
paystub 伪造模板被高置信度抽取, 进入 underwriting workflow。
claim photo 被复用, fraud signal 未传给 adjuster。
dispute evidence summary 省略关键商户证明, 导致不当处置。
KYB 文件显示 authorized representative 不清楚, 但 AI prefill 直接推进开户。
legal hold 已触发, 但 OCR JSON、summary、review notes 或 vendor copies 被清理。
客户投诉时, 团队找不到“模型看了什么、谁复核了什么、为什么采取该动作”。

Executive one-liner:

Document AI is a controlled evidence supply chain for operations, risk and records.

1.1 Steering Committee Questions

哪些 document classes 和 fields 可以自动抽取, 哪些只能辅助人工?
哪些字段会影响客户权利、资金、身份、保险、信贷、争议或合规?
如何证明字段来自哪份文档、哪一页、哪个区域、哪个模型版本?
置信度阈值如何校准, reviewer overturn 如何反馈?
records retention、legal hold、vendor copies 和 derived AI artifacts 如何治理?
出现投诉、审计、监管问询或诉讼保全时, case 能否完整 replay?

2. Source Anchors

Anchor	Official link	本 playbook 使用方式
NIST AI Risk Management Framework	https://www.nist.gov/itl/ai-risk-management-framework	用 Govern / Map / Measure / Manage 组织 document AI 风险、eval、monitoring、human oversight、incident and evidence controls
NIST Privacy Framework	https://www.nist.gov/privacy-framework	用 privacy risk management、data minimization、purpose limitation、processing controls 和 monitoring 设计文档数据边界
NARA Records Management	https://www.archives.gov/records-mgmt	用 records lifecycle、disposition、records program 和 accountability 设计 records governance interface
NARA Electronic Records Management	https://www.archives.gov/records-mgmt/policy/transfer-guidance-tables.html	用 electronic records metadata、format、transfer/readiness guidance 设计电子记录可保存、可检索、可迁移的架构讨论
CFPB Consumer Complaint Database	https://www.consumerfinance.gov/data-research/consumer-complaints/	用 consumer complaint operations 视角设计 complaint linkage、evidence replay、root cause and remediation learning loop
FFIEC Authentication and Access to Financial Institution Services and Systems	https://www.ffiec.gov/press/pr081121.htm	用 financial institution access/authentication and layered security 思路设计 reviewer access、document workflow action、privileged operation controls
ISO/IEC 42001 overview	https://www.iso.org/standard/42001	用 AI management system、roles、operation、performance evaluation、internal audit and continual improvement 建立 operating model

Source-to-control pattern:

source anchor -> control objective -> product decision
  -> workflow requirement -> evidence artifact -> owner -> metric

3. Taxonomy

3.1 Document Use Case Classes

Class	Examples	Primary decision impact
Income and affordability	bank statements, paystubs, payroll summaries, benefit letters	income evidence, servicing treatment, affordability package
Identity and entity	IDs, business registration, ownership docs, licenses, utility bills	KYC/KYB evidence, authority, onboarding routing
Claims and loss	claim forms, photos, invoices, police reports, medical bills	claim triage, payout support, fraud review
Payment disputes	receipts, shipping proof, merchant correspondence, customer statements	dispute reason code, evidence package, SLA
Account authority	POA, court order, death certificate, consent forms, corporate resolution	account access, maintenance, servicing authority
Complaints and correspondence	letters, emails, transcripts, attachments	complaint classification, response evidence, RCA
Internal operations	branch scans, mailroom forms, agent notes, back office forms	task routing, QA, operational control

3.2 Evidence Classes

Evidence class	Meaning	Governance need
Raw artifact	原始文件或图像	hash, source, access, retention
Rendered artifact	系统生成的 page image / normalized PDF	renderer version, lineage
OCR/layout artifact	text, table, reading order, bounding boxes	source anchoring, versioning
Extracted field	schema-bound field and value	confidence, validation, review
Derived fact	calculations or reconciled facts	formula, input trace, policy use
AI summary	source-linked narrative for reviewer	prohibited conclusions, citation
Human decision	reviewer correction, acceptance, escalation	reason code, role, timestamp
Workflow action	case update, payment, request-for-info, escalation	policy decision id, evidence link
Records metadata	record class, retention rule, legal hold flag	Records/Legal governance

3.3 Criticality Levels

Level	Examples	Default treatment
Low	document title, non-decision routing label	auto-populate with sampling
Medium	address, product type, non-material date	validation and review on conflict
High	income amount, claim amount, policy number, account holder, dispute reason	field-level threshold, source link, validations, review triggers
Restricted / specialist	POA, court order, beneficial ownership, medical record, fraud signal, legal hold	specialized queue, access controls, policy review

4. Target Operating Architecture

channel intake
  -> document provenance and storage
  -> normalization / rendering / safety scan
  -> classification and package splitting
  -> OCR / layout / table extraction
  -> multimodal field and entity extraction
  -> normalization and validation
  -> confidence calibration
  -> fraud and tamper checks
  -> evidence policy gates
  -> human review workbench
  -> workflow orchestration
  -> records retention and legal hold integration
  -> evidence ledger, complaint linkage, monitoring and governance

Architecture capabilities:

Capability	What it must do
Document provenance	Assign document id, hash, source channel, received time, custody events
Document classification	Identify type/subtype, package boundaries, language, quality, ambiguity
Layout understanding	Preserve page, coordinates, reading order, tables, checkboxes, signatures
Schema-constrained extraction	Extract only governed fields with allowed values and source anchors
Entity normalization	Normalize names, dates, addresses, monetary values, IDs, account masks
Evidence validation	Run arithmetic, chronology, cross-document and system-of-record checks
Confidence and triage	Route by calibrated confidence, field criticality and customer impact
Fraud/tamper detection	Detect altered files, duplicates, fake templates, metadata anomalies, prompt injection
Human review	Source-first review, structured overrides, QA sampling and dual control
Workflow integration	Feed KYC, claims, disputes, servicing, complaints and ops queues
Records/hold integration	Apply record class, retention metadata, legal hold propagation and disposition controls
Governance monitoring	Track evals, model drift, review outcomes, complaints, incidents and CAPA

5. Decision Gates

Gate 0: Use Case Boundary

Question	Pass condition
Which journey is in scope?	one workflow named: KYC, KYB, claims, disputes, servicing, complaints or operations
Which documents are accepted?	document class/subtype list and unsupported formats
What decisions may use extracted evidence?	allowed workflow actions and prohibited actions defined
Is AI summarizing, extracting, classifying or recommending?	AI role and decision boundary documented
Does the use case touch records, legal hold, privacy, KYC/KYB, lending, insurance or dispute obligations?	governance functions identified

Gate 1: Evidence Schema

Question	Pass condition
What fields are required?	field dictionary with definitions, data types, source requirements
Which fields are high impact?	criticality and customer harm assessment
What source anchor is required?	page/coordinate/table cell/paragraph anchor rule
What validations are required?	arithmetic, chronology, cross-document, system match
What summaries are allowed?	source-linked summary rules and prohibited conclusions

Gate 2: Confidence and Review Policy

Question	Pass condition
Are confidence scores calibrated?	calibration evidence by field and document class
Are thresholds field-specific?	threshold matrix by criticality and journey risk
What routes to human review?	review triggers and queue ownership
Is QA sampling defined?	sample rules for auto-processed and reviewed cases
Are reviewer overrides captured?	structured reason codes and source-linked corrections

Gate 3: Fraud, Security and Access

Question	Pass condition
Can the system detect tamper and duplicate patterns?	metadata, visual, duplicate and validation checks
Are prompt injection and malicious files controlled?	isolation, safety scan, schema validation
Are reviewer and vendor access controlled?	role-based access, logging, least privilege
Are privileged workflow actions protected?	step-up/dual control where appropriate
Are fraud signals routed without leaking sensitive rules?	reviewer-only signal display and customer-safe language

Gate 4: Records, Legal Hold and Evidence Replay

Question	Pass condition
Are raw and derived artifacts classified?	record class and retention metadata assigned
Does legal hold propagate?	raw, OCR, extraction, summaries, review notes, exports and vendor copies checked
Can disposition be audited?	disposition workflow and exception log
Can a complaint or audit replay the case?	evidence pack with document, model, review, workflow and final message
Are vendor obligations tracked?	retention, deletion, access, audit and exit evidence

Gate 5: Production Readiness

Question	Pass condition
Are eval sets representative?	document quality, channel, language, layout, fraud pattern, customer segment covered
Are monitoring dashboards live?	extraction, confidence, review, records, fraud, complaints and incidents
Are operations trained?	reviewer playbook and QA plan completed
Is incident response ready?	model defect, vendor outage, legal hold miss, fraud pattern, complaint spike playbooks
Is go/no-go tied to evidence quality?	release decision uses balanced scorecard, not automation rate alone

6. Required Artifacts

Artifact	What it proves
Use Case Boundary Card	明确 journey、documents、AI role、business actions、risk and governance owners
Document Class Taxonomy	证明文档类别、子类型、unsupported documents and routing
Evidence Field Dictionary	证明字段定义、source anchor、data type、criticality、allowed uses
Extraction and Validation Spec	证明 OCR/layout/extraction/normalization/validation controls
Confidence and Review Policy	证明何时自动化、何时人工复核、何时双控
Fraud/Tamper Threat Model	覆盖 altered PDF、fake template、duplicate、deepfake、insider and prompt injection
Records and Legal Hold Mapping	证明 raw and derived artifacts 如何分类、保留、hold and disposition
Workflow Contract	证明 evidence payload 如何进入 case/task/decision systems
Human Review Workbench Spec	证明 reviewer 能看到 source, confidence, conflicts, reason codes and history
AI Model and Prompt Inventory	证明模型、ruleset、prompt、vendor、versions and allowed uses
Eval and QA Scenario Suite	证明按文档类别、字段、语言、质量、欺诈和流程结果测试
Evidence Bundle Schema	证明投诉、审计、监管问询时可以 replay
RACI and Governance Cadence	证明 Product、Ops、Records、Legal、Model Risk、Fraud、Architecture 的责任边界

6.1 Evidence Field Dictionary Pattern

Field	Example
field_name	`statement_period_end_date`
document_class	bank statement
source_anchor	page + coordinate + OCR text
data_type	date
criticality	high for income verification, medium for routing
normalization	ISO date with locale parsing
validations	period continuity, not future date, matches page header
review_triggers	low confidence, missing page, conflict with upload metadata
allowed_uses	statement completeness, income package preparation
prohibited_uses	final lending decision by itself
retention_link	derived field linked to statement artifact

6.2 Workflow Contract Pattern

Workflow:
Trigger:
Accepted document classes:
Required fields:
Optional fields:
Excluded data:
Confidence thresholds:
Validation rules:
Fraud/tamper routes:
Human review triggers:
Records metadata:
Legal hold behavior:
Case update payload:
Customer communication constraints:
Monitoring metrics:
Owner:

7. RACI / Operating Model

Activity	Accountable	Responsible	Consulted	Informed
Use case prioritization	Product Owner	AI PM / Senior BA	Operations, Risk, CX	Steering Committee
Document taxonomy	Operations Owner	BA / Process Architect	Records, Legal, Compliance	Product
Evidence field dictionary	Product / Ops	BA / Data Product	Compliance, Model Risk, Fraud	Review teams
Extraction architecture	Enterprise Architecture	Engineering / AI Platform	Security, Data Governance, Vendor Mgmt	Operations
Confidence policy	Business Risk Owner	AI PM / Model Risk / Ops QA	Fraud, Compliance, Product	Audit
Human review design	Operations Owner	Ops Lead / UX / BA	Compliance, Accessibility, Fraud	Product
Fraud/tamper model	Fraud Risk	Fraud Analytics / Security	Product, Model Risk, Legal	Ops
Records mapping	Records Management	Information Governance / Data	Legal, Compliance, Architecture	Product
Legal hold propagation	Legal / Records	Platform / Case Systems	Compliance, Vendor Mgmt	Audit
Privacy controls	Privacy / Data Governance	Product / Engineering	Legal, Security, CX	Ops
Model inventory and eval	Model Risk / AI Governance	AI Platform / AI PM	Product, Compliance, Ops	Audit
Workflow integration	Operations / Product	Engineering / Case Platform	Architecture, Records, Fraud	CX
Complaint linkage	Complaint Ops	Case Management / Data	Compliance, Product, Model Risk	Audit
Independent assurance	Internal Audit	Audit Team	Risk, Legal, Technology	Board Committee

Governance cadence:

Cadence	Forum	Output
Weekly	Pilot operations review	review backlog, extraction defects, queue SLA
Weekly	Fraud/tamper review	emerging patterns, false positives, confirmed fraud yield
Monthly	Evidence quality council	field accuracy, calibration, overturns, downstream rework
Monthly	Records and hold review	metadata completeness, hold propagation tests, vendor attestations
Monthly	AI governance review	eval results, prompt/ruleset changes, incidents
Quarterly	Product/risk steering committee	scale, restrict, redesign, retire decisions
Quarterly	Complaint learning loop	complaint themes, RCA, CAPA and policy changes
Semiannual	Tabletop exercise	legal hold miss, vendor outage, model defect, fraud pattern

8. Implementation Roadmap

Days 1-30: Baseline and Scope

Day range	Work	Artifact
1-3	Select one bounded workflow, e.g. paystub income extraction, claims invoice review, dispute evidence package	Use Case Boundary Card
4-7	Inventory document classes, sources, channels, volumes, current defects	Document Class Taxonomy
8-12	Define high-impact fields, source anchors, allowed and prohibited uses	Evidence Field Dictionary
13-16	Map records, retention, legal hold and vendor copy requirements for raw and derived artifacts	Records and Legal Hold Mapping
17-20	Design extraction, validation, confidence and review thresholds	Extraction and Review Policy
21-24	Threat model tamper, duplicate, prompt injection, insider and vendor risks	Fraud/Tamper Threat Model
25-27	Define workflow payload, case states, reason codes and customer communication constraints	Workflow Contract
28-30	Define metrics, eval sets, QA sampling and evidence bundle	Control Dashboard Spec

Days 31-60: Controlled Build and Pilot

Day range	Work	Artifact
31-35	Implement provenance, hash, storage, rendering and source anchoring	Provenance Test Report
36-40	Configure classification, OCR/layout and schema-constrained extraction	Extraction Test Report
41-45	Add validation, confidence routing and review queues	Review Routing Report
46-50	Build human review workbench with source-linked fields and override capture	Reviewer QA Record
51-54	Integrate fraud/tamper checks and security/access controls	Fraud Control Report
55-57	Connect records metadata, retention and legal hold flags	Records Integration Report
58-60	Pilot with limited scope and manual QA sampling	Pilot Evidence Quality Report

Days 61-90: Scale Decision and Assurance

Day range	Work	Artifact
61-65	Analyze field accuracy, confidence calibration, review overturn and complaints	Outcome Review
66-70	Tune thresholds, validations, queues and customer request-for-info language	Change Control Record
71-75	Test legal hold propagation, disposition exception and vendor deletion evidence	Hold and Disposition Test
76-80	Run model defect, vendor outage and fraud pattern tabletop	Tabletop Decision Log
81-85	Complete model risk, privacy, records and compliance review	Governance Review Pack
86-90	Decide scale, restrict, redesign or retire	Go/No-Go Decision Record

9. Evidence Pack

Minimum evidence fields:

Field	Purpose
`case_id`	operational case reference
`workflow_id`	KYC, KYB, claims, disputes, servicing, complaints
`document_id`	unique document reference
`document_version`	raw and derived artifact version
`source_channel`	upload, branch scan, mail, email, API, vendor
`received_timestamp`	intake time
`raw_file_hash`	integrity check
`document_class`	type and subtype
`classification_result`	model/rule output and confidence
`page_count`	completeness signal
`quality_result`	blur, missing pages, render errors
`processing_lineage`	OCR/layout/extraction model and rule versions
`extracted_fields`	field values with source anchors
`validation_results`	rules, cross-document and system checks
`confidence_results`	field and case scores with calibration bucket
`fraud_tamper_signals`	duplicate, metadata, visual, arithmetic, prompt injection
`human_review_records`	reviewer decisions, corrections, reason codes
`policy_decision`	accept evidence, request more info, review, reject use
`workflow_action`	case update, payment, escalation, request, communication
`ai_summary_run_id`	model summary trace when used
`records_metadata`	record class, retention rule, disposition state
`legal_hold_flag`	hold status and propagation reference
`access_events`	reviewer/system/vendor access
`customer_final_message_id`	final communication or request-for-info
`complaint_id`	linked complaint if applicable
`capa_id`	corrective action when defect found

Evidence rules:

Store raw artifacts and derived artifacts with clear lineage, not as one mutable blob。
Preserve source anchors for every material extracted field。
Treat missing source anchor for high-impact field as a control defect。
Separate reviewer-facing summaries from official records and decision evidence unless Records/Legal approves their role。
Make legal hold state visible to deletion, training-data selection, vendor purge and export jobs。
Capture final customer/counterparty communication, because disputes often turn on what was said and when。

10. Workflow Playbooks

10.1 Bank Statement / Paystub Income Evidence

Step	Control
Intake	verify page count, statement/pay period, source channel and raw hash
Extract	employer, employee, period, gross/net/YTD, deposits, balances, account mask
Validate	arithmetic, period continuity, YTD consistency, name/account match, duplicate check
Confidence	high-impact amount and identity fields use stricter thresholds
Review	route mismatches, missing pages, fake-template signals, high amount outliers
Workflow	create income evidence package, not final affordability conclusion by itself
Records	attach raw and derived artifacts to case record class and hold state

10.2 Claims Package

Step	Control
Intake	split package into forms, invoices, photos, reports, correspondence
Extract	claimant, policy, loss date, invoice totals, provider, photo metadata
Validate	timeline consistency, duplicate invoice/photo, coverage period, amount totals
Confidence	photo and invoice authenticity signals influence triage
Review	adjuster sees source-linked timeline and conflicting evidence
Workflow	route to payout support, investigation, request-for-info or denial review per policy
Records	preserve package, adjuster notes, AI summary and final communication linkage

10.3 Payment Dispute Evidence

Step	Control
Intake	classify cardholder evidence, merchant evidence, shipping proof, screenshots
Extract	transaction, merchant, date, amount, reason code, delivery/tracking details
Validate	match transaction system, reason-code evidence checklist, SLA deadline
Confidence	summary cannot replace required proof element
Review	route weak evidence or conflicting claims to dispute specialist
Workflow	create source-linked representment or response package
Records	preserve submitted and outgoing evidence with final response

10.4 KYC/KYB Documents

Step	Control
Intake	identify individual, entity, license, ownership, authority and address documents
Extract	names, entity IDs, owners, roles, license status, address, dates
Validate	freshness, entity resolution, authority scope, conflicting ownership
Confidence	high-impact authority and ownership fields require stricter handling
Review	route beneficial ownership, authorization ambiguity and stale documents
Workflow	evidence package for compliance/ops review, not automatic compliance conclusion
Records	retention and hold handling per product, record type and governance interpretation

10.5 Complaints and Operational Correspondence

Step	Control
Intake	detect complaint indicators, product, customer, attachments, urgency
Extract	issue theme, harm, requested resolution, dates, referenced transactions
Validate	case/customer match, prior interactions, response deadlines
Confidence	low confidence complaint classification goes to complaint ops
Review	preserve customer voice and avoid summary-only handling
Workflow	create complaint case, route RCA, attach evidence and final response
Records	link documents, AI runs, agent notes, response and CAPA

11. Checklists

11.1 Release Checklist

Check	Passing evidence
Use case and decision boundary documented	Use Case Boundary Card
Document classes and unsupported formats defined	Document Class Taxonomy
High-impact fields identified	Evidence Field Dictionary
Source anchor required for material fields	extraction schema test
Confidence thresholds calibrated	calibration report
Human review queues configured	workflow test
Reviewer override reasons captured	review workbench test
Fraud/tamper checks active	threat model and test cases
Records metadata assigned	records integration test
Legal hold propagation tested	hold propagation evidence
Vendor retention and access controls reviewed	vendor control record
Privacy minimization and redaction reviewed	privacy assessment evidence
Complaint replay path tested	simulated complaint evidence pack
Monitoring dashboard live	production readiness dashboard

11.2 Extraction QA Checklist

Check	Passing evidence
OCR/layout preserves page and coordinates	source anchor sample
Table extraction handles merged cells and totals	table validation result
Date and currency normalization tested	locale test cases
Entity resolution does not overwrite raw value	raw and normalized field pair
Low-quality images routed	quality gate log
Missing pages detected	completeness rule
Model output schema enforced	invalid output rejection
Prompt injection ignored	adversarial document test
High-confidence wrong fields sampled	QA sampling report

11.3 Human Review Checklist

Check	Passing evidence
Reviewer sees original source next to extracted field	UI test
Source coordinate click works	workbench test
Confidence and validation failures visible	reviewer screenshot/spec
Sensitive fraud details protected	role-based display
Override captures reason and corrected value	review log
Dual control applied where required	approval record
QA samples auto-pass and manual-pass cases	QA plan
Reviewer feedback does not train models without approval	data governance control

11.4 Records and Legal Hold Checklist

Check	Passing evidence
Raw document has record metadata	record metadata sample
OCR/layout/extraction artifacts classified	derived artifact inventory
AI summaries mapped to retention treatment	Records/Legal decision record
Legal hold propagates to all linked artifacts	propagation test
Vendor copies checked for hold and deletion	vendor attestation/control
Disposition has approval and audit	disposition log
Training-data selection checks hold and use restrictions	data pipeline control
Retrieval supports case/audit/complaint replay	search/replay test

11.5 Workflow Integration Checklist

Check	Passing evidence
Case system receives structured evidence payload	integration test
Policy reason codes captured	decision log
Request-for-info language approved	CX/Compliance review
Final customer message linked to evidence	communication id
SLA timers consider document ambiguity	workflow config
Downstream systems do not treat extraction as final decision	contract test
Exceptions and fallbacks are operationally staffed	queue capacity plan

12. Metrics and KRIs

Metric	Why it matters
Field-level accuracy by criticality	avoids hiding high-impact errors
High-confidence wrong field rate	detects calibration failure
Source-anchor completeness	measures auditability
Classification ambiguity rate	shows routing risk
Missing page detection rate	protects completeness
Human review overturn rate	reveals extraction/control quality
QA defect rate after auto-pass	estimates residual risk
Straight-through processing rate by document class	productivity with risk context
Review queue SLA	operational capacity and customer impact
Downstream rework rate	quality of evidence entering workflow
Duplicate/tamper signal yield	fraud control effectiveness
False positive fraud review rate	customer friction and ops burden
Records metadata completeness	records readiness
Legal hold propagation success	preservation control
Vendor deletion/retention exceptions	third-party risk
Complaint document trace completeness	complaint/audit readiness
AI unsupported conclusion defects	model governance
Accessibility defect rate	inclusive operations

Balanced scorecard:

Productivity: fewer manual minutes per case.
Evidence quality: critical fields accurate, anchored and validated.
Risk: fraud, records, privacy and legal hold controls operate.
Customer outcome: fewer unnecessary document requests and complaint defects.
Governance: every material document-driven action is replayable.

13. Anti-Patterns

Anti-pattern	Why it fails	Better pattern
“OCR all docs, then ask LLM”	loses layout/source control and invites hallucination	schema-constrained, source-linked extraction
One confidence threshold for all fields	treats routing label and income amount the same	threshold by field criticality and workflow risk
Summary as evidence	unsupported omissions or invented conclusions	source-linked summary plus structured evidence
Auto-approve based on high confidence	ignores validation, fraud, records and policy	evidence policy gate
Human review with no source context	reviewer cannot verify efficiently	source-first workbench
Reviewer override as free text only	hard to monitor and improve	reason codes + structured corrections
No raw artifact hash	cannot prove integrity	immutable raw artifact and lineage
No derived-artifact records mapping	OCR/summary/review notes become governance blind spot	artifact inventory and records metadata
Legal hold handled manually outside AI pipeline	deletion/training/vendor purge may miss artifacts	hold-aware data and workflow graph
Vendor black box extraction	weak evidence and model governance	version, lineage, eval and audit obligations
Fraud model makes final decision	false positives and explainability risk	fraud signal plus policy/human review
Automation rate as north-star metric	hides customer harm and evidence defects	balanced scorecard

14. Tabletop Scenarios

Scenario 1: High-Confidence Paystub Error

The model extracts gross pay correctly but misreads pay period and YTD amount.
Confidence is high. The income package is about to move forward.

Expected decisions: field criticality threshold, arithmetic validation, reviewer route, downstream case hold, model defect capture。

Scenario 2: Legal Hold Miss on Derived Artifacts

A legal hold is applied to a servicing case. Raw PDFs are preserved,
but OCR JSON, AI summaries and reviewer notes are scheduled for deletion.

Expected decisions: hold propagation graph, disposition stop, vendor copy check, Records/Legal escalation, CAPA。

Scenario 3: Dispute Summary Omits Merchant Evidence

The AI summary says the customer provided strong evidence,
but the merchant delivery proof contradicts the customer statement.

Expected decisions: source-linked summary defect, dispute specialist review, reason-code evidence checklist, eval update。

Scenario 4: Fake Bank Statement Template

A bank statement matches a known visual template but metadata and transaction pattern are inconsistent.
The extraction fields look clean.

Expected decisions: fraud/tamper signal route, no customer-facing rule leakage, manual review, duplicate/template detection improvement。

Scenario 5: KYB Authority Ambiguity

A business registration document and board resolution are uploaded.
The model extracts an officer name but cannot establish authority to open the product.

Expected decisions: authority field specialist review, KYC/KYB interpretation boundary, request-for-info wording, evidence pack。

Scenario 6: Complaint Replay Failure

A customer complains that an AI rejected their claim documents.
The team can find the raw PDF but not the model run, reviewer override or final message.

Expected decisions: evidence pack gap, incident classification, complaint remediation, logging and workflow contract update。

15. Portfolio Deliverables

Deliverable	What it demonstrates
Executive one-pager	你能把 document AI 讲成 evidence operating system, 不只是 OCR automation
Use case boundary card	你能控制 scope and decision impact
Document class taxonomy	你能把文档、流程、风险和记录治理连接
Evidence field dictionary	你能定义 source-linked, criticality-aware extraction
Confidence and review policy	你能设计 calibrated automation and human oversight
Records/legal hold mapping	你能把 raw and derived artifacts 纳入治理
Fraud/tamper threat model	你能覆盖 fake documents, duplicates, prompt injection and insider risks
Reviewer workbench spec	你能把 human review 设计成有效控制
Workflow contract	你能让 AI evidence 正确进入 case systems
Evidence bundle schema	你能支持 complaint, audit and regulatory replay
Metrics dashboard	你能平衡效率、质量、风险、客户和治理

Portfolio storyline:

I designed an AI document intelligence architecture for financial retail operations.
It converts unstructured documents into source-linked evidence,
uses calibrated confidence and validation to route work,
integrates human review, fraud/tamper checks, records retention and legal hold,
and preserves a replayable evidence trail from intake to final workflow action.

16. Interview Answers

Q1: 如何向高管解释 document intelligence 的边界?

30 秒:

Document intelligence 不是 OCR 自动化, 而是 evidence supply chain。它把文档采集、provenance、layout、field extraction、confidence、validation、human review、fraud checks、workflow action、records retention 和 legal hold 连成一套可审计系统。能自动化的只是低风险、高质量、可验证字段; 高影响或冲突证据必须复核。

Q2: 什么样的字段可以 straight-through processing?

30 秒:

要满足五个条件: 字段风险低或中等、source anchor 完整、confidence 已校准、业务验证通过、没有 fraud/tamper/legal/authority trigger。高影响字段如 income、claim amount、beneficial ownership、POA、dispute reason 通常不能只靠模型分数推进。

Q3: 如何设计 human-in-the-loop 才不是形式主义?

30 秒:

reviewer 必须看到原文和字段定位、confidence、validation failures、conflicts、history and risk signals, 并用 structured reason code 记录接受、纠正或升级。还要做 QA sampling、overturn monitoring、dual control and feedback governance。否则 human review 很容易变成 rubber-stamp。

Q4: records retention 和 legal hold 为什么会影响 AI 文档项目?

30 秒:

因为系统不只保存原 PDF, 还会生成 OCR text、layout JSON、extracted fields、AI summaries、review notes、exports and final decisions。哪些是 records、保留多久、legal hold 是否覆盖, 取决于 product、record type、jurisdiction、retention schedule、hold status 和 Legal/Compliance/Records interpretation。架构必须让这些 artifact 可分类、可保留、可冻结、可检索。

Q5: 如何衡量 document AI 是否真的可用?

30 秒:

看 field-level accuracy、high-confidence wrong rate、source-anchor completeness、review overturn、QA defects、downstream rework、fraud/tamper yield、records metadata completeness、legal hold propagation、complaint replay success and customer impact。只看 OCR accuracy 或 automation rate 会误导决策。

17. Practical Templates

17.1 Use Case Boundary Card

Use case:
Workflow:
Product:
Customer / business segment:
Jurisdiction / policy scope:
Channels:
Document classes:
Unsupported documents:
AI role:
Allowed workflow actions:
Prohibited workflow actions:
High-impact fields:
Fraud risks:
Records owner:
Legal hold considerations:
Human review triggers:
Evidence replay requirement:
Product owner:
Risk owner:

17.2 Document Class Card

Document class:
Subtypes:
Typical source channels:
Expected pages / sections:
Required fields:
Optional fields:
High-impact fields:
Known fraud/tamper patterns:
Quality gates:
Classification ambiguity handling:
Retention metadata:
Legal hold propagation:
Workflow route:
Specialist review triggers:

17.3 Evidence Acceptance Rule

Rule ID:
Workflow:
Document class:
Field:
Criticality:
Required source anchor:
Minimum confidence:
Required validations:
Cross-document checks:
Fraud/tamper exclusions:
Human review required when:
Auto-populate allowed:
Policy acceptance state:
Customer-facing reason:
Evidence fields:
Owner:
Review cadence:

17.4 Reviewer Decision Record

Review task ID:
Case ID:
Document ID:
Field / evidence item:
Model value:
Model confidence:
Source anchor:
Validation result:
Fraud/tamper signal:
Reviewer decision:
Corrected value:
Reason code:
Rationale:
Second approval:
Workflow action:
Customer communication:
Timestamp:
QA sample flag:

17.5 Evidence Replay Script

Case:
Question to answer:
Raw documents:
Derived artifacts:
Extraction model versions:
Field source anchors:
Validation results:
Fraud/tamper results:
Human reviews:
Policy decisions:
Workflow actions:
Customer/counterparty messages:
Records metadata:
Legal hold status:
Complaint / audit / CAPA links:
Replay conclusion:

18. Final Operating Principle

这套 playbook 的成熟度可以用一个问题检验:

When a bank statement, paystub, claim package, dispute file, KYB document
or servicing letter changes a customer or business outcome,
can the institution prove exactly what document was used,
what fields were extracted,
where they came from,
how confidence and validation were handled,
who reviewed exceptions,
how fraud and records controls applied,
and why the workflow action was appropriate at that time?

如果答案不清楚, 不是缺一个更强的 OCR vendor。问题是 document intelligence、operations workflow、records governance、fraud controls、model risk and evidence quality 还没有成为同一套 operating model。