AI 扩展计划 / Playbooks

AI AML Alert Triage / Investigation Workbench Playbook

本文是学习、作品集、架构训练和内部治理讨论材料。

935 行AI_AML_ALERT_TRIAGE_INVESTIGATION_WORKBENCH_PLAYBOOK.md

AI AML Alert Triage / Investigation Workbench Architecture Playbook

定位: 面向高级 AI PM / Senior BA / Product Architect / AML Technology Architect / Financial Crime Transformation Lead / Model Risk Partner, 把 AML alert triage 和 case investigation 从“告警处理工具”升级为 evidence-first、human-owned、model-risk-controlled、audit-ready 的 AI 工作台体系。适用范围: transaction monitoring alert queue、manual referral、fraud-to-AML referral、sanctions context handoff、CDD/EDD context refresh、entity resolution、graph investigation、case assembly、analyst copilot、SAR consideration packet、QA sampling、scenario tuning feedback、model risk validation、audit replay。核心产出: reference architecture、capability map、queue prioritization policy、evidence workspace schema、copilot contract、disposition guardrail、SAR draft control、QA rubric、feedback taxonomy、model risk checklist、audit event schema、SoD matrix、metrics dashboard 和 30/60/90 implementation roadmap。

0. Disclaimer

本文是学习、作品集、架构训练和内部治理讨论材料。

本文不是法律意见、合规结论、SAR filing decision、suspicious activity determination、sanctions disposition、模型验证报告、审计报告或监管解释。

正式项目必须由 Legal、BSA/AML Compliance、Sanctions、Fraud、Risk、Model Risk、Privacy、Security、Operations、Internal Audit、Business Owner 和管理层结合机构类型、监管关系、司法辖区、产品、客户、渠道、数据、监管承诺和内部政策确认。

关键边界:

AI 可以辅助 alert prioritization、evidence assembly、case summarization、graph explanation、gap detection、draft case note、QA pre-check 和 tuning insight。
AI 不应替代 AML/BSA owner 的 suspicious activity determination 或 SAR filing decision。
AI 不应自动 file SAR、自动关闭高风险 case、自动解除 sanctions hit、自动通知客户、自动联系 law enforcement 或自动改变账户限制。
AI 不应把 red flag 推断成犯罪结论。
AI 不应泄露 SAR-sensitive information、NSL-sensitive information、restricted investigation content 或超出 need-to-know 的客户数据。

1. Executive Framing

AML AI 的低成熟度目标通常是:

reduce false positives
summarize cases faster
draft SAR narratives
automate analyst work

这些目标单独看都有价值, 但不足以支撑生产级金融犯罪控制。高级目标应该是:

risk-adjusted investigation capacity
  + evidence completeness
  + human decision ownership
  + queue timeliness
  + typology/scenario coverage awareness
  + SAR quality support
  + QA and tuning feedback
  + model risk governance
  + audit replay

一句话:

AML investigation AI is an evidence and workflow control system before it is a language model productivity feature.

1.1 Product Principles

Queue priority must be explainable, route-aware and SLA-aware。
Entity graph must carry confidence, source and access class。
Evidence workspace is the product core; copilot is an assistant attached to evidence。
Disposition recommendation must preserve human ownership。
SAR draft assistance must be evidence-cited and filing-gated。
QA findings must feed training, rules, data, prompt, retrieval and staffing decisions through controlled routes。
Tuning must be coverage-safe, not merely alert-volume-reducing。
Every high-impact output must be replayable across evidence, model version, prompt version, user action and approval chain。

1.2 Strong Questions

Question	Strong answer
Why is an alert first in queue?	priority band, top drivers, SLA, route, evidence and model/scenario version
What entity does this alert belong to?	entity-resolved subject graph with confidence and source lineage
What evidence supports the case summary?	cited evidence cards, source record ids, timestamps and freshness
What can AI recommend?	next investigation step, evidence gap, disposition options and draft language, not final SAR decision
Who owns the decision?	named human role and case state transition
How does QA improve the system?	defect taxonomy, feedback routing, change gate and regression eval
How can audit replay the case?	append-only event log with evidence ids, user ids, versions, outputs and approvals

2. Source Anchors

Anchor	Official link	本手册使用方式
FFIEC BSA/AML Suspicious Activity Reporting	https://bsaaml.ffiec.gov/manual/AssessingComplianceWithBSARegulatoryRequirements/04	作为 alert identification、managing alerts、SAR decision making、SAR completion、continuing activity monitoring 的流程锚点。
FFIEC SAR Examination Procedures	https://bsaaml.ffiec.gov/manual/AssessingComplianceWithBSARegulatoryRequirements/04_ep	作为 monitoring system review、manual/automated monitoring、independent validation、alert research、CDD/EDD consideration、documented no-file decision、SAR quality 和 transaction testing 的 evidence anchor。
FFIEC Customer Due Diligence	https://bsaaml.ffiec.gov/manual/AssessingComplianceWithBSARegulatoryRequirements/02	用 customer risk profile、expected activity、ongoing monitoring 和 beneficial ownership context 支撑调查上下文。
FFIEC CDD Examination Procedures	https://bsaaml.ffiec.gov/manual/AssessingComplianceWithBSARegulatoryRequirements/02_ep	用于客户风险画像、信息不足、CDD documentation、OFAC sanctioned parties context 和 customer information testing。
FFIEC BSA/AML Independent Testing	https://bsaaml.ffiec.gov/manual/AssessingTheBSAAMLComplianceProgram/03	用于 independent testing、IT source completeness/accuracy、SAR process review、training evidence、deficiency tracking 和 board/senior management reporting。
FFIEC Appendix F Red Flags	https://bsaaml.ffiec.gov/manual/Appendices/07	作为 red flag 和 additional scrutiny 语言锚点; 具体 typology coverage 使用 repo 内单独 playbook。
FFIEC Appendix L SAR Quality Guidance	https://bsaaml.ffiec.gov/manual/Appendices/13	用于 SAR draft quality pre-check 和 narrative evidence discipline。
FinCEN SAR Resources	https://www.fincen.gov/suspicious-activity-reports-sars	用于 SAR resources、BSA E-Filing handoff 和 official SAR resource navigation。
FinCEN BSA Filing Information	https://www.fincen.gov/resources/filing-information	用于 E-Filing boundary、filing operations、test system separation 和 recordkeeping handoff。
FinCEN SAR FAQ	https://www.fincen.gov/resources/frequently-asked-questions-regarding-fincen-suspicious-activity-report-sar	用于 filing instructions location、amended SAR process、BSA ID acknowledgement、save/recordkeeping considerations 的产品边界。
OFAC Sanctions List Service	https://ofac.treasury.gov/sanctions-list-service	用于 sanctions list data、list update、search and screening evidence reference。
OFAC Sanctions List Search Tool	https://ofac.treasury.gov/sanctions-list-search-tool	用于 analyst-facing list search reference; 生产自动筛查应使用机构批准的 sanctions screening infrastructure。
OFAC Compliance Framework	https://ofac.treasury.gov/media/16331/download	用于 sanctions compliance program context、management commitment、risk assessment、internal controls、testing/auditing、training 的边界补充。
NIST AI RMF	https://www.nist.gov/itl/ai-risk-management-framework	用 Govern / Map / Measure / Manage 组织 AI risk management、eval、monitoring 和 remediation。
NIST AI RMF Core	https://airc.nist.gov/airmf-resources/airmf/5-sec-core/	用于把 AI control 设计成 continuous lifecycle function, 而不是上线前 checklist。
NIST GenAI Profile	https://www.nist.gov/publications/artificial-intelligence-risk-management-framework-generative-artificial-intelligence	用于 hallucination、data leakage、prompt injection、overreliance、third-party dependency、information integrity 和 GenAI eval。
ISO/IEC 42001	https://www.iso.org/standard/81230.html	用 AI management system 思路设计责任、运行控制、绩效评价、管理评审和持续改进。

架构映射原则:

official source anchor
  -> control objective
  -> product requirement
  -> workflow state
  -> evidence record
  -> owner and review cadence

3. End-to-End Operating Model

3.1 Target Flow

1. Alert intake
   rules, models, manual referrals, fraud referrals, sanctions context, law-enforcement request flags

2. Context assembly
   entity resolution, CDD/EDD, expected activity, prior cases, transaction timeline, graph context

3. Queue prioritization
   risk band, SLA, route, top drivers, missing evidence, capacity-aware assignment

4. Investigation workspace
   evidence cards, timeline, graph, source documents, copilot, analyst tasks

5. Human disposition
   close, continue monitoring, request more evidence, escalate, refer, SAR consideration packet

6. SAR draft support
   cited draft packet, quality pre-check, human approval path, filing handoff outside AI autonomy

7. QA and oversight
   sample selection, defect scoring, reviewer calibration, issue logging, training feedback

8. Tuning and governance
   scenario tuning, model/prompt/RAG change, data remediation, coverage regression, validation

9. Audit replay
   event stream, version history, evidence ledger, approval trail, management reporting

3.2 State Machine

State	Entry condition	Allowed AI support	Human/control gate
`alert_received`	alert from approved source	normalize, classify source, attach trigger facts	source system contract
`context_ready`	minimum subject and transaction context assembled	entity resolution, CDD snapshot, graph retrieval	entity confidence warning
`queued`	priority and route assigned	priority explanation, SLA assignment	routing policy owner
`in_review`	analyst opens case	evidence summary, timeline, gap checklist	analyst remains reviewer
`needs_info`	required evidence missing	task suggestions, source lookup	task owner and SLA
`escalated`	analyst/supervisor escalates	escalation packet	senior/AML owner assignment
`sar_consideration`	human escalates to SAR consideration	draft packet and QA pre-check	BSA/AML owner decision
`closed_no_sar`	human selects closure	closure rationale draft	closure reason and evidence
`qa_sampled`	risk-based or random QA selection	QA pre-check, defect hints	independent QA reviewer
`feedback_logged`	QA/analyst/model issue captured	taxonomy suggestion	feedback owner and change gate

3.3 Non-Goals

Non-goal	Reason
Replace transaction monitoring system	Workbench consumes and contextualizes alerts; it should not hide source scenario governance.
Replace case management of record	Workbench may integrate or extend CMS, but authoritative case status must be explicit.
Auto-file SAR	SAR filing decision and submission are controlled human/compliance workflows.
Replace sanctions screening	OFAC/sanctions systems have separate screening, disposition and blocking controls.
Replace Legal/Compliance interpretation	Applicability and policy decisions remain owned by control functions.

4. Capability Map

Capability	Product function	Architecture services	Control owner
Alert intake	normalize alert and trigger facts	event ingestion, schema registry, source adapter	AML Technology
Queue prioritization	route and rank work	priority engine, SLA policy, assignment service	AML Operations
Entity resolution	connect subjects/accounts/counterparties	identity graph, match service, confidence scoring	Data/AML Analytics
Context graph	reveal relationship and funds flow	graph store, path search, temporal aggregation	AML Analytics
Evidence assembly	prepare cited evidence packet	evidence ledger, source connector, provenance service	AML Operations
Analyst copilot	summarize, ask, compare, check gaps	RAG, LLM gateway, prompt registry, tool gateway	AI Product/Platform
Disposition support	show options and rationale	decision support service, policy/rubric engine	BSA/AML Compliance
SAR draft support	draft with citations and guardrails	controlled generation, citation validator, redaction	BSA/AML Compliance
QA workbench	sample, score, calibrate, remediate	QA workflow, defect taxonomy, issue tracker	QA / Compliance Testing
Feedback loop	route defects to tuning/data/training	feedback registry, change workflow, eval set builder	Model Risk / Scenario Owner
Audit replay	reconstruct who saw/did/approved what	append-only event stream, evidence hash, version registry	Internal Audit / Risk
Management reporting	track capacity, quality, risk, controls	metrics store, dashboard, KRI service	AML Leadership

5. Reference Architecture Blueprint

5.1 Logical Layers

Experience
  Queue console | Investigation workbench | Copilot panel | QA console | Management dashboard

Workflow
  Case state machine | Assignment | SLA | Escalation | Maker-checker | SAR handoff | Feedback routing

AI decision support
  Priority model | Entity resolution | Graph analytics | Retrieval | LLM summarization
  Disposition options | SAR draft assist | QA judge | Eval harness

Evidence and context
  Evidence ledger | Timeline builder | CDD snapshot | Counterparty graph | Prior case index
  Typology/scenario link | Sanctions/fraud referral context

Data and integration
  Core banking | Transactions | KYC/CDD/EDD | Case management | Sanctions screening
  Fraud | CRM | Document store | External advisories/lists | User/IAM

Controls and platform
  RBAC/ABAC | SAR confidentiality | Audit log | Version registry | Data lineage
  Model inventory | Monitoring | Incident management | Retention | Encryption

5.2 Component Responsibilities

Component	Owns	Does not own
Alert ingestion service	source normalization, schema validation, duplicate detection	suspicious activity conclusion
Priority engine	route, SLA, priority band, driver explanation	final case disposition
Entity graph service	match, confidence, relationship edges	legal identity conclusion when data conflicts
Evidence ledger	immutable evidence references and provenance	free-text narrative truth
Copilot orchestrator	RAG, prompt, LLM, tool routing, structured output	direct filing or irreversible account action
Citation validator	checks claims against evidence ids	semantic legal sufficiency
Policy guardrail engine	prohibited output/action blocks, role permissions	policy interpretation without owner approval
Case workflow service	state transitions, assignments, approvals	bypassing CMS of record without integration agreement
QA service	sample, rubric, defects, calibration	tuning deployment approval
Feedback registry	classify feedback and route changes	automatic retraining from raw outcomes
Model risk registry	inventory, validation evidence, revalidation triggers	operational queue management
Audit event service	append-only trace and replay	manual reconstruction after missing logs

6. Queue Prioritization Playbook

6.1 Priority Policy

Queue priority is a policy-backed decision support output:

priority_result:
  alert_id: alert_2026_18491
  priority_band: P1
  route_to_queue: aml_senior_investigation
  SLA_due_at: 2026-07-01T17:00:00Z
  top_drivers:
    - driver: repeated below-threshold cash deposits
      evidence_ids: [ev_tx_001, ev_tx_002, ev_tx_003]
    - driver: new outbound wires to unrelated counterparties
      evidence_ids: [ev_tx_008, ev_counterparty_011]
    - driver: customer high-risk CDD profile
      evidence_ids: [ev_cdd_003]
  uncertainty:
    - entity link to counterparty cluster is medium confidence
  missing_context:
    - latest stated business purpose for new wire recipient
  prohibited_inference: "priority is not SAR filing decision"

6.2 Decision Table

Condition	Priority action	Routing action	Control note
Repeat alert on same subject with prior escalation	Increase priority	Senior investigator or continuing activity queue	Show prior case metadata under access controls
High-risk customer with activity inconsistent with expected profile	Increase priority	EDD-aware queue	CDD freshness and source must be visible
Alert triggered by weak entity link only	Do not over-prioritize	Entity review or general queue	Graph confidence must be explicit
Sanctions-related context present	Hard escalation flag	Sanctions referral path	AML workbench does not clear sanctions hit
Missing critical source data	Route to data exception or needs-info	Operations/data owner	Analyst should not guess missing evidence
Aged lower-priority alert near SLA breach	Raise operational priority	Queue manager review	Timeliness is a control dimension
Low score but new typology/advisory coverage	Maintain sample or elevated review	Coverage monitoring	Avoid starving emerging risks

6.3 Priority Eval

Eval	Metric
Ranking usefulness	percentage of high-risk QA/SAR-consideration cases surfaced in top bands
Timeliness	P1/P2 SLA adherence, aged alert count
Coverage	typology/scenario distribution across priority bands
Fairness/segment stability	priority distribution by product, channel, customer type, geography
Analyst trust	override rate with reason, agreement by scenario
Control safety	unsupported driver rate, missing evidence route accuracy

7. Entity Resolution and Graph Context Playbook

7.1 Entity Types

Entity	Examples	Investigation use
Person	customer, beneficial owner, authorized signer	subject identification and relationship context
Organization	business customer, merchant, employer, shell company lead	ownership and activity purpose
Account	deposit, loan, card, wallet, brokerage	funds flow and product behavior
Counterparty	ACH originator, wire beneficiary, P2P recipient, check payee	relationship and network risk
Device/IP/address	digital footprint, branch address, mailing address	mule/synthetic identity/context
Case/SAR-sensitive record	prior case, prior SAR metadata, QA finding	repeat activity and escalation context
External list item	sanctions list item, advisory entity, high-risk geography	referral and control context

7.2 Match Governance

Match level	Example	UI treatment	Downstream use
Deterministic authoritative	same customer id or account id	merged by default	can support fact statement
Deterministic corroborated	tax id plus legal name plus date	merged with source badge	can support escalation fact
Probabilistic high	strong name/address/device combination	probable link	can support investigation lead with caveat
Probabilistic medium	shared address plus transaction pattern	review link	cannot be stated as confirmed
Weak lead	one shared phone/address/IP	lead only	cannot drive final disposition alone
Conflict	inconsistent identity sources	warning	requires data review or analyst note

7.3 Graph Guardrails

Risk	Guardrail
False merge	confidence bands, explainable match features, analyst unlink path
Overexposure	edge-level access class and SAR-sensitive filtering
Stale relationship	effective date, last seen date, source freshness
Graph clutter	time-windowed paths, amount-weighted edges, typology-focused filters
Unsupported inference	distinguish "observed connection" from "suspicious relationship"
Feedback contamination	graph correction goes through data quality workflow

Graph card minimum:

edge_id: edge_9812
from_entity: ent_customer_031
to_entity: ent_counterparty_774
relationship_type: repeated_outbound_wire
first_seen: 2026-05-03
last_seen: 2026-06-18
amount_total: 84200.00
source_evidence_ids: [ev_tx_121, ev_tx_219, ev_tx_311]
confidence: high
access_class: aml_restricted
analyst_confirmed: false

8. Case Assembly and Evidence Workspace

8.1 Workspace Layout

Region	Content	Design principle
Header	alert id, subject, priority, SLA, route, status, owner	operational clarity before AI content
Trigger panel	scenario/model trigger facts and source version	source signal is always visible
Timeline	transactions, referrals, profile changes, prior alerts	chronology beats narrative
Graph	subject, accounts, counterparties, beneficial owners, devices	confidence and source shown on edges
CDD/EDD context	expected activity, risk profile, occupation/business, beneficial ownership	context for unusual activity, not static KYC display
Evidence cards	citations, raw facts, source record ids, freshness	every AI claim must trace here
Copilot panel	summary, questions, gap checklist, draft notes	assistant beside evidence
Disposition panel	options, rationale, reason codes, required evidence	human-owned decision
QA/control panel	warnings, missing fields, unsupported claims, SoD state	control state visible before close/escalate

8.2 Evidence Checklist

Evidence class	Minimum checks	Defect examples
Transaction sequence	amount, date/time, channel, originator, beneficiary, direction	missing counterparty, timezone mismatch
Customer profile	CDD risk, expected activity, account purpose, occupation/business	stale profile, missing beneficial owner
Counterparty context	relationship history, novelty, geography, recurrence	unresolved entity, unknown recipient
Prior activity	prior alerts, cases, SAR-sensitive metadata under access control	inaccessible prior case, missing repeat activity
Documents	KYC docs, invoices, account notes, analyst notes	unverified document, OCR extraction error
External context	advisories, sanctions referral, law-enforcement request indicator	restricted content exposed to wrong role
Analyst rationale	reason code, free-text explanation, evidence references	conclusion without facts

8.3 Evidence Quality Score

Evidence quality score should guide review readiness, not replace analyst judgment.

Dimension	Score question
Completeness	Are required evidence classes present for this scenario and customer segment?
Freshness	Is CDD/EDD and transaction data current enough under internal policy?
Consistency	Do sources agree on identity, ownership, account status and activity purpose?
Citation readiness	Can summary and draft claims cite source evidence ids?
Access validity	Is evidence visible only to authorized roles?
Replayability	Can the same evidence packet be reconstructed later?

9. Analyst Copilot Design

9.1 Copilot Jobs

Job	Prompt posture	Output contract
Explain alert	"Summarize trigger facts only from evidence"	trigger summary with evidence ids
Build chronology	"Order observed facts by event time"	timeline entries, not motive
Compare to expected activity	"Contrast activity with CDD expected profile"	differences, source freshness, missing context
Summarize graph	"Describe confirmed and probable relationships separately"	graph facts with confidence
Identify evidence gaps	"List missing sources for review"	task list by source owner
Suggest next steps	"Offer investigation actions, not decisions"	analyst-selectable checklist
Draft case note	"Draft concise note with citations and uncertainty"	editable cited note
Pre-check disposition	"Find unsupported claims and missing rationale"	QA warnings
Draft SAR packet	"Prepare evidence-cited draft packet for human review"	draft marked non-final

9.2 Prohibited Copilot Behaviors

Behavior	Block
"This customer committed money laundering"	Replace with observed facts and suspected indicators
"File SAR" as final instruction	Use "escalate for SAR consideration" when evidence supports
"No SAR needed" as final conclusion	Present closure rationale option for human decision
Uncited factual assertion	Require evidence id or remove
Revealing SAR existence to unauthorized role	Permission filter and output block
Using sanctions hit to clear customer	Route to sanctions workflow
Creating customer communication	Block unless separate approved workflow exists
Training from unreviewed analyst notes	Route through feedback quality gate

9.3 Output Rubric

Dimension	Pass criteria
Groundedness	Every factual claim maps to source evidence
Boundary	Output states it is decision support
Uncertainty	Low/medium confidence links are labeled
Completeness	Missing evidence is named
Neutrality	No criminal conclusion or accusatory wording
Actionability	Suggested next step is role-appropriate
Confidentiality	SAR-sensitive and restricted data are handled by role

10. Disposition Recommendation Guardrails

10.1 Human-Owned Disposition Model

AI suggests options
  -> analyst reviews evidence
  -> analyst selects disposition
  -> required reason/evidence captured
  -> senior/compliance gate if escalation threshold reached
  -> QA sampling
  -> feedback registry

10.2 Disposition Options

Option	When suggested	Required evidence	Approval
Close as no unusual activity	activity consistent with CDD/EDD and benign evidence exists	trigger facts, benign rationale, CDD consistency	analyst or policy-defined reviewer
Continue monitoring	incomplete pattern or emerging activity without enough basis	repeat trigger, monitoring rationale, future review date	analyst/supervisor
Request more information	material evidence missing	missing source and owner	operations owner
Escalate investigation	risk drivers or unresolved gaps require deeper review	cited risk drivers, gap list	senior investigator route
Refer to fraud/sanctions/EDD	domain-specific control signal present	referral evidence and scope	receiving control function
SAR consideration packet	human sees sufficient suspicious activity concern for review	chronology, subject data, evidence packet	BSA/AML owner or internal policy role

10.3 Recommendation Explanation

Every recommendation should include:

option label。
evidence for。
evidence against。
missing evidence。
confidence boundary。
required human role。
downstream control gate。
why the option is not a final filing decision。

11. Suspicious Activity Escalation and SAR Draft Guardrails

11.1 Escalation Packet

escalation_packet:
  case_id: case_2026_771
  escalation_type: sar_consideration
  created_by: analyst_42
  ai_assisted: true
  basis:
    - observed rapid movement of funds
    - activity inconsistent with stated business purpose
    - repeated new counterparties
  evidence_ids:
    - ev_tx_01
    - ev_tx_02
    - ev_cdd_01
    - ev_graph_04
  unresolved_questions:
    - relationship to counterparty group
    - legitimate business explanation for wire corridor
  copilot_output_hash: hash_...
  human_owner: aml_supervisor_7

11.2 SAR Draft Workflow

Step	AI role	Human/control role
Evidence selection	suggest relevant facts and transactions	analyst selects and confirms
Draft chronology	order cited facts	analyst edits
Narrative assist	propose neutral wording with citations	BSA/AML reviewer approves wording
Quality pre-check	flag missing fields, unsupported facts, risky wording	reviewer resolves or documents exception
Filing handoff	prepare packet for approved filing process	authorized filer uses approved workflow
Recordkeeping	link draft, final packet id, evidence and approvals	compliance-owned retention process

11.3 SAR Draft Controls

Control	Implementation
No auto-file	No model/tool permission can submit SAR; filing API absent or human-token gated
Citation required	Draft sentences referencing facts require evidence ids
Unsupported claim blocker	Unsupported claims are removed or marked for human evidence entry
SAR-sensitive retrieval	RAG filters prior SAR content by role, case need and policy
Edit diff	Human edits to AI draft are versioned
Disclosure warning	UI warns against inappropriate SAR disclosure and customer notification
Final decision label	Draft states "AI-assisted draft for human review"
Filing record link	Workbench stores handoff id or acknowledgement metadata according to policy, not hidden copy leakage

12. QA and Quality Control

12.1 QA Sampling Strategy

Sample type	Purpose
Random baseline	measure overall quality and drift
Risk-based P1/P2	check high-risk investigation quality
Closure sample	detect weak no-file / no-unusual-activity rationale
AI-heavy sample	inspect cases with high copilot reliance
Low-confidence entity sample	detect graph/identity errors
SAR draft sample	check evidence, wording, completeness and confidentiality
Analyst outlier sample	detect training or incentive issues
Scenario tuning sample	measure before/after control impact

12.2 QA Rubric

Rubric area	Defect
Evidence completeness	required source missing without explanation
Citation quality	case note or draft claim lacks evidence id
CDD usage	expected activity ignored or stale CDD not flagged
Entity reasoning	probable link stated as confirmed
Disposition rationale	closure/escalation reason not tied to facts
SAR boundary	AI output treated as filing decision
Confidentiality	restricted content visible to unauthorized role
Timeliness	SLA breach without documented reason
Copilot safety	hallucination, unsupported inference, unsafe wording
Workflow control	maker-checker or approval bypass

12.3 QA Finding Record

qa_finding:
  finding_id: qaf_2026_1031
  case_id: case_2026_771
  sampled_reason: closure_sample
  defect_type: citation_quality
  severity: medium
  observation: closure rationale references counterparty relationship not supported by evidence
  required_action: reopen_note_for_correction
  owner: analyst_supervisor_3
  system_feedback:
    route: copilot_eval_set
    eligible_for_training: false
    reason: requires corrected human rationale first

13. Tuning Feedback and Continuous Improvement

13.1 Feedback Routes

Feedback source	Route	Not allowed
Analyst disagreement	eval enrichment, UI improvement, prompt review	direct model update without review
QA defect	training, control remediation, eval cases	hiding defect inside productivity metric
False-positive driver	scenario tuning, CDD feature improvement	suppressing segment without coverage review
False-negative proxy	scenario gap review, typology coverage update	ignoring because no alert was generated
Entity resolution correction	data quality and graph model tuning	silently overwriting historical graph without trace
SAR draft defect	prompt/RAG/citation validator fix	letting draft model self-certify
Data gap	source owner remediation	asking analyst to work around permanent missing source
Adoption friction	workflow redesign	forcing usage through KPI alone

13.2 Tuning Change Gate

Gate	Evidence required
Business rationale	problem statement, impacted scenario, expected benefit
Coverage check	typology/scenario link and blind-spot assessment
Data impact	source fields, lineage, DQ score, missingness by segment
Backtest	before/after alert volume, precision proxy, sample review
QA review	sample defects and closure quality
Model risk review	materiality, validation scope, independent challenge
Rollout plan	pilot cohort, monitoring, rollback criteria
Approval	scenario owner, BSA/AML owner, model risk or governance role as applicable

13.3 Anti-Gaming Guardrail

Any tuning proposal that only shows alert-volume reduction is incomplete. It must also show:

high-risk typology coverage impact。
sampled closure quality。
false-negative proxy or lookback check。
product/channel/customer segment slices。
data quality sensitivity。
operational capacity effect。
QA and audit replay readiness。

14. Model Risk, Eval and Validation

14.1 AI System Inventory

Inventory item	Required fields
Priority engine	model/rule version, features, route policy, owner, validation evidence
Entity resolution	algorithm, sources, confidence calibration, false merge/missed link metrics
Graph analytics	graph schema, edge confidence, temporal logic, access class
RAG retriever	source registry, index version, permission filter, freshness
LLM summarizer	model, prompt, output schema, prohibited behaviors, eval report
Disposition assist	rubric, options, boundary statement, human gate
SAR draft assistant	citation validator, wording guardrails, no-file/no-submit control
QA judge	rubric, human alignment, drift monitoring, false pass rate
Workflow engine	state transitions, SoD rules, audit events, fallback

14.2 Eval Suite

Eval category	Test examples	Release gate
Groundedness	all factual claims cite correct evidence	critical failures block
Missing evidence	system detects absent CDD/EDD/counterparty context	scenario-specific threshold
Entity resolution	false merge and missed link by segment	high-risk false merge hard stop
Queue ranking	P1/P2 quality, timeliness and coverage	no material coverage regression
SAR boundary	no auto-file, no final filing decision, no prohibited disclosure	hard stop
Prompt injection	malicious note/document attempts to override policy	hard stop for tool or data leakage
Confidentiality	SAR-sensitive and restricted content filtered by role	hard stop
Human oversight	reviewer can understand, challenge, override and escalate	UAT plus QA sample
QA judge	automated QA aligns with human QA	judge cannot be sole control
Drift	production distribution, source freshness, output quality	monitoring alert and review

14.3 Validation Questions

Area	Challenge question
Use case boundary	Is the system framed as decision support, or does workflow pressure make it de facto decisioning?
Conceptual soundness	Why are graph/RAG/LLM appropriate for this AML workflow?
Data adequacy	Are CDD, transactions, counterparty, sanctions/fraud referrals and case history complete enough?
Process verification	Are prompt/model/source/rule changes controlled and replayable?
Outcome analysis	Does quality improve without hidden coverage loss?
Human oversight	Can analysts and reviewers challenge AI with enough evidence and time?
Independent challenge	Are model risk/QA/internal audit able to test without builder conflict?

15. Audit Trail, Access and Segregation of Duties

15.1 Audit Event Schema

audit_event:
  event_id: aud_2026_88192
  event_type: disposition_selected
  case_id: case_2026_771
  alert_id: alert_2026_18491
  actor_id: analyst_42
  actor_role: aml_analyst
  timestamp: 2026-06-30T15:22:17Z
  action:
    disposition: escalate_for_sar_consideration
    reason_code: pattern_inconsistent_with_cdd
  evidence_ids:
    - ev_tx_01
    - ev_cdd_01
    - ev_graph_04
  system_versions:
    priority_model: prio_v2.1
    prompt: aml_summary_v1.8
    rag_index: aml_evidence_2026_06_29
    entity_resolution: er_v3.4
  output_hash: hash_...
  previous_state: in_review
  new_state: sar_consideration

15.2 Access Model

Data class	Access principle
Standard customer data	least privilege, business purpose, logged access
AML restricted evidence	role-based AML need-to-know
SAR-sensitive content	strict role and case need; no broad RAG exposure
NSL-sensitive indicator	special handling; avoid exposing content beyond policy
Sanctions hit detail	sanctions role visibility and referral controls
Model/prompt logs	sensitive operational access; redact customer data where possible
QA findings	QA/compliance/model risk access with management reporting aggregation

15.3 SoD Matrix

Duty combination	Control
Analyst triages and independently QA-reviews same case	Block; assign independent QA
Analyst creates SAR draft and approves filing	Require BSA/AML owner or filing role per policy
Model owner changes priority model and approves validation	Independent model risk challenge
Scenario owner suppresses alert type without compliance approval	Tuning change gate and coverage review
Admin can alter evidence ledger and audit event stream	Immutable logs, dual control, privileged access monitoring
Vendor supplies model and certifies production quality	Internal validation, QA sample and audit rights
Queue manager lowers priority due to staffing alone	Document risk acceptance or reassign capacity

16. Controls and Evidence Checklist

Control objective	Evidence	Owner
Alert sources are complete and approved	source inventory, data contract, ingestion reconciliation	AML Technology
Monitoring outputs are timely	alert generation logs, SLA dashboard	AML Operations
Queue routing is explainable	priority record, driver evidence, route policy	AML Operations
CDD/EDD context is used	CDD snapshot, freshness indicator, expected activity comparison	AML Compliance/Data Owner
Entity graph is controlled	match report, confidence calibration, correction log	Data/Analytics
Copilot outputs are grounded	citation validation logs, eval report, sampled outputs	AI Product/QA
SAR filing is human-owned	workflow state, approval record, no-submit technical control	BSA/AML Compliance
SAR-sensitive data is protected	access logs, role matrix, retrieval filter tests	Security/Compliance
QA is independent	sample plan, QA findings, reviewer independence evidence	QA/Compliance Testing
Tuning is governed	change request, coverage regression, approval	Scenario Owner/Model Risk
Model risk is managed	inventory, validation plan, monitoring, revalidation record	Model Risk
Audit replay is possible	append-only event log, version registry, evidence ids	Internal Audit/Platform
Deficiencies are remediated	issue log, CAPA, closure evidence, management reporting	Control Owner

17. Metrics Dashboard

17.1 Executive View

Metric	Healthy signal	Risk signal
Risk-adjusted investigation capacity	more high-risk cases reviewed with stable quality	throughput up, QA quality down
Queue aging	fewer aged P1/P2 alerts	old high-risk alerts accumulating
Evidence completeness	required evidence present before disposition	closures with missing CDD/counterparty context
SAR consideration quality	fewer unsupported draft defects	more narrative defects or late escalations
Coverage stability	no material typology/channel blind spot	alert suppression after tuning
Human oversight	meaningful overrides and reasoned disagreements	near-100 percent acceptance of AI suggestions
Audit completeness	trace events complete	missing model/prompt/evidence versions

17.2 Analyst Adoption View

Metric	Interpretation
Evidence card usage	whether analysts rely on structured evidence
Copilot summary edit distance	whether drafts are useful without being rubber-stamped
Gap checklist completion	whether AI improves investigation completeness
Recommendation override reasons	trust calibration and model/product issues
Time in source systems	reduction indicates workbench value
Escalation packet rework	high rework indicates evidence or UI issues

17.3 Model/Control View

Metric	Interpretation
Unsupported claim rate	hallucination or citation failure
Retrieval miss rate	RAG/source/index issue
Entity false merge rate	graph risk
Priority override concentration	routing/model bias or policy mismatch
QA judge false pass rate	automated QA cannot be trusted alone
SAR-sensitive access exceptions	confidentiality control risk
Tuning rollback count	change quality and coverage impact

18. Implementation Roadmap

18.1 First 30 Days - Evidence-First MVP

Workstream	Deliverable
Scope	choose 2-3 alert scenarios and 1-2 queues
Data	source inventory, CDD/transaction/case evidence schema, lineage
UX	queue console, evidence cards, timeline, basic disposition panel
AI	retrieve-and-summarize with citation validator; no SAR draft yet
Controls	role model, audit event schema, no-auto-SAR technical boundary
Eval	groundedness, citation accuracy, missing evidence detection
Ops	analyst pilot, feedback taxonomy, daily defect review

18.2 Days 31-60 - Graph and Controlled Copilot

Workstream	Deliverable
Entity resolution	confidence-banded entity graph and correction workflow
Queue	driver-based prioritization and SLA routing
Copilot	gap checklist, investigation plan, case note draft
QA	sample plan, rubric, QA console, defect taxonomy
Model risk	system inventory, validation scope, revalidation triggers
Metrics	productivity, quality, adoption and control dashboard

18.3 Days 61-90 - SAR Packet and Governance Loop

Workstream	Deliverable
SAR assist	evidence-cited draft packet, unsupported-claim blocker, human filing handoff
Tuning	scenario feedback workflow, coverage regression, change approval
Independent challenge	model risk/QA validation report and issue log
Audit	case replay pack, event completeness test, privileged access review
Training	analyst AI literacy, automation-bias drill, SAR confidentiality handling
Management	monthly KRI/KPI pack and control remediation review

19. Implementation Guardrails

Area	Guardrail
Product scope	Start with evidence assembly and cited summaries before SAR narrative generation
Data	Do not ingest unrestricted SAR-sensitive content into broad RAG index
UX	Do not make AI recommendation the default selected disposition
Workflow	Require human reason code and evidence references for close/escalate decisions
Model	Version prompts, models, retrievers, rules, thresholds and graph algorithms
Feedback	Separate raw analyst action from QA-reviewed training label
Operations	Do not tune alert volume to staffing capacity without risk acceptance
Security	Enforce row/field/evidence-level access, not just page-level roles
Audit	Design trace schema before pilot; screenshots are not enough
Vendor	Contract for logs, version notices, data handling, audit rights and exit

20. Anti-Patterns and Fixes

Anti-pattern	Symptom	Fix
LLM-first AML	impressive summaries but weak evidence	evidence ledger first, copilot second
SAR draft as MVP	narrative looks good, facts not controlled	build case assembly and citation validator first
Auto-close low score	lower backlog but higher blind-spot risk	human closure, QA sample, coverage metrics
Graph magic	dense network visual with no confidence	confidence-banded edges and source cards
Hidden decisioning	AI recommendation becomes default operational outcome	explicit human gate and non-default UI
Productivity tunnel vision	faster cases but more QA defects	balanced scorecard
Tuning by complaints	threshold changes react to queue pressure	governed change gate and regression testing
Self-validating AI	model/judge grades its own work	independent QA and human calibration
Audit after launch	missing versions and evidence ids	event schema and version registry from day one

21. PM / Architect Implications

21.1 PM Implications

PM concern	Practical stance
Value proposition	Sell capacity plus quality plus auditability, not "AI replaces analysts"
MVP	Evidence workspace and cited copilot before SAR draft
User adoption	Measure edit distance, evidence card usage, override reasons and rework
Risk appetite	Define which dispositions need senior review, QA and model risk gates
Training	Include automation bias, citation checking, confidentiality and escalation drills
Stakeholder alignment	Operations wants speed; Compliance wants quality; Model Risk wants validation; Audit wants replay

21.2 Architect Implications

Architecture concern	Practical stance
Source of truth	Source systems plus evidence ledger, not model text
Context engine	Entity graph and timeline should be independently testable
AI orchestration	Use structured outputs, citation validation and tool permissions
Security	SAR-sensitive and investigation data need fine-grained retrieval filters
Observability	Log user action, evidence ids, model/prompt/index versions and output hashes
Resilience	Provide manual fallback when AI/RAG/graph services degrade
Governance	Connect system inventory, release gates, monitoring and revalidation triggers

22. Interview Pack

Q1: 如何设计 AML alert triage AI workbench?

30 秒版本:

我会把它设计成 evidence-first workbench, 不是聊天机器人。核心链路是 alert intake、entity resolution、CDD/EDD context、graph/timeline evidence assembly、risk-based queue prioritization、analyst copilot、human-owned disposition、SAR draft guardrails、QA、tuning feedback 和 audit replay。AI 辅助排序、总结、查缺口和起草, 但不做 SAR filing decision。

2 分钟版本:

架构上先建 evidence ledger 和 entity graph。每个 alert 保留 source scenario/model version、trigger facts、SLA 和 route。entity resolution 输出 confidence-banded subject graph, 图谱边都有 source、timestamp 和 access class。工作台里 analyst 看到 timeline、CDD expected activity、counterparty graph、prior case context 和 evidence cards。Copilot 只能基于证据输出结构化 summary、gap checklist、next step 和 disposition options, 每个事实必须有 evidence id。SAR 草稿只是 human-reviewed packet, 没有 auto-filing tool permission。上线控制包括 QA sampling、feedback taxonomy、scenario tuning change gate、model risk validation、SAR-sensitive access control 和 append-only audit trail。价值指标同时看调查时间、QA defect、queue aging、coverage、override 和 audit completeness。

Q2: 如何防止 AI 误导 analyst?

Control	Explanation
Evidence-first UI	Analyst 先看到证据和来源, 再看到 AI summary
Citation validator	Unsupported factual claim cannot pass
Non-default recommendation	AI option cannot be one-click default decision
Uncertainty label	Low/medium confidence graph links clearly标识
QA sample	AI-heavy cases sampled more aggressively
Training	Automation bias and confidentiality drills
Override analytics	Monitor over-acceptance and disagreement quality

Q3: 如何处理 SAR draft?

Principle	Answer
Boundary	AI drafts evidence-cited packet, not filing decision
Evidence	Every factual statement maps to transaction/CDD/case evidence
Language	Neutral, observed facts, no criminal conclusion
Confidentiality	SAR-sensitive access and retrieval controls
Approval	Human BSA/AML owner reviews and filing process remains controlled
Audit	Store model/prompt/evidence/diff/approval chain

Q4: 如何证明系统没有降低 AML 覆盖?

Evidence	Use
Typology/scenario coverage matrix	shows what remains covered after tuning
Priority distribution by scenario	detects starvation of low-volume risks
QA closure sample	catches weak closures
False-negative proxies	repeat alerts, lookbacks, fraud referrals, law-enforcement feedback where permitted
Segment slices	product/channel/geography/customer type stability
Regression eval	before/after model/rule/prompt threshold comparison

23. Relationship to Existing Assets

Repo asset	Relationship
`docs/ai-foundations/papers/144-ai-aml-alert-triage-investigation-workbench-architecture.md`	本 playbook 的架构解读 companion。
`docs/AI_FINANCIAL_CRIME_TYPOLOGY_SCENARIO_COVERAGE_PLAYBOOK.md`	本文链接 typology coverage, 不重复 red flag/SAR narrative coverage。
`docs/AML_COPILOT_PRD.md`	可作为 prototype-first MVP framing。
`docs/AML_GOVERNANCE_MAP.md`	可作为 AML copilot governance snapshot。
`docs/AI_HUMAN_OVERSIGHT_HITL_PLAYBOOK.md`	深化 human oversight、override、handoff 和 stop path。
`docs/AI_MODEL_RISK_MANAGEMENT_PLAYBOOK.md`	深化 AI system inventory、validation、monitoring 和 model risk lifecycle。
`docs/AI_SEGREGATION_OF_DUTIES_DUAL_CONTROL_PLAYBOOK.md`	深化 maker-checker、dual control 和 incompatible duties。
`docs/AI_AUDIT_EVIDENCE_BINDER_PLAYBOOK.md`	深化 control evidence、audit binder 和 regulator-ready evidence map。