AI 扩展计划 / Playbooks

AI Financial Crime Typology / Scenario Coverage Playbook

本文是学习、作品集、架构训练和内部治理讨论材料。

799 行AI_FINANCIAL_CRIME_TYPOLOGY_SCENARIO_COVERAGE_PLAYBOOK.md

AI Financial Crime Typology / Scenario Coverage / SAR Evidence Architecture Playbook

定位: 面向高级 AI PM / Senior BA / Product Architect / AML Technology Architect / Financial Crime Transformation Lead, 把 AML / fraud / sanctions / scam / SAR assist AI 从单点检测模型升级为 typology-driven、coverage-aware、evidence-backed、human-owned 的生产控制体系。适用范围: transaction monitoring、AML alert triage、fraud-to-AML referral、KYC/CDD refresh、case investigation copilot、SAR narrative assistant、FinCEN advisory response、scenario coverage review、model/rule/LLM triage governance。核心产出: typology object model、scenario library governance、red flag evidence mapping、coverage measurement、synthetic and real-case eval、SAR narrative evidence bundle、alert-to-case-to-SAR traceability、operating model、KRI dashboard 和 30-day portfolio lab。

0. Disclaimer

本文是学习、作品集、架构训练和内部治理讨论材料。

本文不是法律意见、合规结论、SAR filing decision、suspicious activity determination、监管解释、模型验证报告或审计报告。

正式项目必须由 Legal、BSA/AML Compliance、Fraud、Sanctions、Risk、Model Risk、Privacy、Security、Operations、Internal Audit、Business Owner 和管理层结合机构类型、司法辖区、产品、客户、渠道、数据、监管承诺和内部政策确认。

关键边界:

AI 可以辅助识别 signal、组织 evidence、检查缺口、生成 analyst summary、草拟 narrative、做 QA pre-check。
AI 不应替代人类 AML / BSA / Compliance owner 的 SAR decision。
AI 不应把 red flag 推断成犯罪结论。
AI 不应把 SAR-sensitive information 暴露给无权限用户或客户。
AI 不应决定是否告知客户、是否关闭账户、是否联系 law enforcement 或是否提交 SAR。
SAR confidentiality、supporting documentation、retention、access 和 disclosure 必须按适用规则和机构政策处理。

1. Executive Framing

金融犯罪 AI 的常见误区是把目标缩成一个指标:

reduce false positives
increase alert precision
draft SAR narratives faster
automate analyst work

这些目标有价值, 但不够。高级架构目标应该是:

typology coverage
  + scenario effectiveness
  + evidence completeness
  + human decision ownership
  + SAR quality
  + audit replay
  + continuous threat update

一句话:

Financial crime AI is a coverage and evidence architecture before it is a model optimization problem.

1.1 Product Principles

Typology library 是 product control asset。
Scenario inventory 是 architecture asset。
SAR evidence bundle 是 compliance decision asset。
LLM triage 是 assistant capability, 不是 decision authority。
Coverage dashboard 应该进入 management review。
QA findings 必须回流到 typology、scenario、data、prompt、model、training 和 staffing。

1.2 Strong Questions

Question	Strong answer
哪些 typologies 在 scope	typology registry with source anchors and owners
哪些 products/channels/customers 被覆盖	coverage matrix by segment and data dependency
哪些 red flags 可观察	red flag evidence map
哪些 scenario 是 active / partial / manual / gap	scenario inventory
哪些 data quality 会造成 blind spot	feature contract and DQ score
哪些 case 可 replay	evidence ledger and trace id
哪些 SAR narrative 有 evidence defect	SAR QA rubric and sample findings

2. Source Anchors

Anchor	Official link	本文使用方式
FFIEC BSA/AML Suspicious Activity Reporting Overview	https://bsaaml.ffiec.gov/manual/SuspiciousActivityReporting/01；current manual path: https://bsaaml.ffiec.gov/manual/AssessingComplianceWithBSARegulatoryRequirements/04	组织 suspicious activity identification、alert management、SAR decision making、SAR completion、supporting documentation、confidentiality、continuing activity 和 board/management notification 的证据链
FFIEC BSA/AML Appendix F Red Flags	https://bsaaml.ffiec.gov/manual/Appendices/08；current Appendix F path: https://bsaaml.ffiec.gov/manual/Appendices/07	组织 red flag taxonomy、additional scrutiny、key terms、management focus 和 typology-to-scenario mapping；`/Appendices/08` 当前显示 Appendix G Structuring, 实务引用应核对官方路径
FinCEN Advisories / Bulletins / Fact Sheets	https://www.fincen.gov/resources/advisoriesbulletinsfact-sheets	作为 emerging typology、red flag、key term、sector threat 和 scenario refresh 的 source feed
FinCEN BSA Filing Information	https://www.fincen.gov/resources/filing-information	约束 BSA E-Filing、electronic filing instructions、SAR/CTR filing resources 和 filing operations handoff
FATF Recommendations	https://www.fatf-gafi.org/en/publications/Fatfrecommendations/Fatf-recommendations.html	参考 international AML/CFT/CPF risk-based framework、CDD、recordkeeping、suspicious transaction reporting 和 effectiveness orientation
NIST AI RMF	https://www.nist.gov/itl/ai-risk-management-framework	用 Govern / Map / Measure / Manage 组织 AI risk ownership、scenario mapping、eval、monitoring、KRI 和 continuous improvement

架构设计不要把这些来源混成单一规则。更好的路径是:

source anchor -> control objective -> scenario requirement -> evidence requirement
  -> human owner -> review cadence -> jurisdiction / entity applicability check

3. Typology Object Model

Typology 是金融犯罪模式的业务和风险对象, 不是一个 alert rule, 也不是一个 SAR category。

3.1 Typology Entity

typology_id: typ_mule_account_digital_onboarding
name: Digital mule account with rapid movement
risk_family: money_laundering_fraud
source_anchors: [FinCEN_advisory, FFIEC_red_flags, internal_risk_assessment]
customer_segments: [consumer, student, gig_worker]
products_channels: [digital_account_opening, ACH, debit_card, P2P, crypto_on_ramp]
red_flags: [rapid_movement, unrelated_counterparties, shared_device_cluster, thin_profile]
evidence_requirements: [account_open_date, transaction_sequence, counterparty_graph, device_network, KYC_profile, analyst_notes]
SAR_relevance: possible_key_terms_and_narrative_prompts
owner: AML Typology Owner
review_cadence: monthly_high_risk_or_quarterly_standard
status: active

3.2 Required Fields

Field	Purpose
`typology_id`	Stable identity for coverage and evidence
`risk_family`	ML, TF, sanctions, fraud, scam, elder exploitation, cyber, TBML
`source_anchor`	FFIEC, FinCEN, FATF, internal risk assessment, audit issue, law enforcement feedback
`business_definition`	Plain-language description for PM / BA / analyst
`customer_scope`	Customer segments where it applies
`product_channel_scope`	Products and channels where it can manifest
`red_flag_ids`	Observable indicators
`scenario_ids`	Detection and review scenarios
`evidence_requirements`	Minimum evidence for investigation and SAR consideration
`false_positive_drivers`	Legitimate explanations or benign patterns
`human_review_questions`	Questions analyst must answer
`SAR_relevance`	Potential SAR categories, key terms, narrative prompts
`control_owner`	Accountable owner
`coverage_status`	active, partial, manual, gap, retired

3.3 Typology Families

Family	Examples	Design focus
Structuring	cash below threshold, monetary instruments, multi-branch pattern	transaction sequence and reporting avoidance context
Mule accounts	rapid movement, pass-through, network cluster	graph, velocity, onboarding, counterparty
Scams	romance scam, investment scam, impersonation	customer narrative, payment destination, intervention
Elder exploitation	pressure, confusion, new beneficiary, unusual transfer	vulnerability protection and escalation
Check fraud	altered item, stolen mail, duplicate deposit	image, return code, payee and fraud referral
Cyber-enabled crime	account takeover, ransomware payment, BEC	device, IP, beneficiary, urgent payment narrative
TBML	invoice mismatch, high-risk corridors, shell trade entities	trade documents and specialist review
Sanctions evasion	front companies, intermediary layering	sanctions screening and network intelligence
Terrorist financing	small-value patterns, high-risk geography, NPO misuse	FATF-informed risk and enhanced review

3.4 Relationship Design

Typologies overlap. Elder exploitation may involve romance scam, mule accounts and crypto exit. Object model should allow primary typology、secondary typology、related typology、advisory-driven update 和 superseded typology。

4. Scenario Library Governance

Scenario 是 typology 的可执行覆盖方式。它可以是 rule、threshold、anomaly model、graph detector、supervised model、LLM triage rubric、manual review protocol 或 hybrid orchestration。

4.1 Scenario Record

scenario_id: scn_structuring_cash_multi_branch_001
typology_id: typ_structuring_cash
name: Multi-branch below-threshold cash deposit pattern
detection_method: rule_plus_profile_contrast
business_logic: Detect repeated cash deposits below reporting threshold across branches within rolling window, contrasted with expected customer activity.
data_dependencies: [cash_amount, branch_id, teller_channel, customer_id, account_id, transaction_date, expected_cash_activity]
evidence_required: [transaction_timeline, branch_pattern, customer_profile, prior_cash_activity, business_nature]
human_review_questions: [expected_for_customer, legitimate_purpose, reporting_avoidance_pattern]
false_positive_drivers: [seasonal_cash_business, event_driven_cash_receipts]
coverage_status: active
owner: AML Scenario Owner

4.2 Lifecycle

candidate -> design -> data feasibility -> calibration -> UAT with historical cases
  -> pilot -> production -> QA monitoring -> tuning -> retired or replaced

4.3 Governance Questions

Stage	Questions
Candidate	Which typology and red flags does it cover?
Design	What data and evidence are required?
Calibration	What is expected false positive and false negative risk?
Pilot	Did analysts find the evidence useful?
Production	Are alerts routed to the right queue?
QA	Are dispositions consistent and documented?
Tuning	Does threshold change reduce coverage?
Retirement	What compensating control replaces it?

4.4 Coverage States

State	Meaning	Example
Active automated	production rule/model detects and routes alerts	cash structuring rule
Active hybrid	model plus analyst review	mule graph cluster
Manual-only	policy requires manual referral	elder exploitation branch referral
Partial	only some products/channels covered	wire covered, RTP not covered
Gap	known typology without adequate scenario	new scam advisory not implemented
Retired	no longer active with replacement documented	old threshold replaced by model
Suppressed	disabled with risk acceptance	temporary data quality incident

4.5 Review Triggers

Trigger	Action
New FinCEN advisory	typology and red flag impact review
Product launch	product/channel coverage assessment
New payment rail	scenario feasibility review
Model/rule tuning	coverage regression check
SAR QA defect spike	scenario evidence review
Law enforcement feedback	typology update
Audit finding	control remediation
Data source change	data dependency retest

5. Red Flag / Evidence Mapping

Red flag 是 potential indicator, 不是 conclusion。设计目标是把 red flag 转成 observable evidence 和 review question。

5.1 Red Flag Object

red_flag_id: rf_rapid_movement_of_funds
name: Rapid movement of funds through account
observable_data: [inbound_amount, outbound_amount, time_between_flows, account_age, counterparty_count]
evidence_questions:
  - Is this expected for the customer's profile and purpose?
  - Are counterparties related, known, or high risk?
  - Is there a legitimate documented explanation?
typologies: [mule_account, scam_proceeds, layering]
false_positive_drivers: [payroll_processor, marketplace_seller, escrow_like_activity]
severity: medium_high

5.2 Evidence Mapping Table

Red flag	Observable data	Investigation question	Evidence artifact
Activity inconsistent with profile	expected vs actual behavior	What changed and why?	KYC/CDD profile, transaction history
Multiple small cash deposits	amount, branch, frequency	Is pattern consistent with reporting avoidance?	cash timeline, branch map
Rapid movement	inbound/outbound timing	Is account pass-through?	transaction sequence
New beneficiary high-value wire	beneficiary, amount, relationship	Is beneficiary expected or suspicious?	beneficiary profile, customer notes
Shared device/address cluster	device, IP, address, phone	Are accounts connected?	graph cluster
High-risk geography	origin/destination, corridor	Is there plausible business reason?	wire details, customer business
Scam narrative	customer notes, payment purpose	Is customer being coached or deceived?	call transcript, chat extract
Check alteration	image, MICR, return code	Is item altered or stolen?	item image, return notice

5.3 Evidence Quality Levels

Level	Description
E0	AI statement only, no source evidence
E1	Single transaction or note, weak context
E2	Transaction timeline plus customer profile contrast
E3	Multi-source evidence: transactions, KYC, counterparty, notes
E4	Multi-source evidence plus analyst research and alternative explanations
E5	Complete bundle with supporting documents, source versions, human decision and QA readiness

Minimum target: alert triage can start with E1/E2; case escalation should reach E3; SAR consideration should target E4/E5 where available。

6. Coverage Measurement

Coverage measurement answers:

Are the risks we say we monitor actually represented by scenarios,
data, evidence, review capacity and quality feedback?

6.1 Coverage Dimensions

Dimension	Question
Typology	Is the typology in the registry and active?
Product	Which products are covered?
Channel	Branch, ATM, mobile, online, ACH, wire, RTP, card, P2P, crypto on-ramp
Customer segment	Consumer, SMB, MSB, NPO, private banking, senior customer
Data	Are required fields available and reliable?
Scenario	Is detection automated, hybrid, manual or gap?
Evidence	Does the alert provide required evidence?
Review capacity	Can analysts review within SLA?
QA	Are outcomes sampled and defects remediated?
Change control	Does tuning preserve coverage?

6.2 Coverage Matrix

Typology	Product	Channel	Segment	Scenario	Data	Evidence	Coverage
Structuring	deposit	branch cash	consumer / SMB	scn_cash_multi_branch	good	E3	active
Mule account	DDA	digital + ACH	consumer	scn_mule_graph_velocity	partial device	E3	active hybrid
Elder exploitation	deposit	wire / branch	senior	manual referral + wire red flag	good	E2/E3	partial
TBML	commercial	wire / trade	SMB	specialist review	invoice partial	E2	manual-only
Scam crypto exit	deposit	ACH / crypto on-ramp	consumer	payment purpose + velocity	partial	E2	gap/partial

6.3 Coverage Score

coverage_score =
  typology_scope_score
  * data_availability_score
  * scenario_activation_score
  * evidence_quality_score
  * review_capacity_score
  * QA_feedback_score

Score	Meaning
0.00	no known coverage
0.25	manual or ad hoc coverage
0.50	partial scenario or weak data
0.75	active scenario with usable evidence
1.00	active scenario, strong evidence, QA and change control

6.4 Dashboard Metrics

Metric	Why it matters
Typology coverage percentage	Board / management view of risk coverage
Scenario active vs partial vs gap	Shows control maturity
Coverage by product/channel	Finds new rail blind spots
Evidence completeness by typology	Measures investigation readiness
Alert-to-case conversion by scenario	Shows triage usefulness
Case-to-SAR consideration by typology	Shows escalation behavior
SAR QA defect by typology	Finds narrative/evidence issues
Scenario stale age	Finds outdated thresholds
Advisory-to-scenario SLA	Measures threat responsiveness

7. Synthetic vs Real-Case Eval

Financial crime AI eval 必须平衡 confidentiality、realism、coverage 和 repeatability。

7.1 Synthetic Eval

Synthetic cases are useful for typology coverage testing、red flag recognition、edge case generation、prompt regression、model comparison、analyst training 和 SAR narrative rubric calibration。

Strengths: repeatable, safe to share, can cover rare typologies, can create known ground truth, can stress specific evidence gaps。

Weaknesses: too clean, weak operational noise, may miss real customer ambiguity, may overrepresent obvious red flags。

7.2 Real-Case Eval

Real cases are useful for operational realism、data quality issues、analyst workflow friction、source-system gaps、alert disposition variation、SAR narrative quality 和 false positive drivers。

Controls needed:

SAR confidentiality review。
data minimization。
de-identification / tokenization。
access control。
retention policy。
sampling approval。
exclusion of prohibited data from external model training。

7.3 Eval Set Composition

Bucket	Example	Purpose
Clear suspicious	obvious structuring pattern	recall baseline
Benign lookalike	seasonal cash business	false positive control
Ambiguous	rapid movement with partial explanation	uncertainty handling
Missing evidence	no expected activity profile	gap detection
Multi-typology	elder scam through mule account	relationship reasoning
Advisory-specific	new FinCEN typology	update responsiveness
Negative control	normal payroll / merchant settlement	over-alert prevention
Confidentiality trap	prompt asks for SAR existence disclosure	safety control

7.4 Eval Metrics

Metric	Meaning
Red flag recall	Did AI identify relevant indicators?
False red flag rate	Did AI invent unsupported indicators?
Evidence grounding	Are claims linked to source records?
Gap detection	Did AI identify missing evidence?
Typology classification accuracy	Did AI map to correct typology family?
Narrative completeness	Who, what, when, where, why, how
Human decision respect	Did AI avoid file/no-file recommendation?
Confidentiality compliance	Did AI avoid impermissible disclosure?

7.5 Acceptance Criteria

unsupported red flag invention <= 2% on gold set。
SAR confidentiality breach = 0 tolerated in release test。
evidence grounding >= 95% for cited transaction facts。
gap detection recall >= 90% for intentionally incomplete cases。
file/no-file directive phrases blocked at 100% in SAR decision context。
analyst usefulness score improves without lowering evidence completeness。

8. SAR Narrative Evidence Bundle

SAR narrative support must be evidence-first。LLM draft should not be promoted unless the supporting evidence index is complete enough for human review。

8.1 Bundle Structure

case_id: case_aml_2026_00421
trace_id: trace_00421
alert_ids: [alert_7781]
typologies: [typ_structuring_cash]
scenarios: [scn_cash_multi_branch_001]
red_flags: [rf_multiple_below_threshold_cash_deposits, rf_activity_inconsistent_with_profile]
evidence_index:
  transaction_ids: [txn_001, txn_002]
  customer_profile_ref: cdd_2026_03
  branch_notes_ref: note_884
AI_assistance:
  model_version: llm_gateway_aml_summary_v3
  prompt_version: sar_evidence_gap_check_v2
  output_hash: hash_value
human_review:
  analyst_id: analyst_17
  reviewer_id: sar_committee_03
  decision_owner: BSA_Officer_delegate
  decision: SAR_considered
confidentiality:
  retention_class: SAR_sensitive
  access_policy: AML_need_to_know

8.2 Narrative Building Blocks

Block	Evidence source
Subject identity	KYC/CDD, account records
Transaction chronology	transaction timeline
Unusual activity explanation	customer profile contrast
Red flags	scenario output and analyst notes
Amounts and dates	source transactions
Counterparties	beneficiary/payee/counterparty records
Customer explanation	call notes, branch notes, written response
Analyst research	case notes, external permitted research
Uncertainty	missing evidence and limitations
Key terms	applicable FinCEN advisory guidance if relevant

8.3 LLM Guardrails

LLM may summarize transaction chronology, extract candidate red flags, map facts to typology candidates, list evidence gaps, draft analyst-facing narrative sections, check narrative against rubric and identify unsupported claims。

LLM must not decide SAR filing, state that a crime occurred, notify customer of SAR existence, invent explanations, hide uncertainty, remove adverse evidence to make narrative cleaner, bypass human review or access SAR evidence outside need-to-know policy。

9. Model / Rule / LLM Triage Comparison

Method	Strength	Weakness	Best use
Deterministic rule	explainable, stable, easy to audit	threshold brittleness, high false positives	known red flags and regulatory-sensitive patterns
Statistical anomaly	finds deviations	hard to explain, can drift	unusual customer behavior vs baseline
Supervised model	learns patterns from labels	label bias, concept drift	prioritization with strong labels
Graph model	network detection	data integration complexity	mule networks and entity clusters
LLM triage	text synthesis, evidence organization	hallucination, confidentiality, automation bias	analyst summary, gap check, narrative assist
Hybrid orchestration	combines strengths	governance complexity	high-risk typology coverage

Architecture pattern:

rules and models produce signals
  -> graph links related entities
  -> LLM summarizes only approved evidence
  -> policy gate blocks decision overreach
  -> human analyst investigates
  -> evidence ledger records versions and actions

Decision boundary:

Decision	Rule/model/LLM role	Human role
Alert generation	rule/model may trigger	monitoring owner approves scenario
Alert prioritization	model may rank	analyst reviews
Evidence summary	LLM may summarize	analyst validates
Case escalation	workflow may recommend	analyst/supervisor decides per policy
SAR narrative draft	LLM may draft from evidence	SAR owner reviews and edits
SAR filing decision	no autonomous AI decision	authorized compliance owner
Account closure	AI may provide facts	business/compliance decision owner

10. Alert-to-Case-to-SAR Traceability

Traceability is the spine of the architecture。

10.1 Trace Chain

source event -> feature calculation -> scenario trigger -> alert id
  -> queue assignment -> analyst actions -> case id -> evidence collection
  -> disposition -> SAR consideration -> filing/non-filing decision
  -> SAR package or decision record -> QA sample / audit replay

10.2 Minimum Event Schema

Field	Purpose
`trace_id`	end-to-end correlation
`source_event_ids`	transactions, notes, KYC events
`feature_version`	reproducible feature calculation
`scenario_id`	scenario that triggered
`scenario_version`	rule/model version
`alert_id`	alert object
`case_id`	investigation object
`typology_ids`	mapped typologies
`red_flag_ids`	observed indicators
`AI_assist_ids`	LLM/model assistance logs
`analyst_id`	human investigator
`reviewer_id`	checker / supervisor
`disposition`	close, escalate, SAR considered, continuing review
`decision_rationale_ref`	structured rationale
`SAR_package_ref`	if applicable and access-restricted

10.3 Acceptance Criteria

Every alert maps to scenario id and source events。
Every case maps to alert or manual referral。
Every LLM summary maps to source evidence and output hash。
Every SAR consideration maps to human decision owner。
Every non-filing decision has documented rationale per internal policy。
Every scenario change can be tied to coverage impact。
Every SAR-sensitive object has access control and audit log。
Audit can replay a sampled case from source events to final disposition。

11. Operating Model

11.1 RACI

Activity	AML Compliance	Fraud	Sanctions	AI PM	Senior BA	Architect	Data/ML	Model Risk	Operations	Legal	Audit
Typology library ownership	A/R	C	C	C	R	C	C	C	C	C	I
Scenario inventory	A	C	C	R	R	A/R	R	C	C	C	I
Red flag evidence map	A/R	R	R	C	R	C	C	C	C	C	I
Data contracts	C	C	C	C	R	A/R	R	C	C	C	I
Model/rule development	C	C	C	C	C	R	A/R	C	C	I	I
LLM triage design	C	C	C	R	R	A/R	R	C	C	C	I
SAR decision	A/R	C	C	I	I	I	I	I	C	C	I
SAR evidence access policy	A/R	C	C	I	R	R	C	C	C	A/R	I
QA sampling	A/R	C	C	C	R	C	C	C	R	C	C
Audit replay	C	C	C	I	I	C	C	C	C	C	A/R

R = Responsible, A = Accountable, C = Consulted, I = Informed。

11.2 Governance Forums

Forum	Cadence	Agenda
Typology review	monthly / event-driven	FinCEN advisories, source updates, emerging threats
Scenario performance review	monthly	alert volume, conversion, evidence, QA defects
SAR quality review	monthly	narrative defects, supporting documentation, key terms
AI model/rule change board	per release	coverage regression, validation, risk acceptance
Operations capacity review	weekly	queue SLA, backlog, analyst workload
Management risk review	quarterly	coverage gaps, KRI trend, investment needs
Audit / assurance review	risk-based	replay evidence, control design, operating effectiveness

11.3 Human Oversight Design

Human oversight is not a button。It requires authority to disagree, access to source evidence, enough time, reason codes, escalation routes, QA feedback, training on AI limitations and automation bias monitoring。

Reviewer UI should show typology candidates、red flags and source records、AI summary with grounding links、evidence gaps、customer expected activity、counterparty graph、scenario version、prior dispositions、available actions and required rationale fields。

12. Metrics / KRIs

12.1 Executive Metrics

Metric	Meaning
Typology coverage score	Are priority risks covered?
Coverage gaps by product/channel	Where are blind spots?
High-risk scenario health	Are key scenarios functioning?
Advisory response SLA	How fast threats enter controls?
Evidence completeness	Can cases support decisions?
SAR quality defect rate	Are narratives and evidence adequate?
Case backlog by risk tier	Is review capacity sufficient?
AI assist usage with QA result	Is AI improving work without degrading quality?
SAR confidentiality incidents	Is sensitive information protected?
Open remediation aging	Are defects fixed?

12.2 Product Metrics

Metric	Product question
Analyst time to evidence packet	Does AI reduce prep time?
Gap detection rate	Does AI find missing evidence?
Unsupported claim rate	Does LLM invent or overstate?
Reviewer edit distance	Are drafts usable?
Reviewer challenge rate	Are humans exercising judgment?
Case reopen rate	Are closures weak?
False positive drivers by scenario	What causes noise?
Alerts with complete typology mapping	Is taxonomy embedded?
Manual referral conversion	Are frontline signals useful?

12.3 Risk KRIs

KRI	Yellow	Red
Critical typology without active coverage	partial/manual only	no owner or no plan
Scenario stale age	review overdue	high-risk scenario overdue and active
Evidence completeness	below target	SAR-considered cases missing source evidence
Unsupported AI claims	recurring low severity	material unsupported facts in narrative
File/no-file AI language	detected and blocked	reached reviewer or filing package
SAR confidentiality access	unusual access	unauthorized access or disclosure
QA defect repeat	same scenario repeated	no remediation owner
Coverage regression after tuning	small decrease accepted	material gap without approval

13. Financial Retail Scenario Patterns

13.1 Structuring

Risk pattern: repeated cash deposits below reporting threshold, multiple branches or ATMs, business profile does not support cash level。 Evidence: cash timeline, branch pattern, customer expected activity, business profile, explanation and source of funds if available。 AI assist: summarize pattern, compare expected vs actual, flag missing CDD。 Control: red flag triggers scrutiny, not conclusion; human determines escalation or SAR consideration。

13.2 Mule Account

Risk pattern: new account, inbound funds from unrelated parties, rapid outbound movement, shared device / address / phone cluster。 Evidence: account age, velocity, counterparty graph, device network, KYC attributes。 AI assist: graph explanation, similar case retrieval, narrative chronology。 Control: AI cannot close no-suspicion cluster; cluster-level review and QA sampling。

13.3 Elder Exploitation

Risk pattern: senior customer, unusual wire or withdrawal, new beneficiary, pressure/confusion/third-party control。 Evidence: customer history, branch/call notes, beneficiary profile, transaction purpose。 AI assist: summarize protective concerns and separate customer care path from SAR evidence path。 Control: customer protection protocols are jurisdiction-specific; SAR consideration remains human-owned。

13.4 Check Fraud And Mail Theft

Risk pattern: altered check, duplicate deposit, payee mismatch, return codes and account clusters。 Evidence: check image, deposit channel, return reason, payee/account relationship, fraud case notes。 AI assist: cross-case pattern recognition and fraud-to-AML referral summary。 Control: fraud loss recovery and AML suspicion evaluation are related but distinct。

13.5 Scam Proceeds And Crypto Exit

Risk pattern: customer coerced or deceived, repeated transfers to new beneficiary, crypto on-ramp or high-risk platform。 Evidence: payment purpose, customer messages where permitted, beneficiary, transaction sequence, scam report or complaint。 AI assist: extract scam indicators, map to advisory key terms, draft customer-protection evidence summary。 Control: do not disclose SAR existence; payment intervention, complaint, fraud claim and SAR review have different owners。

13.6 TBML And Sanctions Evasion

Risk pattern: invoice mismatch, unusual routing, shell counterparties, high-risk corridor, front companies or intermediary layering。 Evidence: invoice, shipping records, counterparty profile, wire details, screening result, ownership/network data。 AI assist: document comparison and entity relationship summary。 Control: specialist review and document provenance are critical; sanctions controls are distinct from AML SAR evidence。

14. Templates

Template	Required fields
Typology Card	typology id, risk family, source anchors, business definition, customer segments, products/channels, red flags, scenario coverage, evidence requirements, false positive drivers, SAR relevance, owner, review cadence
Scenario Card	scenario id, typology id, detection method, business logic, data dependencies, threshold/model configuration, alert routing, expected evidence, false positive drivers, human review questions, coverage state, QA plan
Red Flag Evidence Map	red flag id, description, observable data, evidence question, source artifact, typology
SAR Evidence Bundle	case id, trace id, typologies, scenarios, red flags, source events, supporting documents, profile reference, AI assistance trace, human review, confidentiality policy
Coverage Review Memo	decision needed, scope, coverage findings, evidence findings, risk, recommendation

14.1 SAR Narrative QA Rubric

Criterion	Pass signal	Defect
Who	subject and counterparties clear	vague subjects
What	specific suspicious activity	generic label
When	dates and period	missing chronology
Where	accounts, channels, geography	unclear channel
Why unusual	profile contrast	no expected activity
How	mechanism described	no pattern explanation
Evidence	source-linked facts	unsupported statements
Uncertainty	limitations disclosed	overclaiming

15. Product And Architecture Requirements

15.1 Functional Requirements

Maintain versioned typology registry with owner and source anchors。
Maintain scenario inventory mapped to typologies and red flags。
Map every production alert to scenario id, scenario version and source events。
Support manual referral objects with typology candidates and evidence。
Generate evidence packet before LLM summary is shown to reviewer。
Require source-linked grounding for LLM factual claims。
Block LLM output that instructs file / do not file SAR。
Record human decision owner for SAR consideration。
Track non-filing rationale according to internal policy。
Link SAR narrative draft to supporting documentation index。
Maintain access controls for SAR-sensitive evidence。
Measure coverage by typology, product, channel and segment。
Run scenario coverage regression before model/rule/prompt releases。
Route QA findings back to typology/scenario owners。

15.2 Non-Functional Requirements

Requirement	Target
Traceability	100% alerts have source scenario and version
Evidence grounding	>= 95% factual AI claims linked to source
SAR confidentiality	zero unauthorized disclosure tolerated
Human decision ownership	100% SAR considerations have authorized owner
Coverage regression	no material regression without approval
Audit replay	sampled cases replayable end to end
Data lineage	critical fields have lineage and quality status
Retention	aligned to SAR/supporting documentation policy
Access control	least privilege and need-to-know

16. 30-Day Lab

目标: 30 天内完成一套可展示的 portfolio pack。推荐选择 Retail mule account、Elder exploitation wire、Check fraud referral、Small business structuring 或 Scam crypto exit。

Days	Theme	Artifacts
1-7	Typology and scenario foundations	use-case boundary, source-anchor map, 8 typology cards, 12 scenarios, 30 red flags, coverage matrix, coverage gap memo
8-14	Evidence and workflow	alert event schema, case state machine, evidence bundle schema, manual referral spec, LLM assist policy, reviewer workbench, SAR confidentiality control
15-21	Eval and controls	synthetic case set, sanitized real-case protocol, SAR narrative rubric, grounding/confidentiality tests, model-rule-LLM comparison, coverage regression gate, QA sampling plan
22-30	Operating model and interview pack	KRI dashboard, RACI, advisory response runbook, automation-bias tabletop, coverage-regression tabletop, executive memo, portfolio case study, interview answers, audit replay demo

Completion standard:

Can explain 8 typologies and their red flags。
Can show active, partial, manual and gap coverage。
Can trace red flag to source evidence and narrative claim。
Can show SAR decision remains human-owned。
Can show LLM allowed/prohibited actions and tests。
Can compare synthetic and governed real-case eval。
Can show RACI, KRI and governance cadence。
Can explain why coverage beats alert volume。

17. Interview Answers

Q1: 如何解释 typology、scenario、red flag、alert、case、SAR evidence 的关系?

30 秒:

Typology 是金融犯罪模式, scenario 是覆盖该模式的检测或审查机制, red flag 是可疑指标, alert 是 scenario 触发的工作对象, case 是人工调查容器, SAR evidence 是支持 SAR consideration 和 human decision 的证据链。AI 的价值是把这些对象连接起来, 不是替代 SAR 判断。

2 分钟:

我会先建立 typology library, 例如 structuring、mule account、elder exploitation、check fraud、scam proceeds。每个 typology 有 source anchors、red flags、产品渠道范围和 evidence requirements。Scenario inventory 定义哪些 rule、ML、graph 或 manual referral 覆盖这些 typology。Alert 触发后进入 case workflow, evidence ledger 记录 source events、scenario version、red flags、AI assistance、analyst notes、reviewer decision 和 SAR consideration。最后 SAR narrative 只是一部分, 真正关键是 supporting documentation and human-owned decision。

Q2: 为什么 false positive reduction 不是 AML AI 的唯一目标?

False positive reduction 可以提升效率, 但如果没有 coverage view, tuning 可能移除低频高风险 typology 或新兴威胁。AML AI 还要衡量 typology coverage、evidence completeness、SAR quality、false negative risk、scenario freshness、advisory response 和 human review quality。效率不能以 blind spot 为代价。

Q3: LLM 在 SAR workflow 中可以做什么?

LLM 可以做 evidence organization、transaction chronology、red flag candidate extraction、gap detection、draft narrative support 和 QA rubric check。它不能决定 file 或 no file, 不能认定犯罪, 不能通知客户 SAR 相关信息, 不能生成无来源事实, 也不能绕过人类 AML/BSA owner。

Q4: 如何设计 typology coverage matrix?

我会按 typology x product x channel x customer segment 建矩阵。每个 cell 记录 scenario id、data status、evidence quality、coverage state、owner、QA result 和 gap action。这样管理层看到的不是 alert volume, 而是哪些风险在哪些业务面被覆盖、哪里只有 manual control、哪里是 known gap。

Q5: 如何证明 SAR narrative 有足够 evidence?

用 evidence bundle。每个 narrative claim 链接 transaction id、customer profile、counterparty record、case note 或 supporting document。Bundle 还记录 typology id、scenario id、red flag ids、AI output hash、analyst notes、reviewer decision、decision owner、timestamp、access policy 和 retention class。审计可以从 narrative 回放到 source evidence。

Q6: 如何避免 LLM 写出“流畅但危险”的 SAR draft?

先做 evidence-first workflow, LLM 只能基于 evidence index 生成。然后用 unsupported-claim scanner、file/no-file phrase blocker、confidentiality tests、gap detection rubric 和 human review。Reviewer UI 显示 source evidence、missing evidence 和 uncertainty, 不把 AI narrative 放在第一屏当结论。

Q7: 如何处理 FinCEN advisory 更新?

建立 advisory-to-typology runbook。新 advisory 进入 intake, typology owner 评估 source relevance、red flags、key terms、affected products/channels、scenario gap、data availability、QA sample 和 release priority。更新 typology library、scenario inventory、eval cases、analyst guidance 和 dashboard, 并记录 review evidence。

Q8: Synthetic eval 和 real-case eval 如何组合?

Synthetic eval 用于覆盖 rare typologies、edge cases、prompt regression 和 known ground truth。Real-case eval 用于 operational realism、messy evidence、data quality 和 analyst workflow。真实 case 必须做 SAR confidentiality、de-identification、access control、retention 和 approval。两者组合才能同时有安全性和真实性。

Q9: 如何解释 alert-to-case-to-SAR traceability?

Traceability 是从 source event 到 final disposition 的完整链。每个 alert 有 scenario id/version and source events, case 有 analyst actions and evidence, SAR consideration 有 human decision owner and rationale, narrative 有 supporting documentation index。没有 traceability, SAR assist AI 就只是一个文本生成工具。

Q10: 如何向高管解释投资价值?

这不是增加合规文档, 而是让金融犯罪 AI 可扩展、可审计、可调优。Typology coverage 可以减少 blind spots, evidence bundle 提升 SAR quality, LLM assist 降低 analyst preparation time, traceability 降低 audit and remediation cost。关键是用 AI 提升质量和效率, 不牺牲 human accountability。

18. Portfolio Deliverables

最终作品集建议包含:

typology-library.md: 8-12 个 typology cards。
scenario-inventory.md: rule / ML / graph / LLM / manual referral scenarios。
coverage-matrix.md: typology x product x channel x segment coverage。
red-flag-evidence-map.md: red flags to observable evidence。
architecture-diagram.md: source systems to evidence ledger。
alert-to-case-state-machine.md: workflow states and controls。
SAR-evidence-bundle.yaml: sample structured evidence package。
LLM-assist-policy.md: allowed/prohibited actions and guardrails。
eval-plan.md: synthetic and governed real-case eval。
KRI-dashboard-spec.md: executive, product and risk metrics。
RACI.md: operating model。
executive-memo.md: coverage gap and investment rationale。
interview-pack.md: 30-second, 2-minute, deep-dive answers。

Portfolio narrative:

I designed a typology-driven financial crime AI architecture.
It maps source anchors and emerging advisories to typology objects,
maps typologies to scenarios and red flags,
measures coverage across products and channels,
uses LLM only as an evidence assistant,
preserves human SAR ownership,
and provides audit-replayable alert-to-case-to-SAR traceability.

19. Common Pitfalls

Pitfall	Consequence	Better design
Treating SAR AI as writing assistant only	Faster weak narratives	evidence-first SAR bundle
Treating red flags as criminal proof	overclaiming and poor decisions	additional scrutiny and human review
Tuning rules without coverage review	hidden false negatives	coverage regression gate
LLM recommends file/no-file	governance breach and automation bias	decision-boundary guardrail
No source grounding	hallucinated facts	evidence index and claim trace
Synthetic-only testing	unrealistic quality	combine with governed real-case eval
Real-case testing without confidentiality	SAR/privacy risk	de-identification and access controls
Dashboard shows alert volume only	blind spots hidden	typology coverage dashboard
Manual referrals ignored	frontline red flags lost	referral object and routing
No advisory change process	emerging threats stale	advisory-to-scenario runbook
SAR package access too broad	confidentiality breach	need-to-know SAR vault
Human review under-capacity	rubber-stamp decisions	capacity planning and QA

20. Final Operating View

AI financial crime architecture should answer ten questions:

Which typologies are in scope?
Which source anchors justify them?
Which red flags are observable?
Which scenarios cover which products, channels and customers?
Which data gaps create blind spots?
Which evidence supports each alert and case?
What did AI assist with, and what was prohibited?
Who made the human decision?
Can the SAR narrative be traced to supporting documentation?
Can audit replay the chain and management see coverage gaps?

Final memory sentence:

A mature financial crime AI system is typology-first, scenario-governed, evidence-grounded, SAR-confidential, human-owned and audit-replayable.