AI Financial Crime Typology / Scenario Coverage Playbook
本文是学习、作品集、架构训练和内部治理讨论材料。
AI Financial Crime Typology / Scenario Coverage / SAR Evidence Architecture Playbook
定位: 面向高级 AI PM / Senior BA / Product Architect / AML Technology Architect / Financial Crime Transformation Lead, 把 AML / fraud / sanctions / scam / SAR assist AI 从单点检测模型升级为 typology-driven、coverage-aware、evidence-backed、human-owned 的生产控制体系。 适用范围: transaction monitoring、AML alert triage、fraud-to-AML referral、KYC/CDD refresh、case investigation copilot、SAR narrative assistant、FinCEN advisory response、scenario coverage review、model/rule/LLM triage governance。 核心产出: typology object model、scenario library governance、red flag evidence mapping、coverage measurement、synthetic and real-case eval、SAR narrative evidence bundle、alert-to-case-to-SAR traceability、operating model、KRI dashboard 和 30-day portfolio lab。
0. Disclaimer
本文是学习、作品集、架构训练和内部治理讨论材料。
本文不是法律意见、合规结论、SAR filing decision、suspicious activity determination、监管解释、模型验证报告或审计报告。
正式项目必须由 Legal、BSA/AML Compliance、Fraud、Sanctions、Risk、Model Risk、Privacy、Security、Operations、Internal Audit、Business Owner 和管理层结合机构类型、司法辖区、产品、客户、渠道、数据、监管承诺和内部政策确认。
关键边界:
- AI 可以辅助识别 signal、组织 evidence、检查缺口、生成 analyst summary、草拟 narrative、做 QA pre-check。
- AI 不应替代人类 AML / BSA / Compliance owner 的 SAR decision。
- AI 不应把 red flag 推断成犯罪结论。
- AI 不应把 SAR-sensitive information 暴露给无权限用户或客户。
- AI 不应决定是否告知客户、是否关闭账户、是否联系 law enforcement 或是否提交 SAR。
- SAR confidentiality、supporting documentation、retention、access 和 disclosure 必须按适用规则和机构政策处理。
1. Executive Framing
金融犯罪 AI 的常见误区是把目标缩成一个指标:
reduce false positives
increase alert precision
draft SAR narratives faster
automate analyst work
这些目标有价值, 但不够。高级架构目标应该是:
typology coverage
+ scenario effectiveness
+ evidence completeness
+ human decision ownership
+ SAR quality
+ audit replay
+ continuous threat update
一句话:
Financial crime AI is a coverage and evidence architecture before it is a model optimization problem.
1.1 Product Principles
- Typology library 是 product control asset。
- Scenario inventory 是 architecture asset。
- SAR evidence bundle 是 compliance decision asset。
- LLM triage 是 assistant capability, 不是 decision authority。
- Coverage dashboard 应该进入 management review。
- QA findings 必须回流到 typology、scenario、data、prompt、model、training 和 staffing。
1.2 Strong Questions
| Question | Strong answer |
|---|---|
| 哪些 typologies 在 scope | typology registry with source anchors and owners |
| 哪些 products/channels/customers 被覆盖 | coverage matrix by segment and data dependency |
| 哪些 red flags 可观察 | red flag evidence map |
| 哪些 scenario 是 active / partial / manual / gap | scenario inventory |
| 哪些 data quality 会造成 blind spot | feature contract and DQ score |
| 哪些 case 可 replay | evidence ledger and trace id |
| 哪些 SAR narrative 有 evidence defect | SAR QA rubric and sample findings |
2. Source Anchors
| Anchor | Official link | 本文使用方式 |
|---|---|---|
| FFIEC BSA/AML Suspicious Activity Reporting Overview | https://bsaaml.ffiec.gov/manual/SuspiciousActivityReporting/01;current manual path: https://bsaaml.ffiec.gov/manual/AssessingComplianceWithBSARegulatoryRequirements/04 | 组织 suspicious activity identification、alert management、SAR decision making、SAR completion、supporting documentation、confidentiality、continuing activity 和 board/management notification 的证据链 |
| FFIEC BSA/AML Appendix F Red Flags | https://bsaaml.ffiec.gov/manual/Appendices/08;current Appendix F path: https://bsaaml.ffiec.gov/manual/Appendices/07 | 组织 red flag taxonomy、additional scrutiny、key terms、management focus 和 typology-to-scenario mapping;/Appendices/08 当前显示 Appendix G Structuring, 实务引用应核对官方路径 |
| FinCEN Advisories / Bulletins / Fact Sheets | https://www.fincen.gov/resources/advisoriesbulletinsfact-sheets | 作为 emerging typology、red flag、key term、sector threat 和 scenario refresh 的 source feed |
| FinCEN BSA Filing Information | https://www.fincen.gov/resources/filing-information | 约束 BSA E-Filing、electronic filing instructions、SAR/CTR filing resources 和 filing operations handoff |
| FATF Recommendations | https://www.fatf-gafi.org/en/publications/Fatfrecommendations/Fatf-recommendations.html | 参考 international AML/CFT/CPF risk-based framework、CDD、recordkeeping、suspicious transaction reporting 和 effectiveness orientation |
| NIST AI RMF | https://www.nist.gov/itl/ai-risk-management-framework | 用 Govern / Map / Measure / Manage 组织 AI risk ownership、scenario mapping、eval、monitoring、KRI 和 continuous improvement |
架构设计不要把这些来源混成单一规则。更好的路径是:
source anchor -> control objective -> scenario requirement -> evidence requirement
-> human owner -> review cadence -> jurisdiction / entity applicability check
3. Typology Object Model
Typology 是金融犯罪模式的业务和风险对象, 不是一个 alert rule, 也不是一个 SAR category。
3.1 Typology Entity
typology_id: typ_mule_account_digital_onboarding
name: Digital mule account with rapid movement
risk_family: money_laundering_fraud
source_anchors: [FinCEN_advisory, FFIEC_red_flags, internal_risk_assessment]
customer_segments: [consumer, student, gig_worker]
products_channels: [digital_account_opening, ACH, debit_card, P2P, crypto_on_ramp]
red_flags: [rapid_movement, unrelated_counterparties, shared_device_cluster, thin_profile]
evidence_requirements: [account_open_date, transaction_sequence, counterparty_graph, device_network, KYC_profile, analyst_notes]
SAR_relevance: possible_key_terms_and_narrative_prompts
owner: AML Typology Owner
review_cadence: monthly_high_risk_or_quarterly_standard
status: active
3.2 Required Fields
| Field | Purpose |
|---|---|
typology_id | Stable identity for coverage and evidence |
risk_family | ML, TF, sanctions, fraud, scam, elder exploitation, cyber, TBML |
source_anchor | FFIEC, FinCEN, FATF, internal risk assessment, audit issue, law enforcement feedback |
business_definition | Plain-language description for PM / BA / analyst |
customer_scope | Customer segments where it applies |
product_channel_scope | Products and channels where it can manifest |
red_flag_ids | Observable indicators |
scenario_ids | Detection and review scenarios |
evidence_requirements | Minimum evidence for investigation and SAR consideration |
false_positive_drivers | Legitimate explanations or benign patterns |
human_review_questions | Questions analyst must answer |
SAR_relevance | Potential SAR categories, key terms, narrative prompts |
control_owner | Accountable owner |
coverage_status | active, partial, manual, gap, retired |
3.3 Typology Families
| Family | Examples | Design focus |
|---|---|---|
| Structuring | cash below threshold, monetary instruments, multi-branch pattern | transaction sequence and reporting avoidance context |
| Mule accounts | rapid movement, pass-through, network cluster | graph, velocity, onboarding, counterparty |
| Scams | romance scam, investment scam, impersonation | customer narrative, payment destination, intervention |
| Elder exploitation | pressure, confusion, new beneficiary, unusual transfer | vulnerability protection and escalation |
| Check fraud | altered item, stolen mail, duplicate deposit | image, return code, payee and fraud referral |
| Cyber-enabled crime | account takeover, ransomware payment, BEC | device, IP, beneficiary, urgent payment narrative |
| TBML | invoice mismatch, high-risk corridors, shell trade entities | trade documents and specialist review |
| Sanctions evasion | front companies, intermediary layering | sanctions screening and network intelligence |
| Terrorist financing | small-value patterns, high-risk geography, NPO misuse | FATF-informed risk and enhanced review |
3.4 Relationship Design
Typologies overlap. Elder exploitation may involve romance scam, mule accounts and crypto exit. Object model should allow primary typology、secondary typology、related typology、advisory-driven update 和 superseded typology。
4. Scenario Library Governance
Scenario 是 typology 的可执行覆盖方式。它可以是 rule、threshold、anomaly model、graph detector、supervised model、LLM triage rubric、manual review protocol 或 hybrid orchestration。
4.1 Scenario Record
scenario_id: scn_structuring_cash_multi_branch_001
typology_id: typ_structuring_cash
name: Multi-branch below-threshold cash deposit pattern
detection_method: rule_plus_profile_contrast
business_logic: Detect repeated cash deposits below reporting threshold across branches within rolling window, contrasted with expected customer activity.
data_dependencies: [cash_amount, branch_id, teller_channel, customer_id, account_id, transaction_date, expected_cash_activity]
evidence_required: [transaction_timeline, branch_pattern, customer_profile, prior_cash_activity, business_nature]
human_review_questions: [expected_for_customer, legitimate_purpose, reporting_avoidance_pattern]
false_positive_drivers: [seasonal_cash_business, event_driven_cash_receipts]
coverage_status: active
owner: AML Scenario Owner
4.2 Lifecycle
candidate -> design -> data feasibility -> calibration -> UAT with historical cases
-> pilot -> production -> QA monitoring -> tuning -> retired or replaced
4.3 Governance Questions
| Stage | Questions |
|---|---|
| Candidate | Which typology and red flags does it cover? |
| Design | What data and evidence are required? |
| Calibration | What is expected false positive and false negative risk? |
| Pilot | Did analysts find the evidence useful? |
| Production | Are alerts routed to the right queue? |
| QA | Are dispositions consistent and documented? |
| Tuning | Does threshold change reduce coverage? |
| Retirement | What compensating control replaces it? |
4.4 Coverage States
| State | Meaning | Example |
|---|---|---|
| Active automated | production rule/model detects and routes alerts | cash structuring rule |
| Active hybrid | model plus analyst review | mule graph cluster |
| Manual-only | policy requires manual referral | elder exploitation branch referral |
| Partial | only some products/channels covered | wire covered, RTP not covered |
| Gap | known typology without adequate scenario | new scam advisory not implemented |
| Retired | no longer active with replacement documented | old threshold replaced by model |
| Suppressed | disabled with risk acceptance | temporary data quality incident |
4.5 Review Triggers
| Trigger | Action |
|---|---|
| New FinCEN advisory | typology and red flag impact review |
| Product launch | product/channel coverage assessment |
| New payment rail | scenario feasibility review |
| Model/rule tuning | coverage regression check |
| SAR QA defect spike | scenario evidence review |
| Law enforcement feedback | typology update |
| Audit finding | control remediation |
| Data source change | data dependency retest |
5. Red Flag / Evidence Mapping
Red flag 是 potential indicator, 不是 conclusion。设计目标是把 red flag 转成 observable evidence 和 review question。
5.1 Red Flag Object
red_flag_id: rf_rapid_movement_of_funds
name: Rapid movement of funds through account
observable_data: [inbound_amount, outbound_amount, time_between_flows, account_age, counterparty_count]
evidence_questions:
- Is this expected for the customer's profile and purpose?
- Are counterparties related, known, or high risk?
- Is there a legitimate documented explanation?
typologies: [mule_account, scam_proceeds, layering]
false_positive_drivers: [payroll_processor, marketplace_seller, escrow_like_activity]
severity: medium_high
5.2 Evidence Mapping Table
| Red flag | Observable data | Investigation question | Evidence artifact |
|---|---|---|---|
| Activity inconsistent with profile | expected vs actual behavior | What changed and why? | KYC/CDD profile, transaction history |
| Multiple small cash deposits | amount, branch, frequency | Is pattern consistent with reporting avoidance? | cash timeline, branch map |
| Rapid movement | inbound/outbound timing | Is account pass-through? | transaction sequence |
| New beneficiary high-value wire | beneficiary, amount, relationship | Is beneficiary expected or suspicious? | beneficiary profile, customer notes |
| Shared device/address cluster | device, IP, address, phone | Are accounts connected? | graph cluster |
| High-risk geography | origin/destination, corridor | Is there plausible business reason? | wire details, customer business |
| Scam narrative | customer notes, payment purpose | Is customer being coached or deceived? | call transcript, chat extract |
| Check alteration | image, MICR, return code | Is item altered or stolen? | item image, return notice |
5.3 Evidence Quality Levels
| Level | Description |
|---|---|
| E0 | AI statement only, no source evidence |
| E1 | Single transaction or note, weak context |
| E2 | Transaction timeline plus customer profile contrast |
| E3 | Multi-source evidence: transactions, KYC, counterparty, notes |
| E4 | Multi-source evidence plus analyst research and alternative explanations |
| E5 | Complete bundle with supporting documents, source versions, human decision and QA readiness |
Minimum target: alert triage can start with E1/E2; case escalation should reach E3; SAR consideration should target E4/E5 where available。
6. Coverage Measurement
Coverage measurement answers:
Are the risks we say we monitor actually represented by scenarios,
data, evidence, review capacity and quality feedback?
6.1 Coverage Dimensions
| Dimension | Question |
|---|---|
| Typology | Is the typology in the registry and active? |
| Product | Which products are covered? |
| Channel | Branch, ATM, mobile, online, ACH, wire, RTP, card, P2P, crypto on-ramp |
| Customer segment | Consumer, SMB, MSB, NPO, private banking, senior customer |
| Data | Are required fields available and reliable? |
| Scenario | Is detection automated, hybrid, manual or gap? |
| Evidence | Does the alert provide required evidence? |
| Review capacity | Can analysts review within SLA? |
| QA | Are outcomes sampled and defects remediated? |
| Change control | Does tuning preserve coverage? |
6.2 Coverage Matrix
| Typology | Product | Channel | Segment | Scenario | Data | Evidence | Coverage |
|---|---|---|---|---|---|---|---|
| Structuring | deposit | branch cash | consumer / SMB | scn_cash_multi_branch | good | E3 | active |
| Mule account | DDA | digital + ACH | consumer | scn_mule_graph_velocity | partial device | E3 | active hybrid |
| Elder exploitation | deposit | wire / branch | senior | manual referral + wire red flag | good | E2/E3 | partial |
| TBML | commercial | wire / trade | SMB | specialist review | invoice partial | E2 | manual-only |
| Scam crypto exit | deposit | ACH / crypto on-ramp | consumer | payment purpose + velocity | partial | E2 | gap/partial |
6.3 Coverage Score
coverage_score =
typology_scope_score
* data_availability_score
* scenario_activation_score
* evidence_quality_score
* review_capacity_score
* QA_feedback_score
| Score | Meaning |
|---|---|
| 0.00 | no known coverage |
| 0.25 | manual or ad hoc coverage |
| 0.50 | partial scenario or weak data |
| 0.75 | active scenario with usable evidence |
| 1.00 | active scenario, strong evidence, QA and change control |
6.4 Dashboard Metrics
| Metric | Why it matters |
|---|---|
| Typology coverage percentage | Board / management view of risk coverage |
| Scenario active vs partial vs gap | Shows control maturity |
| Coverage by product/channel | Finds new rail blind spots |
| Evidence completeness by typology | Measures investigation readiness |
| Alert-to-case conversion by scenario | Shows triage usefulness |
| Case-to-SAR consideration by typology | Shows escalation behavior |
| SAR QA defect by typology | Finds narrative/evidence issues |
| Scenario stale age | Finds outdated thresholds |
| Advisory-to-scenario SLA | Measures threat responsiveness |
7. Synthetic vs Real-Case Eval
Financial crime AI eval 必须平衡 confidentiality、realism、coverage 和 repeatability。
7.1 Synthetic Eval
Synthetic cases are useful for typology coverage testing、red flag recognition、edge case generation、prompt regression、model comparison、analyst training 和 SAR narrative rubric calibration。
Strengths: repeatable, safe to share, can cover rare typologies, can create known ground truth, can stress specific evidence gaps。
Weaknesses: too clean, weak operational noise, may miss real customer ambiguity, may overrepresent obvious red flags。
7.2 Real-Case Eval
Real cases are useful for operational realism、data quality issues、analyst workflow friction、source-system gaps、alert disposition variation、SAR narrative quality 和 false positive drivers。
Controls needed:
- SAR confidentiality review。
- data minimization。
- de-identification / tokenization。
- access control。
- retention policy。
- sampling approval。
- exclusion of prohibited data from external model training。
7.3 Eval Set Composition
| Bucket | Example | Purpose |
|---|---|---|
| Clear suspicious | obvious structuring pattern | recall baseline |
| Benign lookalike | seasonal cash business | false positive control |
| Ambiguous | rapid movement with partial explanation | uncertainty handling |
| Missing evidence | no expected activity profile | gap detection |
| Multi-typology | elder scam through mule account | relationship reasoning |
| Advisory-specific | new FinCEN typology | update responsiveness |
| Negative control | normal payroll / merchant settlement | over-alert prevention |
| Confidentiality trap | prompt asks for SAR existence disclosure | safety control |
7.4 Eval Metrics
| Metric | Meaning |
|---|---|
| Red flag recall | Did AI identify relevant indicators? |
| False red flag rate | Did AI invent unsupported indicators? |
| Evidence grounding | Are claims linked to source records? |
| Gap detection | Did AI identify missing evidence? |
| Typology classification accuracy | Did AI map to correct typology family? |
| Narrative completeness | Who, what, when, where, why, how |
| Human decision respect | Did AI avoid file/no-file recommendation? |
| Confidentiality compliance | Did AI avoid impermissible disclosure? |
7.5 Acceptance Criteria
- unsupported red flag invention <= 2% on gold set。
- SAR confidentiality breach = 0 tolerated in release test。
- evidence grounding >= 95% for cited transaction facts。
- gap detection recall >= 90% for intentionally incomplete cases。
- file/no-file directive phrases blocked at 100% in SAR decision context。
- analyst usefulness score improves without lowering evidence completeness。
8. SAR Narrative Evidence Bundle
SAR narrative support must be evidence-first。LLM draft should not be promoted unless the supporting evidence index is complete enough for human review。
8.1 Bundle Structure
case_id: case_aml_2026_00421
trace_id: trace_00421
alert_ids: [alert_7781]
typologies: [typ_structuring_cash]
scenarios: [scn_cash_multi_branch_001]
red_flags: [rf_multiple_below_threshold_cash_deposits, rf_activity_inconsistent_with_profile]
evidence_index:
transaction_ids: [txn_001, txn_002]
customer_profile_ref: cdd_2026_03
branch_notes_ref: note_884
AI_assistance:
model_version: llm_gateway_aml_summary_v3
prompt_version: sar_evidence_gap_check_v2
output_hash: hash_value
human_review:
analyst_id: analyst_17
reviewer_id: sar_committee_03
decision_owner: BSA_Officer_delegate
decision: SAR_considered
confidentiality:
retention_class: SAR_sensitive
access_policy: AML_need_to_know
8.2 Narrative Building Blocks
| Block | Evidence source |
|---|---|
| Subject identity | KYC/CDD, account records |
| Transaction chronology | transaction timeline |
| Unusual activity explanation | customer profile contrast |
| Red flags | scenario output and analyst notes |
| Amounts and dates | source transactions |
| Counterparties | beneficiary/payee/counterparty records |
| Customer explanation | call notes, branch notes, written response |
| Analyst research | case notes, external permitted research |
| Uncertainty | missing evidence and limitations |
| Key terms | applicable FinCEN advisory guidance if relevant |
8.3 LLM Guardrails
LLM may summarize transaction chronology, extract candidate red flags, map facts to typology candidates, list evidence gaps, draft analyst-facing narrative sections, check narrative against rubric and identify unsupported claims。
LLM must not decide SAR filing, state that a crime occurred, notify customer of SAR existence, invent explanations, hide uncertainty, remove adverse evidence to make narrative cleaner, bypass human review or access SAR evidence outside need-to-know policy。
9. Model / Rule / LLM Triage Comparison
| Method | Strength | Weakness | Best use |
|---|---|---|---|
| Deterministic rule | explainable, stable, easy to audit | threshold brittleness, high false positives | known red flags and regulatory-sensitive patterns |
| Statistical anomaly | finds deviations | hard to explain, can drift | unusual customer behavior vs baseline |
| Supervised model | learns patterns from labels | label bias, concept drift | prioritization with strong labels |
| Graph model | network detection | data integration complexity | mule networks and entity clusters |
| LLM triage | text synthesis, evidence organization | hallucination, confidentiality, automation bias | analyst summary, gap check, narrative assist |
| Hybrid orchestration | combines strengths | governance complexity | high-risk typology coverage |
Architecture pattern:
rules and models produce signals
-> graph links related entities
-> LLM summarizes only approved evidence
-> policy gate blocks decision overreach
-> human analyst investigates
-> evidence ledger records versions and actions
Decision boundary:
| Decision | Rule/model/LLM role | Human role |
|---|---|---|
| Alert generation | rule/model may trigger | monitoring owner approves scenario |
| Alert prioritization | model may rank | analyst reviews |
| Evidence summary | LLM may summarize | analyst validates |
| Case escalation | workflow may recommend | analyst/supervisor decides per policy |
| SAR narrative draft | LLM may draft from evidence | SAR owner reviews and edits |
| SAR filing decision | no autonomous AI decision | authorized compliance owner |
| Account closure | AI may provide facts | business/compliance decision owner |
10. Alert-to-Case-to-SAR Traceability
Traceability is the spine of the architecture。
10.1 Trace Chain
source event -> feature calculation -> scenario trigger -> alert id
-> queue assignment -> analyst actions -> case id -> evidence collection
-> disposition -> SAR consideration -> filing/non-filing decision
-> SAR package or decision record -> QA sample / audit replay
10.2 Minimum Event Schema
| Field | Purpose |
|---|---|
trace_id | end-to-end correlation |
source_event_ids | transactions, notes, KYC events |
feature_version | reproducible feature calculation |
scenario_id | scenario that triggered |
scenario_version | rule/model version |
alert_id | alert object |
case_id | investigation object |
typology_ids | mapped typologies |
red_flag_ids | observed indicators |
AI_assist_ids | LLM/model assistance logs |
analyst_id | human investigator |
reviewer_id | checker / supervisor |
disposition | close, escalate, SAR considered, continuing review |
decision_rationale_ref | structured rationale |
SAR_package_ref | if applicable and access-restricted |
10.3 Acceptance Criteria
- Every alert maps to scenario id and source events。
- Every case maps to alert or manual referral。
- Every LLM summary maps to source evidence and output hash。
- Every SAR consideration maps to human decision owner。
- Every non-filing decision has documented rationale per internal policy。
- Every scenario change can be tied to coverage impact。
- Every SAR-sensitive object has access control and audit log。
- Audit can replay a sampled case from source events to final disposition。
11. Operating Model
11.1 RACI
| Activity | AML Compliance | Fraud | Sanctions | AI PM | Senior BA | Architect | Data/ML | Model Risk | Operations | Legal | Audit |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Typology library ownership | A/R | C | C | C | R | C | C | C | C | C | I |
| Scenario inventory | A | C | C | R | R | A/R | R | C | C | C | I |
| Red flag evidence map | A/R | R | R | C | R | C | C | C | C | C | I |
| Data contracts | C | C | C | C | R | A/R | R | C | C | C | I |
| Model/rule development | C | C | C | C | C | R | A/R | C | C | I | I |
| LLM triage design | C | C | C | R | R | A/R | R | C | C | C | I |
| SAR decision | A/R | C | C | I | I | I | I | I | C | C | I |
| SAR evidence access policy | A/R | C | C | I | R | R | C | C | C | A/R | I |
| QA sampling | A/R | C | C | C | R | C | C | C | R | C | C |
| Audit replay | C | C | C | I | I | C | C | C | C | C | A/R |
R = Responsible, A = Accountable, C = Consulted, I = Informed。
11.2 Governance Forums
| Forum | Cadence | Agenda |
|---|---|---|
| Typology review | monthly / event-driven | FinCEN advisories, source updates, emerging threats |
| Scenario performance review | monthly | alert volume, conversion, evidence, QA defects |
| SAR quality review | monthly | narrative defects, supporting documentation, key terms |
| AI model/rule change board | per release | coverage regression, validation, risk acceptance |
| Operations capacity review | weekly | queue SLA, backlog, analyst workload |
| Management risk review | quarterly | coverage gaps, KRI trend, investment needs |
| Audit / assurance review | risk-based | replay evidence, control design, operating effectiveness |
11.3 Human Oversight Design
Human oversight is not a button。It requires authority to disagree, access to source evidence, enough time, reason codes, escalation routes, QA feedback, training on AI limitations and automation bias monitoring。
Reviewer UI should show typology candidates、red flags and source records、AI summary with grounding links、evidence gaps、customer expected activity、counterparty graph、scenario version、prior dispositions、available actions and required rationale fields。
12. Metrics / KRIs
12.1 Executive Metrics
| Metric | Meaning |
|---|---|
| Typology coverage score | Are priority risks covered? |
| Coverage gaps by product/channel | Where are blind spots? |
| High-risk scenario health | Are key scenarios functioning? |
| Advisory response SLA | How fast threats enter controls? |
| Evidence completeness | Can cases support decisions? |
| SAR quality defect rate | Are narratives and evidence adequate? |
| Case backlog by risk tier | Is review capacity sufficient? |
| AI assist usage with QA result | Is AI improving work without degrading quality? |
| SAR confidentiality incidents | Is sensitive information protected? |
| Open remediation aging | Are defects fixed? |
12.2 Product Metrics
| Metric | Product question |
|---|---|
| Analyst time to evidence packet | Does AI reduce prep time? |
| Gap detection rate | Does AI find missing evidence? |
| Unsupported claim rate | Does LLM invent or overstate? |
| Reviewer edit distance | Are drafts usable? |
| Reviewer challenge rate | Are humans exercising judgment? |
| Case reopen rate | Are closures weak? |
| False positive drivers by scenario | What causes noise? |
| Alerts with complete typology mapping | Is taxonomy embedded? |
| Manual referral conversion | Are frontline signals useful? |
12.3 Risk KRIs
| KRI | Yellow | Red |
|---|---|---|
| Critical typology without active coverage | partial/manual only | no owner or no plan |
| Scenario stale age | review overdue | high-risk scenario overdue and active |
| Evidence completeness | below target | SAR-considered cases missing source evidence |
| Unsupported AI claims | recurring low severity | material unsupported facts in narrative |
| File/no-file AI language | detected and blocked | reached reviewer or filing package |
| SAR confidentiality access | unusual access | unauthorized access or disclosure |
| QA defect repeat | same scenario repeated | no remediation owner |
| Coverage regression after tuning | small decrease accepted | material gap without approval |
13. Financial Retail Scenario Patterns
13.1 Structuring
Risk pattern: repeated cash deposits below reporting threshold, multiple branches or ATMs, business profile does not support cash level。 Evidence: cash timeline, branch pattern, customer expected activity, business profile, explanation and source of funds if available。 AI assist: summarize pattern, compare expected vs actual, flag missing CDD。 Control: red flag triggers scrutiny, not conclusion; human determines escalation or SAR consideration。
13.2 Mule Account
Risk pattern: new account, inbound funds from unrelated parties, rapid outbound movement, shared device / address / phone cluster。 Evidence: account age, velocity, counterparty graph, device network, KYC attributes。 AI assist: graph explanation, similar case retrieval, narrative chronology。 Control: AI cannot close no-suspicion cluster; cluster-level review and QA sampling。
13.3 Elder Exploitation
Risk pattern: senior customer, unusual wire or withdrawal, new beneficiary, pressure/confusion/third-party control。 Evidence: customer history, branch/call notes, beneficiary profile, transaction purpose。 AI assist: summarize protective concerns and separate customer care path from SAR evidence path。 Control: customer protection protocols are jurisdiction-specific; SAR consideration remains human-owned。
13.4 Check Fraud And Mail Theft
Risk pattern: altered check, duplicate deposit, payee mismatch, return codes and account clusters。 Evidence: check image, deposit channel, return reason, payee/account relationship, fraud case notes。 AI assist: cross-case pattern recognition and fraud-to-AML referral summary。 Control: fraud loss recovery and AML suspicion evaluation are related but distinct。
13.5 Scam Proceeds And Crypto Exit
Risk pattern: customer coerced or deceived, repeated transfers to new beneficiary, crypto on-ramp or high-risk platform。 Evidence: payment purpose, customer messages where permitted, beneficiary, transaction sequence, scam report or complaint。 AI assist: extract scam indicators, map to advisory key terms, draft customer-protection evidence summary。 Control: do not disclose SAR existence; payment intervention, complaint, fraud claim and SAR review have different owners。
13.6 TBML And Sanctions Evasion
Risk pattern: invoice mismatch, unusual routing, shell counterparties, high-risk corridor, front companies or intermediary layering。 Evidence: invoice, shipping records, counterparty profile, wire details, screening result, ownership/network data。 AI assist: document comparison and entity relationship summary。 Control: specialist review and document provenance are critical; sanctions controls are distinct from AML SAR evidence。
14. Templates
| Template | Required fields |
|---|---|
| Typology Card | typology id, risk family, source anchors, business definition, customer segments, products/channels, red flags, scenario coverage, evidence requirements, false positive drivers, SAR relevance, owner, review cadence |
| Scenario Card | scenario id, typology id, detection method, business logic, data dependencies, threshold/model configuration, alert routing, expected evidence, false positive drivers, human review questions, coverage state, QA plan |
| Red Flag Evidence Map | red flag id, description, observable data, evidence question, source artifact, typology |
| SAR Evidence Bundle | case id, trace id, typologies, scenarios, red flags, source events, supporting documents, profile reference, AI assistance trace, human review, confidentiality policy |
| Coverage Review Memo | decision needed, scope, coverage findings, evidence findings, risk, recommendation |
14.1 SAR Narrative QA Rubric
| Criterion | Pass signal | Defect |
|---|---|---|
| Who | subject and counterparties clear | vague subjects |
| What | specific suspicious activity | generic label |
| When | dates and period | missing chronology |
| Where | accounts, channels, geography | unclear channel |
| Why unusual | profile contrast | no expected activity |
| How | mechanism described | no pattern explanation |
| Evidence | source-linked facts | unsupported statements |
| Uncertainty | limitations disclosed | overclaiming |
15. Product And Architecture Requirements
15.1 Functional Requirements
- Maintain versioned typology registry with owner and source anchors。
- Maintain scenario inventory mapped to typologies and red flags。
- Map every production alert to scenario id, scenario version and source events。
- Support manual referral objects with typology candidates and evidence。
- Generate evidence packet before LLM summary is shown to reviewer。
- Require source-linked grounding for LLM factual claims。
- Block LLM output that instructs file / do not file SAR。
- Record human decision owner for SAR consideration。
- Track non-filing rationale according to internal policy。
- Link SAR narrative draft to supporting documentation index。
- Maintain access controls for SAR-sensitive evidence。
- Measure coverage by typology, product, channel and segment。
- Run scenario coverage regression before model/rule/prompt releases。
- Route QA findings back to typology/scenario owners。
15.2 Non-Functional Requirements
| Requirement | Target |
|---|---|
| Traceability | 100% alerts have source scenario and version |
| Evidence grounding | >= 95% factual AI claims linked to source |
| SAR confidentiality | zero unauthorized disclosure tolerated |
| Human decision ownership | 100% SAR considerations have authorized owner |
| Coverage regression | no material regression without approval |
| Audit replay | sampled cases replayable end to end |
| Data lineage | critical fields have lineage and quality status |
| Retention | aligned to SAR/supporting documentation policy |
| Access control | least privilege and need-to-know |
16. 30-Day Lab
目标: 30 天内完成一套可展示的 portfolio pack。推荐选择 Retail mule account、Elder exploitation wire、Check fraud referral、Small business structuring 或 Scam crypto exit。
| Days | Theme | Artifacts |
|---|---|---|
| 1-7 | Typology and scenario foundations | use-case boundary, source-anchor map, 8 typology cards, 12 scenarios, 30 red flags, coverage matrix, coverage gap memo |
| 8-14 | Evidence and workflow | alert event schema, case state machine, evidence bundle schema, manual referral spec, LLM assist policy, reviewer workbench, SAR confidentiality control |
| 15-21 | Eval and controls | synthetic case set, sanitized real-case protocol, SAR narrative rubric, grounding/confidentiality tests, model-rule-LLM comparison, coverage regression gate, QA sampling plan |
| 22-30 | Operating model and interview pack | KRI dashboard, RACI, advisory response runbook, automation-bias tabletop, coverage-regression tabletop, executive memo, portfolio case study, interview answers, audit replay demo |
Completion standard:
- Can explain 8 typologies and their red flags。
- Can show active, partial, manual and gap coverage。
- Can trace red flag to source evidence and narrative claim。
- Can show SAR decision remains human-owned。
- Can show LLM allowed/prohibited actions and tests。
- Can compare synthetic and governed real-case eval。
- Can show RACI, KRI and governance cadence。
- Can explain why coverage beats alert volume。
17. Interview Answers
Q1: 如何解释 typology、scenario、red flag、alert、case、SAR evidence 的关系?
30 秒:
Typology 是金融犯罪模式, scenario 是覆盖该模式的检测或审查机制, red flag 是可疑指标, alert 是 scenario 触发的工作对象, case 是人工调查容器, SAR evidence 是支持 SAR consideration 和 human decision 的证据链。AI 的价值是把这些对象连接起来, 不是替代 SAR 判断。
2 分钟:
我会先建立 typology library, 例如 structuring、mule account、elder exploitation、check fraud、scam proceeds。每个 typology 有 source anchors、red flags、产品渠道范围和 evidence requirements。Scenario inventory 定义哪些 rule、ML、graph 或 manual referral 覆盖这些 typology。Alert 触发后进入 case workflow, evidence ledger 记录 source events、scenario version、red flags、AI assistance、analyst notes、reviewer decision 和 SAR consideration。最后 SAR narrative 只是一部分, 真正关键是 supporting documentation and human-owned decision。
Q2: 为什么 false positive reduction 不是 AML AI 的唯一目标?
False positive reduction 可以提升效率, 但如果没有 coverage view, tuning 可能移除低频高风险 typology 或新兴威胁。AML AI 还要衡量 typology coverage、evidence completeness、SAR quality、false negative risk、scenario freshness、advisory response 和 human review quality。效率不能以 blind spot 为代价。
Q3: LLM 在 SAR workflow 中可以做什么?
LLM 可以做 evidence organization、transaction chronology、red flag candidate extraction、gap detection、draft narrative support 和 QA rubric check。它不能决定 file 或 no file, 不能认定犯罪, 不能通知客户 SAR 相关信息, 不能生成无来源事实, 也不能绕过人类 AML/BSA owner。
Q4: 如何设计 typology coverage matrix?
我会按 typology x product x channel x customer segment 建矩阵。每个 cell 记录 scenario id、data status、evidence quality、coverage state、owner、QA result 和 gap action。这样管理层看到的不是 alert volume, 而是哪些风险在哪些业务面被覆盖、哪里只有 manual control、哪里是 known gap。
Q5: 如何证明 SAR narrative 有足够 evidence?
用 evidence bundle。每个 narrative claim 链接 transaction id、customer profile、counterparty record、case note 或 supporting document。Bundle 还记录 typology id、scenario id、red flag ids、AI output hash、analyst notes、reviewer decision、decision owner、timestamp、access policy 和 retention class。审计可以从 narrative 回放到 source evidence。
Q6: 如何避免 LLM 写出“流畅但危险”的 SAR draft?
先做 evidence-first workflow, LLM 只能基于 evidence index 生成。然后用 unsupported-claim scanner、file/no-file phrase blocker、confidentiality tests、gap detection rubric 和 human review。Reviewer UI 显示 source evidence、missing evidence 和 uncertainty, 不把 AI narrative 放在第一屏当结论。
Q7: 如何处理 FinCEN advisory 更新?
建立 advisory-to-typology runbook。新 advisory 进入 intake, typology owner 评估 source relevance、red flags、key terms、affected products/channels、scenario gap、data availability、QA sample 和 release priority。更新 typology library、scenario inventory、eval cases、analyst guidance 和 dashboard, 并记录 review evidence。
Q8: Synthetic eval 和 real-case eval 如何组合?
Synthetic eval 用于覆盖 rare typologies、edge cases、prompt regression 和 known ground truth。Real-case eval 用于 operational realism、messy evidence、data quality 和 analyst workflow。真实 case 必须做 SAR confidentiality、de-identification、access control、retention 和 approval。两者组合才能同时有安全性和真实性。
Q9: 如何解释 alert-to-case-to-SAR traceability?
Traceability 是从 source event 到 final disposition 的完整链。每个 alert 有 scenario id/version and source events, case 有 analyst actions and evidence, SAR consideration 有 human decision owner and rationale, narrative 有 supporting documentation index。没有 traceability, SAR assist AI 就只是一个文本生成工具。
Q10: 如何向高管解释投资价值?
这不是增加合规文档, 而是让金融犯罪 AI 可扩展、可审计、可调优。Typology coverage 可以减少 blind spots, evidence bundle 提升 SAR quality, LLM assist 降低 analyst preparation time, traceability 降低 audit and remediation cost。关键是用 AI 提升质量和效率, 不牺牲 human accountability。
18. Portfolio Deliverables
最终作品集建议包含:
typology-library.md: 8-12 个 typology cards。scenario-inventory.md: rule / ML / graph / LLM / manual referral scenarios。coverage-matrix.md: typology x product x channel x segment coverage。red-flag-evidence-map.md: red flags to observable evidence。architecture-diagram.md: source systems to evidence ledger。alert-to-case-state-machine.md: workflow states and controls。SAR-evidence-bundle.yaml: sample structured evidence package。LLM-assist-policy.md: allowed/prohibited actions and guardrails。eval-plan.md: synthetic and governed real-case eval。KRI-dashboard-spec.md: executive, product and risk metrics。RACI.md: operating model。executive-memo.md: coverage gap and investment rationale。interview-pack.md: 30-second, 2-minute, deep-dive answers。
Portfolio narrative:
I designed a typology-driven financial crime AI architecture.
It maps source anchors and emerging advisories to typology objects,
maps typologies to scenarios and red flags,
measures coverage across products and channels,
uses LLM only as an evidence assistant,
preserves human SAR ownership,
and provides audit-replayable alert-to-case-to-SAR traceability.
19. Common Pitfalls
| Pitfall | Consequence | Better design |
|---|---|---|
| Treating SAR AI as writing assistant only | Faster weak narratives | evidence-first SAR bundle |
| Treating red flags as criminal proof | overclaiming and poor decisions | additional scrutiny and human review |
| Tuning rules without coverage review | hidden false negatives | coverage regression gate |
| LLM recommends file/no-file | governance breach and automation bias | decision-boundary guardrail |
| No source grounding | hallucinated facts | evidence index and claim trace |
| Synthetic-only testing | unrealistic quality | combine with governed real-case eval |
| Real-case testing without confidentiality | SAR/privacy risk | de-identification and access controls |
| Dashboard shows alert volume only | blind spots hidden | typology coverage dashboard |
| Manual referrals ignored | frontline red flags lost | referral object and routing |
| No advisory change process | emerging threats stale | advisory-to-scenario runbook |
| SAR package access too broad | confidentiality breach | need-to-know SAR vault |
| Human review under-capacity | rubber-stamp decisions | capacity planning and QA |
20. Final Operating View
AI financial crime architecture should answer ten questions:
- Which typologies are in scope?
- Which source anchors justify them?
- Which red flags are observable?
- Which scenarios cover which products, channels and customers?
- Which data gaps create blind spots?
- Which evidence supports each alert and case?
- What did AI assist with, and what was prohibited?
- Who made the human decision?
- Can the SAR narrative be traced to supporting documentation?
- Can audit replay the chain and management see coverage gaps?
Final memory sentence:
A mature financial crime AI system is typology-first, scenario-governed, evidence-grounded, SAR-confidential, human-owned and audit-replayable.