AI human factors often get reduced to "make the interface clearer" or "add a human review step." That framing is too shallow for financial retail operations. In AML, credit, fraud, complaints, collect
AI 人因运营架构:Cognitive Load / Automation Bias / Calibrated Trust Architecture
Date: 2026-06-30
Status: evergreen
Audience: experienced CBAP / financial retail PM / AI product architect / enterprise architect / operations lead / model risk partner
Output: advanced architecture note, operating model, control design, ADR draft, interview-ready narrative
1. Why Human Factors Are Architecture, Not Just UX
AI human factors often get reduced to "make the interface clearer" or "add a human review step." That framing is too shallow for financial retail operations. In AML, credit, fraud, complaints, collections and contact centers, the human is not a decorative fallback. The human is a scarce decision resource inside a production control system.
Human factors become architecture because they shape:
Architecture concern
Human factor
Why it matters in production
Throughput
case volume, handling time, fatigue, task switching
A review queue that exceeds cognitive capacity becomes a rubber stamp or backlog.
The right work must reach the right human at the right time.
Trust
calibrated reliance, skepticism, confidence and recoverability
Trust must match evidence strength, task risk and action reversibility.
In a financial retail AI system, the architecture question is not:
Can we put a human in the loop?
The senior question is:
Can the operating architecture preserve independent human judgment under real workload,
while proving that the human control reduced risk instead of becoming control theater?
This note deliberately avoids repeating generic Human-AI Interaction principles or Team Topologies cognitive load language. The focus is operational architecture: operator burden, review fatigue, automation bias, calibrated trust, escalation design, second-line QA, sampling, error cost, workload routing, skill matrix, training loops, decision rights and control evidence.
Operator load is not only the number of cases. It is the mental work required to understand evidence, challenge AI, decide under uncertainty, document the decision and recover from exceptions.
Automation bias means operators give excessive weight to AI output because it is fluent, confident, convenient, faster or socially endorsed by management. In production it shows up as one-click accept, low override rates, shallow evidence review, declining escalation and reduced error detection.
Bias pattern
Control
Architecture implementation
Evidence
Anchoring on AI recommendation
Evidence-first or blind first-pass review for P0/P1
Hide AI recommendation until reviewer marks evidence sufficiency or preliminary risk tier.
UI event sequence, preliminary decision, final decision delta
Default acceptance
No preselected accept action
Require active accept, edit, reject or escalate selection with reason code.
Action log and reason-code distribution
Confidence theater
Explain confidence source and limit
Separate model confidence, retrieval support, policy certainty and data completeness.
Confidence component telemetry and QA findings
Speed pressure
Balanced metrics
Score throughput together with quality, override validity, escalation accuracy and missed-risk rate.
Operations dashboard and performance scorecard
Reviewer fatigue
Load-aware routing
Throttle queue, cap complex cases, route surge and flag fatigue risk.
Workload trace and shift-level quality trend
AI social proof
Independent challenge prompt
Ask "What evidence would make this recommendation wrong?" before final accept.
Challenge response captured in high-risk cases
Shallow review
Mandatory evidence checklist
Require source opening, key field confirmation or missing-evidence acknowledgement for high-impact tasks.
Evidence interaction log
Over-correction or under-trust
Calibration and gold cases
Train reviewers on cases where AI is right, wrong, partially right and unsupported.
Calibration score and drift trend
Blind spots in sampling
Sentinel and stratified QA
Include known tricky cases, edge segments, languages, products and customer vulnerability markers.
QA sample design and hit rate
4.3 Calibrated Trust Design
Trust calibration is the match between reliance and actual capability under the current task, evidence and risk condition.
Trust state
Symptom
Control
Over-trust
Accept rate rises while evidence-open rate drops.
Evidence-first review, reason-code friction, second-line QA and management metric reset.
Under-trust
Operators ignore useful AI and duplicate all work manually.
Training with model strengths, clear source support, workflow integration and feedback response.
Mis-trust
Operators trust AI for the wrong tasks, such as policy exceptions or adverse action language.
Scope boundaries, task-specific affordances and prohibited-use controls.
Calibrated trust
Reliance varies by evidence strength, risk tier and reversibility.
Confidence decomposition, risk-tiered workflow, QA sampling and continuous calibration.
5. Financial Retail Scenarios
5.1 AML Investigator Copilot
Dimension
Architecture design
AI assistance
Summarizes alerts, clusters transactions, retrieves prior SAR narratives, identifies typology matches and drafts investigation notes.
Operator burden
High evidence volume, fragmented systems, deadline pressure and repetitive narrative writing.
Automation bias risk
Investigator accepts AI "close as false positive" recommendation because the summary looks complete.
Controls
Evidence-first review, mandatory typology evidence, reason code for close, senior review for high-risk customer, sentinel QA for false-negative patterns.
Metrics
evidence-open rate, close override rate, SAR escalation precision, QA miss rate, backlog age, alert fatigue index.
Control evidence
alert trace, sources used, model version, investigator action, close rationale, second-line sample and calibration outcome.
ADR-165: Adopt A Human Factors Operations Control Plane For High-Impact AI Workflows
Field
Decision
Status
Proposed for portfolio architecture review
Context
Financial retail AI use cases rely on human operators to review AML narratives, credit memos, fraud interventions, complaint responses, contact center suggestions and hardship scripts. Existing HITL patterns do not adequately manage cognitive load, automation bias, calibrated trust, decision rights, QA sampling and audit evidence.
Decision
Implement a human factors operations control plane across high-impact AI workflows. The control plane includes risk-tiered work intake, cognitive load estimation, skill-based routing, evidence-first reviewer workspace, automation bias controls, decision-right enforcement, second-line QA, sampling, calibration, training loops and evidence ledger.
Drivers
Customer harm prevention, regulatory defensibility, operational resilience, reviewer capacity, model risk management, audit replay, production incident response and executive accountability.
Selected option
Central control-plane pattern integrated with workflow orchestration, AI gateway, reviewer workspace, QA tooling and observability.
Alternatives considered
Local UI-only warnings; generic human approval step; post-hoc QA without runtime routing; full automation with exception review only.
Why selected
The selected option treats human judgment as a managed production capacity and provides runtime controls plus evidence. It reduces the risk that human review becomes a bottleneck or rubber stamp.
Consequences
Requires instrumentation, reviewer training, authority matrix, workflow changes, QA operations and management reporting. It may reduce short-term automation ROI but improves sustainable adoption and defensibility.
Scope
AML, credit, fraud, complaints, collections hardship, contact center agent assist and any AI workflow with customer-visible or customer-impacting output.
Non-goals
This ADR does not approve a specific model, vendor, regulatory interpretation or legal position. It defines the architecture pattern for human factors operations.
Acceptance Criteria
Criterion
Evidence
Every high-impact AI workflow has a review unit definition and risk tier.
workflow catalog and risk-tier map
Review routing uses skill, authority, capacity and independence rules.
routing configuration and role matrix
Reviewer workspace exposes evidence, uncertainty, missing data and allowed actions.
UI review checklist and trace sample
Automation bias controls are implemented for P0/P1 tasks.
QA sampling covers risk, volume, segments and sentinel cases.
sampling plan and QA report
Training and calibration affect queue eligibility.
certification records and access control linkage
Control evidence can be replayed for a sample decision.
evidence packet and audit replay script
10. Interview Answer
30秒版本
AI human factors 不是 UI 问题,而是生产控制架构问题。金融零售里,人类审核承担的是风险吸收和最终判断,但人的注意力、技能、疲劳和自动化偏差都是有限资源。我会把它设计成 human factors operations control plane:按风险分层、估算负荷、技能路由、证据优先、去默认采纳、设置升级和二线 QA,并记录完整 evidence。这样才能证明人不是橡皮图章,而是真正降低客户伤害和模型风险的控制。
2分钟版本
我会先定义 review unit,比如 AML alert close、credit memo、complaint response draft、fraud block request 或 contact center answer。然后按客户影响、监管敏感性、可逆性和错误成本分层。不同层级对应不同的 AI 角色和人类决策权:低风险可以抽样 QA,高风险需要证据优先、强制 reason code、无默认 accept、必要时 blind review 或 second review。
架构上我会建立四类能力。第一是 workload routing,按照技能、容量、语言、产品、风险和 deadline 路由,避免把所有高风险 case 堆到一个队列。第二是 automation bias control,比如先看证据再看 AI 建议、要求 reviewer 标记证据是否充分、记录 override 和 edit depth。第三是 calibrated trust,用 model confidence、retrieval support、policy certainty 和 data completeness 分开展示,不做一个虚假的信心徽章。第四是 evidence ledger,把模型版本、prompt、检索来源、工具调用、人类动作、reason code、QA 结果和下游影响串起来。
我会把 human factors 作为 AI platform control plane 的一部分,而不是每个产品团队自己加提示语。平台层提供 risk-tier classifier、review policy engine、skill/capacity router、evidence bundle service、decision-right enforcement、QA sampling service 和 trace/evidence ledger。业务线配置任务、风险、权限和 SLA。
技术上要把 AI gateway、RAG provenance、agent tool policy、workflow engine、IAM、observability 和 QA 数据模型打通。OpenTelemetry-style trace id 贯穿 case、retrieval、model invocation、tool proposal、human action 和 downstream system update。治理上用 NIST AI RMF 的 Govern / Map / Measure / Manage 组织风险闭环,用 ISO/IEC 42001 的管理体系语言定义责任、能力、运营控制、绩效评价和持续改进。
我不会只承诺 "we have human review." 我会要求能回答三个 CTO 级问题:生产高峰时 review control 是否还能运行?发生客户伤害时能否重放证据链?AI 提升效率是否以削弱人工判断为代价?这三个问题答不上来,AI 系统就还没有准备好扩大自动化范围。
11. 7-Day Practice Plan
Day
Practice
Output
1
Pick one workflow: AML copilot, credit assist, contact center assist, complaints, fraud or collections hardship. Define review unit, risk tiers and error-cost ladder.
one-page review unit map
2
Build an operator load map with volume, average handling time, evidence volume, policy ambiguity, interruption rate and fatigue triggers.
workload and capacity table
3
Design automation bias controls for P0/P1/P2 tasks, including blind pass, no default accept, reason codes and challenge prompts.
automation bias control matrix
4
Create a skill and decision-right matrix for frontline, specialist, supervisor, compliance and second-line QA roles.
authority and routing matrix
5
Draft a QA sampling plan with 100 percent review, risk-based sample, stratified sample, sentinel cases and incident surge sampling.
QA sampling plan
6
Define evidence packet fields and observability trace across AI output, retrieved sources, human action and downstream result.
evidence ledger schema
7
Prepare interview narrative and ADR summary. Practice answering as PM, architect and CTO.
30-second, 2-minute and CTO answer
12. Source Anchors
These anchors are used as architecture and operating model references. They are not legal, compliance, audit or model validation advice. Access date: 2026-06-30.