返回 Papers
AI 扩展计划 / Playbooks

AI Human Factors Operations / Cognitive Load / Automation Bias Playbook

This playbook helps teams design AI operations where human attention, judgment and authority are treated as production architecture. It gives practical templates for operator load, automation bias, tr

439AI_HUMAN_FACTORS_OPERATIONS_COGNITIVE_LOAD_AUTOMATION_BIAS_PLAYBOOK.md

AI Human Factors Operations / Cognitive Load / Automation Bias Playbook

定位: 面向高级 AI PM / AI BA / Product Architect / Enterprise Architect / Operations Lead / Model Risk / Compliance / Internal Audit, 把 AI 人因风险从“培训和提醒”升级为可设计、可路由、可度量、可审计、可持续改进的运营架构。 适用范围: AML investigator copilot、credit underwriter assist、contact center agent assist、complaints copilot、fraud intervention、collections hardship、KYC review、payment dispute、financial retail internal knowledge assistant。 重要说明: 本文是学习、作品集和内部治理训练材料, 不是法律意见、合规结论、审计意见、模型验证报告、监管解释或生产批准。正式项目必须由 Legal、Compliance、Risk、Model Risk、Internal Audit、Security、Privacy、Business Owner、Operations、Workforce Planning 和管理层结合机构类型、司法辖区、客户影响和内部政策确认。


1. Purpose And When To Use

1.1 Purpose

This playbook helps teams design AI operations where human attention, judgment and authority are treated as production architecture. It gives practical templates for operator load, automation bias, trust calibration, QA sampling, escalation and evidence packets.

Use it when the AI system:

  • assists regulated or customer-impacting decisions;
  • drafts customer-visible messages;
  • recommends case closure, escalation, approval, denial, freeze, refund or outreach;
  • changes workload shape for frontline or specialist operators;
  • relies on human review as a risk control;
  • creates risk of over-trust, under-trust, alert fatigue, review fatigue or decision anchoring;
  • must produce control evidence for audit, model risk, compliance or management review.

1.2 When To Use In Delivery

Delivery pointHow to use the playbookRequired output
DiscoveryIdentify human work, risk tier, decision rights and cognitive load drivers.operator load map and PM/BA/architecture question log
Solution designDesign bias controls, routing, escalation, QA and evidence ledger.control matrix, escalation design and evidence packet
Pilot readinessValidate workload assumptions, reviewer training and calibration.pilot runbook, QA sample and calibration report
Release gateConfirm controls operate under production volume and incident conditions.release checklist and management sign-off packet
Post-releaseMonitor load, reliance, quality, customer impact and control drift.dashboard, issue log and improvement backlog

1.3 Core Principle

Do not ask humans to be the control unless the system gives them time, skill,
evidence, independence, authority, escalation paths and proof that their review
changed risk outcomes.

2. Operating Model

2.1 End-To-End Flow

AI-assisted work item
  -> risk, impact and reversibility classification
  -> operator load estimate
  -> review policy decision
  -> skill, authority and capacity routing
  -> evidence-first workspace
  -> automation bias controls
  -> human decision
  -> escalation, override or downstream action
  -> evidence packet
  -> QA sampling and calibration
  -> training, workflow, RAG, prompt, model and policy improvement

2.2 Operating Roles

RoleResponsibilityDecision rights
Business ownerOwns customer outcome, workflow scope, risk appetite and value case.Approves use case scope and residual operational risk.
Product managerDefines AI assistance boundaries, user workflow, adoption goals and success metrics.Prioritizes controls, tradeoffs and release criteria.
CBAP / BAConverts tasks, exceptions, decisions and evidence needs into requirements.Accepts workflow requirements and scenario coverage.
AI architectDesigns AI gateway, RAG, agent policy, workflow integration, trace and evidence model.Approves architecture pattern and integration controls.
Operations leadOwns staffing, queue design, SLA, training, surge, fatigue and handoff.Approves operating readiness and capacity plan.
Risk / complianceReviews regulatory sensitivity, control design, customer harm and escalation.Approves control adequacy or requires additional mitigations.
Model risk / eval ownerDefines model, prompt, retrieval and human factors eval.Approves eval coverage and residual model risk view.
Second-line QATests review quality, independence, sampling and calibration.Opens defects, escalates control failure and validates closure.
Internal auditReconstructs evidence and assesses control operation.Challenges evidence sufficiency and governance effectiveness.

2.3 Operating Cadence

CadenceMeeting or processInputsDecisions
Daily during pilotQueue and defect standupbacklog, SLA, evidence-open rate, accept/edit/escalate, defectsthrottle, surge, coaching, safe-stop
WeeklyHuman factors quality reviewQA samples, gold cases, override validity, customer impactadjust routing, update training, add eval cases
BiweeklyProduct and architecture control reviewworkflow telemetry, trace gaps, RAG failures, agent tool issuesbacklog priority and release fixes
MonthlyGovernance reviewtrend dashboard, incidents, policy changes, audit samplesresidual risk, expansion approval, management action
QuarterlyCalibration and role certificationgold-case performance, drift, new policy scenariosqueue eligibility and refresher training

3. Template: Operator Load Map

Use this template before sizing AI benefits. It prevents teams from transferring effort from one role to a more expensive or fragile human control without recognizing it.

Workflow stepOperator roleAI assistanceVolume signalLoad driversFatigue triggersRisk if overloadedControlsEvidence
AML alert narrativeAML investigatortransaction summary, typology retrieval, draft rationaledaily alerts by risk tier and alert typehigh evidence volume, SAR history, entity links, policy ambiguitycomplex-case streak, end-of-day deadlines, repeated false positivesfalse close, weak SAR rationale, missed escalationevidence bundle, close reason, senior review for high-risk, sentinel QAalert trace, sources opened, reason code, QA sample
Credit memo reviewunderwriterincome summary, exception detection, memo draftapplications by product and channelpolicy interpretation, fair lending sensitivity, adverse action languagehigh queue age, repeated exceptions, policy changesunsupported approval, unfair denial, weak adverse action reasonevidence-first memo, policy checklist, second review for exceptionsmemo diff, policy references, reviewer authority
Live contact center answeragentreal-time suggested answer and summarycontacts per hour and handle-time bandlistening while reading, customer emotion, account navigationlong call streak, escalation pressure, chat concurrencywrong customer answer, missed complaint, unapproved commitmentconcise source cards, prohibited phrase guard, transfer triggertranscript marker, suggestion, source-open log
Complaint final responsecomplaint specialistallegation extraction, deadline tracker, draft responsecomplaint age and regulatory deadlinelegal wording, emotional content, root-cause analysisdeadline clusters, repeated severe casesmissed allegation, deadline breach, poor remediationseverity checklist, compliance escalation, final QA sampleallegation map, approval chain, response version
Fraud block reviewfraud analystrisk signal explanation, action recommendationreal-time alerts by loss exposuretime pressure, false positive cost, customer frictionalert bursts, high-value cases, tool latencywrongful block, missed fraud, customer harmsignal decomposition, reversible action preference, supervisor escalationmodel signal, approval trace, downstream action
Collections hardship conversationcollections specialisthardship detection, program match, script drafthardship signals and delinquency stageemotional labor, vulnerability, affordability evidencedifficult-call streak, aggressive target pressureunsuitable plan, prohibited language, complaintvulnerability routing, affordability checklist, language QAhardship signal, program eligibility, script edits

Load calculation for a release gate:

required_review_hours =
  AI_assisted_items
* review_rate
* average_handling_minutes
* complexity_multiplier
* double_review_multiplier
* rework_multiplier
/ 60

Capacity sanity check:

available_review_hours =
  reviewers
* productive_hours_per_shift
* skill_match_percentage
* attendance_factor

Release rule:

available_review_hours must exceed required_review_hours
with surge reserve for incidents, policy changes and volume spikes.

4. Template: Automation Bias Control Matrix

Bias riskControlProduct implementationArchitecture implementationOwnerTelemetryEvidence
AI recommendation anchors reviewerEvidence-first reviewShow evidence and required fields before recommendation for P0/P1 tasks.UI state machine blocks recommendation until preliminary review step is complete.PM + Architectpreliminary decision delta, evidence-open rateUI event sequence
Reviewer accepts by habitNo default acceptAccept, edit, reject and escalate are neutral actions with no preselected button.Action API requires explicit action and reason code.Product + Engineeringaccept rate, edit depth, reason-code distributiondecision record
Fluency hides uncertaintyConfidence decompositionDisplay model score, retrieval support, policy certainty and data completeness separately.AI gateway returns confidence components and missing-evidence flags.AI Architectlow-support answer rate, conflict flag countAI trace
Speed metric suppresses challengeBalanced scorecardPerformance view includes quality, valid overrides, escalation accuracy and missed-risk defects.Dashboard joins queue, QA and customer impact data.Ops Leadthroughput plus QA defect ratemanagement report
Reviewer fatigue reduces challengeFatigue-aware routingCap consecutive complex cases and route breaks or simpler work.Queue engine tracks complexity streak and shift load.Ops Leadcomplex-case streak, defect by hourqueue log
AI output appears institutionally endorsedChallenge promptHigh-risk accept requires answer to "What evidence would make this wrong?"Reviewer action schema includes challenge response.Risk + PMchallenge completion and defect correlationdecision packet
Exceptions treated as normal casesEscalation triggerComplaint, vulnerability, legal threat, PEP, sanctions, adverse action and hardship signals route differently.Risk classifier emits escalation tags and blocks normal closure.BA + Architectescalation rate, missed escalation defectsescalation record
Team stops detecting model driftSentinel casesAdd known hard cases and model-weakness cases to QA and calibration.QA service manages sentinel sample labels and adjudication.Model Risk + QAsentinel miss ratecalibration report
Human corrections disappearStructured feedback loopOperator selects defect type and desired correction.Feedback creates knowledge, prompt, model or workflow ticket with owner and SLA.Product + Data Ownercorrection closure timeimprovement ticket

5. Template: Trust Calibration Script

Use this script in training, supervisor coaching and in-product microcopy. It teaches operators how to rely on AI neither too much nor too little.

MomentApproved scriptPurposeAvoid sayingRequired evidence
AI appears in workflow"This AI output is an assistant-generated recommendation. You remain accountable for the final action within your authority."Clarify responsibility."The AI has reviewed this case."user role, authority matrix
Evidence is strong"The answer is supported by current policy source, system-of-record data and no detected conflict. Confirm the evidence before sending or acting."Encourage efficient but verified reliance."High confidence means safe to accept."source version, data freshness, conflict check
Evidence is incomplete"The AI found partial support but missing information affects the decision. Resolve missing fields or escalate before final action."Prevent unsupported action."Use your judgment" without a path.missing field list, escalation rule
Recommendation is high impact"For this action, first decide whether the evidence supports the action, then review the AI recommendation."Reduce anchoring."AI recommends approval." as first visible item.preliminary review step, recommendation reveal event
Operator disagrees"Override is expected when evidence, policy or customer context contradicts the AI. Select a reason so QA can improve the system."Normalize valid challenge."Overrides reduce AI adoption."override reason, supporting source
Operator is unsure"Escalate when evidence conflicts, authority is insufficient, customer vulnerability is present or the consequence is not reversible."Convert uncertainty into governed escalation."Try to resolve it yourself."escalation trigger and destination
Post-error coaching"The goal is calibrated trust: use AI where evidence supports it, challenge it where evidence is weak, and escalate where authority or risk requires it."Build durable mental model."Do not trust AI" or "trust the model."QA defect, corrected example

6. Template: QA Sampling Plan

6.1 Sample Design

Sample layerCoverageSampling unitMinimum production useIndependenceEscalation triggerEvidence
Mandatory QA100 percent for P0 actionstool action, final response, adverse action reasonaccount freeze, formal complaint final response, high-risk credit exceptionsecond-line or authorized senior reviewerany critical defectfull evidence packet
Risk-based QAelevated rate for high-risk P1case or recommendationAML high-risk closure, fraud high-value alert, hardship planindependent reviewer from same domaindefect rate above risk appetiteQA result and adjudication
Stratified QArepresentative across segmentcustomer-visible answer or draftlanguage, channel, product, region, vulnerability, age of policyQA teamsegment defect spikesample frame
Sentinel QAfixed hard casesknown edge casepolicy conflict, false confidence, vulnerable customer, PEP near matchmodel risk or QA ownersentinel missgold label and coaching record
Blind second reviewselected decisionsrecommendation or memocredit memo, AML close, complaints severityreviewer cannot see first decision or AI recommendation initiallyhigh disagreement or weak rationaledecision comparison
Incident surge QAtemporary expanded sampleaffected workflowstale knowledge, model release issue, prompt defect, vendor outageincident QA cellcritical defect or unknown scopeincident evidence log

6.2 Sampling Formula

daily_QA_sample =
  mandatory_P0_items
+ max(risk_based_minimum, ceil(P1_volume * P1_sample_rate))
+ stratified_segment_minimums
+ sentinel_case_count
+ incident_surge_addon

Example baseline for pilot:

Risk tierQA treatment
P0100 percent QA or dual approval before downstream action
P110 percent risk-based QA plus all overrides and all escalations
P23 percent stratified sample across channel, product, language and customer vulnerability
P31 percent random sample plus defect-triggered targeted sample
Sentinel20 to 50 gold cases per week depending on workflow complexity

6.3 QA Defect Taxonomy

Defect classDescriptionExample
Unsupported claimOutput or human decision lacks authoritative evidence.Contact center answer cites a retired fee policy.
Missed escalationCase met escalation criteria but stayed in normal flow.Complaint includes legal threat and is treated as servicing inquiry.
Automation biasHuman accepted AI despite visible contradiction or weak support.AML alert closed while transaction cluster matched typology.
Under-relianceHuman ignored correct AI assistance and increased error or delay.Underwriter manually rewrites accurate income summary and introduces discrepancy.
Authority breachReviewer approved action outside role or certification.Agent approves fee waiver above limit.
Communication defectCustomer-visible language is misleading, non-compliant or harmful.Collections script pressures a hardship customer.
Evidence packet defectAudit cannot reconstruct decision.Missing retrieved source version or reviewer reason code.

7. Template: Escalation Design

TriggerStop conditionDestinationSLADecision rightsCommunication ruleEvidence
AI evidence conflicts with system-of-record datacustomer-visible response or downstream action blockeddomain SME or supervisorsame business day for standard cases, immediate for live fraud or complaints riskSME can approve, reject, request more evidence or safe-stoptell frontline "evidence conflict requires specialist review"conflict trace and source versions
Customer vulnerability or hardship signalcollections recommendation cannot be finalizedvulnerability-trained leadsame daylead approves hardship path or escalationuse approved empathetic script, no pressure languagevulnerability signal, script, decision
Legal threat or regulator mentioncomplaint response blockedcomplaints lead and compliancesame day or regulatory deadline-drivencompliance-trained role approves final responseno legal admission without approved reviewallegation map, draft, approval
High-value fraud actionaccount block or release requires approvalfraud supervisorreal time or defined fraud SLAsupervisor approves reversible action or enhanced verificationcustomer contact follows fraud scriptrisk signals, action preview, approval
Credit adverse action uncertaintyadverse action reason blockedsenior underwriter or fair lending reviewbefore decision noticeauthorized underwriter approves reason codescustomer notice uses approved reason languagememo, policy, reason code
AML high-risk closealert closure blockedsenior AML investigatorbefore case closesenior investigator approves close or escalationinternal narrative only unless required workflow says otherwisealert evidence, typology, close rationale
Control failure trendautomation route pausedAI incident owner and business ownerimmediate triagebusiness owner can pause route, risk can require additional controlsmanagement notification follows incident protocoldashboard signal, defect samples, action log

Escalation design rules:

  • Escalation is not a button unless it has a destination, SLA, receiving role, decision right and evidence requirement.
  • High-risk uncertainty should stop or narrow automation, not simply add a note.
  • Escalation volume is a signal. A sudden drop can be as concerning as a spike.
  • Escalation outcomes must feed training, eval cases, knowledge updates and release gates.

8. Template: Evidence Packet

ArtifactRequired fieldsGenerated byReviewer useAudit question
Work item headercase id, workflow, risk tier, customer impact, SLA, jurisdiction, channelworkflow engineunderstand priority and constraintsWhy was this case handled this way?
AI output recordoutput text, recommendation, draft, model id, prompt id, timestamp, confidence componentsAI gatewaycompare output to evidence and scopeWhat did AI produce and under what version?
RAG evidence recordretrieved source ids, source authority, version, chunk ids, freshness, citation supportretrieval servicevalidate support and detect stale contentWhich sources supported the output?
Tool observation recordsystem queried, parameters, result summary, latency, errors, permissionstool gatewayverify system-of-record factsWhat external facts or actions were used?
Human action recordview sequence, evidence opened, action, edit diff, reason code, challenge answer, time on taskreviewer workspacedemonstrate meaningful reviewDid the human review or rubber-stamp?
Decision rights recordrole, certification, authority limit, independence check, conflict-of-interest resultIAM and workflow policyconfirm reviewer was allowed to decideWas the decision made by the right person?
Escalation recordtrigger, destination, SLA, receiving role, outcome, final approverworkflow enginetrack unresolved or high-risk workWas escalation timely and effective?
QA recordsample frame, QA reviewer, defect class, severity, adjudication, remediationQA systemassess control qualityDid second-line testing validate the control?
Improvement recordticket id, owner, fix type, release id, regression eval, closure evidenceproduct backlog and governance toolprove learning loop closureDid the organization improve after defects?

Evidence packet acceptance standard:

A qualified reviewer or auditor can reconstruct the decision without interviewing
the original operator, reading private chat, or relying on memory.

9. PM / BA / Architecture Questions

9.1 PM Questions

QuestionStrong answer should include
Which human task is AI changing?specific review unit, before/after workflow, expected time and quality impact
Where can AI reduce burden and where can it add burden?distinction between summarization benefit, verification cost, escalation cost and documentation cost
What level of trust should users have?trust by risk tier, evidence strength, reversibility and user skill
What metrics would prove adoption is healthy?not only usage and handle time, also evidence-open rate, valid override, escalation accuracy and customer impact
What happens if review volume exceeds capacity?surge staffing, throttle, safe-stop, deferral rules and management notification
Which customer harms are unacceptable?concrete harms such as wrongful block, unfair denial, missed complaint, coercive collections language

9.2 BA Questions

QuestionStrong answer should include
What are the review units?claim, draft, recommendation, tool action, case, sampled outcome
What evidence must the operator see?source of truth, policy version, missing fields, conflicts, system facts and AI trace
What decisions can each role make?accept, edit, override, escalate, approve, safe-stop and limits
What are the exception triggers?legal threat, vulnerability, PEP, sanctions, adverse action, high-value fraud, policy conflict
What data must be captured?reason code, evidence references, authority, edit diff, timing, escalation and QA outcome
What acceptance criteria prove meaningful review?evidence interaction, correct decision, reason quality, escalation accuracy and audit replay

9.3 Architecture Questions

QuestionStrong answer should include
Where is review policy enforced?workflow engine or policy service, not only UI text
How is automation bias reduced at runtime?evidence-first flow, no default accept, challenge prompt, confidence decomposition and QA
How do traces connect AI and human action?shared trace id across model, retrieval, tool, reviewer action and downstream system
How are skill and authority enforced?IAM, role certification, queue eligibility and action policy
How does the system respond to incident conditions?route pause, sampling surge, rollback, communication and governance review
How does feedback improve the system?defect taxonomy mapped to knowledge update, prompt change, model eval, workflow fix or training

10. Release Checklist

10.1 Product And Workflow

  • Review unit is defined for every AI-assisted workflow.
  • Risk tiers include customer impact, financial impact, regulatory sensitivity and reversibility.
  • AI role is explicit: summarize, retrieve, draft, recommend, classify or propose action.
  • Human role is explicit: accept, edit, reject, override, approve, escalate or safe-stop.
  • User-facing and employee-facing trust messages are approved for each risk tier.
  • Recovery path exists for wrong answer, wrong action, missing evidence and customer complaint.

10.2 Operations And Capacity

  • Operator load map includes volume, AHT, skill, complexity, fatigue and surge assumptions.
  • Capacity exceeds required review hours with incident reserve.
  • Skill routing covers domain, product, language, risk, authority and independence.
  • Reviewer training includes AI failure modes, not only screen usage.
  • Calibration uses gold cases and affects eligibility for high-risk queues.
  • Queue dashboard includes backlog, SLA, complexity, fatigue and quality metrics.

10.3 Bias And Trust Controls

  • P0/P1 workflows remove default accept.
  • P0/P1 workflows use evidence-first or blind first-pass where anchoring risk is high.
  • Confidence is decomposed into model, retrieval, policy and data-completeness signals.
  • Reason code and evidence reference are required for high-impact accept, edit, reject, override or escalation.
  • Management scorecards balance productivity with quality, escalation and customer impact.
  • Sentinel cases and blind second reviews are active before expansion.

10.4 Architecture And Evidence

  • AI gateway captures model, prompt, input, output, confidence components and policy decision.
  • RAG layer captures source id, version, freshness, authority and citation support.
  • Tool gateway captures parameters, permissions, result and downstream side effect.
  • Reviewer workspace captures evidence opened, action, edit diff, reason and time on task.
  • Shared trace id connects AI output, human action and downstream system.
  • Evidence packet can be replayed by QA or audit without relying on memory.

10.5 Governance

  • Risk, compliance, model risk, operations and business owner reviewed the control design.
  • QA sampling plan covers risk-based, stratified, sentinel and incident surge samples.
  • Escalation paths have destination, SLA, authority and communication rule.
  • Safe-stop criteria are documented and tested.
  • Residual risk and expansion criteria are approved by accountable owners.
  • Post-release review date and dashboard owner are assigned.

11. Executive Narrative

11.1 One-Minute Executive Version

We should not describe this release as simply adding a human in the loop. The real control is whether qualified people can challenge AI under production workload. This playbook designs the human side as an operating architecture: risk-based routing, workload capacity, evidence-first review, automation bias controls, second-line QA, escalation rights and audit evidence.

The business value is sustainable automation. We can reduce manual effort where AI is reliable, but we avoid shifting hidden work to specialists or creating a rubber-stamp review queue. The release gate should ask three questions:

  1. Can operators handle the expected volume without fatigue-driven quality loss?
  2. Can they see enough evidence and authority boundaries to challenge AI?
  3. Can we prove through trace and QA that human review actually reduced risk?

11.2 Board / Audit Committee Version

The AI system relies on human oversight for customer-impacting workflows. Management has designed oversight as a measurable production control, not a general assurance statement. Controls include risk-tiered review, skill and authority routing, evidence-first workspaces, explicit override and escalation rights, QA sampling, calibration, training and traceable evidence packets.

Management will monitor operator load, automation reliance, evidence use, QA defects, customer impact and escalation performance. Expansion decisions will depend on control performance, not only efficiency or adoption metrics.

11.3 Product Portfolio Version

For the portfolio, this pattern becomes reusable across AML, credit, fraud, complaints, collections and contact center AI. Each use case configures its own risk tiers, skill matrix, evidence packet and QA sample, while the platform provides common trace, routing, reviewer actions, feedback taxonomy and governance dashboards.


12. Interview Drills

Drill 1: "Isn't human review enough to control AI risk?"

Strong answer:

No. Human review is only effective if the human has capacity, skill, evidence,
independence, authority and escalation rights. Otherwise it becomes a bottleneck
or rubber stamp. I would design review as an operating architecture with risk-tiered
routing, evidence-first workspace, automation bias controls, QA sampling and audit trace.

Drill 2: "How would you detect automation bias in production?"

Strong answer:

I would monitor accept rate, edit depth, evidence-open rate, override validity,
blind-pass delta, escalation trend and QA defects. A very high accept rate with low
evidence interaction is not automatically good. It may indicate anchoring or pressure
to accept AI. I would add sentinel cases and blind second reviews to validate.

Drill 3: "How do you reduce cognitive load for an AML investigator copilot?"

Strong answer:

I would not just shorten the summary. I would define the review unit, rank evidence
by authority, expose missing and conflicting evidence, group related transactions,
show typology support, require close reason codes and route high-risk cases to
senior investigators. The goal is to reduce navigation and narrative burden while
preserving independent investigation.

Drill 4: "What is the difference between confidence and calibrated trust?"

Strong answer:

Confidence is a system signal. Calibrated trust is a human behavior outcome.
In financial retail I would separate model confidence, retrieval support, policy
certainty and data completeness, then observe whether operators rely more when
evidence is strong and escalate when evidence is weak or risk is high.

Drill 5: "What would you tell a CTO before scaling a copilot?"

Strong answer:

I would ask for proof that controls work under load: queue capacity, skill routing,
no-default-accept for high-risk work, evidence trace, second-line QA, safe-stop,
and production telemetry connecting AI output to human action and downstream impact.
Scaling without that proof turns human review into control theater.

13. Reference Anchors

AnchorLinkPlaybook use
NIST AI Risk Management Frameworkhttps://www.nist.gov/itl/ai-risk-management-frameworkOrganizes human factors governance, mapping, measurement and management.
NIST bias publicationhttps://www.nist.gov/blogs/taking-measure/powerful-ai-already-here-use-it-responsibly-we-need-mitigate-biasSupports treating bias as a socio-technical deployment issue, not only a model metric.
Microsoft Guidelines for Human-AI Interactionhttps://www.microsoft.com/en-us/research/project/guidelines-for-human-ai-interaction/Provides interaction principles translated here into operational review controls.
ISO/IEC 42001https://www.iso.org/standard/81230.htmlAnchors AI management system thinking for responsibility, competence, operation and improvement.
ISO/IEC/IEEE 42010https://www.iso.org/standard/74393.htmlSupports architecture description through stakeholder concerns, views, decisions and evidence.
OpenTelemetry docshttps://opentelemetry.io/docs/Anchors trace, metric and log design for AI-human workflow observability.