目录
AI Architecture Fitness Functions / Continuous Governance Playbook
版本: v1.0
日期: 2026-06-29
适用对象: AI Product Lead, Senior BA / CBAP, Solutions Architect, Enterprise Architect, Platform PM, Model Risk, Security, Privacy, Compliance, Internal Audit, Release Manager, 金融零售业务负责人
1. Purpose / Audience / Core View
Purpose
本手册把 architecture fitness functions 用作 AI 产品和企业架构治理的高级方法, 目标不是再做一张上线 checklist, 而是把质量属性, eval, policy-as-code, observability, release gate, control evidence 和 product metrics 连接成持续运行的治理系统.
它适合以下场景:
金融零售 AI copilot, RAG assistant, decision support, agent workflow, fraud / AML / credit / customer service AI.
企业架构团队希望把 architecture review 从一次性会议升级成可执行的工程门禁.
AI 产品团队需要证明某个 use case 在模型, prompt, RAG index, tool schema, policy 和数据持续变化下仍然满足架构约束.
解决方案架构师需要把 NFR / quality attribute 从文档语言改写成 pipeline gate, runtime monitor 和 evidence pack.
CBAP / Senior BA 需要展示自己不是只会写需求, 而是能把业务风险, 质量属性和上线控制落到可检查系统.
Core View
Architecture fitness functions are executable or semi-executable architecture constraints that continuously test whether an AI system still satisfies its intended quality attributes, risk boundaries and product outcomes.
中文表达:
Architecture fitness functions 是可自动或半自动检查的架构约束, 用来持续验证 AI 系统在代码, 数据, 模型, prompt, RAG, tool, policy, 监控和用户采用不断变化时, 是否仍然满足目标质量属性, 风险边界和业务结果.
核心观点:
观点 含义 架构治理必须可执行 质量属性不能只写在 architecture review deck 里, 必须进入 CI/CD gate, runtime monitor, exception memo 和 quarterly review Fitness function 不是普通测试 它验证的是架构意图和治理边界, 例如 "AI 不得执行客户影响动作", "高风险回答必须有有效引用", "成本不能通过牺牲安全门禁下降" AI governance 需要持续化 模型, prompt, RAG index, tool schema, policy source 和供应商行为会持续变化, 单次评审证据会快速失效 门禁必须连接证据 每个 hard gate 都要能追到 eval run, policy test, trace sample, control evidence, ADR 和 owner 产品指标也要进入治理 AI 是否被采纳, 是否被过度信任, 是否被频繁 override, 是否真的改善业务流程, 都是架构治理信号
重要说明: 本文是学习, 作品集和架构治理设计材料, 不是法律意见, 审计意见, 模型验证结论或监管解释. 金融零售正式项目必须由 Legal, Compliance, Risk, Model Risk, Security, Privacy, Data Owner, Business Owner 和 Internal Audit 按机构制度确认.
2. Source Anchors
这些 source anchors 用来校准 AI risk management, AI management system, observability 和 API contract 的方法语言. 本手册把它们转化为 architecture fitness functions 和 continuous governance 的落地结构.
Anchor Official Link 本手册采用的思想 转化为 Fitness Function NIST AI RMF https://www.nist.gov/itl/ai-risk-management-framework 用 Govern, Map, Measure, Manage 组织 AI 风险管理 把 risk mapping, measurement, mitigation 和 monitoring 写成可检查 gate 和 runtime signal ISO/IEC 42001 https://www.iso.org/standard/81230.html 用 AI management system 管理范围, 责任, 风险机会, 运行控制, 绩效评价和持续改进 把 AI governance cadence, management review, exception, audit evidence 和 continual improvement 机制化 OpenTelemetry https://opentelemetry.io/docs/ 用 logs, metrics, traces 形成可观测性和跨组件追踪 把 release_id, model_version, prompt_version, index_version, tool_call, user_action, evidence_link 放进 trace 和 dashboard OpenAPI Specification https://spec.openapis.org/oas/latest.html 用标准 API contract 描述接口, schema, operation, security scheme 和扩展字段 把 agent tool contract, allowed action, side effect, approval, idempotency, rollback 和 audit requirement 写进 API 契约
使用纪律:
Source anchors 不是合规结论. 它们提供结构化治理语言.
Fitness functions 必须按 use case, risk tier, automation level, data sensitivity, customer impact 和 production scope 裁剪.
对高影响 AI 系统, 每个 hard gate 必须绑定版本边界: code, model, prompt, RAG index, tool schema, policy source, eval dataset, release route.
Architecture review 的输出不能止于 "approved". 它必须产生 gate rule, monitoring threshold, exception decision, ADR reopen trigger 和 evidence requirement.
3. One-Sentence Positioning
Architecture fitness functions turn AI architecture governance from episodic review into continuous, evidence-backed checks across design, build, release, runtime and quarterly management review.
中文版本:
Architecture fitness functions 把 AI 架构治理从 "一次性评审会" 变成覆盖设计, 构建, 发布, 运行和季度管理评审的持续证据化检查.
对高级 AI PM / 架构师而言, 这句话的含义是:
传统表达 高级表达 我们做过架构评审 关键质量属性已经被转成 fitness functions, 并进入 CI/CD gate, runtime monitoring 和 exception process 我们有评测 Eval contract 绑定 use case risk, quality attribute, threshold, failure severity 和 release decision 我们有监控 Monitoring 能判断 architecture assumptions 是否仍然成立, 并触发 rollback, ADR reopen 或 quarterly review 我们遵循治理流程 Governance controls 有 owner, cadence, evidence, operating effectiveness test 和 management review 我们有业务价值 Product metrics 与 risk metrics 同屏, 防止只追求 adoption 或 cost reduction 而牺牲质量属性
4. Fitness Function Definition
4.1 Definition
An architecture fitness function is an executable or semi-executable check that verifies a specific architecture constraint, quality attribute or governance claim against a defined system version and operating context.
在 AI 系统中, fitness function 至少包含:
字段 说明 示例 Fitness ID 唯一编号, 可跨 PRD, ADR, eval, gate, dashboard 引用 AFF-RAG-GRD-001 Architecture Intent 要保护的架构意图 高风险回答必须被批准来源支持 Constraint 可检查约束 unsupported regulated claim count = 0 Scope 适用 use case, user group, channel, release stage retail fee waiver assistant, authenticated web chat, canary Artifact 被检查对象 prompt, model route, RAG index, OpenAPI tool spec, trace log Check Type automated, semi-automated, manual attestation with sample evidence eval pipeline + weekly SME sample Trigger 何时运行 pull request, index build, release candidate, canary, weekly sample, quarterly review Threshold 通过标准 citation support >= 98%, critical failure = 0 Gate Action 失败后的动作 block release, limited go, rollback, exception review, ADR reopen Evidence 证明材料 eval run ID, trace sample, policy test report, dashboard snapshot Owner 谁对规则和证据负责 Solution Architect, EvalOps, Model Risk, Business Owner Review Cadence 多久复核阈值和适用性 每次 release, 每月 operational review, 每季度 architecture review
4.2 Fitness Function vs Test vs Control
对象 主要问题 典型粒度 AI 架构治理中的角色 Unit / integration test 代码或组件是否按预期工作 function, API, pipeline step 必要但不足, 主要证明工程正确性 Eval AI 输出在任务, 数据集, rubric 上表现如何 prompt, model, RAG, agent task 证明模型行为和任务质量, 是很多 fitness functions 的测量来源 Control 某类风险是否被治理活动降低 policy, process, system control 定义必须控制什么, 需要 evidence Fitness function 架构意图是否在持续变化中仍成立 cross-artifact constraint 把 eval, control, policy-as-code, observability 和 gate 决策连接起来
一句话:
Tests prove components work. Controls define governance expectations. Fitness functions continuously prove architecture decisions still hold.
4.3 Automated and Semi-Automated
不是所有 fitness functions 都能完全自动化. 金融零售 AI 中, 很多高价值检查是半自动的:
类型 例子 适用场景 Fully automated OpenAPI tool spec lint, policy-as-code gate, schema drift check, latency SLO, cost threshold, DLP scan 工程约束明确, 数据可机器读取 Semi-automated SME sample review, model risk challenge, citation audit, exception review, adverse impact analysis 高风险, 需要专业判断或抽样 Manual attestation with evidence Business owner risk acceptance, quarterly management review, legal interpretation sign-off 价值判断, 风险接受, 管理责任
半自动不等于弱控制. 成熟做法是把人工判断的输入, 样本, rubric, reviewer, date, decision 和 evidence link 结构化.
5. Why AI Architecture Governance Must Become Continuous
5.1 Point-in-Time Review Fails Under AI Change
传统 architecture review 常在设计或上线前发生一次. 对 AI 系统, 这种方式很快失效:
变化源 为什么让旧评审失效 Model version 同一 prompt 在新模型上可能更自信, 更会调用工具, 或更容易绕过拒答边界 Prompt / policy 小改动可能改变拒答, 升级, tool calling 和语气边界 RAG index 新 source, 过期 source, chunking, embedding, reranker 变更会影响事实和引用 Data distribution 客户行为, 欺诈模式, 投诉类型, 交易结构改变会导致 eval 分数不再代表真实风险 Tool schema Agent 从 read-only 变成 write / execute, 架构风险级别直接变化 Product adoption 用户可能过度信任 AI, 也可能完全绕过 AI, 两者都会破坏原始业务假设 Vendor behavior 托管模型, embedding, moderation, API rate limit, data terms 和区域可用性会变化 Regulation / policy 内部政策, 监管关注和审计范围变化会改变上线证据要求
5.2 Continuous Governance Loop
推荐闭环:
Architecture intent
-> quality attribute scenario
-> fitness function
-> CI/CD gate
-> release decision
-> runtime monitor
-> evidence graph
-> exception / incident / product metric review
-> quarterly architecture review
-> ADR update or control redesign
5.3 Governance Cadence
Cadence 目标 Fitness Function 例子 Design-time 在架构方案和 ADR 中定义不可违反的约束 高影响 agent 不能直接执行不可逆客户动作 Pull request / merge 阻止约束被代码, prompt, schema 或 config 破坏 OpenAPI spec 必须声明 side effect, approval 和 audit fields Build / CI 验证 artifact 可构建, 可测试, 可扫描 prompt policy tests pass, dependency scan pass, schema compatibility pass Release candidate 用 eval 和 gate 判断能否上线 high-risk critical failure = 0, citation support >= threshold Canary / ramp 用真实流量验证假设 override spike, complaint signal, cost burst, p95 latency breach Runtime 发现 drift, incident, misuse, control failure wrong citation alert, tool abuse alert, budget alert Monthly operational review 复核生产质量, 事件, 例外和用户采用 dashboard trend, issue aging, sample audit Quarterly architecture review 判断架构决策和治理假设是否仍然成立 ADR reopen, threshold recalibration, exception expiry, platform roadmap update
6. Fitness Function Taxonomy
6.1 Taxonomy Overview
Domain Governance Question Typical Fitness Function Evidence Quality AI 输出是否满足业务任务质量和质量属性 groundedness, completeness, consistency, correctness, recoverability checks eval run, SME review, trace replay Safety 是否避免客户, 员工, 业务, 合规或声誉伤害 unsafe recommendation = 0, high-risk under-escalation = 0 safety eval, red-team, incident log Security 是否抵抗 prompt injection, tool abuse, data exfiltration 和越权 unauthorized tool call = 0, jailbreak success below threshold red-team, policy-as-code, access log Privacy 是否保护 PII, sensitive business data 和客户权限 PII leakage = 0, unauthorized retrieval = 0, log minimization pass DLP scan, access test, privacy review Cost 单次任务成本, token burn, reviewer cost 是否可持续 cost per resolved case <= target, budget burn alert cost dashboard, release report Latency 响应时间是否匹配业务流程和人机协作节奏 p95 latency <= target by risk tier OpenTelemetry traces, load test Eval eval 是否覆盖真实失败模式, 高风险切片和回归集 required eval suites present and passing eval registry, failure analysis Tool / Action agent tool 是否受 allowlist, approval, idempotency, audit, rollback 约束 write action requires approval and idempotency key OpenAPI contract, tool registry, negative test Evidence release, control, exception 和 monitoring 证据是否完整 evidence completeness >= target, missing critical evidence = 0 evidence binder, graph query Adoption 用户采用是否健康, 是否出现 automation bias 或 workflow bypass override pattern, blind acceptance, benefit realization usage analytics, review sample, product metrics
6.2 Quality Fitness Functions
Fitness ID Architecture Intent Check Gate Action AFF-QUAL-GRD-001 高风险回答必须有可验证依据 对 regulated / customer-impacting answers, material claims 必须被 approved source 支持; unsupported critical claim = 0 block release or rollback AFF-QUAL-COMP-002 输出不能遗漏关键业务条件 Golden set 中 required factor recall >= threshold; critical missing factor = 0 no-go or limited go AFF-QUAL-CONS-003 同一政策和事实下输出应稳定 Regression set 中 policy decision variance <= threshold require prompt / routing fix AFF-QUAL-REPLAY-004 事故和抽样输出可重放 trace 包含 release_id, model, prompt, index, tool, retrieved source references block scale if trace incomplete
6.3 Safety Fitness Functions
Fitness ID Architecture Intent Check Gate Action AFF-SAFE-ESC-001 高风险客户或案件必须升级人工 vulnerable customer, complaint, hardship, AML high-risk, fraud suspected cases 的 under-escalation = 0 hard stop AFF-SAFE-ADV-002 AI 不得给出未授权金融建议 Wealth assistant 对 suitability incomplete 用户不得输出 product recommendation hard stop AFF-SAFE-HARM-003 Critical harm examples 必须进入回归集 每个 P1 incident 或 near miss 必须映射到 regression eval case release blocked until regression added AFF-SAFE-DISC-004 客户可见 AI 必须满足披露和转人工边界 disclosure present, handoff path available, forbidden commitment blocked limited go or no-go
6.4 Security Fitness Functions
Fitness ID Architecture Intent Check Gate Action AFF-SEC-INJ-001 外部内容不能覆盖系统策略 prompt injection red-team 中 policy override success = 0 for high-risk tools no-go AFF-SEC-AUTH-002 AI 不扩大用户权限 RAG entitlement filter test pass; cross-tenant retrieval = 0 hard stop AFF-SEC-TOOL-003 Tool calling 必须受权限和契约约束 OpenAPI operation 必须声明 auth, side effect, approval, audit, idempotency block merge or release AFF-SEC-LOG-004 Trace 和 logs 不泄露 secret 或敏感字段 secret scan and DLP sample pass block release
6.5 Privacy Fitness Functions
Fitness ID Architecture Intent Check Gate Action AFF-PRIV-MIN-001 Prompt context 和 logs 遵守数据最小化 disallowed PII fields absent from prompt payload and trace attributes block release AFF-PRIV-RET-002 证据保留满足最小充分性和保留期 evidence objects have retention tag, masking rule and access class exception review AFF-PRIV-VND-003 第三方模型调用不违反数据使用边界 vendor route allowed for data class and region block route AFF-PRIV-SUBJ-004 客户纠错或申诉路径可执行 customer-facing AI output can be linked to case, correction workflow and owner limited go if incomplete
6.6 Cost and Latency Fitness Functions
Fitness ID Architecture Intent Check Gate Action AFF-COST-UNIT-001 单次任务成本与业务价值匹配 cost per completed task <= approved unit economics target ramp hold AFF-COST-ROUTE-002 成本优化不能牺牲高风险质量 low-cost route cannot handle Tier 1 cases unless eval parity passes block routing change AFF-LAT-P95-001 AI 响应时间支持业务流程 p95 / p99 latency by channel and risk tier within target rollback or route fallback AFF-LAT-HITL-002 人工复核队列不成为隐藏瓶颈 review queue age and reviewer capacity within threshold hold scale
6.7 Eval Fitness Functions
Fitness ID Architecture Intent Check Gate Action AFF-EVAL-COV-001 Eval 覆盖真实风险和上线范围 golden, regression, red-team, no-answer, conflict, high-risk slice suites present no-go AFF-EVAL-CAL-002 LLM judge 不单独决定高风险门禁 high-risk judge decisions have SME calibration sample and disagreement review no-go for Tier 1 AFF-EVAL-REG-003 每次模型, prompt, index, tool 变更触发受影响回归 impacted eval suites executed and compared to baseline block promotion AFF-EVAL-ERR-004 平均分不能掩盖 critical failure critical failure count reviewed separately from aggregate score hard stop
Fitness ID Architecture Intent Check Gate Action AFF-ACT-READ-001 默认 agent 为 read / draft only tool registry shows no write actions unless approved ADR exists block release AFF-ACT-WRITE-002 写操作必须有人审, 幂等, 可审计, 可回滚 approval_required = true, idempotency_required = true, rollback_plan present hard stop AFF-ACT-LIMIT-003 金额, 频率, 客户影响动作有限额 action limits exist and are tested in simulation limited go AFF-ACT-KILL-004 发现滥用时可快速禁用工具 kill switch tested and tool disabled within target time block scale if untested
6.9 Evidence and Adoption Fitness Functions
Fitness ID Architecture Intent Check Gate Action AFF-EVID-BND-001 Release decision 有完整证据链 evidence binder covers scope, architecture, eval, gate, approval, monitoring, rollback no-go AFF-EVID-EXP-002 Exception 不永久化 every exception has owner, expiry, compensating control and trigger block exception renewal AFF-EVID-ADR-003 关键权衡有 ADR tradeoff points link to accepted or conditionally accepted ADR no-go for scale AFF-ADOPT-TRUST-001 用户采用不产生 automation bias blind acceptance, override, edit distance and complaint trends within expected range hold ramp AFF-ADOPT-VALUE-002 AI 价值真实进入业务流程 benefit metric and risk metric reviewed together stop scale if value absent
7. From Architecture Review to CI/CD Gates
7.1 Gate Stack
Gate 触发 Fitness Function Focus Decision G0 Intake and Risk Tier use case 登记 approved use, prohibited use, customer impact, automation level, data class accept intake, restrict scope, reject G1 Architecture Intent solution design / ADR draft quality attribute scenarios, tool boundary, data boundary, recovery boundary proceed, redesign G2 Contract and Policy PR / spec change OpenAPI tool contract, policy-as-code, prompt policy, schema compatibility merge blocked or approved G3 Build and Eval candidate artifact eval coverage, critical failure, red-team, cost, latency, trace completeness candidate accepted or failed G4 Release Gate pre-production risk acceptance, evidence binder, rollback plan, monitoring dashboard go, limited go, no-go G5 Shadow / Canary real traffic, limited impact runtime trace, quality sample, cost, latency, user behavior, incident signal ramp, hold, rollback G6 Scale expanded population operating effectiveness, exception aging, adoption, support capacity scale, limit, redesign G7 Quarterly Review scheduled or triggered architecture assumptions, ADR reopen, threshold recalibration, control evidence continue, revise, retire
7.2 Gate Decision Language
Decision Meaning Required Evidence Go 所有适用 hard gates 通过, residual risk 在批准范围内 eval pass, evidence binder, owner sign-off, monitoring ready Limited go hard gates 通过, 但放量范围, 用户, 地区, tool, 数据或人工复核需限制 limitation memo, compensating controls, expiry, review trigger No-go critical failure, evidence gap, control failure 或 rollback 缺失 failed fitness function log, remediation plan Rollback 生产信号破坏 architecture assumptions 或 hard threshold incident record, route diff, rollback evidence, impact analysis Exception 业务在有限范围内接受已知偏离 exception memo, owner, expiry, compensating control, monitoring trigger
7.3 Policy-as-Code Pattern
Fitness functions 可以用 policy-as-code 表达. 下面是概念性规则, 重点是治理语义, 不绑定特定工具.
Rule: Tier 1 AI release cannot be promoted unless:
risk_tier is "Tier 1"
critical_failures equals 0
required_eval_suites all passed
rollback_target exists
monitoring_dashboard exists
evidence_binder completeness is "complete"
exception_status is "none" or "approved_with_expiry"
典型 policy inputs:
Input 来源 risk_tier AI inventory / intake release_bundle deployment registry eval_results eval registry tool_contract OpenAPI spec / tool registry data_classification data catalog monitoring_readiness observability platform evidence_completeness evidence graph exception_status risk acceptance register
8. Runtime Monitor and Observability
8.1 Why Runtime Monitoring Is Architecture Governance
上线后的 monitoring 不是运维附属品. 它是证明 architecture fitness 持续成立的主要证据来源.
Runtime monitors 应回答:
当前生产流量运行的是哪个 release bundle.
哪些模型, prompt, RAG index, tool schema 和 policy version 被调用.
输出是否被批准证据支持.
Tool calls 是否符合权限, 审批, 幂等, 限额和审计要求.
用户是否过度接受 AI 或大量覆盖 AI.
质量, 成本, 延迟和安全指标是否触发 gate condition.
某个 incident 是否能被 trace replay, root cause 和 regression update.
8.2 OpenTelemetry Trace Fields
可以在现有 telemetry 规范上增加组织内部 AI trace attributes. 字段命名应稳定, 可脱敏, 可保留, 可与 release evidence 关联.
Attribute Purpose ai.release_id关联 release bundle ai.use_case_id关联 AI inventory 和 risk tier ai.risk_tier支持 risk-tiered monitoring ai.model_id关联 model registry ai.model_version关联 eval 和 rollback ai.prompt_version关联 prompt registry ai.rag_index_version关联 source inventory 和 retrieval eval ai.tool_schema_version关联 OpenAPI contract ai.eval_gate_id关联 release gate ai.policy_version关联 guardrail / decision policy ai.trace_sample_class标记 normal, high-risk, incident, audit_sample ai.human_review_required证明 HITL 触发 ai.human_review_outcome记录 approve, edit, reject, escalate ai.cost_usd_estimate连接 cost fitness ai.latency_ms_total连接 latency fitness
隐私纪律:
不把完整 customer prompt, PII, secret, account number, transaction details 直接放进普通 trace attributes.
对复现必要信息使用 evidence vault, hashed reference, masking, access control 和 retention tag.
Trace completeness 和 privacy minimization 是一组 tradeoff, 必须在 ADR 中说明.
8.3 Runtime Dashboard
Dashboard 不应只展示 "requests per minute". 高影响 AI dashboard 至少分成 8 组:
Panel Metrics Release and route active release_id, model / prompt / index / tool versions, traffic split Quality groundedness sample pass, critical failure, wrong citation, no-answer handling Safety unsafe output, under-escalation, complaint trigger, vulnerable customer handling Security / privacy prompt injection hit, unauthorized retrieval, DLP finding, sensitive log finding Tool / action tool calls by action class, denied action, approval bypass attempt, idempotency error Cost / latency cost per task, token burn, p95 / p99 latency, timeout, fallback rate Human oversight review volume, queue age, edit distance, override, reviewer disagreement Adoption / value active users, acceptance, blind acceptance signal, rework, case resolution impact
8.4 Runtime Trigger to Governance Action
Signal Governance Action Critical failure in high-risk output immediate hold, incident triage, rollback or route disable Wrong citation above threshold freeze index promotion, run citation audit, update regression set Unauthorized tool call attempt disable tool route, security incident review, OpenAPI contract retest Cost spike hold ramp, inspect route mix, check prompt / model / retrieval changes Latency breach fallback route, queue capacity review, route or retrieval optimization Override spike sample review, root cause, retrain / prompt fix, user training review Blind acceptance signal trust calibration review, UI / workflow change, QA sampling Evidence missing stop scale, evidence owner remediation, quarterly review escalation
Agent action governance should be visible in the tool contract. OpenAPI can describe operation, schema, security and custom extensions. Organization-specific extension fields can encode AI governance requirements.
Field Purpose operationIdStable tool action name securityAuth scheme and required scopes x-ai-tool-risklow, medium, high, restricted x-ai-side-effectnone, draft, reversible_write, irreversible_write x-ai-human-approval-requiredtrue / false x-ai-idempotency-requiredtrue / false x-ai-rollback-supportedtrue / false x-ai-audit-fieldsrequired audit fields x-ai-release-gaterequired fitness functions x-ai-prohibited-useaction boundaries
9.2 Example
paths:
/disputes/{disputeId}/draft-response:
post:
operationId: draftDisputeResponse
summary: Draft a payment dispute response for human review
security:
- oauth2:
- disputes.read
- disputes.draft
x-ai-tool-risk: medium
x-ai-side-effect: draft
x-ai-human-approval-required: true
x-ai-idempotency-required: true
x-ai-rollback-supported: true
x-ai-audit-fields:
- release_id
- model_version
- prompt_version
- reviewer_id
- evidence_source_ids
x-ai-release-gate:
- AFF-ACT-WRITE-002
- AFF-EVID-BND-001
- AFF-QUAL-GRD-001
x-ai-prohibited-use:
- submit_final_response_without_human_approval
- change_customer_account_status
Check Failure Meaning Every high-risk tool has x-ai-human-approval-required: true Agent can perform sensitive action without explicit human approval Reversible / irreversible writes require idempotency key Retry or loop may duplicate customer-impacting action Tool schema has audit fields Incident and evidence trace cannot prove what happened Restricted tools require accepted ADR Tool risk was never formally decided Tool operation is linked to release gate IDs Contract is disconnected from governance
10. Financial Retail Cases
10.1 Customer Service Fee Waiver AI
Dimension Design Use case Assist agents answering fee waiver and account service questions AI role retrieve policy, draft response, cite source, recommend escalation AI must not do Promise waiver, approve credit, change account, close complaint Key risks unsupported policy claim, vulnerable customer mishandling, latency hurting service, automation bias Fitness functions AFF-QUAL-GRD-001, AFF-SAFE-ESC-001, AFF-LAT-P95-001, AFF-ADOPT-TRUST-001
Example gate:
Gate Required Signal Release unsupported fee promise = 0 in golden and red-team sets Canary all complaint / hardship tagged conversations escalated or reviewed Runtime wrong policy citation alert triggers route fallback Quarterly review if policy changes, complaint rate increases, or blind acceptance rises
10.2 Credit Policy RAG
Dimension Design Use case Internal credit analyst asks policy and exception handling questions AI role retrieve approved policy, compare facts, draft rationale AI must not do Make final credit decision, generate adverse action reason without workflow, override policy Key risks outdated policy, unfair treatment, unsupported rationale, model risk evidence gap Fitness functions AFF-PRIV-VND-003, AFF-EVAL-COV-001, AFF-QUAL-COMP-002, AFF-EVID-ADR-003
Example gate:
Gate Required Signal Index build all policy sources active, approved, versioned and linked to owner Release conflicting policy test set pass, adverse-action-sensitive examples reviewed Runtime stale source detection and analyst override trends monitored Quarterly Model Risk reviews slice failures, exception usage and policy change impact
10.3 AML Investigation Agent
Dimension Design Use case AML analyst reads alerts, KYC, transaction history and policy evidence AI role summarize, identify red flags, cite evidence, draft narrative AI must not do Close alert, submit SAR, change customer risk rating Key risks hallucinated facts, missing red flags, unauthorized case data retrieval, weak audit trail Fitness functions AFF-ACT-READ-001, AFF-SEC-AUTH-002, AFF-QUAL-REPLAY-004, AFF-EVID-BND-001
Example gate:
Gate Required Signal Tool contract no disposition write API available to AI route Release high-risk red flag recall meets threshold; unsupported allegation = 0 Runtime every saved AI-assisted narrative has human approval and edit diff Quarterly review access logs, citation audit, SAR quality sample and incident near misses
10.4 Payment Dispute Agent
Dimension Design Use case Draft dispute response and collect evidence for operations specialist AI role classify dispute, retrieve transaction evidence, draft customer response AI must not do Final deny / approve dispute, move funds, change account status Key risks wrong customer communication, irreversible action, missing regulatory deadline, privacy leak Fitness functions AFF-ACT-WRITE-002, AFF-LAT-HITL-002, AFF-PRIV-MIN-001, AFF-SAFE-HARM-003
Example gate:
Gate Required Signal Contract draft-only tool has approval workflow and audit fields Release deadline-sensitive cases escalate correctly Runtime queue age and SLA risk monitored Quarterly review disputed outcomes, complaint samples and regulatory deadline misses
Dimension Design Use case Recommend promotion and markdown actions across channels AI role forecast demand, simulate margin, recommend campaign changes AI must not do Publish promotion without merchant approval, violate pricing policy, leak customer segment data Key risks margin erosion, unfair targeting, inventory distortion, brand damage Fitness functions AFF-COST-UNIT-001, AFF-ADOPT-VALUE-002, AFF-PRIV-MIN-001, AFF-EVID-EXP-002
Example gate:
Gate Required Signal Design pricing policy and protected segments mapped Release simulation covers margin, inventory, customer fairness and channel conflict Runtime uplift measured with guardrail metrics, not only conversion Quarterly review value realization, override patterns and exception aging
11. Templates
11.1 Fitness Function Catalog Template
Field Required Content Example Fitness ID Stable ID AFF-RAG-GRD-001 Name Verb + architecture constraint Enforce grounded regulated answers Domain quality, safety, security, privacy, cost, latency, eval, tool, evidence, adoption quality Architecture Intent What decision or quality attribute this protects Customer-impacting answers must be supported by approved sources Scope Use case, risk tier, release stage, channel Tier 1 customer service AI, canary and production Artifact What is checked response, citation, retrieved source, trace Check Method Automated or semi-automated method citation checker + SME weekly sample Trigger When it runs release eval, canary sample, weekly monitoring Threshold Pass / fail rule unsupported high-risk claim = 0; citation support >= 98% Gate Impact What happens on failure no-go, rollback, or restricted route Evidence What record proves it ran eval run ID, sample audit, dashboard snapshot Owner Accountable role EvalOps Owner / Knowledge Owner Review Cadence When threshold and rule are reviewed every release and quarterly Linked ADR Decision record ADR-AI-CS-004 Linked Controls Control IDs AICL-RAG-003, AICL-MON-005
11.2 Gate Matrix Template
Gate Fitness Functions Required Evidence Decision Owner Failure Action G0 Intake approved use, risk tier, data class use case card, risk tier memo Business Owner / AI Governance restrict scope or stop G1 Architecture quality scenarios, tool boundary, data boundary architecture diagram, ADR draft Enterprise Architect redesign G2 CI / Contract schema, OpenAPI tool contract, policy-as-code CI report, contract lint, policy report Platform Owner block merge G3 Eval coverage, critical failure, red-team, cost, latency eval report, failure analysis EvalOps / Model Risk no-go G4 Release evidence binder, rollback, monitoring, approvals gate memo, dashboard, rollback runbook Release Manager / Risk Owner no-go or limited go G5 Runtime production quality, safety, security, cost, adoption OTel dashboard, sample review Ops / Product Owner hold, rollback, incident G6 Quarterly ADR assumptions, exceptions, control evidence, value quarterly architecture review memo Chief Architect / AI Governance revise, retire, reset thresholds
11.3 Exception Memo Template
# Architecture Fitness Function Exception Memo
Exception ID: EXC-AFF-[USECASE]-[NUMBER]
Date: YYYY-MM-DD
Use Case:
Risk Tier:
Release / Scope:
Requested By:
Decision Owner:
Expiry Date:
## Fitness Function
- Fitness ID:
- Architecture intent:
- Current threshold:
- Current result:
## Business Reason
- Why the exception is requested:
- What business outcome depends on it:
- Why alternatives are not sufficient:
## Risk Analysis
| Risk | Impact | Likelihood | Compensating Control | Owner |
|---|---|---|---|---|
## Conditions
- Scope limit:
- User / segment limit:
- Traffic limit:
- Tool / action limit:
- Monitoring trigger:
- Rollback trigger:
## Evidence
- Eval evidence:
- Monitoring evidence:
- Control evidence:
- Approval record:
## Decision
- Approved / rejected / approved with restrictions:
- Residual risk owner:
- Re-review trigger:
- Expiry action:
Writing standard:
Exception must be time-bound.
Exception must name a residual risk owner.
Exception must have compensating control and monitoring trigger.
Exception must not silently become the default architecture.
11.4 Dashboard Template
Dashboard Section Required Widgets Decision Use Release identity active release_id, model, prompt, index, tool schema, policy version prove production version Gate status latest eval result, open blockers, exceptions, expiry release and scale decision Quality groundedness, completeness, consistency, replay success quality drift detection Safety critical failure, escalation, complaint, harm category incident and customer protection Security / privacy injection attempts, unauthorized retrieval, DLP, sensitive log security response Tool action tool calls, denials, approvals, idempotency errors, rollback tests agent risk control Cost / latency cost per task, token burn, p95 / p99, timeout unit economics and UX Human oversight review queue, edit distance, override, reviewer disagreement HITL effectiveness Adoption / value active users, acceptance, rework, benefit metric, workflow impact product scaling Evidence health missing evidence, stale evidence, exception aging, ADR reopen trigger governance health
11.5 ADR Link Template
## ADR-AI-[DOMAIN]-[NUMBER]: [Decision Title]
Status: Proposed / Accepted / Conditionally Accepted / Superseded
Date: YYYY-MM-DD
Scope: use case, risk tier, release stage, region, user group
### Context
- Business driver:
- Quality attributes:
- Risk tier:
- Architecture assumptions:
### Decision
- We will:
- We will not:
### Fitness Functions
| Fitness ID | Why it matters | Gate | Monitoring |
|---|---|---|---|
### Tradeoffs
| Improved Attribute | Weakened Attribute | Mitigation |
|---|---|---|
### Evidence
- Eval run:
- Security / privacy test:
- Cost / latency test:
- Control evidence:
- Exception memo:
### Reopen Triggers
- Model, prompt, index, tool or policy version changes:
- Critical failure or incident:
- Threshold breach:
- Material product scope expansion:
- Quarterly review finding:
12. Quarterly Architecture Review
Quarterly review is where continuous signals become architecture decisions. It should not repeat project status. It should answer whether the architecture still fits the business, risk and operating context.
Input What to Review Fitness function results pass / fail trends, repeated exceptions, fragile thresholds Runtime dashboard quality, safety, cost, latency, security, privacy, adoption Incidents and near misses root cause, regression cases, control redesign ADR assumptions assumptions broken by vendor, data, policy, product scope or adoption Evidence graph stale evidence, missing owners, weak claim support Product metrics value realization, workflow impact, blind acceptance, override patterns Audit / risk findings open issues, remediation aging, model risk challenge Roadmap upcoming automation, new tools, new regions, new data, scale plans
12.2 Review Questions
Question Strong Answer Which architecture assumptions failed this quarter? Named assumptions, affected ADRs, evidence and remediation Which fitness functions were noisy or ineffective? Rule tuning, threshold recalibration, better measurement source Which exceptions are becoming permanent? Expiring exceptions, redesign plan, risk owner decision Which quality attribute tradeoffs changed? New cost / latency / safety / adoption evidence Is AI adoption healthy? Users neither blindly accept nor bypass AI; value and risk metrics align Are release gates too weak or too slow? Gate data shows blockers, cycle time, false positives and risk reduction Which systems need architecture refactoring? Specific RAG, tool, data, observability, fallback or workflow changes
12.3 Outputs
Output Use Quarterly architecture fitness memo Management review evidence ADR updates Record changed decisions or assumptions Fitness catalog changes Add, retire, tighten or soften rules Gate matrix changes Adjust CI/CD, release, canary or runtime gates Exception decisions Close, renew with stronger control, or reject Roadmap changes Fund platform, observability, eval or workflow improvements Audit binder update Preserve review evidence and owner decisions
13. Operating Model
Architecture fitness functions cut across product, platform, risk and operations. Ownership must be explicit.
Role Owns AI Product Lead Use case scope, product metrics, adoption health, value-risk tradeoff Senior BA / CBAP Quality attribute scenarios, process boundaries, requirements traceability, stakeholder concerns Solution Architect Architecture intent, fitness catalog, ADR, data / tool / recovery boundaries Enterprise Architect Cross-system standards, quarterly review, platform roadmap, architecture governance EvalOps Owner Eval datasets, evaluators, regression, failure analysis, gate metrics Platform / MLOps Owner CI/CD implementation, release bundle, registry, deployment, rollback Security Owner prompt injection, access control, tool abuse, threat model Privacy Owner data minimization, retention, vendor data boundary, customer rights Model Risk validation scope, independent challenge, ongoing monitoring expectations Compliance / Legal regulated claims, disclosures, prohibited use, policy interpretation Internal Audit evidence quality, operating effectiveness, management review trail Business Owner residual risk acceptance, process change, frontline accountability
Recommended RACI pattern:
Artifact Responsible Accountable Consulted Informed Fitness catalog Solution Architect Enterprise Architect Product, Risk, Security, Privacy Engineering Eval gate EvalOps Model Risk / Product Owner SME, Compliance Release Manager Tool contract Platform Owner Solution Architect Security, Business Owner Operations Exception memo Product Owner Business / Risk Owner Architecture, Compliance Audit Runtime dashboard Platform / Ops Product Owner Risk, Security, Privacy Governance forum Quarterly review Enterprise Architect AI Governance Chair Product, Risk, Audit Executive stakeholders
14. Anti-Patterns
Anti-Pattern What It Looks Like Risk Correction Review-only governance Architecture board approves slides, no executable gate Decisions drift after model or prompt changes Convert key decisions into fitness functions and gates Accuracy-only gate Average task score passes, critical failures hidden Low-frequency high-impact failures enter production Separate critical failure hard gates from aggregate metrics HITL as a slogan Human review exists in process map, but reviewers lack evidence or capacity Automation bias and false accountability Monitor queue age, edit distance, override and reviewer agreement RAG treated as evidence System retrieves documents, so team assumes answers are grounded Wrong or stale citation causes customer or regulatory harm Add source approval, retrieval eval, citation audit and stale source checks Tool creep Agent quietly gains write permissions across releases Irreversible customer-impacting actions occur without governance OpenAPI contract, tool registry, ADR and hard gates for write actions Observability without privacy Teams log full prompts, documents and customer details Privacy, retention and breach risk Trace references, masking, evidence vault and access control Cost optimization without risk tier Cheaper model handles high-impact cases Quality and safety degrade in the riskiest slices Risk-tiered routing and eval parity gate Exceptions never expire A temporary threshold waiver becomes permanent Control baseline erodes Expiry, owner, compensating control and quarterly review Product metrics disconnected from risk Adoption rises, but complaints and overrides also rise Value narrative hides control failure Put value and risk metrics on the same dashboard ADRs disconnected from runtime Decisions are accepted but never reopened Architecture assumptions become stale Runtime triggers reopen ADRs
15. Interview Expression
15.1 30-Second Answer
I use architecture fitness functions to make AI governance executable. Instead of leaving quality attributes in architecture review slides, I define automated or semi-automated checks for groundedness, safety, security, privacy, latency, cost, eval coverage, tool permissions, evidence and adoption. These checks run in CI/CD, release gates and runtime monitoring. When a check fails, it can block release, limit rollout, trigger rollback, require exception approval or reopen an ADR.
15.2 2-Minute Answer
For AI systems, architecture governance cannot be a one-time review because model versions, prompts, RAG indexes, tool schemas, policies and user behavior keep changing. My approach is to start with quality attribute scenarios and architecture decisions, then convert the most important constraints into fitness functions.
For example, in a financial customer service AI, a fitness function might say that any customer-impacting fee waiver answer must cite an approved active policy source, unsupported claims must be zero, p95 latency must stay below the service target, and complaint or vulnerable-customer cases must escalate to a human. Some checks are fully automated in CI or policy-as-code. Others are semi-automated through eval, SME sampling, model risk review and evidence binder checks.
I connect those functions to release gates, OpenAPI tool contracts, OpenTelemetry traces, dashboards, exception memos and quarterly architecture reviews. The result is that architecture intent becomes operational: we know what is allowed, how it is measured, what evidence proves it, who owns it, and what happens when it fails.
15.3 Chief Architect Version
Architecture fitness functions are the mechanism I use to keep enterprise AI architecture decisions alive after the design review. They encode architecture intent as enforceable constraints across release bundles, tool contracts, eval gates and runtime telemetry. The architecture board no longer only asks whether a design was approved. It asks which fitness functions protect the decision, how often they run, what evidence they create, and which ADRs reopen when assumptions break.
15.4 CRO / Model Risk Version
From a risk perspective, fitness functions are continuous control tests for AI architecture. They connect risk appetite to measurable gates: critical failures, unsupported regulated claims, unauthorized tool calls, privacy leakage, stale evidence, expired exceptions and unhealthy adoption signals. This gives risk leadership a way to see whether residual risk remains within approved boundaries during release and production, not only at initial approval.
15.5 Interview Questions
Question Strong Talking Point How do you prevent AI architecture review from becoming a paperwork exercise? Convert review decisions into fitness functions, gates, telemetry and evidence How do you handle model or prompt changes after approval? Release bundle versioning, impact analysis, regression eval, gate rerun, ADR reopen trigger How do you govern agent tools? OpenAPI contract with side effect, approval, idempotency, audit and rollback fields How do you balance cost and safety? Risk-tiered routing, eval parity, hard gates for high-risk slices, dashboard with cost and risk together What makes a good AI dashboard? It shows release identity, quality, safety, security, privacy, tool action, cost, latency, HITL and adoption How do you prove governance to audit? Evidence graph links requirement, risk, fitness function, test, evidence, owner, gate and review cadence
16. Portfolio Asset Pack
A strong portfolio can turn this playbook into a concrete artifact set:
Asset Content What It Demonstrates AI Fitness Function One-Pager Method, taxonomy, governance loop Senior architecture language Fitness Catalog 30-50 functions across quality, safety, security, privacy, cost, latency, eval, tool, evidence, adoption Ability to operationalize quality attributes Gate Matrix G0 to G7 with owners, evidence and failure actions Release governance design OpenAPI Tool Contract Sample Agent tool spec with AI governance extensions Solution architecture depth OpenTelemetry Trace Schema AI release, model, prompt, index, tool and evidence fields Observability design Runtime Dashboard Spec Panels, metrics, thresholds, owners Production governance Exception Memo Pack Example exceptions with compensating controls and expiry Risk acceptance discipline Quarterly Review Memo Architecture assumption review and ADR reopen examples Enterprise architecture governance Financial Retail Case Study Customer service, credit, AML, dispute or promotion AI Domain-specific credibility Interview Narrative 30-second, 2-minute, Chief Architect and CRO versions Hiring conversion
17. Final Check
Before using an AI architecture fitness function system in a high-impact financial retail use case, run this self-check:
Check Passing Standard Definition Fitness functions are clearly defined as automated or semi-automated architecture constraints Taxonomy Covers quality, safety, security, privacy, cost, latency, eval, tool, evidence and adoption Traceability Each high-risk function links to quality attribute scenario, ADR, eval, gate, monitoring and evidence Gate impact Each failure has a decision path: block, limited go, rollback, exception or review Runtime proof Observability can show active release, versions, tool actions, human review and critical signals Evidence Release and runtime evidence can support audit, model risk and management review Product value Adoption and benefit metrics are reviewed with risk metrics Review cadence Quarterly architecture review can revise functions, thresholds, ADRs and exceptions Financial retail fit Cases reflect customer impact, regulatory sensitivity, model risk, privacy and operational resilience
Final principle:
AI architecture governance becomes real only when architecture intent can fail a build, block a release, stop a ramp, trigger a rollback, expire an exception or reopen an ADR.