返回 Papers
AI 扩展计划 / Playbooks

AI Architecture Fitness Functions / Continuous Governance Playbook

版本: v1.0

900AI_ARCHITECTURE_FITNESS_FUNCTIONS_CONTINUOUS_GOVERNANCE_PLAYBOOK.md

AI Architecture Fitness Functions / Continuous Governance Playbook

版本: v1.0 日期: 2026-06-29 适用对象: AI Product Lead, Senior BA / CBAP, Solutions Architect, Enterprise Architect, Platform PM, Model Risk, Security, Privacy, Compliance, Internal Audit, Release Manager, 金融零售业务负责人


1. Purpose / Audience / Core View

Purpose

本手册把 architecture fitness functions 用作 AI 产品和企业架构治理的高级方法, 目标不是再做一张上线 checklist, 而是把质量属性, eval, policy-as-code, observability, release gate, control evidence 和 product metrics 连接成持续运行的治理系统.

它适合以下场景:

  • 金融零售 AI copilot, RAG assistant, decision support, agent workflow, fraud / AML / credit / customer service AI.
  • 企业架构团队希望把 architecture review 从一次性会议升级成可执行的工程门禁.
  • AI 产品团队需要证明某个 use case 在模型, prompt, RAG index, tool schema, policy 和数据持续变化下仍然满足架构约束.
  • 解决方案架构师需要把 NFR / quality attribute 从文档语言改写成 pipeline gate, runtime monitor 和 evidence pack.
  • CBAP / Senior BA 需要展示自己不是只会写需求, 而是能把业务风险, 质量属性和上线控制落到可检查系统.

Core View

Architecture fitness functions are executable or semi-executable architecture constraints that continuously test whether an AI system still satisfies its intended quality attributes, risk boundaries and product outcomes.

中文表达:

Architecture fitness functions 是可自动或半自动检查的架构约束, 用来持续验证 AI 系统在代码, 数据, 模型, prompt, RAG, tool, policy, 监控和用户采用不断变化时, 是否仍然满足目标质量属性, 风险边界和业务结果.

核心观点:

观点含义
架构治理必须可执行质量属性不能只写在 architecture review deck 里, 必须进入 CI/CD gate, runtime monitor, exception memo 和 quarterly review
Fitness function 不是普通测试它验证的是架构意图和治理边界, 例如 "AI 不得执行客户影响动作", "高风险回答必须有有效引用", "成本不能通过牺牲安全门禁下降"
AI governance 需要持续化模型, prompt, RAG index, tool schema, policy source 和供应商行为会持续变化, 单次评审证据会快速失效
门禁必须连接证据每个 hard gate 都要能追到 eval run, policy test, trace sample, control evidence, ADR 和 owner
产品指标也要进入治理AI 是否被采纳, 是否被过度信任, 是否被频繁 override, 是否真的改善业务流程, 都是架构治理信号

重要说明: 本文是学习, 作品集和架构治理设计材料, 不是法律意见, 审计意见, 模型验证结论或监管解释. 金融零售正式项目必须由 Legal, Compliance, Risk, Model Risk, Security, Privacy, Data Owner, Business Owner 和 Internal Audit 按机构制度确认.


2. Source Anchors

这些 source anchors 用来校准 AI risk management, AI management system, observability 和 API contract 的方法语言. 本手册把它们转化为 architecture fitness functions 和 continuous governance 的落地结构.

AnchorOfficial Link本手册采用的思想转化为 Fitness Function
NIST AI RMFhttps://www.nist.gov/itl/ai-risk-management-framework用 Govern, Map, Measure, Manage 组织 AI 风险管理把 risk mapping, measurement, mitigation 和 monitoring 写成可检查 gate 和 runtime signal
ISO/IEC 42001https://www.iso.org/standard/81230.html用 AI management system 管理范围, 责任, 风险机会, 运行控制, 绩效评价和持续改进把 AI governance cadence, management review, exception, audit evidence 和 continual improvement 机制化
OpenTelemetryhttps://opentelemetry.io/docs/用 logs, metrics, traces 形成可观测性和跨组件追踪把 release_id, model_version, prompt_version, index_version, tool_call, user_action, evidence_link 放进 trace 和 dashboard
OpenAPI Specificationhttps://spec.openapis.org/oas/latest.html用标准 API contract 描述接口, schema, operation, security scheme 和扩展字段把 agent tool contract, allowed action, side effect, approval, idempotency, rollback 和 audit requirement 写进 API 契约

使用纪律:

  • Source anchors 不是合规结论. 它们提供结构化治理语言.
  • Fitness functions 必须按 use case, risk tier, automation level, data sensitivity, customer impact 和 production scope 裁剪.
  • 对高影响 AI 系统, 每个 hard gate 必须绑定版本边界: code, model, prompt, RAG index, tool schema, policy source, eval dataset, release route.
  • Architecture review 的输出不能止于 "approved". 它必须产生 gate rule, monitoring threshold, exception decision, ADR reopen trigger 和 evidence requirement.

3. One-Sentence Positioning

Architecture fitness functions turn AI architecture governance from episodic review into continuous, evidence-backed checks across design, build, release, runtime and quarterly management review.

中文版本:

Architecture fitness functions 把 AI 架构治理从 "一次性评审会" 变成覆盖设计, 构建, 发布, 运行和季度管理评审的持续证据化检查.

对高级 AI PM / 架构师而言, 这句话的含义是:

传统表达高级表达
我们做过架构评审关键质量属性已经被转成 fitness functions, 并进入 CI/CD gate, runtime monitoring 和 exception process
我们有评测Eval contract 绑定 use case risk, quality attribute, threshold, failure severity 和 release decision
我们有监控Monitoring 能判断 architecture assumptions 是否仍然成立, 并触发 rollback, ADR reopen 或 quarterly review
我们遵循治理流程Governance controls 有 owner, cadence, evidence, operating effectiveness test 和 management review
我们有业务价值Product metrics 与 risk metrics 同屏, 防止只追求 adoption 或 cost reduction 而牺牲质量属性

4. Fitness Function Definition

4.1 Definition

An architecture fitness function is an executable or semi-executable check that verifies a specific architecture constraint, quality attribute or governance claim against a defined system version and operating context.

在 AI 系统中, fitness function 至少包含:

字段说明示例
Fitness ID唯一编号, 可跨 PRD, ADR, eval, gate, dashboard 引用AFF-RAG-GRD-001
Architecture Intent要保护的架构意图高风险回答必须被批准来源支持
Constraint可检查约束unsupported regulated claim count = 0
Scope适用 use case, user group, channel, release stageretail fee waiver assistant, authenticated web chat, canary
Artifact被检查对象prompt, model route, RAG index, OpenAPI tool spec, trace log
Check Typeautomated, semi-automated, manual attestation with sample evidenceeval pipeline + weekly SME sample
Trigger何时运行pull request, index build, release candidate, canary, weekly sample, quarterly review
Threshold通过标准citation support >= 98%, critical failure = 0
Gate Action失败后的动作block release, limited go, rollback, exception review, ADR reopen
Evidence证明材料eval run ID, trace sample, policy test report, dashboard snapshot
Owner谁对规则和证据负责Solution Architect, EvalOps, Model Risk, Business Owner
Review Cadence多久复核阈值和适用性每次 release, 每月 operational review, 每季度 architecture review

4.2 Fitness Function vs Test vs Control

对象主要问题典型粒度AI 架构治理中的角色
Unit / integration test代码或组件是否按预期工作function, API, pipeline step必要但不足, 主要证明工程正确性
EvalAI 输出在任务, 数据集, rubric 上表现如何prompt, model, RAG, agent task证明模型行为和任务质量, 是很多 fitness functions 的测量来源
Control某类风险是否被治理活动降低policy, process, system control定义必须控制什么, 需要 evidence
Fitness function架构意图是否在持续变化中仍成立cross-artifact constraint把 eval, control, policy-as-code, observability 和 gate 决策连接起来

一句话:

Tests prove components work. Controls define governance expectations. Fitness functions continuously prove architecture decisions still hold.

4.3 Automated and Semi-Automated

不是所有 fitness functions 都能完全自动化. 金融零售 AI 中, 很多高价值检查是半自动的:

类型例子适用场景
Fully automatedOpenAPI tool spec lint, policy-as-code gate, schema drift check, latency SLO, cost threshold, DLP scan工程约束明确, 数据可机器读取
Semi-automatedSME sample review, model risk challenge, citation audit, exception review, adverse impact analysis高风险, 需要专业判断或抽样
Manual attestation with evidenceBusiness owner risk acceptance, quarterly management review, legal interpretation sign-off价值判断, 风险接受, 管理责任

半自动不等于弱控制. 成熟做法是把人工判断的输入, 样本, rubric, reviewer, date, decision 和 evidence link 结构化.


5. Why AI Architecture Governance Must Become Continuous

5.1 Point-in-Time Review Fails Under AI Change

传统 architecture review 常在设计或上线前发生一次. 对 AI 系统, 这种方式很快失效:

变化源为什么让旧评审失效
Model version同一 prompt 在新模型上可能更自信, 更会调用工具, 或更容易绕过拒答边界
Prompt / policy小改动可能改变拒答, 升级, tool calling 和语气边界
RAG index新 source, 过期 source, chunking, embedding, reranker 变更会影响事实和引用
Data distribution客户行为, 欺诈模式, 投诉类型, 交易结构改变会导致 eval 分数不再代表真实风险
Tool schemaAgent 从 read-only 变成 write / execute, 架构风险级别直接变化
Product adoption用户可能过度信任 AI, 也可能完全绕过 AI, 两者都会破坏原始业务假设
Vendor behavior托管模型, embedding, moderation, API rate limit, data terms 和区域可用性会变化
Regulation / policy内部政策, 监管关注和审计范围变化会改变上线证据要求

5.2 Continuous Governance Loop

推荐闭环:

Architecture intent
-> quality attribute scenario
-> fitness function
-> CI/CD gate
-> release decision
-> runtime monitor
-> evidence graph
-> exception / incident / product metric review
-> quarterly architecture review
-> ADR update or control redesign

5.3 Governance Cadence

Cadence目标Fitness Function 例子
Design-time在架构方案和 ADR 中定义不可违反的约束高影响 agent 不能直接执行不可逆客户动作
Pull request / merge阻止约束被代码, prompt, schema 或 config 破坏OpenAPI spec 必须声明 side effect, approval 和 audit fields
Build / CI验证 artifact 可构建, 可测试, 可扫描prompt policy tests pass, dependency scan pass, schema compatibility pass
Release candidate用 eval 和 gate 判断能否上线high-risk critical failure = 0, citation support >= threshold
Canary / ramp用真实流量验证假设override spike, complaint signal, cost burst, p95 latency breach
Runtime发现 drift, incident, misuse, control failurewrong citation alert, tool abuse alert, budget alert
Monthly operational review复核生产质量, 事件, 例外和用户采用dashboard trend, issue aging, sample audit
Quarterly architecture review判断架构决策和治理假设是否仍然成立ADR reopen, threshold recalibration, exception expiry, platform roadmap update

6. Fitness Function Taxonomy

6.1 Taxonomy Overview

DomainGovernance QuestionTypical Fitness FunctionEvidence
QualityAI 输出是否满足业务任务质量和质量属性groundedness, completeness, consistency, correctness, recoverability checkseval run, SME review, trace replay
Safety是否避免客户, 员工, 业务, 合规或声誉伤害unsafe recommendation = 0, high-risk under-escalation = 0safety eval, red-team, incident log
Security是否抵抗 prompt injection, tool abuse, data exfiltration 和越权unauthorized tool call = 0, jailbreak success below thresholdred-team, policy-as-code, access log
Privacy是否保护 PII, sensitive business data 和客户权限PII leakage = 0, unauthorized retrieval = 0, log minimization passDLP scan, access test, privacy review
Cost单次任务成本, token burn, reviewer cost 是否可持续cost per resolved case <= target, budget burn alertcost dashboard, release report
Latency响应时间是否匹配业务流程和人机协作节奏p95 latency <= target by risk tierOpenTelemetry traces, load test
Evaleval 是否覆盖真实失败模式, 高风险切片和回归集required eval suites present and passingeval registry, failure analysis
Tool / Actionagent tool 是否受 allowlist, approval, idempotency, audit, rollback 约束write action requires approval and idempotency keyOpenAPI contract, tool registry, negative test
Evidencerelease, control, exception 和 monitoring 证据是否完整evidence completeness >= target, missing critical evidence = 0evidence binder, graph query
Adoption用户采用是否健康, 是否出现 automation bias 或 workflow bypassoverride pattern, blind acceptance, benefit realizationusage analytics, review sample, product metrics

6.2 Quality Fitness Functions

Fitness IDArchitecture IntentCheckGate Action
AFF-QUAL-GRD-001高风险回答必须有可验证依据对 regulated / customer-impacting answers, material claims 必须被 approved source 支持; unsupported critical claim = 0block release or rollback
AFF-QUAL-COMP-002输出不能遗漏关键业务条件Golden set 中 required factor recall >= threshold; critical missing factor = 0no-go or limited go
AFF-QUAL-CONS-003同一政策和事实下输出应稳定Regression set 中 policy decision variance <= thresholdrequire prompt / routing fix
AFF-QUAL-REPLAY-004事故和抽样输出可重放trace 包含 release_id, model, prompt, index, tool, retrieved source referencesblock scale if trace incomplete

6.3 Safety Fitness Functions

Fitness IDArchitecture IntentCheckGate Action
AFF-SAFE-ESC-001高风险客户或案件必须升级人工vulnerable customer, complaint, hardship, AML high-risk, fraud suspected cases 的 under-escalation = 0hard stop
AFF-SAFE-ADV-002AI 不得给出未授权金融建议Wealth assistant 对 suitability incomplete 用户不得输出 product recommendationhard stop
AFF-SAFE-HARM-003Critical harm examples 必须进入回归集每个 P1 incident 或 near miss 必须映射到 regression eval caserelease blocked until regression added
AFF-SAFE-DISC-004客户可见 AI 必须满足披露和转人工边界disclosure present, handoff path available, forbidden commitment blockedlimited go or no-go

6.4 Security Fitness Functions

Fitness IDArchitecture IntentCheckGate Action
AFF-SEC-INJ-001外部内容不能覆盖系统策略prompt injection red-team 中 policy override success = 0 for high-risk toolsno-go
AFF-SEC-AUTH-002AI 不扩大用户权限RAG entitlement filter test pass; cross-tenant retrieval = 0hard stop
AFF-SEC-TOOL-003Tool calling 必须受权限和契约约束OpenAPI operation 必须声明 auth, side effect, approval, audit, idempotencyblock merge or release
AFF-SEC-LOG-004Trace 和 logs 不泄露 secret 或敏感字段secret scan and DLP sample passblock release

6.5 Privacy Fitness Functions

Fitness IDArchitecture IntentCheckGate Action
AFF-PRIV-MIN-001Prompt context 和 logs 遵守数据最小化disallowed PII fields absent from prompt payload and trace attributesblock release
AFF-PRIV-RET-002证据保留满足最小充分性和保留期evidence objects have retention tag, masking rule and access classexception review
AFF-PRIV-VND-003第三方模型调用不违反数据使用边界vendor route allowed for data class and regionblock route
AFF-PRIV-SUBJ-004客户纠错或申诉路径可执行customer-facing AI output can be linked to case, correction workflow and ownerlimited go if incomplete

6.6 Cost and Latency Fitness Functions

Fitness IDArchitecture IntentCheckGate Action
AFF-COST-UNIT-001单次任务成本与业务价值匹配cost per completed task <= approved unit economics targetramp hold
AFF-COST-ROUTE-002成本优化不能牺牲高风险质量low-cost route cannot handle Tier 1 cases unless eval parity passesblock routing change
AFF-LAT-P95-001AI 响应时间支持业务流程p95 / p99 latency by channel and risk tier within targetrollback or route fallback
AFF-LAT-HITL-002人工复核队列不成为隐藏瓶颈review queue age and reviewer capacity within thresholdhold scale

6.7 Eval Fitness Functions

Fitness IDArchitecture IntentCheckGate Action
AFF-EVAL-COV-001Eval 覆盖真实风险和上线范围golden, regression, red-team, no-answer, conflict, high-risk slice suites presentno-go
AFF-EVAL-CAL-002LLM judge 不单独决定高风险门禁high-risk judge decisions have SME calibration sample and disagreement reviewno-go for Tier 1
AFF-EVAL-REG-003每次模型, prompt, index, tool 变更触发受影响回归impacted eval suites executed and compared to baselineblock promotion
AFF-EVAL-ERR-004平均分不能掩盖 critical failurecritical failure count reviewed separately from aggregate scorehard stop

6.8 Tool / Action Fitness Functions

Fitness IDArchitecture IntentCheckGate Action
AFF-ACT-READ-001默认 agent 为 read / draft onlytool registry shows no write actions unless approved ADR existsblock release
AFF-ACT-WRITE-002写操作必须有人审, 幂等, 可审计, 可回滚approval_required = true, idempotency_required = true, rollback_plan presenthard stop
AFF-ACT-LIMIT-003金额, 频率, 客户影响动作有限额action limits exist and are tested in simulationlimited go
AFF-ACT-KILL-004发现滥用时可快速禁用工具kill switch tested and tool disabled within target timeblock scale if untested

6.9 Evidence and Adoption Fitness Functions

Fitness IDArchitecture IntentCheckGate Action
AFF-EVID-BND-001Release decision 有完整证据链evidence binder covers scope, architecture, eval, gate, approval, monitoring, rollbackno-go
AFF-EVID-EXP-002Exception 不永久化every exception has owner, expiry, compensating control and triggerblock exception renewal
AFF-EVID-ADR-003关键权衡有 ADRtradeoff points link to accepted or conditionally accepted ADRno-go for scale
AFF-ADOPT-TRUST-001用户采用不产生 automation biasblind acceptance, override, edit distance and complaint trends within expected rangehold ramp
AFF-ADOPT-VALUE-002AI 价值真实进入业务流程benefit metric and risk metric reviewed togetherstop scale if value absent

7. From Architecture Review to CI/CD Gates

7.1 Gate Stack

Gate触发Fitness Function FocusDecision
G0 Intake and Risk Tieruse case 登记approved use, prohibited use, customer impact, automation level, data classaccept intake, restrict scope, reject
G1 Architecture Intentsolution design / ADR draftquality attribute scenarios, tool boundary, data boundary, recovery boundaryproceed, redesign
G2 Contract and PolicyPR / spec changeOpenAPI tool contract, policy-as-code, prompt policy, schema compatibilitymerge blocked or approved
G3 Build and Evalcandidate artifacteval coverage, critical failure, red-team, cost, latency, trace completenesscandidate accepted or failed
G4 Release Gatepre-productionrisk acceptance, evidence binder, rollback plan, monitoring dashboardgo, limited go, no-go
G5 Shadow / Canaryreal traffic, limited impactruntime trace, quality sample, cost, latency, user behavior, incident signalramp, hold, rollback
G6 Scaleexpanded populationoperating effectiveness, exception aging, adoption, support capacityscale, limit, redesign
G7 Quarterly Reviewscheduled or triggeredarchitecture assumptions, ADR reopen, threshold recalibration, control evidencecontinue, revise, retire

7.2 Gate Decision Language

DecisionMeaningRequired Evidence
Go所有适用 hard gates 通过, residual risk 在批准范围内eval pass, evidence binder, owner sign-off, monitoring ready
Limited gohard gates 通过, 但放量范围, 用户, 地区, tool, 数据或人工复核需限制limitation memo, compensating controls, expiry, review trigger
No-gocritical failure, evidence gap, control failure 或 rollback 缺失failed fitness function log, remediation plan
Rollback生产信号破坏 architecture assumptions 或 hard thresholdincident record, route diff, rollback evidence, impact analysis
Exception业务在有限范围内接受已知偏离exception memo, owner, expiry, compensating control, monitoring trigger

7.3 Policy-as-Code Pattern

Fitness functions 可以用 policy-as-code 表达. 下面是概念性规则, 重点是治理语义, 不绑定特定工具.

Rule: Tier 1 AI release cannot be promoted unless:
  risk_tier is "Tier 1"
  critical_failures equals 0
  required_eval_suites all passed
  rollback_target exists
  monitoring_dashboard exists
  evidence_binder completeness is "complete"
  exception_status is "none" or "approved_with_expiry"

典型 policy inputs:

Input来源
risk_tierAI inventory / intake
release_bundledeployment registry
eval_resultseval registry
tool_contractOpenAPI spec / tool registry
data_classificationdata catalog
monitoring_readinessobservability platform
evidence_completenessevidence graph
exception_statusrisk acceptance register

8. Runtime Monitor and Observability

8.1 Why Runtime Monitoring Is Architecture Governance

上线后的 monitoring 不是运维附属品. 它是证明 architecture fitness 持续成立的主要证据来源.

Runtime monitors 应回答:

  1. 当前生产流量运行的是哪个 release bundle.
  2. 哪些模型, prompt, RAG index, tool schema 和 policy version 被调用.
  3. 输出是否被批准证据支持.
  4. Tool calls 是否符合权限, 审批, 幂等, 限额和审计要求.
  5. 用户是否过度接受 AI 或大量覆盖 AI.
  6. 质量, 成本, 延迟和安全指标是否触发 gate condition.
  7. 某个 incident 是否能被 trace replay, root cause 和 regression update.

8.2 OpenTelemetry Trace Fields

可以在现有 telemetry 规范上增加组织内部 AI trace attributes. 字段命名应稳定, 可脱敏, 可保留, 可与 release evidence 关联.

AttributePurpose
ai.release_id关联 release bundle
ai.use_case_id关联 AI inventory 和 risk tier
ai.risk_tier支持 risk-tiered monitoring
ai.model_id关联 model registry
ai.model_version关联 eval 和 rollback
ai.prompt_version关联 prompt registry
ai.rag_index_version关联 source inventory 和 retrieval eval
ai.tool_schema_version关联 OpenAPI contract
ai.eval_gate_id关联 release gate
ai.policy_version关联 guardrail / decision policy
ai.trace_sample_class标记 normal, high-risk, incident, audit_sample
ai.human_review_required证明 HITL 触发
ai.human_review_outcome记录 approve, edit, reject, escalate
ai.cost_usd_estimate连接 cost fitness
ai.latency_ms_total连接 latency fitness

隐私纪律:

  • 不把完整 customer prompt, PII, secret, account number, transaction details 直接放进普通 trace attributes.
  • 对复现必要信息使用 evidence vault, hashed reference, masking, access control 和 retention tag.
  • Trace completeness 和 privacy minimization 是一组 tradeoff, 必须在 ADR 中说明.

8.3 Runtime Dashboard

Dashboard 不应只展示 "requests per minute". 高影响 AI dashboard 至少分成 8 组:

PanelMetrics
Release and routeactive release_id, model / prompt / index / tool versions, traffic split
Qualitygroundedness sample pass, critical failure, wrong citation, no-answer handling
Safetyunsafe output, under-escalation, complaint trigger, vulnerable customer handling
Security / privacyprompt injection hit, unauthorized retrieval, DLP finding, sensitive log finding
Tool / actiontool calls by action class, denied action, approval bypass attempt, idempotency error
Cost / latencycost per task, token burn, p95 / p99 latency, timeout, fallback rate
Human oversightreview volume, queue age, edit distance, override, reviewer disagreement
Adoption / valueactive users, acceptance, blind acceptance signal, rework, case resolution impact

8.4 Runtime Trigger to Governance Action

SignalGovernance Action
Critical failure in high-risk outputimmediate hold, incident triage, rollback or route disable
Wrong citation above thresholdfreeze index promotion, run citation audit, update regression set
Unauthorized tool call attemptdisable tool route, security incident review, OpenAPI contract retest
Cost spikehold ramp, inspect route mix, check prompt / model / retrieval changes
Latency breachfallback route, queue capacity review, route or retrieval optimization
Override spikesample review, root cause, retrain / prompt fix, user training review
Blind acceptance signaltrust calibration review, UI / workflow change, QA sampling
Evidence missingstop scale, evidence owner remediation, quarterly review escalation

9. OpenAPI Tool Contract Pattern

Agent action governance should be visible in the tool contract. OpenAPI can describe operation, schema, security and custom extensions. Organization-specific extension fields can encode AI governance requirements.

9.1 Tool Contract Fields

FieldPurpose
operationIdStable tool action name
securityAuth scheme and required scopes
x-ai-tool-risklow, medium, high, restricted
x-ai-side-effectnone, draft, reversible_write, irreversible_write
x-ai-human-approval-requiredtrue / false
x-ai-idempotency-requiredtrue / false
x-ai-rollback-supportedtrue / false
x-ai-audit-fieldsrequired audit fields
x-ai-release-gaterequired fitness functions
x-ai-prohibited-useaction boundaries

9.2 Example

paths:
  /disputes/{disputeId}/draft-response:
    post:
      operationId: draftDisputeResponse
      summary: Draft a payment dispute response for human review
      security:
        - oauth2:
            - disputes.read
            - disputes.draft
      x-ai-tool-risk: medium
      x-ai-side-effect: draft
      x-ai-human-approval-required: true
      x-ai-idempotency-required: true
      x-ai-rollback-supported: true
      x-ai-audit-fields:
        - release_id
        - model_version
        - prompt_version
        - reviewer_id
        - evidence_source_ids
      x-ai-release-gate:
        - AFF-ACT-WRITE-002
        - AFF-EVID-BND-001
        - AFF-QUAL-GRD-001
      x-ai-prohibited-use:
        - submit_final_response_without_human_approval
        - change_customer_account_status

9.3 Tool Contract Fitness Functions

CheckFailure Meaning
Every high-risk tool has x-ai-human-approval-required: trueAgent can perform sensitive action without explicit human approval
Reversible / irreversible writes require idempotency keyRetry or loop may duplicate customer-impacting action
Tool schema has audit fieldsIncident and evidence trace cannot prove what happened
Restricted tools require accepted ADRTool risk was never formally decided
Tool operation is linked to release gate IDsContract is disconnected from governance

10. Financial Retail Cases

10.1 Customer Service Fee Waiver AI

DimensionDesign
Use caseAssist agents answering fee waiver and account service questions
AI roleretrieve policy, draft response, cite source, recommend escalation
AI must not doPromise waiver, approve credit, change account, close complaint
Key risksunsupported policy claim, vulnerable customer mishandling, latency hurting service, automation bias
Fitness functionsAFF-QUAL-GRD-001, AFF-SAFE-ESC-001, AFF-LAT-P95-001, AFF-ADOPT-TRUST-001

Example gate:

GateRequired Signal
Releaseunsupported fee promise = 0 in golden and red-team sets
Canaryall complaint / hardship tagged conversations escalated or reviewed
Runtimewrong policy citation alert triggers route fallback
Quarterlyreview if policy changes, complaint rate increases, or blind acceptance rises

10.2 Credit Policy RAG

DimensionDesign
Use caseInternal credit analyst asks policy and exception handling questions
AI roleretrieve approved policy, compare facts, draft rationale
AI must not doMake final credit decision, generate adverse action reason without workflow, override policy
Key risksoutdated policy, unfair treatment, unsupported rationale, model risk evidence gap
Fitness functionsAFF-PRIV-VND-003, AFF-EVAL-COV-001, AFF-QUAL-COMP-002, AFF-EVID-ADR-003

Example gate:

GateRequired Signal
Index buildall policy sources active, approved, versioned and linked to owner
Releaseconflicting policy test set pass, adverse-action-sensitive examples reviewed
Runtimestale source detection and analyst override trends monitored
QuarterlyModel Risk reviews slice failures, exception usage and policy change impact

10.3 AML Investigation Agent

DimensionDesign
Use caseAML analyst reads alerts, KYC, transaction history and policy evidence
AI rolesummarize, identify red flags, cite evidence, draft narrative
AI must not doClose alert, submit SAR, change customer risk rating
Key riskshallucinated facts, missing red flags, unauthorized case data retrieval, weak audit trail
Fitness functionsAFF-ACT-READ-001, AFF-SEC-AUTH-002, AFF-QUAL-REPLAY-004, AFF-EVID-BND-001

Example gate:

GateRequired Signal
Tool contractno disposition write API available to AI route
Releasehigh-risk red flag recall meets threshold; unsupported allegation = 0
Runtimeevery saved AI-assisted narrative has human approval and edit diff
Quarterlyreview access logs, citation audit, SAR quality sample and incident near misses

10.4 Payment Dispute Agent

DimensionDesign
Use caseDraft dispute response and collect evidence for operations specialist
AI roleclassify dispute, retrieve transaction evidence, draft customer response
AI must not doFinal deny / approve dispute, move funds, change account status
Key riskswrong customer communication, irreversible action, missing regulatory deadline, privacy leak
Fitness functionsAFF-ACT-WRITE-002, AFF-LAT-HITL-002, AFF-PRIV-MIN-001, AFF-SAFE-HARM-003

Example gate:

GateRequired Signal
Contractdraft-only tool has approval workflow and audit fields
Releasedeadline-sensitive cases escalate correctly
Runtimequeue age and SLA risk monitored
Quarterlyreview disputed outcomes, complaint samples and regulatory deadline misses

10.5 Retail Promotion Optimization AI

DimensionDesign
Use caseRecommend promotion and markdown actions across channels
AI roleforecast demand, simulate margin, recommend campaign changes
AI must not doPublish promotion without merchant approval, violate pricing policy, leak customer segment data
Key risksmargin erosion, unfair targeting, inventory distortion, brand damage
Fitness functionsAFF-COST-UNIT-001, AFF-ADOPT-VALUE-002, AFF-PRIV-MIN-001, AFF-EVID-EXP-002

Example gate:

GateRequired Signal
Designpricing policy and protected segments mapped
Releasesimulation covers margin, inventory, customer fairness and channel conflict
Runtimeuplift measured with guardrail metrics, not only conversion
Quarterlyreview value realization, override patterns and exception aging

11. Templates

11.1 Fitness Function Catalog Template

FieldRequired ContentExample
Fitness IDStable IDAFF-RAG-GRD-001
NameVerb + architecture constraintEnforce grounded regulated answers
Domainquality, safety, security, privacy, cost, latency, eval, tool, evidence, adoptionquality
Architecture IntentWhat decision or quality attribute this protectsCustomer-impacting answers must be supported by approved sources
ScopeUse case, risk tier, release stage, channelTier 1 customer service AI, canary and production
ArtifactWhat is checkedresponse, citation, retrieved source, trace
Check MethodAutomated or semi-automated methodcitation checker + SME weekly sample
TriggerWhen it runsrelease eval, canary sample, weekly monitoring
ThresholdPass / fail ruleunsupported high-risk claim = 0; citation support >= 98%
Gate ImpactWhat happens on failureno-go, rollback, or restricted route
EvidenceWhat record proves it raneval run ID, sample audit, dashboard snapshot
OwnerAccountable roleEvalOps Owner / Knowledge Owner
Review CadenceWhen threshold and rule are reviewedevery release and quarterly
Linked ADRDecision recordADR-AI-CS-004
Linked ControlsControl IDsAICL-RAG-003, AICL-MON-005

11.2 Gate Matrix Template

GateFitness FunctionsRequired EvidenceDecision OwnerFailure Action
G0 Intakeapproved use, risk tier, data classuse case card, risk tier memoBusiness Owner / AI Governancerestrict scope or stop
G1 Architecturequality scenarios, tool boundary, data boundaryarchitecture diagram, ADR draftEnterprise Architectredesign
G2 CI / Contractschema, OpenAPI tool contract, policy-as-codeCI report, contract lint, policy reportPlatform Ownerblock merge
G3 Evalcoverage, critical failure, red-team, cost, latencyeval report, failure analysisEvalOps / Model Riskno-go
G4 Releaseevidence binder, rollback, monitoring, approvalsgate memo, dashboard, rollback runbookRelease Manager / Risk Ownerno-go or limited go
G5 Runtimeproduction quality, safety, security, cost, adoptionOTel dashboard, sample reviewOps / Product Ownerhold, rollback, incident
G6 QuarterlyADR assumptions, exceptions, control evidence, valuequarterly architecture review memoChief Architect / AI Governancerevise, retire, reset thresholds

11.3 Exception Memo Template

# Architecture Fitness Function Exception Memo

Exception ID: EXC-AFF-[USECASE]-[NUMBER]
Date: YYYY-MM-DD
Use Case:
Risk Tier:
Release / Scope:
Requested By:
Decision Owner:
Expiry Date:

## Fitness Function
- Fitness ID:
- Architecture intent:
- Current threshold:
- Current result:

## Business Reason
- Why the exception is requested:
- What business outcome depends on it:
- Why alternatives are not sufficient:

## Risk Analysis
| Risk | Impact | Likelihood | Compensating Control | Owner |
|---|---|---|---|---|

## Conditions
- Scope limit:
- User / segment limit:
- Traffic limit:
- Tool / action limit:
- Monitoring trigger:
- Rollback trigger:

## Evidence
- Eval evidence:
- Monitoring evidence:
- Control evidence:
- Approval record:

## Decision
- Approved / rejected / approved with restrictions:
- Residual risk owner:
- Re-review trigger:
- Expiry action:

Writing standard:

  • Exception must be time-bound.
  • Exception must name a residual risk owner.
  • Exception must have compensating control and monitoring trigger.
  • Exception must not silently become the default architecture.

11.4 Dashboard Template

Dashboard SectionRequired WidgetsDecision Use
Release identityactive release_id, model, prompt, index, tool schema, policy versionprove production version
Gate statuslatest eval result, open blockers, exceptions, expiryrelease and scale decision
Qualitygroundedness, completeness, consistency, replay successquality drift detection
Safetycritical failure, escalation, complaint, harm categoryincident and customer protection
Security / privacyinjection attempts, unauthorized retrieval, DLP, sensitive logsecurity response
Tool actiontool calls, denials, approvals, idempotency errors, rollback testsagent risk control
Cost / latencycost per task, token burn, p95 / p99, timeoutunit economics and UX
Human oversightreview queue, edit distance, override, reviewer disagreementHITL effectiveness
Adoption / valueactive users, acceptance, rework, benefit metric, workflow impactproduct scaling
Evidence healthmissing evidence, stale evidence, exception aging, ADR reopen triggergovernance health
## ADR-AI-[DOMAIN]-[NUMBER]: [Decision Title]

Status: Proposed / Accepted / Conditionally Accepted / Superseded
Date: YYYY-MM-DD
Scope: use case, risk tier, release stage, region, user group

### Context
- Business driver:
- Quality attributes:
- Risk tier:
- Architecture assumptions:

### Decision
- We will:
- We will not:

### Fitness Functions
| Fitness ID | Why it matters | Gate | Monitoring |
|---|---|---|---|

### Tradeoffs
| Improved Attribute | Weakened Attribute | Mitigation |
|---|---|---|

### Evidence
- Eval run:
- Security / privacy test:
- Cost / latency test:
- Control evidence:
- Exception memo:

### Reopen Triggers
- Model, prompt, index, tool or policy version changes:
- Critical failure or incident:
- Threshold breach:
- Material product scope expansion:
- Quarterly review finding:

12. Quarterly Architecture Review

Quarterly review is where continuous signals become architecture decisions. It should not repeat project status. It should answer whether the architecture still fits the business, risk and operating context.

12.1 Inputs

InputWhat to Review
Fitness function resultspass / fail trends, repeated exceptions, fragile thresholds
Runtime dashboardquality, safety, cost, latency, security, privacy, adoption
Incidents and near missesroot cause, regression cases, control redesign
ADR assumptionsassumptions broken by vendor, data, policy, product scope or adoption
Evidence graphstale evidence, missing owners, weak claim support
Product metricsvalue realization, workflow impact, blind acceptance, override patterns
Audit / risk findingsopen issues, remediation aging, model risk challenge
Roadmapupcoming automation, new tools, new regions, new data, scale plans

12.2 Review Questions

QuestionStrong Answer
Which architecture assumptions failed this quarter?Named assumptions, affected ADRs, evidence and remediation
Which fitness functions were noisy or ineffective?Rule tuning, threshold recalibration, better measurement source
Which exceptions are becoming permanent?Expiring exceptions, redesign plan, risk owner decision
Which quality attribute tradeoffs changed?New cost / latency / safety / adoption evidence
Is AI adoption healthy?Users neither blindly accept nor bypass AI; value and risk metrics align
Are release gates too weak or too slow?Gate data shows blockers, cycle time, false positives and risk reduction
Which systems need architecture refactoring?Specific RAG, tool, data, observability, fallback or workflow changes

12.3 Outputs

OutputUse
Quarterly architecture fitness memoManagement review evidence
ADR updatesRecord changed decisions or assumptions
Fitness catalog changesAdd, retire, tighten or soften rules
Gate matrix changesAdjust CI/CD, release, canary or runtime gates
Exception decisionsClose, renew with stronger control, or reject
Roadmap changesFund platform, observability, eval or workflow improvements
Audit binder updatePreserve review evidence and owner decisions

13. Operating Model

Architecture fitness functions cut across product, platform, risk and operations. Ownership must be explicit.

RoleOwns
AI Product LeadUse case scope, product metrics, adoption health, value-risk tradeoff
Senior BA / CBAPQuality attribute scenarios, process boundaries, requirements traceability, stakeholder concerns
Solution ArchitectArchitecture intent, fitness catalog, ADR, data / tool / recovery boundaries
Enterprise ArchitectCross-system standards, quarterly review, platform roadmap, architecture governance
EvalOps OwnerEval datasets, evaluators, regression, failure analysis, gate metrics
Platform / MLOps OwnerCI/CD implementation, release bundle, registry, deployment, rollback
Security Ownerprompt injection, access control, tool abuse, threat model
Privacy Ownerdata minimization, retention, vendor data boundary, customer rights
Model Riskvalidation scope, independent challenge, ongoing monitoring expectations
Compliance / Legalregulated claims, disclosures, prohibited use, policy interpretation
Internal Auditevidence quality, operating effectiveness, management review trail
Business Ownerresidual risk acceptance, process change, frontline accountability

Recommended RACI pattern:

ArtifactResponsibleAccountableConsultedInformed
Fitness catalogSolution ArchitectEnterprise ArchitectProduct, Risk, Security, PrivacyEngineering
Eval gateEvalOpsModel Risk / Product OwnerSME, ComplianceRelease Manager
Tool contractPlatform OwnerSolution ArchitectSecurity, Business OwnerOperations
Exception memoProduct OwnerBusiness / Risk OwnerArchitecture, ComplianceAudit
Runtime dashboardPlatform / OpsProduct OwnerRisk, Security, PrivacyGovernance forum
Quarterly reviewEnterprise ArchitectAI Governance ChairProduct, Risk, AuditExecutive stakeholders

14. Anti-Patterns

Anti-PatternWhat It Looks LikeRiskCorrection
Review-only governanceArchitecture board approves slides, no executable gateDecisions drift after model or prompt changesConvert key decisions into fitness functions and gates
Accuracy-only gateAverage task score passes, critical failures hiddenLow-frequency high-impact failures enter productionSeparate critical failure hard gates from aggregate metrics
HITL as a sloganHuman review exists in process map, but reviewers lack evidence or capacityAutomation bias and false accountabilityMonitor queue age, edit distance, override and reviewer agreement
RAG treated as evidenceSystem retrieves documents, so team assumes answers are groundedWrong or stale citation causes customer or regulatory harmAdd source approval, retrieval eval, citation audit and stale source checks
Tool creepAgent quietly gains write permissions across releasesIrreversible customer-impacting actions occur without governanceOpenAPI contract, tool registry, ADR and hard gates for write actions
Observability without privacyTeams log full prompts, documents and customer detailsPrivacy, retention and breach riskTrace references, masking, evidence vault and access control
Cost optimization without risk tierCheaper model handles high-impact casesQuality and safety degrade in the riskiest slicesRisk-tiered routing and eval parity gate
Exceptions never expireA temporary threshold waiver becomes permanentControl baseline erodesExpiry, owner, compensating control and quarterly review
Product metrics disconnected from riskAdoption rises, but complaints and overrides also riseValue narrative hides control failurePut value and risk metrics on the same dashboard
ADRs disconnected from runtimeDecisions are accepted but never reopenedArchitecture assumptions become staleRuntime triggers reopen ADRs

15. Interview Expression

15.1 30-Second Answer

I use architecture fitness functions to make AI governance executable. Instead of leaving quality attributes in architecture review slides, I define automated or semi-automated checks for groundedness, safety, security, privacy, latency, cost, eval coverage, tool permissions, evidence and adoption. These checks run in CI/CD, release gates and runtime monitoring. When a check fails, it can block release, limit rollout, trigger rollback, require exception approval or reopen an ADR.

15.2 2-Minute Answer

For AI systems, architecture governance cannot be a one-time review because model versions, prompts, RAG indexes, tool schemas, policies and user behavior keep changing. My approach is to start with quality attribute scenarios and architecture decisions, then convert the most important constraints into fitness functions.

For example, in a financial customer service AI, a fitness function might say that any customer-impacting fee waiver answer must cite an approved active policy source, unsupported claims must be zero, p95 latency must stay below the service target, and complaint or vulnerable-customer cases must escalate to a human. Some checks are fully automated in CI or policy-as-code. Others are semi-automated through eval, SME sampling, model risk review and evidence binder checks.

I connect those functions to release gates, OpenAPI tool contracts, OpenTelemetry traces, dashboards, exception memos and quarterly architecture reviews. The result is that architecture intent becomes operational: we know what is allowed, how it is measured, what evidence proves it, who owns it, and what happens when it fails.

15.3 Chief Architect Version

Architecture fitness functions are the mechanism I use to keep enterprise AI architecture decisions alive after the design review. They encode architecture intent as enforceable constraints across release bundles, tool contracts, eval gates and runtime telemetry. The architecture board no longer only asks whether a design was approved. It asks which fitness functions protect the decision, how often they run, what evidence they create, and which ADRs reopen when assumptions break.

15.4 CRO / Model Risk Version

From a risk perspective, fitness functions are continuous control tests for AI architecture. They connect risk appetite to measurable gates: critical failures, unsupported regulated claims, unauthorized tool calls, privacy leakage, stale evidence, expired exceptions and unhealthy adoption signals. This gives risk leadership a way to see whether residual risk remains within approved boundaries during release and production, not only at initial approval.

15.5 Interview Questions

QuestionStrong Talking Point
How do you prevent AI architecture review from becoming a paperwork exercise?Convert review decisions into fitness functions, gates, telemetry and evidence
How do you handle model or prompt changes after approval?Release bundle versioning, impact analysis, regression eval, gate rerun, ADR reopen trigger
How do you govern agent tools?OpenAPI contract with side effect, approval, idempotency, audit and rollback fields
How do you balance cost and safety?Risk-tiered routing, eval parity, hard gates for high-risk slices, dashboard with cost and risk together
What makes a good AI dashboard?It shows release identity, quality, safety, security, privacy, tool action, cost, latency, HITL and adoption
How do you prove governance to audit?Evidence graph links requirement, risk, fitness function, test, evidence, owner, gate and review cadence

16. Portfolio Asset Pack

A strong portfolio can turn this playbook into a concrete artifact set:

AssetContentWhat It Demonstrates
AI Fitness Function One-PagerMethod, taxonomy, governance loopSenior architecture language
Fitness Catalog30-50 functions across quality, safety, security, privacy, cost, latency, eval, tool, evidence, adoptionAbility to operationalize quality attributes
Gate MatrixG0 to G7 with owners, evidence and failure actionsRelease governance design
OpenAPI Tool Contract SampleAgent tool spec with AI governance extensionsSolution architecture depth
OpenTelemetry Trace SchemaAI release, model, prompt, index, tool and evidence fieldsObservability design
Runtime Dashboard SpecPanels, metrics, thresholds, ownersProduction governance
Exception Memo PackExample exceptions with compensating controls and expiryRisk acceptance discipline
Quarterly Review MemoArchitecture assumption review and ADR reopen examplesEnterprise architecture governance
Financial Retail Case StudyCustomer service, credit, AML, dispute or promotion AIDomain-specific credibility
Interview Narrative30-second, 2-minute, Chief Architect and CRO versionsHiring conversion

17. Final Check

Before using an AI architecture fitness function system in a high-impact financial retail use case, run this self-check:

CheckPassing Standard
DefinitionFitness functions are clearly defined as automated or semi-automated architecture constraints
TaxonomyCovers quality, safety, security, privacy, cost, latency, eval, tool, evidence and adoption
TraceabilityEach high-risk function links to quality attribute scenario, ADR, eval, gate, monitoring and evidence
Gate impactEach failure has a decision path: block, limited go, rollback, exception or review
Runtime proofObservability can show active release, versions, tool actions, human review and critical signals
EvidenceRelease and runtime evidence can support audit, model risk and management review
Product valueAdoption and benefit metrics are reviewed with risk metrics
Review cadenceQuarterly architecture review can revise functions, thresholds, ADRs and exceptions
Financial retail fitCases reflect customer impact, regulatory sensitivity, model risk, privacy and operational resilience

Final principle:

AI architecture governance becomes real only when architecture intent can fail a build, block a release, stop a ramp, trigger a rollback, expire an exception or reopen an ADR.