AI 底层逻辑 / 经典论文

AI Platform Service Catalog：Golden Paths

一句话:

244 行ai-foundations/papers/86-ai-platform-service-catalog-golden-paths.md

AI Platform Service Catalog / Golden Paths 解读

面向对象: AI Platform PM / Platform Architect / Enterprise Architect / Product Operations Lead / Senior BA / Developer Experience Lead。核心问题: AI 平台如果只提供模型 API, 业务团队仍然要自己解决 RAG、eval、tool gateway、权限、日志、HITL、证据和上线门禁。AI 平台产品管理的核心是把安全可复用能力包装成 service catalog 和 golden paths。学习目标: 用 platform engineering、service catalog、golden paths、self-service with guardrails、SLO/cost/adoption metrics 设计 AI 平台服务目录和业务团队体验。

Source Anchors

Source	Link	用途
CNCF Platform Engineering	https://tag-app-delivery.cncf.io/wgs/platforms/	参考平台工程、平台能力和内部平台产品思路
Backstage Docs	https://backstage.io/docs/	参考 service catalog、software templates 和开发者门户
NIST AI RMF	https://www.nist.gov/itl/ai-risk-management-framework	将平台 self-service 与风险、评估、治理和监控连接
OpenTelemetry	https://opentelemetry.io/docs/	参考 trace、metrics、logs 的可观测性基础

一句话:

AI Platform Service Catalog 是把企业 AI 的模型、知识、工具、评估、控制和证据能力产品化; Golden Paths 是让团队按推荐路径快速、安全、可审计地交付 AI use case。

1. 为什么 AI 平台不能只提供模型 API

模型 API 只解决:

input -> model -> output

业务团队还需要:

选择 approved model。
接入知识源和权限过滤。
设计 eval set 和 release gate。
控制 tool call 和副作用。
记录 trace、cost、latency、quality。
处理 PII、retention、audit。
设计 human review queue。
生成 evidence binder。
通过风险和架构评审。

如果平台只提供模型调用, 每个团队都会重复实现这些能力。平台 PM 的任务是把它们包装成可发现、可申请、可配置、可监控、可治理的服务。

2. Service Catalog Taxonomy

Service	用户问题	平台能力
Model Gateway	我该用哪个模型	allowlist、routing、quota、cost、logging
RAG Service	我如何接入企业知识	ingestion、metadata、permission、citation、freshness
Eval Service	我如何证明质量	golden set、judge、rubric、report、release gate
Tool Gateway	AI 如何安全行动	connector、policy、approval、idempotency、audit
Policy Engine	如何执行 guardrails	risk tier、DLP、advice boundary、runtime decision
Observability	我如何看线上行为	trace、metric、log、dashboard、alert
HITL Queue	人工如何复核	queue、SLA、override、review evidence
Evidence Binder	如何应对审计/风险	ADR、eval、trace sample、approval、control evidence
Templates	如何快速启动	app template、workflow template、prompt/eval template

每张 service catalog card 应说明:

服务解决什么问题。
谁适合用。
输入/输出。
SLO。
成本模型。
风险等级支持。
数据边界。
接入步骤。
支持方式。
成熟度和限制。

3. Golden Paths

Golden path 是推荐的端到端落地路径。

Golden Path	包含服务	适用场景
Customer-facing RAG	RAG + policy + eval + citation + HITL + evidence	客服、政策解释、客户问答
Employee Copilot	model gateway + retrieval + feedback + observability	内部知识助手、运营助手
Agent Workflow	workflow + tool gateway + policy + HITL + trace	支付争议、case automation
Document AI	OCR/extraction + validation + human review + eval	KYC、贷款资料、理赔
Decision Service	feature/DMN/model + explainability + monitoring	欺诈、信贷、风险分层

Golden path 不等于强制唯一方案。它提供:

推荐架构。
默认 controls。
templates。
示例代码/配置。
eval cases。
release checklist。
support model。

让团队从“从零拼装”变成“在 guardrails 内配置”。

4. Self-Service with Guardrails

平台要同时支持自助和治理。

Use case intake
  -> risk tier
  -> recommended golden path
  -> service catalog selection
  -> template provisioning
  -> eval/release gate
  -> observability/evidence

Guardrail	Self-service 实现
Risk tier	intake 问卷 + policy profile
Data boundary	data classification + approved connectors
Model choice	model gateway route policy
Tool action	tool permission profile
Eval gate	required eval template by risk tier
Monitoring	default dashboard and alerts
Evidence	auto-generated release bundle
Exception	exception request and expiry

高级平台设计不是把治理变成审批墙, 而是把治理嵌进 golden path。

5. Platform Product Metrics

Metric	说明
Time-to-first-pilot	从 use case intake 到可运行 pilot
Reuse rate	使用 core services/templates 的比例
Golden path adoption	推荐路径使用和完成率
Quality gate pass rate	首次通过 eval/release gate 比例
Cost per case	按业务结果或 case 的单位成本
Risk exceptions	例外数量、原因、过期和重复
Developer satisfaction	产品团队/工程团队使用体验
Support load	平台支持工单和 blocked reasons
Service SLO	availability、latency、trace completeness
Evidence completeness	release bundle 完整率

平台 PM 需要同时看 adoption 和 constraints:

只看使用量会鼓励无治理扩张。
只看门禁会让平台变成审批机器。
最好的指标是 time-to-safe-value。

6. Financial Retail Case: Bank AI Platform Catalog

银行 AI 平台目录:

Catalog Service	用户
Approved Model Gateway	所有 AI 产品团队
Retail Policy RAG	客服、分行、投诉
AML Knowledge RAG	AML investigation
Credit Explanation Template	信贷服务
Tool Gateway for Case Systems	agent workflow
HITL Review Queue	ops/risk reviewer
EvalOps Service	product + QA + model risk
AI Evidence Binder	audit/risk/compliance

Golden path: Customer-facing policy assistant

Intake
  -> risk tier: high
  -> RAG service with approved source registry
  -> model gateway approved route
  -> advice boundary policy
  -> customer disclosure UX template
  -> eval service with groundedness/citation/refusal
  -> HITL for high impact
  -> observability dashboard
  -> evidence bundle

7. Artifact Templates

Service Catalog Card

字段	内容
Service name	服务名称
Problem solved	解决什么问题
Consumers	谁使用
Inputs / outputs	输入输出
SLO	可用性、延迟、质量
Risk support	支持哪些 risk tier
Data boundary	数据限制
Cost model	成本计量
How to onboard	接入步骤
Evidence produced	产生哪些证据

Golden Path Checklist

Step	Required Artifact
Intake	opportunity brief, risk tier
Provision	template, service config
Build	prompt/RAG/tool workflow
Evaluate	eval report
Release	release bundle, sign-off
Operate	dashboard, runbook

8. ADR Draft

项目	内容
决策	AI 平台以 service catalog + golden paths 方式产品化模型、RAG、eval、tool gateway、policy、observability、HITL 和 evidence 服务
背景	只提供模型 API 会导致业务团队重复建设治理、评估和审计能力
替代方案	中央团队手工交付所有 AI 项目; 每个团队自建; 单一 vendor SaaS
选择理由	service catalog 提供可发现能力, golden paths 提供安全快速路径, guardrails 内嵌治理
影响	需要 platform PM、service owner、SLO、support model、usage telemetry 和 roadmap prioritization
反转条件	如果 golden paths 不能覆盖高价值场景, 需要引入 extension mechanism 而不是放弃平台化

9. 面试表达

30 秒版本

AI 平台不能只提供模型 API。真正有用的平台应该有 service catalog 和 golden paths, 包含 model gateway、RAG、eval、tool gateway、policy engine、observability、HITL 和 evidence binder。业务团队可以自助启动, 但风险分层、数据边界、评估门禁和审计证据被内嵌在路径里。

2 分钟版本

我会把 AI 平台当成产品来管理。首先定义 service catalog, 每个服务都有 consumer、SLO、数据边界、成本、风险等级支持和产生的证据。然后为常见场景设计 golden paths, 比如 customer-facing RAG、employee copilot、agent workflow、document AI 和 decision service。团队通过 intake 选择 risk tier, 平台推荐 golden path, 自动 provision 模板和服务配置, 强制进入 eval/release gate, 上线后进入 observability 和 evidence binder。这样既提高 time-to-pilot, 又避免每个团队重复造治理和审计能力。

AI Platform PM / CTO 版本

平台 PM 要证明平台不是技术组件仓库, 而是让产品团队更快交付 safe value 的 operating system。CTO 关心复用、SLO 和成本, CPO 关心 adoption 和业务速度, CRO 关心 guardrails 和证据。Service catalog + golden paths 是把这些目标放进同一套平台产品语言里。