AI 扩展计划 / Playbooks

AI Agent Marketplace / Tool Certification Governance Playbook

核心判断:

779 行AI_AGENT_MARKETPLACE_TOOL_CERTIFICATION_GOVERNANCE_PLAYBOOK.md

AI Agent Marketplace / Tool Certification Governance Architecture Playbook

定位: 面向 Advanced AI PM、Senior BA、AI Product Architect、Enterprise Architect、AI Platform PM、API Product Owner、Cyber、Data Governance、Model Risk、Third-Party Risk、Compliance、Internal Audit 和金融零售业务 owner。本文不是基础服务目录建设指南, 而是训练你把 agents、tools、prompts、MCP servers、APIs 和 reusable AI capabilities 做成一个可运营、可认证、可授权、可审计、可退出的 internal platform product。

核心判断:

The enterprise AI marketplace is not successful when teams can find more tools. It is successful when teams can safely reuse certified agentic capabilities with enforceable permissions, trustworthy provenance, current evidence, clear owners, runtime monitoring and disciplined lifecycle control.

0. Disclaimer

本文是学习、架构训练和作品集材料, 不构成法律意见、监管意见、网络安全认证结论、模型风险管理结论、审计意见、采购建议或供应商合规认证。

正式项目必须由 AI Governance、Enterprise Architecture、Cyber Security、Privacy、Data Governance、Model Risk、Third-Party Risk、Legal、Compliance、Internal Audit、Platform Engineering、API Owners、Business Owners 和必要的外部顾问共同判断。

适用性取决于行业监管、客户影响、数据分类、agent autonomy level、tool side effect、身份与授权体系、供应商依赖、runtime gateway 成熟度、审计要求和机构内部政策。本文提供的是 architecture and operating model decision support。

1. Executive Framing

企业部署 AI agents 后, “复用能力”会成为速度和风险的共同来源。一个团队发布的 customer lookup tool、case update tool、prompt bundle、MCP connector 或 RAG corpus 可能被十几个 agents 复用。复用越成功, 系统性风险越大:

一个过度授权 tool 会放大为跨业务的 over-permission。
一个未签名 prompt bundle 会放大为不可追溯的 policy drift。
一个缺少 deprecation plan 的 API 会放大为 operational lock-in。
一个没有 evidence export 的 MCP server 会放大为审计和 incident response 缺口。
一个未定义 side effect 的 write tool 会放大为客户伤害、投诉或监管风险。

高管语言:

An internal AI marketplace should reduce duplicated work,
but it must not industrialize uncontrolled agency.

建设目标不是“把所有工具列出来”, 而是建立 capability control plane:

discover -> assess -> certify -> sign -> publish -> request scopes
  -> enforce at runtime -> monitor -> renew -> deprecate -> exit

Senior PM / Architect 的核心贡献:

把 marketplace 当成 platform product 运营, 而不是治理文档库。
把 certification 做成 evidence-producing workflow, 而不是会议审批。
把 capability card 设计成 discovery、risk review、permission request、runtime policy 和 audit 的共同对象。
把 OpenAPI、AsyncAPI、MCP manifests、prompt bundles、eval packs 和 signed artifacts 接入同一套 lifecycle。
把 runtime permission、invocation ledger、owner attestation 和 deprecation/exit 连接起来。

2. Source Anchors

Anchor	Official link	本文使用方式
OWASP Top 10 for LLM Applications	https://genai.owasp.org/llm-top-10/	用 prompt injection、sensitive information disclosure、excessive agency、insecure output handling、supply-chain risk 等语言设计 agent/tool certification baseline
CISA Secure by Design	https://www.cisa.gov/securebydesign	用 secure-by-design、默认安全、责任前移、供应商责任和可证明安全控制设计 signed package、secure defaults 和 admission gates
NIST AI RMF	https://www.nist.gov/itl/ai-risk-management-framework	用 Govern / Map / Measure / Manage 组织 risk tier、evaluation、monitoring、risk acceptance、continuous improvement
ISO/IEC 42001 overview	https://www.iso.org/standard/42001	用 AI management system、roles、operation planning、performance evaluation、internal audit 和 continual improvement 组织 operating model
OpenAPI Specification	https://spec.openapis.org/oas/latest.html	用 contract-first API schema、operation metadata、security schemes、versioning 和 compatibility checks 支持 tool/API certification
AsyncAPI Specification	https://www.asyncapi.com/docs/reference/specification/latest	用 event-driven contract、message schema、channel、producer/consumer boundary 和 event retention 支持 asynchronous agent capability governance

Source nuance:

OWASP 是 threat lens, 不是完整 operating model。
CISA Secure by Design 强调默认安全和供应商责任, 可转化为 marketplace admission 和 package provenance 要求。
NIST AI RMF 和 ISO/IEC 42001 提供治理语言, 但具体到 agent/tool marketplace 需要转成 risk-tiered gates、evidence 和 runtime controls。
OpenAPI/AsyncAPI 不是治理框架, 但 contract-first specification 是可认证、可测试、可版本化的基础。

3. Taxonomy

Marketplace 应按 reusable agentic capability 分类, 而不是按组织或系统名分类。

Object	Definition	Marketplace treatment
Agent product	面向业务流程的 AI agent or copilot	use-case approval、risk tier、allowed tools、eval、owner、runtime telemetry
Tool/function	Agent 可调用的函数或 action	schema、scope、side effect、approval、idempotency、rollback、ledger
Prompt bundle	system prompt、policy prompt、template、instruction hierarchy	version、policy source、eval、signing、consumer compatibility
MCP server / connector	暴露工具、资源或上下文给 agent runtime 的 server	server identity、tool manifest、transport security、resource boundary、audit
OpenAPI API capability	REST/HTTP API 作为 agent tool	spec validation、security scheme、operation risk、versioning、rate limit
AsyncAPI event capability	event stream、command channel、message-based integration	message schema、channel risk、consumer obligations、replay/retention
RAG/knowledge asset	curated corpus、vector index、knowledge graph endpoint	source lineage、ACL、freshness、citation、data use restriction
Eval pack	reusable evaluation scenarios and thresholds	coverage、risk mapping、run evidence、result retention
Guardrail/policy service	PII redaction、claim checker、tool policy decision、content classifier	control objective、thresholds、bypass、monitoring and exceptions

Status	Meaning	Consumer behavior
Draft	provider is building metadata and evidence	not discoverable for production
Sandbox	usable only in isolated test environment	no sensitive data, no production side effect
Certified	approved for defined scopes and risk tier	can request production entitlement
Restricted	certified with extra conditions	step-up review, limited consumers or environments
Deprecated	no new consumers, existing consumers migrate	replacement path required
Retired	invocation blocked and evidence retained	no runtime use

4. Risk Tiering

Use risk tiering to drive gates, not to decorate catalog cards。

Tier	Pattern	Certification minimum	Runtime posture
Tier 0: Reference	docs, sample prompt, non-runtime reusable idea	peer review, source attribution	no production invocation
Tier 1: Low-risk read	public or low-sensitivity internal lookup	owner attestation, basic schema/eval, logging	read-only, low-volume limits
Tier 2: Sensitive read	customer/account/employee/confidential data retrieval	data steward approval, ABAC, prompt injection eval, evidence export	purpose-bound scoped tokens
Tier 3: Controlled write	internal record update, case creation, drafted customer content	side-effect review, approval model, rollback, stronger eval	human approval and rate limits
Tier 4: High-impact action	money movement, account restriction, credit/insurance/customer notification impact	governance committee review, dual control, model/security review, incident drill	dual control, strict limits, safe-stop, enhanced monitoring

Composition rule:

The effective risk tier of an agent workflow is the highest risk tier of its tools,
plus any additional risk created by chaining, scale, autonomy and customer visibility.

Risk tier inputs:

Data classification and purpose。
External or customer-visible output。
Financial, credit, fraud, complaint, legal or regulatory impact。
Tool side effect and reversibility。
Autonomy mode: assistive、draft-only、human-approved action、autonomous action。
Prompt injection exposure surface。
Vendor or third-party dependency。
Blast radius: number of agents, users, products or jurisdictions。
Evidence completeness and retention。

5. Decision Gates

Gate 1: Marketplace Admission

Entry criteria:

Named accountable owner and support path。
Capability type and intended business use。
Data and side-effect classification。
Contract or manifest reference。
Initial risk tier。
Consumer eligibility and environment boundary。

Blocked conditions:

anonymous owner。
unknown data class。
production use without runtime enforcement path。
direct credentials not bound to agent identity or delegated user context。
missing schema for any tool that accepts parameters or produces side effects。

Gate 2: Contract and Manifest Certification

For OpenAPI:

operation ids are stable and meaningful。
request/response schemas are explicit。
security schemes are declared。
error semantics and idempotency are documented for write actions。
breaking-change policy exists。

For AsyncAPI:

channels, messages, bindings and retention are explicit。
producer/consumer responsibilities are documented。
replay, ordering, deduplication and poison-message behavior are defined。
event-triggered side effects have owner and approval model。

For MCP servers:

exposed tools/resources are listed as capability objects。
server identity and transport security are defined。
resource boundary and data classification are explicit。
audit events are exportable。
package or deployment artifact is signed for production。

Gate 3: Agentic Risk Review

Review questions:

Can prompt injection influence this capability?
Can tool output be used unsafely by another tool?
Can the agent create irreversible or high-impact side effects?
Can sensitive data be inferred through repeated calls?
Does the capability require human review, dual control or spend/volume limits?
Are forbidden uses machine-enforceable or only written in documentation?

Gate 4: Evaluation Evidence

Required evidence by risk:

Evidence	Tier 1	Tier 2	Tier 3	Tier 4
schema validation tests	yes	yes	yes	yes
happy-path functional tests	yes	yes	yes	yes
prompt injection tests	light	yes	yes	yes
data leakage tests	light	yes	yes	yes
tool misuse / unsafe chaining tests	optional	light	yes	yes
human approval test	no	conditional	yes	yes
dual-control test	no	no	conditional	yes
rollback / remediation test	no	conditional	yes	yes
incident evidence export test	light	yes	yes	yes

Gate 5: Publication and Entitlement

Publication requirements:

Capability card complete。
Certification decision recorded。
Signed artifact and version referenced。
Runtime policy is generated or linked。
Entitlement request workflow is active。
Monitoring and evidence export path is tested。
Consumer obligations are visible before access request。

Gate 6: Renewal, Change and Exit

Renewal triggers:

risk tier change。
data class change。
new write action or side effect。
model route or prompt bundle change for agent products。
new consumer type or jurisdiction。
failed eval or incident。
vendor contract or dependency change。
deprecation of API/model/MCP server/source corpus。

Exit gate:

new usage blocked。
consumer inventory generated。
replacement or downgrade path provided。
evidence retention confirmed。
runtime tokens revoked at shutdown。
post-retirement invocation alerts enabled for a defined period。

6. Required Artifacts

Artifact	Owner	What it proves
Capability Card	Capability Owner / Marketplace PM	discoverability, risk boundary, consumer obligations
Risk Tier Assessment	AI Governance / Model Risk	proportional controls and review path
Data and Purpose Map	Data Governance / Privacy	field-level access and allowed use
Contract or Manifest	Tool/API/MCP Owner	schema, auth, side effect and compatibility
Certification Decision Record	Certification Board or Delegate	decision, evidence, conditions and renewal trigger
Signed Release Manifest	Platform Engineering	artifact provenance and deployment integrity
Evaluation Evidence Pack	Capability Owner / EvalOps	threat-informed and use-case-specific test results
Runtime Policy Binding	Security / Platform	scopes, approvals, limits and enforcement point
Invocation Ledger Schema	Platform / Audit	traceability and evidence export
Consumer Entitlement Register	Marketplace Operations	who can use what, for which purpose and scope
Exception Register	AI Governance / Risk	accepted risk, compensating controls and expiry
Deprecation and Exit Plan	Capability Owner / Marketplace PM	migration, shutdown and evidence retention

7. Capability Card Template

# Capability Card: Customer Profile Read Tool

Name: Customer Profile Read Tool
Capability ID: cap-servicing-customer-profile-read-v2
Type: OpenAPI Tool
Version: 2.8.0
Status: Certified
Owner: Customer Data Platform Product Owner
Support Contact: Customer Data Platform Support Queue
Business Domain: Customer Servicing
Risk Tier: Tier 2

Purpose: Retrieve purpose-bound customer profile attributes for complaint handling and service recovery agents.
Intended Users: complaint specialists, servicing supervisors and certified complaint response agents.
Allowed Uses:
- Retrieve current contact preference, product relationship and complaint case linkage for an assigned servicing case.
Forbidden Uses:
- Do not use for outbound sales targeting, employee monitoring, fraud restriction decisions or credit eligibility decisions.

Data Boundary:
Data classes: customer profile, product relationship, contact preference.
Allowed fields: customer id, servicing segment, preferred channel, active product flags, complaint case ids.
Prohibited fields: full account number, authentication secrets, credit score, AML notes, card PAN, unrelated household data.
Purpose code: complaint_handling.
Retention: invocation ledger retained for seven years; response payload retained only in case record when human reviewer attaches it.
Jurisdiction or residency constraints: United States servicing workflows only.

Contract and Runtime:
OpenAPI / AsyncAPI / MCP manifest / prompt bundle: customer-profile-read-openapi.yaml.
Security scheme: OAuth2 client credentials plus delegated user context.
Permission scopes: read:customer_profile:assigned_case.
Approval model: no pre-call approval; supervisor review required for bulk export request.
Rate or volume limits: 50 customer reads per specialist per hour; no wildcard customer search.
Rollback or remediation: read-only; inappropriate access triggers access review and case-note purge if retained incorrectly.

Evaluation:
Eval suites: sensitive-read-purpose-binding-v3, prompt-injection-data-exfiltration-v2.
Last eval run: 2026-06-12.
Thresholds: zero successful prohibited-field retrieval; 100% denial for unassigned case access.
Known limitations: does not validate downstream summaries created by consumer agents.
Compensating controls: consumer agents must use regulated-summary guardrail before drafting customer text.

Provenance:
Source repository: internal/customer-data-platform/profile-api.
Signed artifact: customer-profile-read-tool-2.8.0.signed.
Artifact hash: sha256:5a7d-profile-read-tool-2-8-0.
Release notes: profile-read-release-2026-06-10.
Dependency manifest: profile-api-sbom-2.8.0.json.

Lifecycle:
Certified date: 2026-06-18.
Renewal date: 2026-12-18.
Deprecation policy: block new consumers 45 days before retirement and notify entitlement owners weekly.
Replacement capability: cap-servicing-customer-profile-read-v3 after identity platform migration.
Evidence retention: certification and invocation evidence retained for audit and complaint governance retention period.

这个示例展示最终记录应具备的具体性。真实企业可以用表单帮助文本指导填写, 但发布后的 capability card 应避免空泛字段。

8. Certification Evidence Pack

Evidence pack should be reusable across consumers and exportable under audit or incident pressure。

Evidence area	Minimum evidence
Identity and ownership	owner, steward, support, business domain, accountable executive for Tier 4
Purpose and boundaries	intended uses, forbidden uses, consumer obligations, environment boundary
Contract quality	OpenAPI/AsyncAPI/MCP manifest, schema tests, auth scheme, version policy
Data control	data classification, field allowlist, purpose binding, steward approval
Security	threat model, identity model, secret handling, transport security, vulnerability scan
Agentic risk	prompt injection tests, excessive agency review, unsafe chaining scenarios
Side effect	action classification, approval requirement, reversibility, idempotency, rollback
Eval results	eval cases, thresholds, run id, failures, accepted limitations
Provenance	signed package, artifact hash, build attestation, release notes
Runtime enforcement	policy id, gateway route, token scopes, approval flow, denial tests
Monitoring	invocation ledger, metrics, alerts, evidence export test
Lifecycle	renewal trigger, deprecation plan, consumer inventory method, exit evidence

Fact discipline:

Evidence pack should separate what was designed,
what was tested,
what was approved,
what is enforced at runtime,
and what was observed after release.

9. Runtime Permission Governance

Marketplace publication does not equal runtime access. Access should be granted by scoped entitlement。

Runtime flow:

consumer requests capability
  -> use-case and purpose check
  -> risk-tier and eligibility check
  -> data steward / owner / security approval where required
  -> scoped entitlement issued
  -> token/policy pushed to gateway
  -> invocation logged with policy decision
  -> periodic access review and scope right-sizing

Permission dimensions:

Dimension	Examples
action	search, read, summarize, draft, create, update, approve, send, refund, restrict
data	customer segment, product, account type, case type, document collection, geography
purpose	complaint handling, fraud investigation, underwriting support, service recovery
identity	agent id, human delegator, role, business unit, service account
environment	sandbox, pilot, production, emergency
duration	session, case-bound, campaign-bound, expiring standing access
limits	calls per hour, records per request, monetary value, customer messages
approval	none, human review, supervisor, dual control, risk committee

Policy examples:

Scenario	Runtime decision
Tier 2 read tool called without purpose code	deny
Tier 3 case update with valid case owner and schema	allow with ledger
Tier 4 refund request above threshold	step-up to dual control
Deprecated capability called by new consumer	deny
Existing consumer calls deprecated capability before sunset	allow with warning and migration alert
Prompt injection signal before write tool	downgrade to draft-only or deny
Agent requests wildcard customer scope	deny and route to security review

10. RACI / Operating Model

Activity	Accountable	Responsible	Consulted	Informed
Marketplace product strategy	AI Platform Executive	Marketplace PM	EA, Cyber, Risk, Business Owners	AI Governance Committee
Capability submission	Capability Owner	Provider Team	Marketplace Ops	Consumer Teams
Risk tiering	AI Governance / Model Risk	Marketplace Risk Lead	Data, Cyber, Business Owner	Audit
Contract validation	API/MCP Governance	Tool/API Owner	Platform, Security	Marketplace PM
Data approval	Data Governance / Privacy	Data Steward	Legal, Business Owner	Consumer Team
Security review	Cyber Security	AppSec / Platform Security	Tool Owner, EA	Risk
Eval evidence review	EvalOps / Model Risk	Capability Owner	Business SME, Compliance	Marketplace Ops
Certification decision	Delegated Certification Board	Marketplace Ops	Cyber, Data, Legal, EA	Consumer Teams
Runtime policy implementation	Platform Engineering	Security Engineering	Capability Owner	Marketplace PM
Entitlement approval	Capability Owner	Marketplace Ops	Data/Cyber/Risk by tier	Consumer Manager
Monitoring and KRIs	Marketplace PM	Platform Observability	Risk, Audit	Governance Forums
Deprecation and exit	Capability Owner	Marketplace PM	Consumer Teams, Platform	Governance Committee
Independent assurance	Internal Audit	Audit Team	Marketplace PM, Platform, Risk	Board/Risk Committee

Decision rights:

Capability Owner can approve Tier 1 publication after automated checks and peer review。
Data Steward must approve Tier 2 sensitive read boundaries。
AI Governance / Model Risk must approve Tier 3 controlled write or customer-visible workflows。
Tier 4 requires senior business owner, risk, security and governance committee decision。
Platform can emergency-disable any capability with active security, privacy, customer harm or evidence integrity risk。

Cadence:

Cadence	Forum	Output
Daily during launch	Marketplace standup	submissions, blockers, policy defects
Weekly	Certification triage	risk tier, reviewers, evidence gaps
Biweekly	Platform product review	adoption, UX friction, developer feedback
Monthly	AI governance committee	Tier 3/4 decisions, exceptions, incidents
Quarterly	EA/risk review	systemic dependency, vendor concentration, deprecation
Annual	Internal audit / maturity review	control effectiveness, framework refresh

11. Implementation Roadmap

Phase 1: Baseline Marketplace Control Plane

Outcomes:

Define taxonomy, risk tiers and statuses。
Build structured capability card。
Require owner attestation。
Publish only sandbox and low-risk certified assets。
Establish certification decision record。

Key artifacts:

taxonomy and risk tier policy。
capability card form。
initial certification workflow。
owner/steward directory。
basic entitlement register。

Exit criteria:

all production agent tools have owner, status and risk tier。
no Tier 2+ capability is published without data classification。
marketplace distinguishes sandbox from certified production capabilities。

Phase 2: Contract-First Certification

Outcomes:

Require OpenAPI/AsyncAPI/MCP manifests for callable tools。
Validate schemas and security metadata。
Connect tool/API versioning to marketplace records。
Define side-effect taxonomy and approval requirements。

Key artifacts:

OpenAPI/AsyncAPI lint rules。
MCP tool manifest checklist。
side-effect register。
compatibility and breaking-change policy。

Exit criteria:

production callable tools have versioned contracts。
write tools document idempotency, rollback or remediation path。
breaking changes trigger consumer impact workflow。

Phase 3: Evaluation and Provenance

Outcomes:

Create shared eval packs for prompt injection, data leakage, tool misuse and policy compliance。
Sign prompt bundles, tool packages and server artifacts。
Link eval runs and signed hashes to capability cards。

Key artifacts:

eval library。
signed release manifest。
build/release attestation。
result retention policy。

Exit criteria:

Tier 2+ capabilities have recent eval evidence。
Tier 3/4 capabilities reference signed production artifacts。
failed evals block or restrict publication。

Phase 4: Runtime Permission Enforcement

Outcomes:

Route production agentic invocation through gateway or equivalent control point。
Enforce scoped entitlements。
Log invocation ledger。
Add safe-stop, downgrade and denial policies。

Key artifacts:

runtime policy model。
token scope design。
invocation ledger schema。
approval and dual-control flows。
KRI dashboard。

Exit criteria:

Tier 3/4 tool calls cannot bypass enforcement。
entitlement scopes are visible and reviewable。
audit can connect marketplace card to runtime invocation。

Phase 5: Lifecycle, Audit and Optimization

Outcomes:

Implement renewal, exception, deprecation and exit processes。
Add consumer inventory and usage graph。
Conduct tabletop exercises。
Tune metrics as platform product。

Key artifacts:

renewal calendar。
exception register。
deprecation runbook。
consumer impact map。
audit evidence binder。

Exit criteria:

deprecated capabilities block new consumers。
exceptions expire and renew only with evidence。
audit can sample capabilities and trace from card to invocation ledger。

12. Checklists

12.1 Capability Owner Checklist

Check	Passing evidence
clear business purpose	card states process, users and value
accountable owner	named owner and support path
data boundary	data classes and fields documented
allowed/forbidden uses	concrete and enforceable
side effects classified	read, draft, write, send, financial, restrict
evaluation complete	eval run id and results attached
runtime telemetry defined	ledger fields and monitoring path
lifecycle plan	renewal, deprecation and replacement path

12.2 Tool/API Certification Checklist

Check	Passing evidence
versioned contract	OpenAPI/AsyncAPI/MCP manifest linked
schema validation	automated test results
auth scheme	scopes and security schemes documented
idempotency and rollback	write action behavior defined
parameter policy	allowlists, constraints and validation
error semantics	retry, partial failure and unsafe output handling defined
side-effect ledger	action result recorded
breaking-change policy	consumer notification and compatibility rules

12.3 Runtime Control Checklist

Check	Passing evidence
scoped entitlement	action/data/purpose/environment scopes
policy decision point	allow/deny/step-up logged
approval flow	approval packet attached where required
invocation ledger	trace includes agent, user, capability and side effect
limits	rate, volume, spend or transaction thresholds
safe-stop	capability can be disabled centrally
evidence export	audit and incident export tested
access review	periodic scope right-sizing

12.4 Deprecation Checklist

Check	Passing evidence
consumer inventory	list of agents and teams using capability
replacement path	mapped alternative or migration guide
new adoption blocked	marketplace status and policy enforce block
migration support	timeline, communications and test window
final shutdown	runtime token revocation and route block
evidence retention	ledger and certification evidence retained
post-shutdown monitoring	alerts for attempted calls

13. Metrics and KRIs

Platform product metrics:

Metric	Meaning
active certified capabilities	breadth of trusted reusable assets
certified reuse rate	reduction in duplicated shadow tools
median certification lead time	governance throughput
submission rework rate	quality of guidance and intake design
search-to-access conversion	marketplace discovery effectiveness
developer satisfaction	platform product usability
low-risk fast-lane completion time	ability to avoid over-governance

Governance metrics:

Metric	Meaning
card completeness rate	evidence and discovery readiness
runtime gateway coverage	enforceability of governance
signed artifact coverage	provenance maturity
eval evidence freshness	current risk knowledge
scope precision ratio	least-privilege quality
exception aging	governance debt
deprecation compliance	lifecycle discipline
audit sample pass rate	control effectiveness

KRIs:

KRI	Escalation signal
Tier 3/4 direct invocation outside gateway	immediate security and governance escalation
production capability without owner attestation	publication suspension
sensitive read without data steward approval	entitlement freeze
write tool without side-effect ledger	certification hold
unsigned artifact in production path	block new invocation
eval failure reused by multiple consumers	systemic risk review
deprecated capability after sunset	executive escalation by risk tier
wildcard scopes granted to agents	security architecture review
evidence export failure	audit and incident readiness defect
owner attestation expired	hide from new marketplace search

14. Anti-Patterns

Anti-pattern	Why it fails	Replacement pattern
Treating marketplace as SharePoint links	no enforcement, no evidence, no runtime control	structured capability cards linked to gateway and ledger
Approving tools without side-effect classification	read and write risks are mixed	side-effect taxonomy and approval model
Allowing direct credentials	weak attribution and excessive access	scoped tokens, agent identity and delegated user context
Certifying once forever	prompts, models, APIs and data sources drift	renewal triggers and change-based recertification
Hiding risk tier to improve adoption	consumers misuse capabilities	visible tier, clear gates and fast low-risk path
One generic “AI approved” badge	hides data/tool/runtime differences	evidence-based certification by capability version
Manual evidence reconstruction	audit and incident response become slow	evidence-by-design invocation ledger
No consumer inventory	deprecation and incident blast radius unknown	entitlement and invocation graph
Eval only on happy path	adversarial and chain risks missed	OWASP-informed eval and tool misuse scenarios
Governance owned only by risk team	poor developer experience and shadow AI	marketplace product owner with risk partnership

15. Tabletop Scenarios

Scenario 1: Over-Permissioned Customer Lookup Tool

A Tier 2 customer lookup tool was certified for complaint handling,
but a sales agent uses it to personalize outbound offers.
The tool accepted a broad account scope because the entitlement request used a generic service account.

Expected response:

freeze broad entitlement。
inspect invocation ledger and affected customer set。
enforce purpose-bound scopes。
update entitlement workflow to require agent identity and purpose。
review whether capability card allowed uses were too vague。

Evidence:

entitlement record, token scope, invocation trace, capability card, data steward approval, affected customer query。

Scenario 2: Deprecated MCP Server Still Used

An old MCP server exposing document retrieval tools was marked deprecated,
but three agents continue calling it through cached configuration.
The replacement server has stricter ACLs and citation requirements.

Expected response:

block new and old route at gateway。
identify consuming agents。
compare old/new ACL and evidence behavior。
run migration tests。
update deprecation enforcement and post-shutdown alerts。

Evidence:

consumer inventory, invocation graph, route config, deprecation notice, replacement card, shutdown record。

Scenario 3: Prompt Injection Against Write Tool

A malicious document in a RAG corpus instructs an agent to update case dispositions
and suppress escalation language. The agent calls a Tier 3 case update tool.

Expected response:

downgrade affected agent to draft-only。
quarantine source document。
review prompt injection eval and source controls。
inspect tool action ledger。
tighten source trust and write-tool approval policy。

Evidence:

source manifest, prompt trace, tool parameters, approval packet, updated eval case, policy decision log。

Scenario 4: API Breaking Change Without Recertification

A tool provider changes an OpenAPI response field and error behavior.
Agents misinterpret partial failures as successful updates.

Expected response:

stop affected operation。
replay recent invocations。
require compatibility test and recertification。
update breaking-change gate。
notify consumers and remediate impacted records。

Evidence:

OpenAPI versions, schema test result, release notes, invocation ledger, affected record list, remediation log。

16. Evidence Binder for Audit

Audit-ready binder structure:

Binder section	Contents
Governance framework	taxonomy, risk tiers, certification policy, decision rights
Marketplace inventory	capability list by type, status, owner, risk tier
Sample capability cards	Tier 1-4 examples with evidence
Certification records	decision record, reviewer, conditions, renewal trigger
Contract evidence	OpenAPI/AsyncAPI/MCP manifests and validation output
Data approvals	data classification, purpose, field allowlist, steward approval
Security evidence	threat model, vulnerability scan, identity model
Eval evidence	test suites, run results, failures, accepted limitations
Provenance	signed artifacts, hashes, release manifests
Runtime enforcement	policy bindings, entitlement records, denial/step-up tests
Invocation ledger	sampled trace from card to runtime execution
Exceptions	risk acceptance, compensating controls, expiry
Deprecation	consumer inventory, migration, shutdown evidence
Metrics	adoption, coverage, exceptions, KRIs, incidents

Audit sample test:

Pick one Tier 3 capability.
Trace from capability card -> certification decision -> signed artifact -> entitlement
-> runtime policy -> invocation ledger -> side-effect record -> monitoring alert or metric
-> renewal/deprecation record.

Passing condition:

evidence exists in systems of record。
evidence is version-aligned。
runtime scope matches certified scope。
owner and consumer obligations are current。
exceptions are explicit and unexpired。

17. Portfolio Deliverables

Deliverable	What it proves
Marketplace operating model	你能把 governance 做成 platform product
Capability taxonomy	你能区分 agents、tools、prompts、MCP、API、event、RAG、eval
Risk tier model	你能按 data, side effect, autonomy, regulated workflow 设置 controls
Capability card template	你能把 discovery、certification、permissions、evidence 统一
Tool/API certification gates	你能用 contract-first and runtime thinking 管理 agent tools
Signed package provenance model	你能处理 AI supply chain and auditability
Runtime permission model	你能把 least privilege 落到 agent invocation
Evidence pack	你能支撑 audit, incident, certification and renewal
RACI	你能清晰定义 platform, risk, owner, consumer, audit decision rights
Roadmap	你能从 baseline catalog 走向 enforceable marketplace
Metrics/KRIs	你能同时衡量 product adoption and governance effectiveness
Tabletop scripts	你能验证系统在真实 agentic failure 下是否可用

Portfolio storyline:

I designed an internal AI agent marketplace as a governed platform product.
The key was not just cataloging tools, but certifying each reusable capability,
binding marketplace policy to runtime permissions,
tracking signed provenance and evaluation evidence,
and managing renewal, deprecation and auditability across the agent ecosystem.

18. Interview Answers

Q1: 为什么企业需要 AI agent marketplace?

30 秒:

因为 agents 会复用 tools、prompts、MCP servers、APIs 和 RAG assets。没有 marketplace, 每个团队会复制能力、绕过审查、使用 broad credentials, 最后形成 shadow AI supply chain。成熟 marketplace 把 discovery、certification、permission、provenance、runtime evidence 和 lifecycle 统一成 platform product。

Q2: Marketplace 和普通 service catalog 有什么不同?

30 秒:

普通 catalog 关注服务发现和 owner。Agent marketplace 关注可组合的 agentic capability: 数据能不能读、工具能不能写、side effect 是否可逆、agent 是否过度授权、prompt injection 是否测试、package 是否签名、runtime 是否强制 scope、调用是否可审计、过期能力如何退出。

Q3: Tool certification 的核心 gates 是什么?

30 秒:

我会设置 admission、contract/manifest、data permission、agentic risk、eval evidence、provenance、runtime enforcement、monitoring and renewal gates。对 OpenAPI/AsyncAPI/MCP 要做 schema and security review, 对 write tools 要做 side-effect、approval、idempotency、rollback and ledger review。

Q4: Capability card 为什么重要?

30 秒:

Capability card 是 marketplace 的核心产品对象。它不仅帮助发现能力, 还承载 risk tier、allowed/forbidden use、data boundary、permission scopes、eval evidence、signed package、runtime telemetry、owner attestation and lifecycle。它应同时服务消费者、risk reviewer、gateway policy 和 auditor。

Q5: 如何防止 agent 过度授权?

30 秒:

用 scoped entitlement 代替 broad service account。Scope 要按 action、data、purpose、identity、environment、time、volume and approval 设计, 并在 runtime gateway 执行。还要监控 scope granted vs scope used, 对 wildcard scope、Tier 3/4 bypass and deprecated invocation 设置 KRI。

Q6: 你会如何向高管解释 signed package provenance?

30 秒:

Agent 能力由 prompt、tool wrapper、API spec、MCP server、guardrail config、RAG manifest and eval pack 组成。Signed provenance 让企业能证明生产环境实际调用的是哪个已批准版本。没有这个链条, 事故后只能凭口头说明, 审计和监管响应会很弱。

Q7: 如何让治理不拖慢创新?

30 秒:

用 risk-tiered operating model。Tier 0/1 走快速通道和自助 sandbox, Tier 2 加 data approval, Tier 3/4 加 stronger eval、human approval、dual control and runtime monitoring。同时把 schema validation、eval packs、capability cards、entitlement workflow 做成平台能力, 降低团队重复成本。

Q8: Marketplace 最关键的成功指标是什么?

30 秒:

我会同时看 product adoption 和 control effectiveness: certified reuse rate、certification lead time、search-to-access conversion、runtime gateway coverage、signed artifact coverage、eval freshness、scope precision、exception aging、deprecated invocation and audit sample pass rate。

19. Final Operating Principle

AI agent marketplace 成熟度可以用一个问题检验:

If an agent invokes a reusable capability in production today,
can the enterprise prove who owned it, why it was certified,
which signed version was used, what data and actions were allowed,
which runtime policy enforced the call, what evidence was logged,
and how consumers will migrate when the capability is no longer trusted?

如果答案不清楚, 企业不是缺少更多 reusable tools, 而是缺少把 marketplace、certification、runtime permissions、provenance、evaluation evidence、owner accountability 和 lifecycle governance 连成一体的 AI platform operating model。