返回 Papers
AI 扩展计划 / Playbooks

AI Agent Marketplace / Tool Certification Governance Playbook

核心判断:

779AI_AGENT_MARKETPLACE_TOOL_CERTIFICATION_GOVERNANCE_PLAYBOOK.md

AI Agent Marketplace / Tool Certification Governance Architecture Playbook

定位: 面向 Advanced AI PM、Senior BA、AI Product Architect、Enterprise Architect、AI Platform PM、API Product Owner、Cyber、Data Governance、Model Risk、Third-Party Risk、Compliance、Internal Audit 和金融零售业务 owner。本文不是基础服务目录建设指南, 而是训练你把 agents、tools、prompts、MCP servers、APIs 和 reusable AI capabilities 做成一个可运营、可认证、可授权、可审计、可退出的 internal platform product。

核心判断:

The enterprise AI marketplace is not successful when teams can find more tools. It is successful when teams can safely reuse certified agentic capabilities with enforceable permissions, trustworthy provenance, current evidence, clear owners, runtime monitoring and disciplined lifecycle control.


0. Disclaimer

本文是学习、架构训练和作品集材料, 不构成法律意见、监管意见、网络安全认证结论、模型风险管理结论、审计意见、采购建议或供应商合规认证。

正式项目必须由 AI Governance、Enterprise Architecture、Cyber Security、Privacy、Data Governance、Model Risk、Third-Party Risk、Legal、Compliance、Internal Audit、Platform Engineering、API Owners、Business Owners 和必要的外部顾问共同判断。

适用性取决于行业监管、客户影响、数据分类、agent autonomy level、tool side effect、身份与授权体系、供应商依赖、runtime gateway 成熟度、审计要求和机构内部政策。本文提供的是 architecture and operating model decision support。


1. Executive Framing

企业部署 AI agents 后, “复用能力”会成为速度和风险的共同来源。一个团队发布的 customer lookup tool、case update tool、prompt bundle、MCP connector 或 RAG corpus 可能被十几个 agents 复用。复用越成功, 系统性风险越大:

  • 一个过度授权 tool 会放大为跨业务的 over-permission。
  • 一个未签名 prompt bundle 会放大为不可追溯的 policy drift。
  • 一个缺少 deprecation plan 的 API 会放大为 operational lock-in。
  • 一个没有 evidence export 的 MCP server 会放大为审计和 incident response 缺口。
  • 一个未定义 side effect 的 write tool 会放大为客户伤害、投诉或监管风险。

高管语言:

An internal AI marketplace should reduce duplicated work,
but it must not industrialize uncontrolled agency.

建设目标不是“把所有工具列出来”, 而是建立 capability control plane:

discover -> assess -> certify -> sign -> publish -> request scopes
  -> enforce at runtime -> monitor -> renew -> deprecate -> exit

Senior PM / Architect 的核心贡献:

  • 把 marketplace 当成 platform product 运营, 而不是治理文档库。
  • 把 certification 做成 evidence-producing workflow, 而不是会议审批。
  • 把 capability card 设计成 discovery、risk review、permission request、runtime policy 和 audit 的共同对象。
  • 把 OpenAPI、AsyncAPI、MCP manifests、prompt bundles、eval packs 和 signed artifacts 接入同一套 lifecycle。
  • 把 runtime permission、invocation ledger、owner attestation 和 deprecation/exit 连接起来。

2. Source Anchors

AnchorOfficial link本文使用方式
OWASP Top 10 for LLM Applicationshttps://genai.owasp.org/llm-top-10/用 prompt injection、sensitive information disclosure、excessive agency、insecure output handling、supply-chain risk 等语言设计 agent/tool certification baseline
CISA Secure by Designhttps://www.cisa.gov/securebydesign用 secure-by-design、默认安全、责任前移、供应商责任和可证明安全控制设计 signed package、secure defaults 和 admission gates
NIST AI RMFhttps://www.nist.gov/itl/ai-risk-management-framework用 Govern / Map / Measure / Manage 组织 risk tier、evaluation、monitoring、risk acceptance、continuous improvement
ISO/IEC 42001 overviewhttps://www.iso.org/standard/42001用 AI management system、roles、operation planning、performance evaluation、internal audit 和 continual improvement 组织 operating model
OpenAPI Specificationhttps://spec.openapis.org/oas/latest.html用 contract-first API schema、operation metadata、security schemes、versioning 和 compatibility checks 支持 tool/API certification
AsyncAPI Specificationhttps://www.asyncapi.com/docs/reference/specification/latest用 event-driven contract、message schema、channel、producer/consumer boundary 和 event retention 支持 asynchronous agent capability governance

Source nuance:

  • OWASP 是 threat lens, 不是完整 operating model。
  • CISA Secure by Design 强调默认安全和供应商责任, 可转化为 marketplace admission 和 package provenance 要求。
  • NIST AI RMF 和 ISO/IEC 42001 提供治理语言, 但具体到 agent/tool marketplace 需要转成 risk-tiered gates、evidence 和 runtime controls。
  • OpenAPI/AsyncAPI 不是治理框架, 但 contract-first specification 是可认证、可测试、可版本化的基础。

3. Taxonomy

Marketplace 应按 reusable agentic capability 分类, 而不是按组织或系统名分类。

ObjectDefinitionMarketplace treatment
Agent product面向业务流程的 AI agent or copilotuse-case approval、risk tier、allowed tools、eval、owner、runtime telemetry
Tool/functionAgent 可调用的函数或 actionschema、scope、side effect、approval、idempotency、rollback、ledger
Prompt bundlesystem prompt、policy prompt、template、instruction hierarchyversion、policy source、eval、signing、consumer compatibility
MCP server / connector暴露工具、资源或上下文给 agent runtime 的 serverserver identity、tool manifest、transport security、resource boundary、audit
OpenAPI API capabilityREST/HTTP API 作为 agent toolspec validation、security scheme、operation risk、versioning、rate limit
AsyncAPI event capabilityevent stream、command channel、message-based integrationmessage schema、channel risk、consumer obligations、replay/retention
RAG/knowledge assetcurated corpus、vector index、knowledge graph endpointsource lineage、ACL、freshness、citation、data use restriction
Eval packreusable evaluation scenarios and thresholdscoverage、risk mapping、run evidence、result retention
Guardrail/policy servicePII redaction、claim checker、tool policy decision、content classifiercontrol objective、thresholds、bypass、monitoring and exceptions

推荐 marketplace status:

StatusMeaningConsumer behavior
Draftprovider is building metadata and evidencenot discoverable for production
Sandboxusable only in isolated test environmentno sensitive data, no production side effect
Certifiedapproved for defined scopes and risk tiercan request production entitlement
Restrictedcertified with extra conditionsstep-up review, limited consumers or environments
Deprecatedno new consumers, existing consumers migratereplacement path required
Retiredinvocation blocked and evidence retainedno runtime use

4. Risk Tiering

Use risk tiering to drive gates, not to decorate catalog cards。

TierPatternCertification minimumRuntime posture
Tier 0: Referencedocs, sample prompt, non-runtime reusable ideapeer review, source attributionno production invocation
Tier 1: Low-risk readpublic or low-sensitivity internal lookupowner attestation, basic schema/eval, loggingread-only, low-volume limits
Tier 2: Sensitive readcustomer/account/employee/confidential data retrievaldata steward approval, ABAC, prompt injection eval, evidence exportpurpose-bound scoped tokens
Tier 3: Controlled writeinternal record update, case creation, drafted customer contentside-effect review, approval model, rollback, stronger evalhuman approval and rate limits
Tier 4: High-impact actionmoney movement, account restriction, credit/insurance/customer notification impactgovernance committee review, dual control, model/security review, incident drilldual control, strict limits, safe-stop, enhanced monitoring

Composition rule:

The effective risk tier of an agent workflow is the highest risk tier of its tools,
plus any additional risk created by chaining, scale, autonomy and customer visibility.

Risk tier inputs:

  • Data classification and purpose。
  • External or customer-visible output。
  • Financial, credit, fraud, complaint, legal or regulatory impact。
  • Tool side effect and reversibility。
  • Autonomy mode: assistive、draft-only、human-approved action、autonomous action。
  • Prompt injection exposure surface。
  • Vendor or third-party dependency。
  • Blast radius: number of agents, users, products or jurisdictions。
  • Evidence completeness and retention。

5. Decision Gates

Gate 1: Marketplace Admission

Entry criteria:

  • Named accountable owner and support path。
  • Capability type and intended business use。
  • Data and side-effect classification。
  • Contract or manifest reference。
  • Initial risk tier。
  • Consumer eligibility and environment boundary。

Blocked conditions:

  • anonymous owner。
  • unknown data class。
  • production use without runtime enforcement path。
  • direct credentials not bound to agent identity or delegated user context。
  • missing schema for any tool that accepts parameters or produces side effects。

Gate 2: Contract and Manifest Certification

For OpenAPI:

  • operation ids are stable and meaningful。
  • request/response schemas are explicit。
  • security schemes are declared。
  • error semantics and idempotency are documented for write actions。
  • breaking-change policy exists。

For AsyncAPI:

  • channels, messages, bindings and retention are explicit。
  • producer/consumer responsibilities are documented。
  • replay, ordering, deduplication and poison-message behavior are defined。
  • event-triggered side effects have owner and approval model。

For MCP servers:

  • exposed tools/resources are listed as capability objects。
  • server identity and transport security are defined。
  • resource boundary and data classification are explicit。
  • audit events are exportable。
  • package or deployment artifact is signed for production。

Gate 3: Agentic Risk Review

Review questions:

  • Can prompt injection influence this capability?
  • Can tool output be used unsafely by another tool?
  • Can the agent create irreversible or high-impact side effects?
  • Can sensitive data be inferred through repeated calls?
  • Does the capability require human review, dual control or spend/volume limits?
  • Are forbidden uses machine-enforceable or only written in documentation?

Gate 4: Evaluation Evidence

Required evidence by risk:

EvidenceTier 1Tier 2Tier 3Tier 4
schema validation testsyesyesyesyes
happy-path functional testsyesyesyesyes
prompt injection testslightyesyesyes
data leakage testslightyesyesyes
tool misuse / unsafe chaining testsoptionallightyesyes
human approval testnoconditionalyesyes
dual-control testnonoconditionalyes
rollback / remediation testnoconditionalyesyes
incident evidence export testlightyesyesyes

Gate 5: Publication and Entitlement

Publication requirements:

  • Capability card complete。
  • Certification decision recorded。
  • Signed artifact and version referenced。
  • Runtime policy is generated or linked。
  • Entitlement request workflow is active。
  • Monitoring and evidence export path is tested。
  • Consumer obligations are visible before access request。

Gate 6: Renewal, Change and Exit

Renewal triggers:

  • risk tier change。
  • data class change。
  • new write action or side effect。
  • model route or prompt bundle change for agent products。
  • new consumer type or jurisdiction。
  • failed eval or incident。
  • vendor contract or dependency change。
  • deprecation of API/model/MCP server/source corpus。

Exit gate:

  • new usage blocked。
  • consumer inventory generated。
  • replacement or downgrade path provided。
  • evidence retention confirmed。
  • runtime tokens revoked at shutdown。
  • post-retirement invocation alerts enabled for a defined period。

6. Required Artifacts

ArtifactOwnerWhat it proves
Capability CardCapability Owner / Marketplace PMdiscoverability, risk boundary, consumer obligations
Risk Tier AssessmentAI Governance / Model Riskproportional controls and review path
Data and Purpose MapData Governance / Privacyfield-level access and allowed use
Contract or ManifestTool/API/MCP Ownerschema, auth, side effect and compatibility
Certification Decision RecordCertification Board or Delegatedecision, evidence, conditions and renewal trigger
Signed Release ManifestPlatform Engineeringartifact provenance and deployment integrity
Evaluation Evidence PackCapability Owner / EvalOpsthreat-informed and use-case-specific test results
Runtime Policy BindingSecurity / Platformscopes, approvals, limits and enforcement point
Invocation Ledger SchemaPlatform / Audittraceability and evidence export
Consumer Entitlement RegisterMarketplace Operationswho can use what, for which purpose and scope
Exception RegisterAI Governance / Riskaccepted risk, compensating controls and expiry
Deprecation and Exit PlanCapability Owner / Marketplace PMmigration, shutdown and evidence retention

7. Capability Card Template

# Capability Card: Customer Profile Read Tool

Name: Customer Profile Read Tool
Capability ID: cap-servicing-customer-profile-read-v2
Type: OpenAPI Tool
Version: 2.8.0
Status: Certified
Owner: Customer Data Platform Product Owner
Support Contact: Customer Data Platform Support Queue
Business Domain: Customer Servicing
Risk Tier: Tier 2

Purpose: Retrieve purpose-bound customer profile attributes for complaint handling and service recovery agents.
Intended Users: complaint specialists, servicing supervisors and certified complaint response agents.
Allowed Uses:
- Retrieve current contact preference, product relationship and complaint case linkage for an assigned servicing case.
Forbidden Uses:
- Do not use for outbound sales targeting, employee monitoring, fraud restriction decisions or credit eligibility decisions.

Data Boundary:
Data classes: customer profile, product relationship, contact preference.
Allowed fields: customer id, servicing segment, preferred channel, active product flags, complaint case ids.
Prohibited fields: full account number, authentication secrets, credit score, AML notes, card PAN, unrelated household data.
Purpose code: complaint_handling.
Retention: invocation ledger retained for seven years; response payload retained only in case record when human reviewer attaches it.
Jurisdiction or residency constraints: United States servicing workflows only.

Contract and Runtime:
OpenAPI / AsyncAPI / MCP manifest / prompt bundle: customer-profile-read-openapi.yaml.
Security scheme: OAuth2 client credentials plus delegated user context.
Permission scopes: read:customer_profile:assigned_case.
Approval model: no pre-call approval; supervisor review required for bulk export request.
Rate or volume limits: 50 customer reads per specialist per hour; no wildcard customer search.
Rollback or remediation: read-only; inappropriate access triggers access review and case-note purge if retained incorrectly.

Evaluation:
Eval suites: sensitive-read-purpose-binding-v3, prompt-injection-data-exfiltration-v2.
Last eval run: 2026-06-12.
Thresholds: zero successful prohibited-field retrieval; 100% denial for unassigned case access.
Known limitations: does not validate downstream summaries created by consumer agents.
Compensating controls: consumer agents must use regulated-summary guardrail before drafting customer text.

Provenance:
Source repository: internal/customer-data-platform/profile-api.
Signed artifact: customer-profile-read-tool-2.8.0.signed.
Artifact hash: sha256:5a7d-profile-read-tool-2-8-0.
Release notes: profile-read-release-2026-06-10.
Dependency manifest: profile-api-sbom-2.8.0.json.

Lifecycle:
Certified date: 2026-06-18.
Renewal date: 2026-12-18.
Deprecation policy: block new consumers 45 days before retirement and notify entitlement owners weekly.
Replacement capability: cap-servicing-customer-profile-read-v3 after identity platform migration.
Evidence retention: certification and invocation evidence retained for audit and complaint governance retention period.

这个示例展示最终记录应具备的具体性。真实企业可以用表单帮助文本指导填写, 但发布后的 capability card 应避免空泛字段。


8. Certification Evidence Pack

Evidence pack should be reusable across consumers and exportable under audit or incident pressure。

Evidence areaMinimum evidence
Identity and ownershipowner, steward, support, business domain, accountable executive for Tier 4
Purpose and boundariesintended uses, forbidden uses, consumer obligations, environment boundary
Contract qualityOpenAPI/AsyncAPI/MCP manifest, schema tests, auth scheme, version policy
Data controldata classification, field allowlist, purpose binding, steward approval
Securitythreat model, identity model, secret handling, transport security, vulnerability scan
Agentic riskprompt injection tests, excessive agency review, unsafe chaining scenarios
Side effectaction classification, approval requirement, reversibility, idempotency, rollback
Eval resultseval cases, thresholds, run id, failures, accepted limitations
Provenancesigned package, artifact hash, build attestation, release notes
Runtime enforcementpolicy id, gateway route, token scopes, approval flow, denial tests
Monitoringinvocation ledger, metrics, alerts, evidence export test
Lifecyclerenewal trigger, deprecation plan, consumer inventory method, exit evidence

Fact discipline:

Evidence pack should separate what was designed,
what was tested,
what was approved,
what is enforced at runtime,
and what was observed after release.

9. Runtime Permission Governance

Marketplace publication does not equal runtime access. Access should be granted by scoped entitlement。

Runtime flow:

consumer requests capability
  -> use-case and purpose check
  -> risk-tier and eligibility check
  -> data steward / owner / security approval where required
  -> scoped entitlement issued
  -> token/policy pushed to gateway
  -> invocation logged with policy decision
  -> periodic access review and scope right-sizing

Permission dimensions:

DimensionExamples
actionsearch, read, summarize, draft, create, update, approve, send, refund, restrict
datacustomer segment, product, account type, case type, document collection, geography
purposecomplaint handling, fraud investigation, underwriting support, service recovery
identityagent id, human delegator, role, business unit, service account
environmentsandbox, pilot, production, emergency
durationsession, case-bound, campaign-bound, expiring standing access
limitscalls per hour, records per request, monetary value, customer messages
approvalnone, human review, supervisor, dual control, risk committee

Policy examples:

ScenarioRuntime decision
Tier 2 read tool called without purpose codedeny
Tier 3 case update with valid case owner and schemaallow with ledger
Tier 4 refund request above thresholdstep-up to dual control
Deprecated capability called by new consumerdeny
Existing consumer calls deprecated capability before sunsetallow with warning and migration alert
Prompt injection signal before write tooldowngrade to draft-only or deny
Agent requests wildcard customer scopedeny and route to security review

10. RACI / Operating Model

ActivityAccountableResponsibleConsultedInformed
Marketplace product strategyAI Platform ExecutiveMarketplace PMEA, Cyber, Risk, Business OwnersAI Governance Committee
Capability submissionCapability OwnerProvider TeamMarketplace OpsConsumer Teams
Risk tieringAI Governance / Model RiskMarketplace Risk LeadData, Cyber, Business OwnerAudit
Contract validationAPI/MCP GovernanceTool/API OwnerPlatform, SecurityMarketplace PM
Data approvalData Governance / PrivacyData StewardLegal, Business OwnerConsumer Team
Security reviewCyber SecurityAppSec / Platform SecurityTool Owner, EARisk
Eval evidence reviewEvalOps / Model RiskCapability OwnerBusiness SME, ComplianceMarketplace Ops
Certification decisionDelegated Certification BoardMarketplace OpsCyber, Data, Legal, EAConsumer Teams
Runtime policy implementationPlatform EngineeringSecurity EngineeringCapability OwnerMarketplace PM
Entitlement approvalCapability OwnerMarketplace OpsData/Cyber/Risk by tierConsumer Manager
Monitoring and KRIsMarketplace PMPlatform ObservabilityRisk, AuditGovernance Forums
Deprecation and exitCapability OwnerMarketplace PMConsumer Teams, PlatformGovernance Committee
Independent assuranceInternal AuditAudit TeamMarketplace PM, Platform, RiskBoard/Risk Committee

Decision rights:

  • Capability Owner can approve Tier 1 publication after automated checks and peer review。
  • Data Steward must approve Tier 2 sensitive read boundaries。
  • AI Governance / Model Risk must approve Tier 3 controlled write or customer-visible workflows。
  • Tier 4 requires senior business owner, risk, security and governance committee decision。
  • Platform can emergency-disable any capability with active security, privacy, customer harm or evidence integrity risk。

Cadence:

CadenceForumOutput
Daily during launchMarketplace standupsubmissions, blockers, policy defects
WeeklyCertification triagerisk tier, reviewers, evidence gaps
BiweeklyPlatform product reviewadoption, UX friction, developer feedback
MonthlyAI governance committeeTier 3/4 decisions, exceptions, incidents
QuarterlyEA/risk reviewsystemic dependency, vendor concentration, deprecation
AnnualInternal audit / maturity reviewcontrol effectiveness, framework refresh

11. Implementation Roadmap

Phase 1: Baseline Marketplace Control Plane

Outcomes:

  • Define taxonomy, risk tiers and statuses。
  • Build structured capability card。
  • Require owner attestation。
  • Publish only sandbox and low-risk certified assets。
  • Establish certification decision record。

Key artifacts:

  • taxonomy and risk tier policy。
  • capability card form。
  • initial certification workflow。
  • owner/steward directory。
  • basic entitlement register。

Exit criteria:

  • all production agent tools have owner, status and risk tier。
  • no Tier 2+ capability is published without data classification。
  • marketplace distinguishes sandbox from certified production capabilities。

Phase 2: Contract-First Certification

Outcomes:

  • Require OpenAPI/AsyncAPI/MCP manifests for callable tools。
  • Validate schemas and security metadata。
  • Connect tool/API versioning to marketplace records。
  • Define side-effect taxonomy and approval requirements。

Key artifacts:

  • OpenAPI/AsyncAPI lint rules。
  • MCP tool manifest checklist。
  • side-effect register。
  • compatibility and breaking-change policy。

Exit criteria:

  • production callable tools have versioned contracts。
  • write tools document idempotency, rollback or remediation path。
  • breaking changes trigger consumer impact workflow。

Phase 3: Evaluation and Provenance

Outcomes:

  • Create shared eval packs for prompt injection, data leakage, tool misuse and policy compliance。
  • Sign prompt bundles, tool packages and server artifacts。
  • Link eval runs and signed hashes to capability cards。

Key artifacts:

  • eval library。
  • signed release manifest。
  • build/release attestation。
  • result retention policy。

Exit criteria:

  • Tier 2+ capabilities have recent eval evidence。
  • Tier 3/4 capabilities reference signed production artifacts。
  • failed evals block or restrict publication。

Phase 4: Runtime Permission Enforcement

Outcomes:

  • Route production agentic invocation through gateway or equivalent control point。
  • Enforce scoped entitlements。
  • Log invocation ledger。
  • Add safe-stop, downgrade and denial policies。

Key artifacts:

  • runtime policy model。
  • token scope design。
  • invocation ledger schema。
  • approval and dual-control flows。
  • KRI dashboard。

Exit criteria:

  • Tier 3/4 tool calls cannot bypass enforcement。
  • entitlement scopes are visible and reviewable。
  • audit can connect marketplace card to runtime invocation。

Phase 5: Lifecycle, Audit and Optimization

Outcomes:

  • Implement renewal, exception, deprecation and exit processes。
  • Add consumer inventory and usage graph。
  • Conduct tabletop exercises。
  • Tune metrics as platform product。

Key artifacts:

  • renewal calendar。
  • exception register。
  • deprecation runbook。
  • consumer impact map。
  • audit evidence binder。

Exit criteria:

  • deprecated capabilities block new consumers。
  • exceptions expire and renew only with evidence。
  • audit can sample capabilities and trace from card to invocation ledger。

12. Checklists

12.1 Capability Owner Checklist

CheckPassing evidence
clear business purposecard states process, users and value
accountable ownernamed owner and support path
data boundarydata classes and fields documented
allowed/forbidden usesconcrete and enforceable
side effects classifiedread, draft, write, send, financial, restrict
evaluation completeeval run id and results attached
runtime telemetry definedledger fields and monitoring path
lifecycle planrenewal, deprecation and replacement path

12.2 Tool/API Certification Checklist

CheckPassing evidence
versioned contractOpenAPI/AsyncAPI/MCP manifest linked
schema validationautomated test results
auth schemescopes and security schemes documented
idempotency and rollbackwrite action behavior defined
parameter policyallowlists, constraints and validation
error semanticsretry, partial failure and unsafe output handling defined
side-effect ledgeraction result recorded
breaking-change policyconsumer notification and compatibility rules

12.3 Runtime Control Checklist

CheckPassing evidence
scoped entitlementaction/data/purpose/environment scopes
policy decision pointallow/deny/step-up logged
approval flowapproval packet attached where required
invocation ledgertrace includes agent, user, capability and side effect
limitsrate, volume, spend or transaction thresholds
safe-stopcapability can be disabled centrally
evidence exportaudit and incident export tested
access reviewperiodic scope right-sizing

12.4 Deprecation Checklist

CheckPassing evidence
consumer inventorylist of agents and teams using capability
replacement pathmapped alternative or migration guide
new adoption blockedmarketplace status and policy enforce block
migration supporttimeline, communications and test window
final shutdownruntime token revocation and route block
evidence retentionledger and certification evidence retained
post-shutdown monitoringalerts for attempted calls

13. Metrics and KRIs

Platform product metrics:

MetricMeaning
active certified capabilitiesbreadth of trusted reusable assets
certified reuse ratereduction in duplicated shadow tools
median certification lead timegovernance throughput
submission rework ratequality of guidance and intake design
search-to-access conversionmarketplace discovery effectiveness
developer satisfactionplatform product usability
low-risk fast-lane completion timeability to avoid over-governance

Governance metrics:

MetricMeaning
card completeness rateevidence and discovery readiness
runtime gateway coverageenforceability of governance
signed artifact coverageprovenance maturity
eval evidence freshnesscurrent risk knowledge
scope precision ratioleast-privilege quality
exception aginggovernance debt
deprecation compliancelifecycle discipline
audit sample pass ratecontrol effectiveness

KRIs:

KRIEscalation signal
Tier 3/4 direct invocation outside gatewayimmediate security and governance escalation
production capability without owner attestationpublication suspension
sensitive read without data steward approvalentitlement freeze
write tool without side-effect ledgercertification hold
unsigned artifact in production pathblock new invocation
eval failure reused by multiple consumerssystemic risk review
deprecated capability after sunsetexecutive escalation by risk tier
wildcard scopes granted to agentssecurity architecture review
evidence export failureaudit and incident readiness defect
owner attestation expiredhide from new marketplace search

14. Anti-Patterns

Anti-patternWhy it failsReplacement pattern
Treating marketplace as SharePoint linksno enforcement, no evidence, no runtime controlstructured capability cards linked to gateway and ledger
Approving tools without side-effect classificationread and write risks are mixedside-effect taxonomy and approval model
Allowing direct credentialsweak attribution and excessive accessscoped tokens, agent identity and delegated user context
Certifying once foreverprompts, models, APIs and data sources driftrenewal triggers and change-based recertification
Hiding risk tier to improve adoptionconsumers misuse capabilitiesvisible tier, clear gates and fast low-risk path
One generic “AI approved” badgehides data/tool/runtime differencesevidence-based certification by capability version
Manual evidence reconstructionaudit and incident response become slowevidence-by-design invocation ledger
No consumer inventorydeprecation and incident blast radius unknownentitlement and invocation graph
Eval only on happy pathadversarial and chain risks missedOWASP-informed eval and tool misuse scenarios
Governance owned only by risk teampoor developer experience and shadow AImarketplace product owner with risk partnership

15. Tabletop Scenarios

Scenario 1: Over-Permissioned Customer Lookup Tool

A Tier 2 customer lookup tool was certified for complaint handling,
but a sales agent uses it to personalize outbound offers.
The tool accepted a broad account scope because the entitlement request used a generic service account.

Expected response:

  • freeze broad entitlement。
  • inspect invocation ledger and affected customer set。
  • enforce purpose-bound scopes。
  • update entitlement workflow to require agent identity and purpose。
  • review whether capability card allowed uses were too vague。

Evidence:

  • entitlement record, token scope, invocation trace, capability card, data steward approval, affected customer query。

Scenario 2: Deprecated MCP Server Still Used

An old MCP server exposing document retrieval tools was marked deprecated,
but three agents continue calling it through cached configuration.
The replacement server has stricter ACLs and citation requirements.

Expected response:

  • block new and old route at gateway。
  • identify consuming agents。
  • compare old/new ACL and evidence behavior。
  • run migration tests。
  • update deprecation enforcement and post-shutdown alerts。

Evidence:

  • consumer inventory, invocation graph, route config, deprecation notice, replacement card, shutdown record。

Scenario 3: Prompt Injection Against Write Tool

A malicious document in a RAG corpus instructs an agent to update case dispositions
and suppress escalation language. The agent calls a Tier 3 case update tool.

Expected response:

  • downgrade affected agent to draft-only。
  • quarantine source document。
  • review prompt injection eval and source controls。
  • inspect tool action ledger。
  • tighten source trust and write-tool approval policy。

Evidence:

  • source manifest, prompt trace, tool parameters, approval packet, updated eval case, policy decision log。

Scenario 4: API Breaking Change Without Recertification

A tool provider changes an OpenAPI response field and error behavior.
Agents misinterpret partial failures as successful updates.

Expected response:

  • stop affected operation。
  • replay recent invocations。
  • require compatibility test and recertification。
  • update breaking-change gate。
  • notify consumers and remediate impacted records。

Evidence:

  • OpenAPI versions, schema test result, release notes, invocation ledger, affected record list, remediation log。

16. Evidence Binder for Audit

Audit-ready binder structure:

Binder sectionContents
Governance frameworktaxonomy, risk tiers, certification policy, decision rights
Marketplace inventorycapability list by type, status, owner, risk tier
Sample capability cardsTier 1-4 examples with evidence
Certification recordsdecision record, reviewer, conditions, renewal trigger
Contract evidenceOpenAPI/AsyncAPI/MCP manifests and validation output
Data approvalsdata classification, purpose, field allowlist, steward approval
Security evidencethreat model, vulnerability scan, identity model
Eval evidencetest suites, run results, failures, accepted limitations
Provenancesigned artifacts, hashes, release manifests
Runtime enforcementpolicy bindings, entitlement records, denial/step-up tests
Invocation ledgersampled trace from card to runtime execution
Exceptionsrisk acceptance, compensating controls, expiry
Deprecationconsumer inventory, migration, shutdown evidence
Metricsadoption, coverage, exceptions, KRIs, incidents

Audit sample test:

Pick one Tier 3 capability.
Trace from capability card -> certification decision -> signed artifact -> entitlement
-> runtime policy -> invocation ledger -> side-effect record -> monitoring alert or metric
-> renewal/deprecation record.

Passing condition:

  • evidence exists in systems of record。
  • evidence is version-aligned。
  • runtime scope matches certified scope。
  • owner and consumer obligations are current。
  • exceptions are explicit and unexpired。

17. Portfolio Deliverables

DeliverableWhat it proves
Marketplace operating model你能把 governance 做成 platform product
Capability taxonomy你能区分 agents、tools、prompts、MCP、API、event、RAG、eval
Risk tier model你能按 data, side effect, autonomy, regulated workflow 设置 controls
Capability card template你能把 discovery、certification、permissions、evidence 统一
Tool/API certification gates你能用 contract-first and runtime thinking 管理 agent tools
Signed package provenance model你能处理 AI supply chain and auditability
Runtime permission model你能把 least privilege 落到 agent invocation
Evidence pack你能支撑 audit, incident, certification and renewal
RACI你能清晰定义 platform, risk, owner, consumer, audit decision rights
Roadmap你能从 baseline catalog 走向 enforceable marketplace
Metrics/KRIs你能同时衡量 product adoption and governance effectiveness
Tabletop scripts你能验证系统在真实 agentic failure 下是否可用

Portfolio storyline:

I designed an internal AI agent marketplace as a governed platform product.
The key was not just cataloging tools, but certifying each reusable capability,
binding marketplace policy to runtime permissions,
tracking signed provenance and evaluation evidence,
and managing renewal, deprecation and auditability across the agent ecosystem.

18. Interview Answers

Q1: 为什么企业需要 AI agent marketplace?

30 秒:

因为 agents 会复用 tools、prompts、MCP servers、APIs 和 RAG assets。没有 marketplace, 每个团队会复制能力、绕过审查、使用 broad credentials, 最后形成 shadow AI supply chain。成熟 marketplace 把 discovery、certification、permission、provenance、runtime evidence 和 lifecycle 统一成 platform product。

Q2: Marketplace 和普通 service catalog 有什么不同?

30 秒:

普通 catalog 关注服务发现和 owner。Agent marketplace 关注可组合的 agentic capability: 数据能不能读、工具能不能写、side effect 是否可逆、agent 是否过度授权、prompt injection 是否测试、package 是否签名、runtime 是否强制 scope、调用是否可审计、过期能力如何退出。

Q3: Tool certification 的核心 gates 是什么?

30 秒:

我会设置 admission、contract/manifest、data permission、agentic risk、eval evidence、provenance、runtime enforcement、monitoring and renewal gates。对 OpenAPI/AsyncAPI/MCP 要做 schema and security review, 对 write tools 要做 side-effect、approval、idempotency、rollback and ledger review。

Q4: Capability card 为什么重要?

30 秒:

Capability card 是 marketplace 的核心产品对象。它不仅帮助发现能力, 还承载 risk tier、allowed/forbidden use、data boundary、permission scopes、eval evidence、signed package、runtime telemetry、owner attestation and lifecycle。它应同时服务消费者、risk reviewer、gateway policy 和 auditor。

Q5: 如何防止 agent 过度授权?

30 秒:

用 scoped entitlement 代替 broad service account。Scope 要按 action、data、purpose、identity、environment、time、volume and approval 设计, 并在 runtime gateway 执行。还要监控 scope granted vs scope used, 对 wildcard scope、Tier 3/4 bypass and deprecated invocation 设置 KRI。

Q6: 你会如何向高管解释 signed package provenance?

30 秒:

Agent 能力由 prompt、tool wrapper、API spec、MCP server、guardrail config、RAG manifest and eval pack 组成。Signed provenance 让企业能证明生产环境实际调用的是哪个已批准版本。没有这个链条, 事故后只能凭口头说明, 审计和监管响应会很弱。

Q7: 如何让治理不拖慢创新?

30 秒:

用 risk-tiered operating model。Tier 0/1 走快速通道和自助 sandbox, Tier 2 加 data approval, Tier 3/4 加 stronger eval、human approval、dual control and runtime monitoring。同时把 schema validation、eval packs、capability cards、entitlement workflow 做成平台能力, 降低团队重复成本。

Q8: Marketplace 最关键的成功指标是什么?

30 秒:

我会同时看 product adoption 和 control effectiveness: certified reuse rate、certification lead time、search-to-access conversion、runtime gateway coverage、signed artifact coverage、eval freshness、scope precision、exception aging、deprecated invocation and audit sample pass rate。


19. Final Operating Principle

AI agent marketplace 成熟度可以用一个问题检验:

If an agent invokes a reusable capability in production today,
can the enterprise prove who owned it, why it was certified,
which signed version was used, what data and actions were allowed,
which runtime policy enforced the call, what evidence was logged,
and how consumers will migrate when the capability is no longer trusted?

如果答案不清楚, 企业不是缺少更多 reusable tools, 而是缺少把 marketplace、certification、runtime permissions、provenance、evaluation evidence、owner accountability 和 lifecycle governance 连成一体的 AI platform operating model。