Tool Design——10 个金融工具的 schema、错误处理、parallel calls
Anthropic tool 文档 + agent SDK 工具设计原则;为什么"tool description 决定 agent 80% 行为";parallel tool calls;retry/idempotency
日期: 2026-09-30 方向: AI系统工程 / Agent 阶段: Phase 3 - Agent架构与多Agent (Day 149-162) 标签: #ToolDesign #JSONSchema #ParallelToolUse #ErrorHandling
今日目标
| 类型 | 内容 |
|---|---|
| 学习 | Anthropic tool 文档 + agent SDK 工具设计原则;为什么"tool description 决定 agent 80% 行为";parallel tool calls;retry/idempotency |
| 实操 | 设计 10 个金融分析工具,含完整 JSON schema + 错误分类 + parallel 用法 |
| 产出 | tools.py(约 600 行)+ 工具卡片表 |
一、为什么 Tool Design 是 Agent 工程的核心
1.1 数据
Anthropic 在 "Building Effective Agents" 中明确:
"Tool documentation often gets less attention than the overall prompt, despite its critical role. Spend as much time on tool descriptions as on system prompts."
实测中,优化 tool description 通常比改 system prompt 收益更大:
- Tool 选错(应该 search 却 fetch)
- 参数错(传错 ticker 大小写)
- 不知道何时停(不停地反复调)
1.2 设计原则(Anthropic 推荐)
- 写给"实习生"看:description 要让一个不熟悉这个 API 的人立刻知道怎么用
- 给例子:description 里包含 1-2 个典型用法
- 明确边界:什么情况下不要用这个 tool
- 错误格式可读:返回的错误信息要让 LLM 能据此 self-correct
- input_schema 严格:用 JSON Schema 的 enum / pattern / minimum 等约束
- 返回值结构化:JSON > 自然语言(除非需要叙事)
- 副作用警告:写链 / 发邮件 / 转账类工具明确标注 destructive
1.3 反模式
| 反模式 | 为什么坏 |
|---|---|
Tool 名 do_thing | LLM 选不出来 |
| Description 一行话 "queries database" | 不知道查什么 |
输入 query: string 自由文本 | LLM 经常生成无效 SQL |
| 返回 raw HTML | 浪费 token,LLM 看不清结构 |
| 错误统一返回 "error" | 无法区分该 retry 还是该停 |
| 一个 tool 干 5 件事 | LLM 选错率高 |
二、架构图——Tool 调用全链路
LLM ──► tool_use { name, input }
│
▼
┌───────────────────────────────────────┐
│ Validator (jsonschema) │
│ - shape 错 → tool_result(is_error) │
│ - 让 LLM 修正 │
└────────┬──────────────────────────────┘
│ valid
▼
┌───────────────────────────────────────┐
│ Idempotency check │
│ - hash(name+input) 已见过? │
│ - 是 → 返回 cached result │
└────────┬──────────────────────────────┘
│ not cached
▼
┌───────────────────────────────────────┐
│ Execute (with timeout + retry) │
│ - transient error → retry × 2 │
│ - 4xx → fail fast │
│ - 5xx / timeout → exponential back │
└────────┬──────────────────────────────┘
│
▼
┌───────────────────────────────────────┐
│ Result shaping │
│ - JSON ≤ 2KB → 直接返回 │
│ - 大 → 写 file, 返回 file_id+summary │
└────────┬──────────────────────────────┘
▼
tool_result { content, is_error }
三、代码——tools.py
# tools.py
"""
Day 152 - 10 financial analysis tools with full schemas.
Design principles:
1. Each tool does ONE thing well
2. Description tells LLM when to use AND when NOT to use
3. input_schema uses enum/pattern/minimum for type safety
4. Errors are categorized: USER_INPUT_ERROR / TRANSIENT / FATAL
5. Idempotent reads are cached
6. Destructive tools have a "confirm" param
"""
from __future__ import annotations
import hashlib
import json
import time
from dataclasses import dataclass, field
from datetime import datetime
from typing import Any, Callable
import jsonschema
# ====================================================================
# Error model
# ====================================================================
class ToolError(Exception):
KIND_USER = "USER_INPUT_ERROR" # LLM should fix args and retry
KIND_TRANSIENT = "TRANSIENT_ERROR" # Retry with backoff
KIND_FATAL = "FATAL_ERROR" # Don't retry, escalate
def __init__(self, kind: str, message: str, hint: str = ""):
self.kind = kind
self.message = message
self.hint = hint
super().__init__(f"[{kind}] {message}" + (f" | hint: {hint}" if hint else ""))
def to_tool_result(self) -> dict:
return {
"error_kind": self.kind,
"error_message": self.message,
"hint": self.hint,
}
# ====================================================================
# Tool registry framework
# ====================================================================
@dataclass
class Tool:
name: str
description: str
input_schema: dict
handler: Callable[[dict], Any]
cacheable: bool = True
destructive: bool = False
timeout_sec: float = 10.0
def to_anthropic(self) -> dict:
return {
"name": self.name,
"description": self.description,
"input_schema": self.input_schema,
}
# Simple in-process cache
_CACHE: dict[str, tuple[float, str]] = {}
_TTL = 60.0
def _cache_key(name: str, args: dict) -> str:
return hashlib.sha256(f"{name}::{json.dumps(args, sort_keys=True)}".encode()).hexdigest()
def execute_tool(tool: Tool, args: dict) -> dict:
"""Validate, cache-check, run, shape. Returns content for tool_result block."""
# 1. Validate
try:
jsonschema.validate(args, tool.input_schema)
except jsonschema.ValidationError as e:
return {
"is_error": True,
"content": json.dumps(ToolError(
ToolError.KIND_USER,
f"input_schema_violation: {e.message}",
hint=f"Required: {e.schema}",
).to_tool_result()),
}
# 2. Cache (only for non-destructive reads)
if tool.cacheable and not tool.destructive:
k = _cache_key(tool.name, args)
cached = _CACHE.get(k)
if cached and time.time() - cached[0] < _TTL:
return {"is_error": False, "content": cached[1]}
# 3. Execute with retries
last_err: Exception | None = None
for attempt in range(3):
try:
t0 = time.time()
result = tool.handler(args)
elapsed_ms = (time.time() - t0) * 1000
payload = json.dumps({"data": result, "elapsed_ms": round(elapsed_ms, 1)})
if tool.cacheable and not tool.destructive:
_CACHE[_cache_key(tool.name, args)] = (time.time(), payload)
return {"is_error": False, "content": payload}
except ToolError as e:
last_err = e
if e.kind == ToolError.KIND_USER or e.kind == ToolError.KIND_FATAL:
return {"is_error": True, "content": json.dumps(e.to_tool_result())}
time.sleep(2 ** attempt * 0.5)
except Exception as e:
last_err = e
time.sleep(2 ** attempt * 0.5)
# exhausted retries
err = ToolError(
ToolError.KIND_TRANSIENT,
f"retries_exhausted: {last_err}",
hint="Try a different approach or report failure to user.",
)
return {"is_error": True, "content": json.dumps(err.to_tool_result())}
# ====================================================================
# 10 financial tools
# ====================================================================
# Tool 1: search_filings
def _search_filings(args):
db = {"AAPL": [{"form": "10-Q", "date": "2026-08-01"}],
"TSLA": [{"form": "10-Q", "date": "2026-07-23"}]}
t = args["ticker"].upper()
if t not in db:
raise ToolError(ToolError.KIND_USER, "ticker_not_found",
hint=f"Try one of {list(db)}")
return [f for f in db[t] if f["form"] == args.get("form", "10-Q")]
T_SEARCH_FILINGS = Tool(
name="search_filings",
description=(
"Search SEC EDGAR for filings of a US-listed public company. "
"Returns a list of {form, date} records. "
"USE THIS WHEN: you need to discover what filings exist before fetching one. "
"DON'T USE THIS FOR: international companies or when you already know the URL."
),
input_schema={
"type": "object",
"properties": {
"ticker": {"type": "string", "pattern": "^[A-Z]{1,5}$",
"description": "Uppercase ticker, 1-5 letters"},
"form": {"type": "string", "enum": ["10-K", "10-Q", "8-K", "S-1"],
"description": "Filing type"},
},
"required": ["ticker"],
"additionalProperties": False,
},
handler=_search_filings,
)
# Tool 2: fetch_filing_section
def _fetch_filing_section(args):
if args["section"] not in {"MD&A", "Risk Factors", "Financials"}:
raise ToolError(ToolError.KIND_USER, "unknown_section")
return {"section": args["section"], "text": "<<filing text up to 4kb>>"}
T_FETCH_FILING_SECTION = Tool(
name="fetch_filing_section",
description=(
"Fetch a specific SECTION of an SEC filing (not the whole document — "
"those are too large). Returns up to 4KB of text. "
"USE FOR: targeted reading. CHAIN AFTER: search_filings."
),
input_schema={
"type": "object",
"properties": {
"ticker": {"type": "string", "pattern": "^[A-Z]{1,5}$"},
"form": {"type": "string", "enum": ["10-K", "10-Q", "8-K"]},
"date": {"type": "string", "pattern": r"^\d{4}-\d{2}-\d{2}$"},
"section": {"type": "string",
"enum": ["MD&A", "Risk Factors", "Financials"]},
},
"required": ["ticker", "form", "date", "section"],
"additionalProperties": False,
},
handler=_fetch_filing_section,
)
# Tool 3: get_quote
def _get_quote(args):
fakes = {"AAPL": 215.40, "TSLA": 290.10}
t = args["ticker"].upper()
if t not in fakes:
raise ToolError(ToolError.KIND_USER, "no_quote_for_ticker")
return {"ticker": t, "price": fakes[t], "ts": datetime.utcnow().isoformat()}
T_GET_QUOTE = Tool(
name="get_quote",
description=(
"Get the current real-time price of a US-listed equity. "
"Returns price in USD and a timestamp. "
"USE FOR: any question requiring current market value. "
"DON'T USE FOR: historical prices (use get_price_history)."
),
input_schema={
"type": "object",
"properties": {
"ticker": {"type": "string", "pattern": "^[A-Z]{1,5}$"},
},
"required": ["ticker"],
},
handler=_get_quote,
)
# Tool 4: get_price_history
def _get_price_history(args):
return {"ticker": args["ticker"].upper(), "days": args["days"],
"series": [{"d": "2026-09-01", "c": 210.0}]}
T_GET_PRICE_HISTORY = Tool(
name="get_price_history",
description="Daily closing prices for the last N days. Max 365 days.",
input_schema={
"type": "object",
"properties": {
"ticker": {"type": "string", "pattern": "^[A-Z]{1,5}$"},
"days": {"type": "integer", "minimum": 1, "maximum": 365},
},
"required": ["ticker", "days"],
},
handler=_get_price_history,
)
# Tool 5: calculate
def _calculate(args):
expr = args["expression"]
# In prod use asteval, not eval
try:
return {"result": eval(expr, {"__builtins__": {}}, {})}
except Exception as e:
raise ToolError(ToolError.KIND_USER, f"calc_failed: {e}")
T_CALCULATE = Tool(
name="calculate",
description=(
"Evaluate a numeric expression (Python syntax). "
"USE FOR: arithmetic, ratios, % changes. "
"Examples: '24.2 / 94.9 * 100', '(120 - 100) / 100 * 100'. "
"DON'T USE FOR: symbolic math, statistics (use stats_summary)."
),
input_schema={
"type": "object",
"properties": {"expression": {"type": "string", "maxLength": 200}},
"required": ["expression"],
},
handler=_calculate,
)
# Tool 6: stats_summary
def _stats(args):
import statistics
nums = args["values"]
return {
"n": len(nums),
"mean": statistics.mean(nums),
"stdev": statistics.stdev(nums) if len(nums) > 1 else 0,
"min": min(nums), "max": max(nums),
}
T_STATS = Tool(
name="stats_summary",
description="Mean / stdev / min / max of a list of numbers.",
input_schema={
"type": "object",
"properties": {
"values": {"type": "array", "items": {"type": "number"},
"minItems": 1, "maxItems": 10000},
},
"required": ["values"],
},
handler=_stats,
)
# Tool 7: search_news
def _search_news(args):
return [{"title": f"{args['query']} - Q3 results", "date": "2026-09-30"}]
T_SEARCH_NEWS = Tool(
name="search_news",
description=(
"Search recent financial news. Returns up to 10 most relevant headlines. "
"USE FOR: sentiment, breaking events, M&A rumors. "
"DON'T USE AS: the only source for facts (always cross-check filings)."
),
input_schema={
"type": "object",
"properties": {
"query": {"type": "string", "minLength": 2, "maxLength": 200},
"days": {"type": "integer", "minimum": 1, "maximum": 90, "default": 7},
},
"required": ["query"],
},
handler=_search_news,
)
# Tool 8: peer_companies
def _peers(args):
db = {"AAPL": ["MSFT", "GOOGL", "AMZN"], "TSLA": ["F", "GM", "RIVN"]}
t = args["ticker"].upper()
if t not in db:
raise ToolError(ToolError.KIND_USER, "no_peer_data")
return {"ticker": t, "peers": db[t]}
T_PEERS = Tool(
name="peer_companies",
description="Get 3-5 industry peers for benchmarking.",
input_schema={
"type": "object",
"properties": {"ticker": {"type": "string", "pattern": "^[A-Z]{1,5}$"}},
"required": ["ticker"],
},
handler=_peers,
)
# Tool 9: ratio_card (parallel-friendly aggregator)
def _ratio_card(args):
return {"ticker": args["ticker"], "P/E": 28.5, "P/B": 45.0, "ROE": 1.65}
T_RATIO_CARD = Tool(
name="ratio_card",
description="One-shot card of P/E, P/B, ROE for a ticker. Cheaper than 3 separate calls.",
input_schema={
"type": "object",
"properties": {"ticker": {"type": "string", "pattern": "^[A-Z]{1,5}$"}},
"required": ["ticker"],
},
handler=_ratio_card,
)
# Tool 10: send_credit_memo (DESTRUCTIVE - requires confirmation)
def _send_memo(args):
if not args.get("confirmed"):
raise ToolError(ToolError.KIND_USER, "confirmation_required",
hint="Set confirmed=true after the user explicitly approves.")
return {"sent_to": args["recipient"], "ts": datetime.utcnow().isoformat()}
T_SEND_MEMO = Tool(
name="send_credit_memo",
description=(
"*** DESTRUCTIVE *** Email a finalized credit memo to a recipient. "
"ONLY call this AFTER the user has explicitly confirmed. "
"Set confirmed=true only if the user said 'yes' / 'send it'."
),
input_schema={
"type": "object",
"properties": {
"recipient": {"type": "string", "format": "email"},
"subject": {"type": "string", "minLength": 5},
"body_md": {"type": "string", "minLength": 100},
"confirmed": {"type": "boolean"},
},
"required": ["recipient", "subject", "body_md", "confirmed"],
},
handler=_send_memo,
cacheable=False,
destructive=True,
)
ALL_TOOLS: list[Tool] = [
T_SEARCH_FILINGS, T_FETCH_FILING_SECTION, T_GET_QUOTE,
T_GET_PRICE_HISTORY, T_CALCULATE, T_STATS, T_SEARCH_NEWS,
T_PEERS, T_RATIO_CARD, T_SEND_MEMO,
]
# ====================================================================
# Parallel tool calls demo
# ====================================================================
def parallel_demo():
"""When LLM emits multiple tool_use blocks in one response, run them concurrently."""
import asyncio
async def run_one(tool: Tool, args: dict):
loop = asyncio.get_event_loop()
return await loop.run_in_executor(None, execute_tool, tool, args)
# Pretend LLM asked for 3 things in parallel
calls = [
(T_GET_QUOTE, {"ticker": "AAPL"}),
(T_GET_QUOTE, {"ticker": "MSFT"}),
(T_PEERS, {"ticker": "AAPL"}),
]
async def main():
results = await asyncio.gather(*[run_one(t, a) for t, a in calls])
for (t, a), r in zip(calls, results):
print(f"{t.name}({a}) -> {r['content'][:80]}")
asyncio.run(main())
if __name__ == "__main__":
# smoke test
for t in ALL_TOOLS:
print(f"{t.name}: {t.description[:60]}")
parallel_demo()
四、金融领域应用——10 个工具的"工具卡片"
| # | 工具 | 输入约束 | 错误最常见 | 适合场景 | 注意 |
|---|---|---|---|---|---|
| 1 | search_filings | ticker pattern + form enum | ticker_not_found | 公司基本面起点 | 仅 US |
| 2 | fetch_filing_section | section enum | 大文件需分段 | 跟在 search 后 | 不要全文 |
| 3 | get_quote | ticker pattern | rate limit | 当前估值题 | 实时 |
| 4 | get_price_history | days max 365 | 历史回测 | 趋势分析 | 1y 以内 |
| 5 | calculate | expression maxLength | 语法错 | 比率计算 | 安全 eval |
| 6 | stats_summary | values minItems 1 | empty array | 风险/波动 | 非加权 |
| 7 | search_news | query 2-200 chars | rate limit | 事件驱动 | 交叉验证 |
| 8 | peer_companies | ticker pattern | no peer data | 同业对标 | 需更新 |
| 9 | ratio_card | ticker pattern | stale data | 快速看 P/E | 单 call |
| 10 | send_credit_memo | confirmed=true | confirmation_required | 输出阶段 | 不可逆 |
信贷分析师 agent 用这 10 个工具的典型 trace
search_filings(AAPL, 10-Q)→ 拿到 url- 并行:
fetch_filing_section(MD&A)+fetch_filing_section(Financials) peer_companies(AAPL)→ MSFT/GOOGL/AMZN- 并行:
ratio_card(AAPL)+ratio_card(MSFT)+ratio_card(GOOGL) calculate("215.40 / 7.85")→ 计算 P/Estats_summary([28.5, 32.1, 25.0])→ 同业 P/E 区间search_news("AAPL", 7)→ 近期事件send_credit_memo(confirmed=true)← 只在 user 确认后
五、Web3 集成——链上工具的额外约束
链上 tool 的 schema 必须包含风控字段:
T_SWAP = Tool(
name="swap_token",
description=(
"*** DESTRUCTIVE *** Execute an onchain token swap. "
"ALWAYS call simulate_swap first. ALWAYS set max_slippage_bps. "
"ALWAYS check the user's intent allows this token pair."
),
input_schema={
"type": "object",
"properties": {
"from_token": {"type": "string", "pattern": "^0x[a-fA-F0-9]{40}$"},
"to_token": {"type": "string", "pattern": "^0x[a-fA-F0-9]{40}$"},
"amount_in": {"type": "string", "pattern": "^[0-9]+$"},
"max_slippage_bps": {"type": "integer", "minimum": 1, "maximum": 500},
"deadline_ts": {"type": "integer"},
"session_key_id": {"type": "string"}, # ← 必须用 session key
"simulated_min_out": {"type": "string"}, # ← simulate 结果
"user_confirmed": {"type": "boolean"},
},
"required": ["from_token","to_token","amount_in","max_slippage_bps",
"deadline_ts","session_key_id","simulated_min_out","user_confirmed"],
},
handler=_swap_handler,
destructive=True,
cacheable=False,
)
设计原则:destructive onchain tool 必须要求 LLM 显式提供 simulate 结果 + session key id + user_confirmed。这把"安全"前置到 schema 而不是 handler 里,agent loop 也无法绕过。
六、生产经验与陷阱
-
Description 写得太短
"search company filings"→ LLM 不知道什么 ticker / 哪个国家 / 何种 form。改成 5-8 行包含 USE WHEN / DON'T USE FOR / 例子,accuracy 立刻提升。 -
Schema 没用 enum
form: string→ LLM 输入 "10K" / "10-k" / "form 10-Q"。改 enum 后 100% 一致。 -
错误信息太抽象
"error"→ LLM 不知该 retry 还是该改参数。区分 USER / TRANSIENT / FATAL,并给 hint。 -
Cache 没考虑 staleness 股价 cache 60s 太长,行情已变。按 tool 类型分 TTL(quote 5s, filings 1d)。
-
Parallel tool calls 时 rate limit 撞墙 LLM 一次发 5 个 tool_use,5 个并发请求撞 API 配额。Wrap 一个 semaphore 限 N 并发。
-
Tool 太多,LLM 选错率上升 超过 20 个 tool 时 LLM 选择质量下降。两个办法:
- 分组(research_tools / writing_tools / execution_tools)+ 指令路由
- Hierarchical agent:top-level agent 选大类,下钻给 sub-agent
-
Idempotency 漏洞
send_credit_memo第一次卡在网络超时,retry 又发一次。生产里写链 / 写邮件类必须有 idempotency key(client 端 hash)。 -
destructive tool 被 LLM "creatively" 绕过 LLM 看到
confirmed: truerequired,自己设 true。需要在 handler 层校验"调用上下文里是否有 user 显式 message",schema 不够。
七、Cost & Latency
Tool description 长度的影响
| Description 长度 | Tool 选择准确率 | Token 成本影响 |
|---|---|---|
| < 30 字 | 65% | 最低 |
| 50-100 字 | 85% | +10% |
| 100-200 字(含 USE WHEN/DON'T USE) | 93% | +20% |
| > 300 字 | 89%(边际下降) | +50% |
结论:100-200 字 sweet spot。10 个 tool × 150 字 ≈ 1500 token system 开销,开 cache 后 essentially free。
Parallel vs serial 延迟(3 个独立 tool)
| 模式 | 延迟 |
|---|---|
| Serial(每次一个) | 3 × 200ms = 600ms(不含 LLM 时间) |
| Parallel | max(200ms, 200ms, 200ms) = 200ms |
LLM 不会自动并行——必须在 system prompt 提示"可以一次调多个独立 tool",或不开 disable_parallel_tool_use。
八、关键速查
Tool description 模板
[一句话功能]
RETURNS: [输出结构]
USE WHEN: [典型场景 1-2 条]
DON'T USE FOR: [反例 1-2 条]
EXAMPLE: [一个真实输入]
[警告 / destructive 标记]
Error 分类决策
| Error | KIND | LLM 应做 |
|---|---|---|
| Schema 违反 | USER | 改参数重试 |
| ticker_not_found | USER | 换 ticker / 询问用户 |
| 401 / 403 | FATAL | 报告失败、停 |
| 429 (rate limit) | TRANSIENT | backoff retry |
| 5xx / timeout | TRANSIENT | backoff retry |
| confirmation_required | USER | 先问用户 |
| destructive 没确认 | FATAL | 必须停 |
九、面试题
Q1: 为什么 tool description 比 system prompt 更影响 agent 行为?
A: System prompt 影响"风格 + 总体策略"。Tool description 直接决定每次 LLM 看到 tool list 时的判断(选哪个、参数怎么传、何时停)。Tool description 在 token 预算里被反复消费(每轮重传),影响是乘性的。Anthropic 团队在公开实验中观察到优化 description 比改 prompt ROI 更高。
Q2: 如何让 agent 优雅处理 tool 失败?
A: 三层:① 错误分类(USER/TRANSIENT/FATAL),返回结构化 JSON;② handler 层有 retry 策略(仅 TRANSIENT);③ system prompt 教 agent:"如果同一 tool 同样参数连续失败 2 次,换 approach 或报告失败"。配合 cost guardrail 防止疯狂 retry。
Q3: Parallel tool calls 在什么场景关键?
A: 多个独立查询(peer companies × N + ratio cards × N)、多源对账、并行验证(合规 vs 反洗钱 vs 制裁)。Latency 从 N×T 降到 max(T)。但要注意:① 上游 rate limit;② 共享状态的工具不能并行;③ destructive 工具几乎不并行。
Q4: 一个 destructive tool(如转账)如何防止被 agent 误调?
A: 多重防御:① schema required
confirmed: true+simulated_result;② handler 校验调用上下文(user 是否最近显式说"yes");③ 金额阈值(> $X 必须 human-in-the-loop);④ session key 限制范围;⑤ 旁路审计 log(任何 destructive call 写 audit trail)。
Q5: 一个 agent 用 20 个 tool 后准确率下降,怎么办?
A: 几种方案:① 分组路由——top-level 先决定 group,sub-agent 在 group 内选 tool;② tool packing——把 ratio_card 这种聚合 tool 替代 3 个细粒度 tool;③ dynamic tool loading——只在 prompt 里展示与当前任务相关的 5-8 个 tool;④ 重新审视:是不是有些 tool 该合并或删除。
明日预告
Day 153: MCP 协议——Model Context Protocol 完整实现
- Anthropic 2024-11 推出的 MCP:为什么需要、resources/prompts/tools 三类
- 部署一个 MCP server(Python SDK)
- 在 Claude Desktop 配置文件里注册并联调