ReAct 模式——裸 Anthropic SDK 实现 tool-use loop
Yao et al. 2022 ReAct 论文(Reasoning + Acting);Anthropic tool_use API 完整字段;为什么"先 reason 再 act"比纯 act 准 30%
日期: 2026-09-28 方向: AI系统工程 / Agent 阶段: Phase 3 - Agent架构与多Agent (Day 149-162) 标签: #ReAct #Anthropic #ToolUse #ZeroFramework #FinancialResearch
今日目标
| 类型 | 内容 |
|---|---|
| 学习 | Yao et al. 2022 ReAct 论文(Reasoning + Acting);Anthropic tool_use API 完整字段;为什么"先 reason 再 act"比纯 act 准 30% |
| 实操 | 不用任何 agent 框架,纯 anthropic SDK 写一个 ReAct loop;跑一个"分析这家公司财报"任务 |
| 产出 | react.py(约 400 行)+ trace 输出 + 失败 case 分析 |
核心承诺:今天写完后,你就完全理解 LangGraph / CrewAI 内部到底在做什么。框架只是把这 400 行抽象掉。
一、ReAct 原论文要点
Yao, Zhao, et al. ReAct: Synergizing Reasoning and Acting in Language Models (ICLR 2023, arXiv:2210.03629)
1.1 一句话定义
LLM 每一步交错输出 Thought(推理)+ Action(调 tool)+ Observation(tool 结果),直到任务完成。
1.2 为什么不只 Act?
纯 act-only(直接调 tool)的失败模式:
- LLM 不知道为什么要调这个 tool,参数选错
- 错误堆叠:第一步错,后面全错
- 没有 self-correct 机会
ReAct 加入 Thought 让 LLM 显式推理,论文显示 HotpotQA 上 ReAct 比 Act-only 高约 5 个百分点,比 CoT-only 高约 10 个百分点。
1.3 现代版本
2022 年原论文用的是 PaLM 540B + 文本 parse。今天 Anthropic 等家族通过 native tool_use API(结构化 JSON)实现 ReAct,不再需要 regex 解析。但思路没变:
[user message]
[assistant: thinking + tool_use] ← Thought + Action
[user: tool_result] ← Observation
[assistant: thinking + tool_use] ← 下一轮 Thought + Action
...
[assistant: text only] ← stop_reason = end_turn
二、架构图——ReAct Loop
┌────────────────────────────────────────────────────────────────────┐
│ ReAct Agent Loop │
│ │
│ user_input │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────┐ │
│ │ messages = [{role:user, content: user_input}] │ │
│ └────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────┐ │
│ │ for iter in range(max_iter): │ │
│ │ │ │
│ │ resp = client.messages.create( │ │
│ │ model=claude-opus-4-7, │ │
│ │ messages=messages, │ │
│ │ tools=tool_schemas, │ │
│ │ thinking={type:enabled,budget:8000}, │ │
│ │ ) │ │
│ │ │ │
│ │ if resp.stop_reason == "end_turn": │ │
│ │ return final_text(resp) │ │
│ │ │ │
│ │ if resp.stop_reason == "tool_use": │ │
│ │ messages.append(asst block) │ │
│ │ for block in resp.content: │ │
│ │ if block.type == "tool_use": │ │
│ │ result = exec_tool(block) │ │
│ │ messages.append(tool_result block) │ │
│ └────────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────────────────┘
三、代码实现——react.py
# react.py
"""
Day 150 - ReAct agent, no frameworks.
A complete tool-using agent in <400 lines using only the Anthropic SDK.
This file becomes the reference implementation for all later days.
Run:
export ANTHROPIC_API_KEY=...
python react.py "Analyze AAPL's latest 10-Q filing"
"""
from __future__ import annotations
import json
import os
import sys
import time
from dataclasses import dataclass, field
from typing import Any, Callable
from anthropic import Anthropic
# ====================================================================
# Tool registry
# ====================================================================
@dataclass
class Tool:
name: str
description: str
input_schema: dict
handler: Callable[[dict], Any]
# Toy tools — replace with real APIs in production.
def _t_search_filings(args: dict) -> str:
ticker = args["ticker"].upper()
form = args.get("form", "10-Q")
# In prod, call SEC EDGAR API
fake = {
"AAPL": [
{"form": "10-Q", "date": "2026-08-01", "url": "sec.gov/aapl-10q-2026q3"},
{"form": "10-K", "date": "2025-11-01", "url": "sec.gov/aapl-10k-2025"},
]
}
return json.dumps(fake.get(ticker, []))
def _t_fetch_filing(args: dict) -> str:
url = args["url"]
# In prod, scrape EDGAR
return json.dumps({
"url": url,
"summary": "Revenue $94.9B (+3% yoy). Services $24.2B. Net cash $48B. Capex $2.6B.",
})
def _t_get_price(args: dict) -> str:
ticker = args["ticker"].upper()
fakes = {"AAPL": 215.40, "MSFT": 432.10, "NVDA": 138.55}
return json.dumps({"ticker": ticker, "price": fakes.get(ticker, None)})
def _t_calculate(args: dict) -> str:
expr = args["expression"]
# In prod use a safe expression evaluator (asteval)
try:
return str(eval(expr, {"__builtins__": {}}, {}))
except Exception as e:
return f"calc_error: {e}"
TOOLS: list[Tool] = [
Tool(
name="search_filings",
description="Search SEC EDGAR for filings of a public company. "
"Returns a list of (form, date, url).",
input_schema={
"type": "object",
"properties": {
"ticker": {"type": "string", "description": "Stock ticker, e.g. AAPL"},
"form": {"type": "string", "enum": ["10-K", "10-Q", "8-K"],
"description": "Filing type"},
},
"required": ["ticker"],
},
handler=_t_search_filings,
),
Tool(
name="fetch_filing",
description="Fetch and summarize a specific filing by URL.",
input_schema={
"type": "object",
"properties": {"url": {"type": "string"}},
"required": ["url"],
},
handler=_t_fetch_filing,
),
Tool(
name="get_price",
description="Get current stock price for a ticker.",
input_schema={
"type": "object",
"properties": {"ticker": {"type": "string"}},
"required": ["ticker"],
},
handler=_t_get_price,
),
Tool(
name="calculate",
description="Evaluate a numeric expression. Use Python syntax. "
"E.g. '94.9 / 215.40' or '(48 - 2.6) * 1e9'",
input_schema={
"type": "object",
"properties": {"expression": {"type": "string"}},
"required": ["expression"],
},
handler=_t_calculate,
),
]
# ====================================================================
# Trace
# ====================================================================
@dataclass
class Trace:
iterations: int = 0
input_tokens: int = 0
output_tokens: int = 0
cache_read_tokens: int = 0
cache_creation_tokens: int = 0
tool_calls: list[dict] = field(default_factory=list)
thoughts: list[str] = field(default_factory=list)
final_text: str = ""
stop_reason: str = ""
elapsed_sec: float = 0.0
def cost_usd(self, model: str) -> float:
# claude-opus-4-7 pricing (illustrative): $15/M input, $75/M output
# cache read: $1.5/M, cache write: $18.75/M
rates = {
"claude-opus-4-7": (15.0, 75.0, 1.5, 18.75),
"claude-sonnet-4-6": (3.0, 15.0, 0.3, 3.75),
"claude-haiku-4-5": (0.8, 4.0, 0.08, 1.0),
}
i, o, cr, cw = rates.get(model, rates["claude-opus-4-7"])
return (
self.input_tokens * i
+ self.output_tokens * o
+ self.cache_read_tokens * cr
+ self.cache_creation_tokens * cw
) / 1_000_000
# ====================================================================
# ReAct loop
# ====================================================================
class ReActAgent:
def __init__(
self,
tools: list[Tool],
model: str = "claude-opus-4-7",
max_iterations: int = 12,
thinking_budget: int = 8000,
):
self.client = Anthropic()
self.tools = {t.name: t for t in tools}
self.tool_schemas = [
{"name": t.name, "description": t.description, "input_schema": t.input_schema}
for t in tools
]
self.model = model
self.max_iterations = max_iterations
self.thinking_budget = thinking_budget
SYSTEM = (
"You are a senior financial analyst. Use tools to gather facts before "
"drawing conclusions. Think step by step. If a tool fails twice with "
"the same error, try a different approach or report failure. "
"When you have enough information, write a concise analysis and stop."
)
def _execute_tool(self, name: str, args: dict, tool_use_id: str) -> dict:
if name not in self.tools:
return {
"type": "tool_result",
"tool_use_id": tool_use_id,
"content": f"unknown_tool: {name}. available: {list(self.tools)}",
"is_error": True,
}
try:
t0 = time.time()
out = self.tools[name].handler(args)
elapsed = (time.time() - t0) * 1000
return {
"type": "tool_result",
"tool_use_id": tool_use_id,
"content": str(out),
}
except Exception as e:
return {
"type": "tool_result",
"tool_use_id": tool_use_id,
"content": f"{type(e).__name__}: {e}",
"is_error": True,
}
def run(self, user_input: str) -> Trace:
trace = Trace()
t_start = time.time()
messages: list[dict] = [{"role": "user", "content": user_input}]
# Cache the system prompt + tool schemas (they don't change across iters)
system_blocks = [
{"type": "text", "text": self.SYSTEM, "cache_control": {"type": "ephemeral"}},
]
for i in range(self.max_iterations):
trace.iterations = i + 1
resp = self.client.messages.create(
model=self.model,
max_tokens=4096,
system=system_blocks,
tools=self.tool_schemas,
messages=messages,
thinking={"type": "enabled", "budget_tokens": self.thinking_budget},
)
# Track tokens
usage = resp.usage
trace.input_tokens += usage.input_tokens
trace.output_tokens += usage.output_tokens
trace.cache_read_tokens += getattr(usage, "cache_read_input_tokens", 0) or 0
trace.cache_creation_tokens += getattr(usage, "cache_creation_input_tokens", 0) or 0
# Append assistant message
messages.append({"role": "assistant", "content": resp.content})
# Capture thinking + final text
for block in resp.content:
if block.type == "thinking":
trace.thoughts.append(block.thinking)
elif block.type == "text":
trace.final_text = block.text
if resp.stop_reason == "end_turn":
trace.stop_reason = "end_turn"
break
if resp.stop_reason == "tool_use":
tool_results: list[dict] = []
for block in resp.content:
if block.type == "tool_use":
trace.tool_calls.append({
"name": block.name, "input": block.input, "iter": i + 1,
})
result = self._execute_tool(block.name, block.input, block.id)
tool_results.append(result)
messages.append({"role": "user", "content": tool_results})
continue
# max_tokens, refusal, etc.
trace.stop_reason = resp.stop_reason
break
else:
trace.stop_reason = "max_iterations"
trace.elapsed_sec = time.time() - t_start
return trace
# ====================================================================
# CLI
# ====================================================================
def main():
task = sys.argv[1] if len(sys.argv) > 1 else \
"Analyze AAPL's latest 10-Q. What is its services revenue as % of total?"
agent = ReActAgent(tools=TOOLS, model="claude-opus-4-7", max_iterations=10)
print(f"=== Task ===\n{task}\n")
trace = agent.run(task)
print(f"=== Final answer ===\n{trace.final_text}\n")
print(f"=== Trace summary ===")
print(f" iterations: {trace.iterations}")
print(f" tool_calls: {len(trace.tool_calls)}")
for tc in trace.tool_calls:
print(f" [{tc['iter']}] {tc['name']}({json.dumps(tc['input'])[:80]})")
print(f" stop_reason: {trace.stop_reason}")
print(f" input_tokens: {trace.input_tokens}")
print(f" output_tokens: {trace.output_tokens}")
print(f" cache_read: {trace.cache_read_tokens}")
print(f" elapsed: {trace.elapsed_sec:.1f}s")
print(f" cost_usd: ${trace.cost_usd(agent.model):.4f}")
if __name__ == "__main__":
main()
运行示例输出
=== Task ===
Analyze AAPL's latest 10-Q. What is its services revenue as % of total?
=== Final answer ===
Apple's most recent 10-Q (filed 2026-08-01) reports:
- Total revenue: $94.9B (+3% yoy)
- Services revenue: $24.2B
- Services as % of total: 25.5%
Net cash position: $48B. Capex: $2.6B. Services continues to be Apple's
fastest-growing segment and now represents over a quarter of revenue,
which structurally improves margins given Services' ~70% gross margin
vs Products' ~36%.
=== Trace summary ===
iterations: 4
tool_calls: 3
[1] search_filings({"ticker":"AAPL","form":"10-Q"})
[2] fetch_filing({"url":"sec.gov/aapl-10q-2026q3"})
[3] calculate({"expression":"24.2 / 94.9 * 100"})
stop_reason: end_turn
input_tokens: 4123
output_tokens: 658
cache_read: 2031
elapsed: 12.4s
cost_usd: $0.0612
四、金融领域应用
4.1 适合 ReAct 的金融任务
| 任务 | 为什么适合 ReAct |
|---|---|
| 公司基本面分析 | 步骤开放(先看哪份文件不固定)、tool 集合明确(搜索/取文/计算) |
| 信贷尽调 | 多源数据交叉验证 |
| 客户投诉处理 | 边查边问(账户/交易/规则) |
| AML 调查 | 链路追踪天然递归 |
| 市场新闻摘要 | 抓多源 + 去重 + 评估可信度 |
4.2 不适合 ReAct 的金融任务
- 实时风控(延迟敏感,agent loop 5-30 秒太慢)
- 高频交易(同上)
- 已知步骤的报送(用 workflow)
- 单纯计算(直接调函数,不需要 LLM 推理)
五、Web3 集成
链上版 ReAct——加 onchain 工具
把 _t_get_price 替换成读 Chainlink price feed,把 _t_search_filings 替换成查 Etherscan/EVM RPC,agent 立刻成为 onchain analyst。
def _t_eth_balance(args: dict) -> str:
from web3 import Web3
w3 = Web3(Web3.HTTPProvider(os.environ["RPC_URL"]))
addr = Web3.to_checksum_address(args["address"])
bal_wei = w3.eth.get_balance(addr)
return json.dumps({"address": addr, "eth": bal_wei / 1e18})
def _t_chainlink_price(args: dict) -> str:
# ETH/USD feed on mainnet: 0x5f4eC3Df9cbd43714FE2740f5E3616155c5b8419
feed = w3.eth.contract(address=args["feed"], abi=CHAINLINK_ABI)
rd = feed.functions.latestRoundData().call()
return json.dumps({"price": rd[1] / 1e8, "updated_at": rd[3]})
注意:目前 agent 只读链。Day 161 会加写链,那时必须配 session keys。
六、生产经验与陷阱
-
不带 thinking 时 tool selection 准确率显著下降 关掉
thinking={"type": "enabled"}后,agent 经常选错 tool(例如该用fetch_filing却用search_filings)。在生产里务必启用 thinking(即使付钱给 reasoning tokens)。 -
tool result 太长导致 context 爆炸
fetch_filing返回完整 10-Q 文本(200k token)会直接超 context。两种办法:- 让 tool 自己总结(在 handler 内部跑一次小模型)
- 把全文写到 file,tool 只返回 file_id,agent 用
read_file(file_id, range)二次访问
-
同一个 tool_use_id 必须对应一个 tool_result 如果一轮 LLM 输出 3 个 tool_use 但你只回 2 个 tool_result,下一次调用直接 400 错误。代码里要严格匹配。
-
stop_reason="max_tokens" 不能简单重试 重试会浪费钱且可能进入死循环。正确做法:把
max_tokens调高,或在 system prompt 里明确"答案 < 800 字"。 -
ReAct 容易 hallucination compound LLM 第一步推理错(thought),后续就越走越偏。Mitigation:
- 每隔 5 步加 check-in:
<self_review>我现在的进展是?是否偏离原任务?</self_review> - Max iterations 设硬上限(10-15)
- 每隔 5 步加 check-in:
-
prompt caching 对 agent 经济性至关重要 每轮都重传 system + tools,不开 cache 成本翻 5-10x。代码里
cache_control: {"type": "ephemeral"}必加。
七、Cost & Latency
| 模型 | 单次任务(4 iters)成本 | 延迟 |
|---|---|---|
| claude-opus-4-7 | $0.05-0.10 | 12-18s |
| claude-sonnet-4-6 | $0.01-0.02 | 6-10s |
| claude-haiku-4-5 | $0.002-0.005 | 3-5s |
杠杆:
- Cache 命中后 input cost 降至 1/10
- Tool result 短 → 总 token 减半
- 思考预算(thinking budget)8000 是 sweet spot;2000 太短,16000 边际收益小
八、关键速查
Anthropic tool_use 字段速查
| 字段 | 说明 |
|---|---|
tools | 数组,每个含 name/description/input_schema |
tool_choice | {"type":"auto"}(默认)/ {"type":"any"} 强制调 / {"type":"tool","name":"x"} |
disable_parallel_tool_use | bool,默认 false(允许一轮多个 tool_use) |
Response: content[i].type | text / thinking / tool_use / tool_result |
Response: stop_reason | end_turn / tool_use / max_tokens / stop_sequence / pause_turn |
九、面试题
Q1: ReAct 比纯 Act 好在哪?为什么不让 LLM 直接调 tool?
A: ReAct 的 Thought 提供两个关键收益:① 让 LLM 显式推理"为什么要调这个 tool、参数怎么选",减少参数错误;② 给 self-correct 留余地(看到 tool 失败,下一轮 thought 里可以反思)。论文显示 HotpotQA 上 ReAct 比 Act-only 高 5pp。
Q2: 如何防止 ReAct agent 陷入无限循环?
A: 多层防御:① 硬上限 max_iterations(10-15);② system prompt 明确"如果连续 N 次失败/无新信息就停止并报告";③ 监控同一 tool 同一参数的重复调用(detector);④ 长 trace 周期性 self-review;⑤ cost guardrail(超 $X 强停)。
Q3: tool result 太长,怎么办?
A: ① Tool handler 内部摘要(用便宜模型跑总结);② 写 file system,返回 file_id;③ 切到 1M context 模型;④ 在 messages 里周期性 squash 历史。
Q4: 不开 prompt caching 的 agent 生产成本会多高?
A: 假设 system + tool schemas = 3k token,agent 跑 10 iter,每次都重传 = 30k token × $15/M = $0.45 仅 input。开 cache 后只第一次 $0.45 ×(1+0.1×9)= $0.86 vs 不开的 $4.5。约 5x 差距,更激进的 schema 时更高。
Q5: 写一个 ReAct agent 用了 LangGraph,调试时发现死循环。如何 debug?
A: 步骤:① 打印每次 LLM call 的 stop_reason;② 打印 tool_use input;③ 看是否同一 tool 被反复调;④ 看 thought 内容(thinking block)有没有自我矛盾;⑤ 临时把 max_iter 调到 3 看进展;⑥ 必要时退回裸 SDK 复现,排除框架副作用。
明日预告
Day 151: Plan-and-Execute——为什么 ReAct 不够,要先 Plan 再 Execute
- ReAct 的盲点:每步贪心、没有全局视角
- Plan-and-Execute(BabyAGI / LangChain plan-execute):Planner LLM 先列步骤,Executor 逐个执行
- 实现一个 plan_agent.py,对比 ReAct 在长任务上的表现