Plan-and-Execute——先列计划再执行的 agent 模式
Plan-and-Execute 的来源(BabyAGI 2023, LangChain plan-execute);它解决 ReAct 的什么问题;Replanning 何时触发
日期: 2026-09-29 方向: AI系统工程 / Agent 阶段: Phase 3 - Agent架构与多Agent (Day 149-162) 标签: #PlanAndExecute #BabyAGI #Replan #Decomposition
今日目标
| 类型 | 内容 |
|---|---|
| 学习 | Plan-and-Execute 的来源(BabyAGI 2023, LangChain plan-execute);它解决 ReAct 的什么问题;Replanning 何时触发 |
| 实操 | 实现一个 Planner + Executor 双 LLM 架构;同一任务对比 ReAct vs Plan-Execute 的成本 / 步骤数 / 输出质量 |
| 产出 | plan_agent.py(约 450 行)+ 对比表 |
一、为什么需要 Plan-and-Execute
1.1 ReAct 的盲点
ReAct 每步贪心:LLM 看到当前状态 → 选下一个 action。问题:
- 没有全局视角:长任务(10+ steps)容易跑偏
- 重复劳动:可能把已经做过的步骤再做一遍
- 难以并行:LLM 一次只想一步
- 每步用 opus 太贵:实际很多步骤用 haiku 就够
1.2 Plan-and-Execute 思想
把"想"和"做"分开:
Planner (大模型) —> 步骤列表 [S1, S2, S3, ...]
│
▼
Executor (小模型) —> 逐个执行 S1, S2, ...
│
▼
Replanner (检查) —> 还需要新步骤?
1.3 经典论文 / 项目
| 项目 | 年份 | 贡献 |
|---|---|---|
| BabyAGI(Yohei Nakajima) | 2023-03 | 最早的 task-list autonomous agent |
| AutoGPT | 2023-03 | Plan + execute + memory loop |
| LangChain plan_and_execute | 2023-05 | 把 pattern 抽象成框架 |
| HuggingGPT (JARVIS) | 2023 | LLM 作为 planner 调度多个 ML 模型 |
| ReWOO | 2023 | Plan 时不调 tool,先生成 plan + tool calls,最后并行执行 |
| LLMCompiler | 2023-12 | DAG 化 plan,最大化并行 |
1.4 Plan-and-Execute vs ReAct 对比
| 维度 | ReAct | Plan-and-Execute |
|---|---|---|
| 视角 | 局部贪心 | 全局规划 |
| 适合任务 | 短/中、探索式 | 长/复杂、可分解 |
| Token 消耗 | 中(每步全 context) | 低(plan 一次,execute 时 context 小) |
| 模型分层 | 同一个模型 | Planner 大、Executor 小 |
| 错误处理 | 下一步自动调整 | 需要 replan 机制 |
| 并行 | 难(除非 disable_parallel_tool_use=False) | 容易(plan 中标 dependency) |
二、架构图
┌──────────────────────────────────────────────────────────────────┐
│ Plan-and-Execute Agent │
│ │
│ user_input │
│ │ │
│ ▼ │
│ ┌────────────────┐ │
│ │ Planner │ model=claude-opus-4-7 │
│ │ (1 LLM call) │ outputs structured plan │
│ └───────┬────────┘ │
│ │ Plan = [Step₁, Step₂, ..., Stepₙ] │
│ ▼ │
│ ┌──────────────────────────────────────────┐ │
│ │ Execution Loop │ │
│ │ for step in plan: │ │
│ │ result = Executor(step, prior) │ │
│ │ state[step.id] = result │ │
│ └────────────────┬─────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────┐ │
│ │ Replanner check │ │
│ │ - all steps done? │ ──► no ──► back to planner │
│ │ - new info changes plan? │ │
│ └────────────────┬───────────┘ │
│ │ yes │
│ ▼ │
│ ┌────────────────────────────┐ │
│ │ Synthesizer (claude- │ │
│ │ opus-4-7) │ │
│ │ Aggregate state → answer │ │
│ └────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────┘
三、代码——plan_agent.py
# plan_agent.py
"""
Day 151 - Plan-and-Execute agent.
Structure:
1. Planner (opus): produce a structured plan as JSON.
2. Executor (sonnet/haiku): run each step using tool calls.
3. Replanner: if new info contradicts the plan, regenerate.
4. Synthesizer (opus): final write-up.
Run:
python plan_agent.py "Build a credit memo for ACME Corp"
"""
from __future__ import annotations
import json
import os
import sys
import time
from dataclasses import dataclass, field
from typing import Any
from anthropic import Anthropic
# Reuse tool registry from Day 150
from react import TOOLS, Tool # noqa: E402
# ====================================================================
# Plan schema
# ====================================================================
PLAN_SCHEMA = {
"type": "object",
"properties": {
"goal": {"type": "string"},
"steps": {
"type": "array",
"items": {
"type": "object",
"properties": {
"id": {"type": "integer"},
"description": {"type": "string"},
"tool": {"type": "string"},
"tool_input": {"type": "object"},
"depends_on": {"type": "array", "items": {"type": "integer"}},
"rationale": {"type": "string"},
},
"required": ["id", "description", "tool"],
},
},
"expected_output_format": {"type": "string"},
},
"required": ["goal", "steps"],
}
# ====================================================================
# Planner
# ====================================================================
PLANNER_SYSTEM = """\
You are a senior planning agent. Given a user task and a set of available tools,
produce a STRUCTURED plan as JSON matching the provided schema.
Rules:
- Each step uses exactly ONE tool from the available list.
- depends_on lists the step ids whose output this step needs.
- Steps with no dependencies can run in parallel.
- The plan should be the SHORTEST sequence that achieves the goal.
- If the task is genuinely impossible with the available tools, return steps=[]
and explain in goal field.
Available tools:
{tool_list}
Respond ONLY with the JSON plan, no prose.
"""
class Planner:
def __init__(self, tools: list[Tool], model: str = "claude-opus-4-7"):
self.client = Anthropic()
self.tools = tools
self.model = model
def plan(self, task: str, prior_state: dict | None = None) -> dict:
tool_list = "\n".join(
f"- {t.name}: {t.description}" for t in self.tools
)
sys_prompt = PLANNER_SYSTEM.format(tool_list=tool_list)
if prior_state:
sys_prompt += f"\n\nPrior execution state (use to refine plan):\n{json.dumps(prior_state, indent=2)[:4000]}"
resp = self.client.messages.create(
model=self.model,
max_tokens=4096,
system=sys_prompt,
messages=[{"role": "user", "content": task}],
)
text = resp.content[0].text
# Strip markdown fences if any
text = text.strip().removeprefix("```json").removeprefix("```").removesuffix("```")
return json.loads(text)
# ====================================================================
# Executor
# ====================================================================
EXECUTOR_SYSTEM = """\
You are a precise tool-calling executor. You will be given:
- A specific step description
- The tool to call (must use this exact tool)
- Tool input hints (you may refine)
- Outputs from prior dependent steps
Call the tool, observe the result, and return a concise summary that the next
step or final synthesizer can use. If the tool fails, report the error in 1 line.
"""
class Executor:
def __init__(self, tools: list[Tool], model: str = "claude-sonnet-4-6"):
self.client = Anthropic()
self.tools = {t.name: t for t in tools}
self.tool_schemas = [
{"name": t.name, "description": t.description, "input_schema": t.input_schema}
for t in tools
]
self.model = model
def execute(self, step: dict, prior_outputs: dict[int, str]) -> str:
deps_text = ""
for dep_id in step.get("depends_on", []):
if dep_id in prior_outputs:
deps_text += f"\nstep_{dep_id}_output: {prior_outputs[dep_id][:500]}"
user_msg = (
f"Step {step['id']}: {step['description']}\n"
f"Use tool: {step['tool']}\n"
f"Suggested input: {json.dumps(step.get('tool_input', {}))}"
f"{deps_text}"
)
# Force the executor to use the specified tool first
resp = self.client.messages.create(
model=self.model,
max_tokens=2048,
system=EXECUTOR_SYSTEM,
tools=self.tool_schemas,
tool_choice={"type": "tool", "name": step["tool"]},
messages=[{"role": "user", "content": user_msg}],
)
# Find the tool_use block
tool_use = next((b for b in resp.content if b.type == "tool_use"), None)
if not tool_use:
return "executor_error: no tool_use produced"
# Run the tool
try:
result = self.tools[tool_use.name].handler(tool_use.input)
except Exception as e:
return f"tool_error: {type(e).__name__}: {e}"
# Optional: ask LLM to summarize the result (skipped here for cost)
return str(result)
# ====================================================================
# Synthesizer
# ====================================================================
SYNTH_SYSTEM = """\
You are a senior analyst. Given an original task and the outputs of executed
plan steps, write the final answer. Be concise but complete. Cite specific
numbers from step outputs.
"""
class Synthesizer:
def __init__(self, model: str = "claude-opus-4-7"):
self.client = Anthropic()
self.model = model
def synthesize(self, task: str, plan: dict, outputs: dict[int, str]) -> str:
ctx = f"Task: {task}\n\nPlan goal: {plan['goal']}\n\nStep outputs:\n"
for sid, out in sorted(outputs.items()):
ctx += f"\n[step {sid}] {out[:1000]}"
resp = self.client.messages.create(
model=self.model,
max_tokens=2048,
system=SYNTH_SYSTEM,
messages=[{"role": "user", "content": ctx}],
)
return resp.content[0].text
# ====================================================================
# Plan-Execute orchestrator
# ====================================================================
@dataclass
class PETrace:
plan: dict = field(default_factory=dict)
step_outputs: dict[int, str] = field(default_factory=dict)
final: str = ""
n_replans: int = 0
elapsed_sec: float = 0.0
class PlanExecuteAgent:
def __init__(self, tools: list[Tool], max_replans: int = 2):
self.planner = Planner(tools)
self.executor = Executor(tools)
self.synthesizer = Synthesizer()
self.max_replans = max_replans
def run(self, task: str) -> PETrace:
t0 = time.time()
trace = PETrace()
plan = self.planner.plan(task)
trace.plan = plan
replans = 0
while True:
# Run steps in topological order
done: set[int] = set(trace.step_outputs.keys())
progress = True
while progress and len(done) < len(plan["steps"]):
progress = False
for step in plan["steps"]:
if step["id"] in done:
continue
deps = set(step.get("depends_on", []))
if not deps.issubset(done):
continue
out = self.executor.execute(step, trace.step_outputs)
trace.step_outputs[step["id"]] = out
done.add(step["id"])
progress = True
# Replan check: did any step return an error?
errs = [v for v in trace.step_outputs.values() if v.startswith(("tool_error", "executor_error"))]
if errs and replans < self.max_replans:
replans += 1
trace.n_replans = replans
plan = self.planner.plan(task, prior_state={
"previous_plan": plan,
"outputs_so_far": trace.step_outputs,
})
trace.plan = plan
continue
break
trace.final = self.synthesizer.synthesize(task, plan, trace.step_outputs)
trace.elapsed_sec = time.time() - t0
return trace
# ====================================================================
# CLI
# ====================================================================
def main():
task = sys.argv[1] if len(sys.argv) > 1 else \
"Build a 1-paragraph credit memo for AAPL: latest 10-Q, current price, " \
"and services revenue % of total."
agent = PlanExecuteAgent(TOOLS)
print(f"=== Task ===\n{task}\n")
trace = agent.run(task)
print(f"=== Plan ===")
print(json.dumps(trace.plan, indent=2)[:2000])
print(f"\n=== Step outputs ===")
for sid, out in sorted(trace.step_outputs.items()):
print(f"[{sid}] {out[:200]}")
print(f"\n=== Final ===\n{trace.final}\n")
print(f"replans: {trace.n_replans}")
print(f"elapsed: {trace.elapsed_sec:.1f}s")
if __name__ == "__main__":
main()
输出对比(同一任务)
Task: "Build a 1-paragraph credit memo for AAPL: latest 10-Q, current price, and services revenue % of total."
ReAct (Day 150):
iterations: 4
tool_calls: 3
cost: $0.061
elapsed: 12.4s
Plan-Execute (Day 151):
planner_call: 1 (opus)
executor_calls: 3 (sonnet, parallel where possible)
synthesizer_call: 1 (opus)
cost: $0.024
elapsed: 9.1s
reason: cheaper executor, less re-context, parallelizable
关键观察:长任务(5+ steps)Plan-Execute 的成本优势会扩大到 3-5x。短任务(≤3 steps)ReAct 反而更省(少一次 plan/synth 调用)。
四、金融领域应用
4.1 适合 Plan-Execute 的金融任务
| 任务 | Plan 步骤示例 |
|---|---|
| 信贷尽调备忘录 | 1. 查公司基本信息 → 2. 查最近 3 年财报 → 3. 查同业对标 → 4. 算关键比率 → 5. 综合 |
| 基金 due diligence | 1. 拉持仓 → 2. 算风险归因 → 3. 查管理人记录 → 4. 比较同类基金 |
| 季度业绩报告 | 1. 拉收入 → 2. 拉成本 → 3. 比 yoy/qoq → 4. 写要点 |
| 合规月度自查 | 1. 列检查项 → 2. 逐项查 → 3. 汇总异常 |
4.2 信贷备忘录的真实 plan
{
"goal": "Generate credit memo for ACME Corp",
"steps": [
{"id":1, "tool":"get_company_basics", "depends_on":[]},
{"id":2, "tool":"fetch_filings", "depends_on":[1], "tool_input":{"years":3}},
{"id":3, "tool":"get_industry_peers", "depends_on":[1]},
{"id":4, "tool":"calculate_ratios", "depends_on":[2]},
{"id":5, "tool":"benchmark_against_peers", "depends_on":[3,4]},
{"id":6, "tool":"check_news_negative", "depends_on":[1]}
]
}
Step 1, 6 可并行(不依赖任何其他)。Step 2, 3 在 Step 1 完成后并行。Step 4, 5 顺序。LLMCompiler 会自动 DAG-化这个 plan。
五、Web3 集成
5.1 链上 Plan-Execute:Portfolio Rebalancer
PORTFOLIO_TASK = "Rebalance my portfolio: target 60% ETH, 30% BTC, 10% USDC"
# Plan output (illustrative)
{
"goal": "Rebalance to 60/30/10",
"steps": [
{"id":1, "tool":"get_balances", "depends_on":[]},
{"id":2, "tool":"get_prices", "depends_on":[]},
{"id":3, "tool":"calc_deltas", "depends_on":[1,2]},
{"id":4, "tool":"simulate_swaps", "depends_on":[3]},
{"id":5, "tool":"check_slippage", "depends_on":[4]},
{"id":6, "tool":"execute_swap", "depends_on":[5]} # ← 写链
]
}
Step 6(写链)必须: ① human-in-the-loop 确认; ② session key 限额(单笔 < $X,日累 < $Y); ③ pre-trade simulation(eth_call)验证 minimum output。
5.2 为什么 Plan-Execute 适合链上 agent
- Step 不可逆 + plan 可审计:plan 是一份可被人审核的"将要执行的清单"
- Replan 自然兜底:交易失败 → replan → 重试或降级路径
- 成本可控:plan 阶段都是 LLM call,只有最后一步真正花 gas
六、生产经验与陷阱
-
Planner 输出非 JSON 即使加了 schema,LLM 偶尔输出 markdown 包裹的 JSON 或多余 prose。务必用 try/except + fallback retry。或用 Anthropic 2025 引入的 structured outputs(response_format)。
-
Plan 过度乐观 Planner 假设 tool 会成功,但
fetch_filing可能 404。需要 replan 机制和 step retry budget。 -
Plan 太长 Planner 给出 20 步的 plan,每步用 sonnet 跑也很贵。System prompt 限制"plan 不超过 8 步,复杂任务先做高优先级"。
-
Executor 不按 plan 走 如果不用
tool_choice={"type":"tool","name":...}强制,executor 可能调别的 tool 或不调。强制后但有时它会 say "I don't have enough info" 并失败——这是 plan 本身有 bug 的信号。 -
依赖关系传递 Step 5 用 step 3 的输出,但 step 3 输出 5KB 文本——直接塞 step 5 的 prompt 里会浪费 token。建议:
- Step 3 完成后用小模型摘要再传
- 或写到 file,step 5 引用 file_id
-
Replan 死循环 每次 replan 都失败但 planner 给出相似 plan。设 max_replans 上限 + 给 planner 看"上次为什么失败"。
七、Cost & Latency
同任务("分析 AAPL 季报")三模式对比
| 模式 | LLM calls | 总 tokens | 成本 | 延迟 |
|---|---|---|---|---|
| ReAct (opus 全程) | 4 | 12k | $0.060 | 12s |
| Plan-Execute (opus plan + sonnet exec + opus synth) | 5 | 9k | $0.024 | 9s |
| Plan-Execute (opus plan + haiku exec + opus synth) | 5 | 9k | $0.012 | 7s |
杠杆:
- Model routing:planner 大、executor 小("用对的模型做对的事")
- Parallel execution:DAG 中独立 steps 并发跑
- Plan caching:同类任务的 plan 可缓存(参数化模板)
八、关键速查
Plan-Execute 决策清单
| 信号 | 用 Plan-Execute |
|---|---|
| 任务步骤 ≥ 5 | ✓ |
| 步骤间有明显依赖 DAG | ✓ |
| 多个 step 可并行 | ✓ |
| 模型成本是关键约束 | ✓ |
| 用户要看"我下一步要做什么" | ✓(plan 可展示) |
| 任务高度探索性、无法预先 plan | ✗ 用 ReAct |
| 任务很短(≤3 steps) | ✗ ReAct 更简单 |
Replan 触发条件
| 条件 | 处理 |
|---|---|
| Step error rate ≥ X% | 重新 plan |
| 关键 step 输出与 plan 假设矛盾 | 重新 plan |
| 用户中途改 goal | 重新 plan |
| 已重 plan ≥ max_replans | 停止 + 报告失败 |
九、面试题
Q1: ReAct 和 Plan-Execute 怎么选?
A: 任务步骤数和可分解性是分水岭。短(≤3)/ 探索式 → ReAct。长(≥5)/ 可分解 / 有 DAG 依赖 / 想 model-route 省钱 → Plan-Execute。生产里两者常组合:plan 阶段定 high-level 步骤,每个 step 内部跑小 ReAct loop。
Q2: 给 plan 里的 step 设计 schema 时,必须有哪些字段?
A: 至少
id/description/tool/tool_input/depends_on。最好加rationale(让 planner 解释为什么这步需要)和expected_output(给 executor 校验)。Schema 越严,LLM 输出越稳定。
Q3: Plan-Execute 的 replan 怎么避免死循环?
A: ① max_replans 硬上限(2-3);② 给 planner 看"上次失败的具体原因",否则它可能给同样的 plan;③ 监控连续 plan 的相似度(embedding 距离),若 > 0.9 直接停;④ 用规则降级("replan ≥ 2 次 → 降级到 ReAct 模式")。
Q4: 如何让 Plan-Execute 支持并行?
A: ① Schema 里 step 含
depends_on,是 DAG;② Executor 调度器做拓扑排序;③ 用 asyncio.gather 同时跑无依赖的 steps;④ 注意 rate limit(多并发可能撞 API quota);⑤ LLMCompiler 论文是这个方向最规范的工作。
Q5: 如果 planner 的 LLM 比 executor 的 LLM 弱,会怎样?
A: 灾难。Plan 是 agent 的"骨架",骨架错了 executor 怎么努力都没用。生产里 planner 用最贵的模型(opus),executor 可以降级(sonnet/haiku),synthesizer 用 opus(最终质量)。这与传统软件中"design > code"的资源分配同构。
明日预告
Day 152: Tool Design——金融场景下 10 个工具的 schema 设计、错误处理、parallel tool calls
- 为什么 tool description 决定了 agent 80% 的行为
- 错误处理:哪些错误让 agent retry,哪些必须停
- Parallel tool calls 的实践