Expert Day 151

Plan-and-Execute——先列计划再执行的 agent 模式

Plan-and-Execute 的来源（BabyAGI 2023, LangChain plan-execute）；它解决 ReAct 的什么问题；Replanning 何时触发

2026-09-29

Phase 3 - Agent架构与多Agent (Day 149-162)

PlanAndExecuteBabyAGIReplanDecomposition

日期: 2026-09-29 方向: AI系统工程 / Agent 阶段: Phase 3 - Agent架构与多Agent (Day 149-162) 标签: #PlanAndExecute #BabyAGI #Replan #Decomposition

今日目标

类型	内容
学习	Plan-and-Execute 的来源（BabyAGI 2023, LangChain plan-execute）；它解决 ReAct 的什么问题；Replanning 何时触发
实操	实现一个 Planner + Executor 双 LLM 架构；同一任务对比 ReAct vs Plan-Execute 的成本 / 步骤数 / 输出质量
产出	`plan_agent.py`（约 450 行）+ 对比表

一、为什么需要 Plan-and-Execute

1.1 ReAct 的盲点

ReAct 每步贪心：LLM 看到当前状态 → 选下一个 action。问题：

没有全局视角：长任务（10+ steps）容易跑偏
重复劳动：可能把已经做过的步骤再做一遍
难以并行：LLM 一次只想一步
每步用 opus 太贵：实际很多步骤用 haiku 就够

1.2 Plan-and-Execute 思想

把"想"和"做"分开：

Planner (大模型) —> 步骤列表 [S1, S2, S3, ...]
                     │
                     ▼
Executor (小模型) —> 逐个执行 S1, S2, ...
                     │
                     ▼
Replanner (检查) —> 还需要新步骤？

1.3 经典论文 / 项目

项目	年份	贡献
BabyAGI（Yohei Nakajima）	2023-03	最早的 task-list autonomous agent
AutoGPT	2023-03	Plan + execute + memory loop
LangChain plan_and_execute	2023-05	把 pattern 抽象成框架
HuggingGPT (JARVIS)	2023	LLM 作为 planner 调度多个 ML 模型
ReWOO	2023	Plan 时不调 tool，先生成 plan + tool calls，最后并行执行
LLMCompiler	2023-12	DAG 化 plan，最大化并行

1.4 Plan-and-Execute vs ReAct 对比

维度	ReAct	Plan-and-Execute
视角	局部贪心	全局规划
适合任务	短/中、探索式	长/复杂、可分解
Token 消耗	中（每步全 context）	低（plan 一次，execute 时 context 小）
模型分层	同一个模型	Planner 大、Executor 小
错误处理	下一步自动调整	需要 replan 机制
并行	难（除非 disable_parallel_tool_use=False）	容易（plan 中标 dependency）

二、架构图

┌──────────────────────────────────────────────────────────────────┐
│                  Plan-and-Execute Agent                          │
│                                                                  │
│   user_input                                                     │
│       │                                                          │
│       ▼                                                          │
│   ┌────────────────┐                                            │
│   │   Planner      │  model=claude-opus-4-7                     │
│   │  (1 LLM call)  │  outputs structured plan                   │
│   └───────┬────────┘                                            │
│           │ Plan = [Step₁, Step₂, ..., Stepₙ]                   │
│           ▼                                                      │
│   ┌──────────────────────────────────────────┐                  │
│   │         Execution Loop                    │                  │
│   │   for step in plan:                       │                  │
│   │       result = Executor(step, prior)      │                  │
│   │       state[step.id] = result             │                  │
│   └────────────────┬─────────────────────────┘                  │
│                    │                                            │
│                    ▼                                            │
│   ┌────────────────────────────┐                                │
│   │   Replanner check          │                                │
│   │   - all steps done?        │   ──► no ──► back to planner  │
│   │   - new info changes plan? │                                │
│   └────────────────┬───────────┘                                │
│                    │ yes                                        │
│                    ▼                                            │
│   ┌────────────────────────────┐                                │
│   │   Synthesizer (claude-     │                                │
│   │     opus-4-7)              │                                │
│   │   Aggregate state → answer │                                │
│   └────────────────────────────┘                                │
└──────────────────────────────────────────────────────────────────┘

三、代码——`plan_agent.py`

# plan_agent.py
"""
Day 151 - Plan-and-Execute agent.

Structure:
  1. Planner (opus): produce a structured plan as JSON.
  2. Executor (sonnet/haiku): run each step using tool calls.
  3. Replanner: if new info contradicts the plan, regenerate.
  4. Synthesizer (opus): final write-up.

Run:
    python plan_agent.py "Build a credit memo for ACME Corp"
"""
from __future__ import annotations
import json
import os
import sys
import time
from dataclasses import dataclass, field
from typing import Any

from anthropic import Anthropic

# Reuse tool registry from Day 150
from react import TOOLS, Tool  # noqa: E402

# ====================================================================
# Plan schema
# ====================================================================
PLAN_SCHEMA = {
    "type": "object",
    "properties": {
        "goal": {"type": "string"},
        "steps": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "id":          {"type": "integer"},
                    "description": {"type": "string"},
                    "tool":        {"type": "string"},
                    "tool_input":  {"type": "object"},
                    "depends_on":  {"type": "array", "items": {"type": "integer"}},
                    "rationale":   {"type": "string"},
                },
                "required": ["id", "description", "tool"],
            },
        },
        "expected_output_format": {"type": "string"},
    },
    "required": ["goal", "steps"],
}

# ====================================================================
# Planner
# ====================================================================
PLANNER_SYSTEM = """\
You are a senior planning agent. Given a user task and a set of available tools,
produce a STRUCTURED plan as JSON matching the provided schema.

Rules:
- Each step uses exactly ONE tool from the available list.
- depends_on lists the step ids whose output this step needs.
- Steps with no dependencies can run in parallel.
- The plan should be the SHORTEST sequence that achieves the goal.
- If the task is genuinely impossible with the available tools, return steps=[]
  and explain in goal field.

Available tools:
{tool_list}

Respond ONLY with the JSON plan, no prose.
"""

class Planner:
    def __init__(self, tools: list[Tool], model: str = "claude-opus-4-7"):
        self.client = Anthropic()
        self.tools = tools
        self.model = model

    def plan(self, task: str, prior_state: dict | None = None) -> dict:
        tool_list = "\n".join(
            f"- {t.name}: {t.description}" for t in self.tools
        )
        sys_prompt = PLANNER_SYSTEM.format(tool_list=tool_list)
        if prior_state:
            sys_prompt += f"\n\nPrior execution state (use to refine plan):\n{json.dumps(prior_state, indent=2)[:4000]}"

        resp = self.client.messages.create(
            model=self.model,
            max_tokens=4096,
            system=sys_prompt,
            messages=[{"role": "user", "content": task}],
        )
        text = resp.content[0].text
        # Strip markdown fences if any
        text = text.strip().removeprefix("```json").removeprefix("```").removesuffix("```")
        return json.loads(text)

# ====================================================================
# Executor
# ====================================================================
EXECUTOR_SYSTEM = """\
You are a precise tool-calling executor. You will be given:
- A specific step description
- The tool to call (must use this exact tool)
- Tool input hints (you may refine)
- Outputs from prior dependent steps

Call the tool, observe the result, and return a concise summary that the next
step or final synthesizer can use. If the tool fails, report the error in 1 line.
"""

class Executor:
    def __init__(self, tools: list[Tool], model: str = "claude-sonnet-4-6"):
        self.client = Anthropic()
        self.tools = {t.name: t for t in tools}
        self.tool_schemas = [
            {"name": t.name, "description": t.description, "input_schema": t.input_schema}
            for t in tools
        ]
        self.model = model

    def execute(self, step: dict, prior_outputs: dict[int, str]) -> str:
        deps_text = ""
        for dep_id in step.get("depends_on", []):
            if dep_id in prior_outputs:
                deps_text += f"\nstep_{dep_id}_output: {prior_outputs[dep_id][:500]}"

        user_msg = (
            f"Step {step['id']}: {step['description']}\n"
            f"Use tool: {step['tool']}\n"
            f"Suggested input: {json.dumps(step.get('tool_input', {}))}"
            f"{deps_text}"
        )

        # Force the executor to use the specified tool first
        resp = self.client.messages.create(
            model=self.model,
            max_tokens=2048,
            system=EXECUTOR_SYSTEM,
            tools=self.tool_schemas,
            tool_choice={"type": "tool", "name": step["tool"]},
            messages=[{"role": "user", "content": user_msg}],
        )

        # Find the tool_use block
        tool_use = next((b for b in resp.content if b.type == "tool_use"), None)
        if not tool_use:
            return "executor_error: no tool_use produced"

        # Run the tool
        try:
            result = self.tools[tool_use.name].handler(tool_use.input)
        except Exception as e:
            return f"tool_error: {type(e).__name__}: {e}"

        # Optional: ask LLM to summarize the result (skipped here for cost)
        return str(result)

# ====================================================================
# Synthesizer
# ====================================================================
SYNTH_SYSTEM = """\
You are a senior analyst. Given an original task and the outputs of executed
plan steps, write the final answer. Be concise but complete. Cite specific
numbers from step outputs.
"""

class Synthesizer:
    def __init__(self, model: str = "claude-opus-4-7"):
        self.client = Anthropic()
        self.model = model

    def synthesize(self, task: str, plan: dict, outputs: dict[int, str]) -> str:
        ctx = f"Task: {task}\n\nPlan goal: {plan['goal']}\n\nStep outputs:\n"
        for sid, out in sorted(outputs.items()):
            ctx += f"\n[step {sid}] {out[:1000]}"
        resp = self.client.messages.create(
            model=self.model,
            max_tokens=2048,
            system=SYNTH_SYSTEM,
            messages=[{"role": "user", "content": ctx}],
        )
        return resp.content[0].text

# ====================================================================
# Plan-Execute orchestrator
# ====================================================================
@dataclass
class PETrace:
    plan: dict = field(default_factory=dict)
    step_outputs: dict[int, str] = field(default_factory=dict)
    final: str = ""
    n_replans: int = 0
    elapsed_sec: float = 0.0

class PlanExecuteAgent:
    def __init__(self, tools: list[Tool], max_replans: int = 2):
        self.planner = Planner(tools)
        self.executor = Executor(tools)
        self.synthesizer = Synthesizer()
        self.max_replans = max_replans

    def run(self, task: str) -> PETrace:
        t0 = time.time()
        trace = PETrace()
        plan = self.planner.plan(task)
        trace.plan = plan

        replans = 0
        while True:
            # Run steps in topological order
            done: set[int] = set(trace.step_outputs.keys())
            progress = True
            while progress and len(done) < len(plan["steps"]):
                progress = False
                for step in plan["steps"]:
                    if step["id"] in done:
                        continue
                    deps = set(step.get("depends_on", []))
                    if not deps.issubset(done):
                        continue
                    out = self.executor.execute(step, trace.step_outputs)
                    trace.step_outputs[step["id"]] = out
                    done.add(step["id"])
                    progress = True

            # Replan check: did any step return an error?
            errs = [v for v in trace.step_outputs.values() if v.startswith(("tool_error", "executor_error"))]
            if errs and replans < self.max_replans:
                replans += 1
                trace.n_replans = replans
                plan = self.planner.plan(task, prior_state={
                    "previous_plan": plan,
                    "outputs_so_far": trace.step_outputs,
                })
                trace.plan = plan
                continue
            break

        trace.final = self.synthesizer.synthesize(task, plan, trace.step_outputs)
        trace.elapsed_sec = time.time() - t0
        return trace

# ====================================================================
# CLI
# ====================================================================
def main():
    task = sys.argv[1] if len(sys.argv) > 1 else \
        "Build a 1-paragraph credit memo for AAPL: latest 10-Q, current price, " \
        "and services revenue % of total."

    agent = PlanExecuteAgent(TOOLS)
    print(f"=== Task ===\n{task}\n")
    trace = agent.run(task)

    print(f"=== Plan ===")
    print(json.dumps(trace.plan, indent=2)[:2000])
    print(f"\n=== Step outputs ===")
    for sid, out in sorted(trace.step_outputs.items()):
        print(f"[{sid}] {out[:200]}")
    print(f"\n=== Final ===\n{trace.final}\n")
    print(f"replans:    {trace.n_replans}")
    print(f"elapsed:    {trace.elapsed_sec:.1f}s")

if __name__ == "__main__":
    main()

输出对比（同一任务）

Task: "Build a 1-paragraph credit memo for AAPL: latest 10-Q, current price, and services revenue % of total."

ReAct (Day 150):
  iterations: 4
  tool_calls: 3
  cost:    $0.061
  elapsed: 12.4s

Plan-Execute (Day 151):
  planner_call: 1 (opus)
  executor_calls: 3 (sonnet, parallel where possible)
  synthesizer_call: 1 (opus)
  cost:    $0.024
  elapsed: 9.1s
  reason: cheaper executor, less re-context, parallelizable

关键观察：长任务（5+ steps）Plan-Execute 的成本优势会扩大到 3-5x。短任务（≤3 steps）ReAct 反而更省（少一次 plan/synth 调用）。

四、金融领域应用

4.1 适合 Plan-Execute 的金融任务

任务	Plan 步骤示例
信贷尽调备忘录	1. 查公司基本信息 → 2. 查最近 3 年财报 → 3. 查同业对标 → 4. 算关键比率 → 5. 综合
基金 due diligence	1. 拉持仓 → 2. 算风险归因 → 3. 查管理人记录 → 4. 比较同类基金
季度业绩报告	1. 拉收入 → 2. 拉成本 → 3. 比 yoy/qoq → 4. 写要点
合规月度自查	1. 列检查项 → 2. 逐项查 → 3. 汇总异常

4.2 信贷备忘录的真实 plan

{
  "goal": "Generate credit memo for ACME Corp",
  "steps": [
    {"id":1, "tool":"get_company_basics", "depends_on":[]},
    {"id":2, "tool":"fetch_filings", "depends_on":[1], "tool_input":{"years":3}},
    {"id":3, "tool":"get_industry_peers", "depends_on":[1]},
    {"id":4, "tool":"calculate_ratios", "depends_on":[2]},
    {"id":5, "tool":"benchmark_against_peers", "depends_on":[3,4]},
    {"id":6, "tool":"check_news_negative", "depends_on":[1]}
  ]
}

Step 1, 6 可并行（不依赖任何其他）。Step 2, 3 在 Step 1 完成后并行。Step 4, 5 顺序。LLMCompiler 会自动 DAG-化这个 plan。

五、Web3 集成

5.1 链上 Plan-Execute：Portfolio Rebalancer

PORTFOLIO_TASK = "Rebalance my portfolio: target 60% ETH, 30% BTC, 10% USDC"

# Plan output (illustrative)
{
  "goal": "Rebalance to 60/30/10",
  "steps": [
    {"id":1, "tool":"get_balances",    "depends_on":[]},
    {"id":2, "tool":"get_prices",      "depends_on":[]},
    {"id":3, "tool":"calc_deltas",     "depends_on":[1,2]},
    {"id":4, "tool":"simulate_swaps",  "depends_on":[3]},
    {"id":5, "tool":"check_slippage",  "depends_on":[4]},
    {"id":6, "tool":"execute_swap",    "depends_on":[5]}   # ← 写链
  ]
}

Step 6（写链）必须： ① human-in-the-loop 确认； ② session key 限额（单笔 < $X，日累 < $Y）； ③ pre-trade simulation（eth_call）验证 minimum output。

5.2 为什么 Plan-Execute 适合链上 agent

Step 不可逆 + plan 可审计：plan 是一份可被人审核的"将要执行的清单"
Replan 自然兜底：交易失败 → replan → 重试或降级路径
成本可控：plan 阶段都是 LLM call，只有最后一步真正花 gas

六、生产经验与陷阱

Planner 输出非 JSON 即使加了 schema，LLM 偶尔输出 markdown 包裹的 JSON 或多余 prose。务必用 try/except + fallback retry。或用 Anthropic 2025 引入的 structured outputs（response_format）。
Plan 过度乐观 Planner 假设 tool 会成功，但 fetch_filing 可能 404。需要 replan 机制和 step retry budget。
Plan 太长 Planner 给出 20 步的 plan，每步用 sonnet 跑也很贵。System prompt 限制"plan 不超过 8 步，复杂任务先做高优先级"。
Executor 不按 plan 走 如果不用 tool_choice={"type":"tool","name":...} 强制，executor 可能调别的 tool 或不调。强制后但有时它会 say "I don't have enough info" 并失败——这是 plan 本身有 bug 的信号。
依赖关系传递 Step 5 用 step 3 的输出，但 step 3 输出 5KB 文本——直接塞 step 5 的 prompt 里会浪费 token。建议：
- Step 3 完成后用小模型摘要再传
- 或写到 file，step 5 引用 file_id
Replan 死循环 每次 replan 都失败但 planner 给出相似 plan。设 max_replans 上限 + 给 planner 看"上次为什么失败"。

七、Cost & Latency

同任务（"分析 AAPL 季报"）三模式对比

模式	LLM calls	总 tokens	成本	延迟
ReAct (opus 全程)	4	12k	$0.060	12s
Plan-Execute (opus plan + sonnet exec + opus synth)	5	9k	$0.024	9s
Plan-Execute (opus plan + haiku exec + opus synth)	5	9k	$0.012	7s

杠杆：

Model routing：planner 大、executor 小（"用对的模型做对的事"）

Parallel execution：DAG 中独立 steps 并发跑

Plan caching：同类任务的 plan 可缓存（参数化模板）

八、关键速查

Plan-Execute 决策清单

信号	用 Plan-Execute
任务步骤 ≥ 5	✓
步骤间有明显依赖 DAG	✓
多个 step 可并行	✓
模型成本是关键约束	✓
用户要看"我下一步要做什么"	✓（plan 可展示）
任务高度探索性、无法预先 plan	✗ 用 ReAct
任务很短（≤3 steps）	✗ ReAct 更简单

Replan 触发条件

条件	处理
Step error rate ≥ X%	重新 plan
关键 step 输出与 plan 假设矛盾	重新 plan
用户中途改 goal	重新 plan
已重 plan ≥ max_replans	停止 + 报告失败

九、面试题

Q1: ReAct 和 Plan-Execute 怎么选？

A: 任务步骤数和可分解性是分水岭。短（≤3）/ 探索式 → ReAct。长（≥5）/ 可分解 / 有 DAG 依赖 / 想 model-route 省钱 → Plan-Execute。生产里两者常组合：plan 阶段定 high-level 步骤，每个 step 内部跑小 ReAct loop。

Q2: 给 plan 里的 step 设计 schema 时，必须有哪些字段？

A: 至少 id / description / tool / tool_input / depends_on。最好加 rationale（让 planner 解释为什么这步需要）和 expected_output（给 executor 校验）。Schema 越严，LLM 输出越稳定。

Q3: Plan-Execute 的 replan 怎么避免死循环？

A: ① max_replans 硬上限（2-3）；② 给 planner 看"上次失败的具体原因"，否则它可能给同样的 plan；③ 监控连续 plan 的相似度（embedding 距离），若 > 0.9 直接停；④ 用规则降级（"replan ≥ 2 次 → 降级到 ReAct 模式")。

Q4: 如何让 Plan-Execute 支持并行？

A: ① Schema 里 step 含 depends_on，是 DAG；② Executor 调度器做拓扑排序；③ 用 asyncio.gather 同时跑无依赖的 steps；④ 注意 rate limit（多并发可能撞 API quota）；⑤ LLMCompiler 论文是这个方向最规范的工作。

Q5: 如果 planner 的 LLM 比 executor 的 LLM 弱，会怎样？

A: 灾难。Plan 是 agent 的"骨架"，骨架错了 executor 怎么努力都没用。生产里 planner 用最贵的模型（opus），executor 可以降级（sonnet/haiku），synthesizer 用 opus（最终质量）。这与传统软件中"design > code"的资源分配同构。

明日预告

Day 152: Tool Design——金融场景下 10 个工具的 schema 设计、错误处理、parallel tool calls

为什么 tool description 决定了 agent 80% 的行为
错误处理：哪些错误让 agent retry，哪些必须停
Parallel tool calls 的实践