CrewAI vs AutoGen vs LangGraph——同任务三框架对比
CrewAI(role-based)、AutoGen(conversation-based)、LangGraph(state-graph)三个主流多 agent 框架的设计哲学对比
日期: 2026-10-05 方向: AI系统工程 / Agent 阶段: Phase 3 - Agent架构与多Agent (Day 149-162) 标签: #CrewAI #AutoGen #LangGraph #FrameworkComparison
今日目标
| 类型 | 内容 |
|---|---|
| 学习 | CrewAI(role-based)、AutoGen(conversation-based)、LangGraph(state-graph)三个主流多 agent 框架的设计哲学对比 |
| 实操 | 同一任务(金融研究 agent:researcher + writer + reviewer)用三个框架各实现一份,对比代码量 / 灵活性 / cost / 调试 |
| 产出 | framework_compare/ 目录含 3 个实现 + benchmark 表 |
一、三个框架的设计哲学
1.1 CrewAI — Role-based / Task-based
思维模型:组建一个"团队(crew)",每个成员(agent)有 role + goal + backstory,每件事(task)assign 给 agent。 适合:明确分工的协作场景,PM 思维直接映射。 当前版本:CrewAI 0.140+(2026 中)。
researcher = Agent(role="researcher", goal="find facts", ...)
task = Task(description="...", expected_output="...", agent=researcher)
crew = Crew(agents=[...], tasks=[...], process=Process.sequential)
crew.kickoff()
1.2 AutoGen — Conversation-based
思维模型:agent 是 ConversableAgent,多 agent 通过 GroupChat 对话。Microsoft Research 出品(2023-10)。 当前版本:AutoGen v0.4(2025 重写为 actor model)。 适合:自由对话、辩论、模拟。
from autogen_agentchat.agents import AssistantAgent
researcher = AssistantAgent(name="researcher", model_client=...)
writer = AssistantAgent(name="writer", ...)
team = RoundRobinGroupChat([researcher, writer])
result = await team.run(task=...)
1.3 LangGraph — State-graph based
思维模型:Day 156 已学。Pregel-style 有状态有环图。 适合:复杂控制流、人在 loop、需要 persistence。
1.4 横向对比
| 维度 | CrewAI | AutoGen | LangGraph |
|---|---|---|---|
| 抽象 | Crew/Agent/Task | ConversableAgent | StateGraph |
| 控制流 | Process(sequential/hierarchical) | GroupChat selector | 显式图 |
| 学习曲线 | 最平 | 中 | 中-陡 |
| 灵活性 | 中 | 高 | 最高 |
| HIL | 内置(task feedback) | 内置(UserProxy) | interrupt |
| Persistence | 部分 | 弱 | 强 |
| 可视化 | UI Studio | studio 工具 | LangSmith |
| 开发者体验 | 极快上手 | API 多变 | 文档相对全 |
| 代码量(同任务) | 最少 | 中 | 最多 |
| 生态 | 中 | 中(Microsoft) | 最大(LangChain) |
| 适合场景 | 业务团队协作 | 研究/对话 | 生产复杂工作流 |
二、同一任务的三种实现
任务定义
"给定一个 ticker,产出一份 1 页投资备忘录。流程:researcher 收集事实 → writer 起草 → reviewer 挑刺 → 终稿。"
预期约 4-6 LLM call,1 个或多个 tool(fetch filing)。
三、CrewAI 实现 — framework_compare/crew_impl.py
# crew_impl.py
"""
CrewAI implementation. Pip install: crewai[tools]>=0.140
"""
import os
from crewai import Agent, Task, Crew, Process, LLM
from crewai.tools import tool
@tool("Search SEC filings")
def search_filings(ticker: str) -> str:
"""Search SEC EDGAR for the most recent 10-Q filing."""
return '[{"form":"10-Q","date":"2026-08-01","mda":"Revenue $94.9B (+3% YoY). Services $24.2B."}]'
llm = LLM(model="claude-opus-4-7", temperature=0.1)
llm_sonnet = LLM(model="claude-sonnet-4-6", temperature=0.1)
researcher = Agent(
role="Equity Researcher",
goal="Gather concrete facts from SEC filings about a target company",
backstory="A meticulous analyst who only reports verified numbers.",
tools=[search_filings],
llm=llm,
verbose=False,
)
writer = Agent(
role="Investment Memo Writer",
goal="Turn research notes into a 1-page memo with thesis + risks",
backstory="A clear writer who turns numbers into a narrative.",
llm=llm,
verbose=False,
)
reviewer = Agent(
role="Investment Committee Reviewer",
goal="Stress-test the memo and require revisions if weak",
backstory="Skeptical chair of the IC; rejects fluff.",
llm=llm_sonnet,
verbose=False,
)
def build_crew(ticker: str) -> Crew:
research_task = Task(
description=f"Find the latest 10-Q for {ticker}. Extract revenue, services revenue, net cash, and 1 risk note.",
expected_output="JSON-style bullets of verified facts.",
agent=researcher,
)
write_task = Task(
description=f"Using the research, draft a 1-page investment memo for {ticker} with: Thesis, Key Numbers, Risks, Recommendation.",
expected_output="A markdown memo, ~400 words.",
agent=writer,
context=[research_task],
)
review_task = Task(
description="Critique the memo. If the thesis is weak or numbers are missing citations, request revisions. Otherwise approve and return final memo.",
expected_output="Final approved memo (markdown).",
agent=reviewer,
context=[research_task, write_task],
)
return Crew(
agents=[researcher, writer, reviewer],
tasks=[research_task, write_task, review_task],
process=Process.sequential,
verbose=False,
)
if __name__ == "__main__":
crew = build_crew("AAPL")
out = crew.kickoff()
print(out)
代码行数:~60。CrewAI 抽象高,PM 看一眼就懂"3 个角色 → 3 个 task → 顺序跑"。
四、AutoGen 实现 — framework_compare/autogen_impl.py
# autogen_impl.py
"""
AutoGen v0.4 implementation. Pip install: autogen-agentchat>=0.4 autogen-ext[anthropic]
"""
import asyncio
from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_agentchat.conditions import TextMentionTermination, MaxMessageTermination
from autogen_ext.models.anthropic import AnthropicChatCompletionClient
opus = AnthropicChatCompletionClient(model="claude-opus-4-7")
sonnet = AnthropicChatCompletionClient(model="claude-sonnet-4-6")
async def search_filings(ticker: str) -> str:
"""Search SEC EDGAR for the most recent 10-Q filing."""
return '[{"form":"10-Q","date":"2026-08-01","mda":"Revenue $94.9B (+3% YoY). Services $24.2B."}]'
researcher = AssistantAgent(
name="researcher",
model_client=opus,
tools=[search_filings],
system_message="You collect facts from SEC filings. Use search_filings. Only report verified numbers.",
)
writer = AssistantAgent(
name="writer",
model_client=opus,
system_message="You turn research into a 1-page investment memo.",
)
reviewer = AssistantAgent(
name="reviewer",
model_client=sonnet,
system_message=(
"You critique the memo. If acceptable, write 'APPROVED' followed by the final memo. "
"Otherwise list specific revision requests."
),
)
term = TextMentionTermination("APPROVED") | MaxMessageTermination(8)
team = RoundRobinGroupChat([researcher, writer, reviewer], termination_condition=term)
async def main():
result = await team.run(task="Produce a 1-page investment memo for AAPL.")
for m in result.messages:
print(f"[{m.source}] {m.content}\n")
if __name__ == "__main__":
asyncio.run(main())
代码行数:~50。AutoGen 把它建模为 round-robin 对话,agent 间通过自然语言"传递"信息(不是结构化 task chain)。
五、LangGraph 实现 — framework_compare/lg_impl.py
# lg_impl.py
"""
LangGraph implementation.
"""
from typing import Annotated, TypedDict
from langchain_anthropic import ChatAnthropic
from langchain_core.messages import HumanMessage, AIMessage, SystemMessage, BaseMessage
from langchain_core.tools import tool
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages
from langgraph.prebuilt import ToolNode
@tool
def search_filings(ticker: str) -> str:
"""Search SEC EDGAR for the most recent 10-Q filing."""
return '[{"form":"10-Q","date":"2026-08-01","mda":"Revenue $94.9B (+3% YoY). Services $24.2B."}]'
opus = ChatAnthropic(model="claude-opus-4-7").bind_tools([search_filings])
sonnet = ChatAnthropic(model="claude-sonnet-4-6")
class S(TypedDict):
ticker: str
research_notes: str
draft: str
final: str
review_count: Annotated[int, lambda a, b: a + b]
messages: Annotated[list[BaseMessage], add_messages]
def researcher_node(state: S):
msgs = [
SystemMessage(content="Collect facts via tools. Output structured bullets."),
HumanMessage(content=f"Research {state['ticker']} latest 10-Q."),
]
resp = opus.invoke(msgs)
if resp.tool_calls:
# Run tool then re-invoke once
tool_results = ToolNode([search_filings]).invoke({"messages": [resp]})
final = opus.invoke(msgs + [resp] + tool_results["messages"])
notes = final.content
else:
notes = resp.content
return {"research_notes": notes}
def writer_node(state: S):
resp = ChatAnthropic(model="claude-opus-4-7").invoke([
SystemMessage(content="You write 1-page investment memos."),
HumanMessage(content=f"Notes:\n{state['research_notes']}\n\nWrite a memo for {state['ticker']}."),
])
return {"draft": resp.content}
def reviewer_node(state: S):
resp = sonnet.invoke([
SystemMessage(content="Critique. If acceptable say APPROVED + final memo, else list revisions."),
HumanMessage(content=f"Memo:\n{state['draft']}"),
])
text = resp.content
if "APPROVED" in text.upper():
return {"final": text, "review_count": 1}
# request revision: re-set draft to None signal
return {"draft": "", "research_notes": state["research_notes"] + "\n\nReviewer feedback: " + text,
"review_count": 1}
def route(state: S) -> str:
if state.get("final"):
return "end"
if state["review_count"] >= 3:
return "end"
if not state.get("draft"):
return "writer"
return "reviewer"
g = StateGraph(S)
g.add_node("researcher", researcher_node)
g.add_node("writer", writer_node)
g.add_node("reviewer", reviewer_node)
g.add_edge(START, "researcher")
g.add_edge("researcher", "writer")
g.add_edge("writer", "reviewer")
g.add_conditional_edges("reviewer", route, {"writer": "writer", "end": END})
graph = g.compile()
if __name__ == "__main__":
out = graph.invoke({"ticker": "AAPL", "messages": [], "review_count": 0})
print(out.get("final") or out.get("draft"))
代码行数:~80。LangGraph 让控制流("reviewer 否决 → 回到 writer")非常显式。
六、Benchmark — 三框架同任务对比表
| 维度 | CrewAI | AutoGen | LangGraph |
|---|---|---|---|
| 代码行数 | 60 | 50 | 80 |
| 学习时间(首跑通) | 30 min | 60 min | 90 min |
| Tool 接入成本 | 极低 (@tool) | 低(async fn) | 低 (@tool) |
| Cost/run | $0.06 | $0.05 | $0.05 |
| Latency | 25-35s | 25-40s | 22-32s |
| 控制流灵活性 | 中 | 中 | 高 |
| Resume/persist | 部分 | 弱 | 强(checkpoint) |
| Debug | logs OK | logs OK | LangSmith 强 |
| HIL | task feedback | UserProxy | interrupt |
| 多 agent 表达 | 自然 | 自然 | 需手写 conditional edges |
| 跳出框架(plain SDK fallback) | 中 | 易 | 易 |
关键观察
- CrewAI:业务 PM 友好,但深度定制(如条件路由)会顶住框架边界。
- AutoGen v0.4:actor model 重写后性能好,但 API 比 v0.2 不稳定,文档有 catch-up。
- LangGraph:控制流最精细,生产化(persist/HIL/observ)能力最强,代价是认知负担。
七、金融领域应用——选哪个?
| 场景 | 推荐 | 理由 |
|---|---|---|
| 内部 IC 自动化(多角色固定) | CrewAI | role/task 直观 |
| 客户服务 chatbot 含 escalation | LangGraph | 需要 thread persist |
| 合规调查(多 agent 对话辩论) | AutoGen | 对话模式自然 |
| 监管报送(流程严格) | LangGraph | 控制流可审 |
| 研究分析师 copilot | CrewAI 起步,复杂化后迁 LangGraph | 复杂度演进 |
| 市场情绪监控 + 行动 | LangGraph | 多源 + HIL |
八、Web3 集成
三框架各自接 onchain tool 的难度
| 框架 | 接钱包/RPC | session key 集成 | 写链 confirm UI |
|---|---|---|---|
| CrewAI | tool 包 web3.py,简单 | 自己管 | task feedback |
| AutoGen | async tool 接 web3,简单 | 自己管 | UserProxy 拦截 |
| LangGraph | tool + interrupt before write | 在 state 里管 | interrupt 内置 |
生产里:写链关键路径选 LangGraph + interrupt(最严密的 HIL)。读链 + research 阶段可以用 CrewAI 快速搭。
框架无关的 onchain pattern
read → simulate → confirm-by-user → sign → submit → verify
(chain) (HIL/interrupt) (wallet) (RPC) (chain)
LangGraph 把每一步建模为节点 + interrupt。AutoGen 用 UserProxy 拦截。CrewAI 借 task input/feedback。
九、生产经验与陷阱
-
框架选完后被锁死 一旦写了 100 个 task 在 CrewAI 上,迁 LangGraph 是大动作。先做小 PoC 评估,再 commit。
-
CrewAI 隐式 prompt 不可见 CrewAI 内部把 role/goal/backstory 拼成 prompt 你看不到全貌。生产前用 verbose=True 抓所有 prompt 看一遍。
-
AutoGen 升级破坏性 v0.2 → v0.4 几乎全部 API 改名。pin 版本,升级前看 migration guide。
-
LangGraph state 设计错 字段没标 reducer,每次被覆盖。或者把整个 messages 同时塞 4 个 agent 看,token 爆炸。
-
多 agent 的 token 成本被低估 3 agent 顺序跑 = 3x 单 agent。生产估算时按"agent 数 × 平均 LLM call × token"。
-
重复写代码 CrewAI/LangGraph/AutoGen 各写一份"金融研究 agent"维护成本高。建议:抽 tool 层(用 MCP 或独立 lib),框架只是编排层。
-
测试难 多 agent 输出非确定,单元测试难写。建议:① snapshot test(trace);② golden set(10 个 task 看 final output 是否符合 rubric);③ 用第 4 个 LLM 做 evaluator。
十、Cost & Latency
同任务三框架(researcher + writer + reviewer,6-8 LLM call)
| 框架 | LLM call 总数 | 总 token | Cost | 延迟 |
|---|---|---|---|---|
| CrewAI | 6-7 | ~12k | $0.05-0.07 | 25-35s |
| AutoGen | 6-9 | ~14k | $0.05-0.08 | 25-40s |
| LangGraph | 5-7 | ~10k | $0.04-0.06 | 22-32s |
LangGraph 略快略便宜(更少抽象层 prompt)。差距 < 20%,框架选择主要看可维护性和能力,不是省钱。
十一、关键速查
框架选择决策
团队偏 PM/业务,要求快速搭原型 → CrewAI
研究/学术场景、Microsoft 生态 → AutoGen
生产复杂 stateful agent、HIL、observability → LangGraph
完全定制需求、不想锁定 → 裸 SDK + 自家 lib
三框架 API 速查
| 操作 | CrewAI | AutoGen | LangGraph |
|---|---|---|---|
| 定义 agent | Agent(role,goal,...) | AssistantAgent(name,model_client,...) | node function |
| 定义 task/工作流 | Task(description, agent) | system_message | StateGraph |
| 顺序执行 | Process.sequential | RoundRobinGroupChat | add_edge |
| 主管模式 | Process.hierarchical | SelectorGroupChat | conditional edges |
| HIL | human_input=True | UserProxyAgent | interrupt() |
| 终止 | task done | termination_condition | END node |
| 持久化 | 部分 | 弱 | checkpointer |
十二、面试题
Q1: CrewAI / AutoGen / LangGraph,业务团队应该选哪个?
A: 看团队画像和需求成熟度。① 业务 PM 主导、需求 sequential、要快速 demo → CrewAI;② 需要 conversation-style 多轮辩论、研究风 → AutoGen;③ 已经在 LangChain 生态、需要 prod 级 persistence/observability → LangGraph。生产里很多团队 CrewAI 起步,复杂化后迁 LangGraph。
Q2: 用 AutoGen 实现的 agent 系统跑生产半年,要不要迁 LangGraph?
A: 视痛点:① 如果 token 失控、对话乱跳 → LangGraph 显式控制流可救;② 如果调试痛苦 → LangSmith trace 强;③ 如果 HIL/persistence 要求高 → LangGraph 内置;④ 如果只是偶尔有问题但整体稳定 → 不迁,迁移成本 ≥ 几周工作量。
Q3: 三个框架都依赖 LLM,如何避免任一框架成为性能瓶颈?
A: ① Tool 层与框架解耦(用 MCP 或独立 lib);② 关键路径用裸 SDK 写性能敏感节点;③ 框架版本 pin;④ Bench 对比 1 周/1 个月 cost & latency;⑤ 监控异常重试率(各框架都有 retry 机制差异);⑥ 写迁移测试套件(同任务多框架同时跑),保留可迁移性。
Q4: 多 agent 框架最容易出的 bug 类型?
A: ① 角色 collapse——agent 风格趋同;② 死锁/无限循环——终止条件没设好;③ token 爆炸——所有 agent 看全 history;④ 顺序错——并行 agent 写共享 state 冲突;⑤ 错误吞没——一个 agent fail 框架默认继续;⑥ prompt injection 横向扩散——一个 agent 被 hijack 影响全队。每种都需要在框架基础上加额外护栏。
Q5: 如果让你设计第四个框架,要解决三家没解决的什么问题?
A: 几个候选:① Cost-first 编排——agent budget、动态 model routing、dollar SLO;② TypeSafe agent——agent 之间结构化消息(不是自然语言),编译时检查;③ 可重放 / time-travel debug——任何一次跑都可 byte-level replay;④ 观测层标准化——OpenTelemetry for agents;⑤ 多 model provider 抽象——切 provider 不改业务代码。这些点已经有一些 PydanticAI / Burr / Marvin 类框架在补。
十三、深度对比——同任务 trace 对比
CrewAI trace 节选
[crew] Starting Crew with 3 agents, 3 tasks
[Researcher] using tool 'Search SEC filings' with input ticker=AAPL
[Researcher] tool returned: [{"form":"10-Q","date":"2026-08-01",...}]
[Researcher] final answer:
- Revenue: $94.9B (+3% YoY)
- Services: $24.2B
- Net cash: $48B
[Writer] received context from Researcher
[Writer] final answer:
## AAPL Investment Memo
Thesis: Long with target $245
...
[Reviewer] received context from Researcher + Writer
[Reviewer] final answer: APPROVED + ...
[crew] Total tokens: 11,873
AutoGen trace 节选
[researcher] (round 1) Calling tool search_filings...
[researcher] (round 1) "Latest 10-Q: revenue $94.9B (+3%), services $24.2B."
[writer] (round 2) "## AAPL Investment Memo\nThesis: Long..."
[reviewer] (round 3) "Revenue YoY 3% is below tech peers. Address."
[researcher] (round 4) "Acknowledge — sector blend 6-8%, AAPL behind."
[writer] (round 5) "Revised: bear scenario +"
[reviewer] (round 6) "APPROVED. Final memo: ..."
TerminationCondition met: TextMention 'APPROVED'.
LangGraph trace 节选(with LangSmith)
[research_node] LLM call (opus) → 1 tool call: search_filings
[research_node] tool result: ...
[research_node] LLM call (opus) → final research notes
[write_node] LLM call (opus) → draft memo
[review_node] LLM call (sonnet) → "APPROVED + final"
[graph] END. iters=3, total tokens 9,800
观察:
- CrewAI 的 trace 最业务化(task 名直观)
- AutoGen 的 trace 像对话日志(适合 debug 多 agent 互动)
- LangGraph 的 trace 节点-LLM-tool 三层结构最适合 ops(监控/告警)
十四、迁移成本估计
如果团队从一个框架迁到另一个,工作量大致:
| From → To | 难度 | 工作量(典型 5 agent 项目) |
|---|---|---|
| CrewAI → LangGraph | 中 | 3-5 周 |
| AutoGen v0.4 → LangGraph | 中 | 3-4 周 |
| LangGraph → CrewAI | 中-易(降级抽象) | 2-3 周 |
| AutoGen ↔ CrewAI | 中 | 3 周 |
| 任意 → 裸 SDK | 易(拆解) | 1-2 周 |
| 裸 SDK → 任意 | 难(要补 framework 知识) | 4-6 周 |
启示:裸 SDK 是最低共通分母。先掌握裸 SDK,框架间迁移更容易。
十五、PM 视角——给业务团队的建议
选框架的 5 个非技术因素
- 团队人手:5+ 工程师 → LangGraph(值得投资学习曲线);1-2 人 → CrewAI(生产力快)
- 业务利益相关者参与度:业务方要 review 流程 → CrewAI(task description 业务语言)
- 审计/合规要求:高 → LangGraph(control flow 可审)
- 客户演示频率:高 → CrewAI(输出口语化、视觉化好做)
- 生态依赖:已重度用 LangChain → LangGraph 自然
框架不重要的项目(识别出来)
- 单 agent 任务
- 短期 PoC(不会演进)
- 替换某个固定 SaaS 流程
- 工具调用 < 5 个
这些场景不要上框架,裸 SDK + 100 行代码足够,避免不必要的依赖。
明日预告
Day 158: Memory 系统——Short-term / Long-term / Episodic / Semantic / Mem0
- 4 层 memory 的本质区别
- 实现 vector store 长记忆
- Mem0 / LangMem / Letta 等专门 lib 对比