Structured Output——JSON Mode、Function Calling、Constrained Decoding
JSON mode底层原理、Function calling、Tool use、Constrained decoding (Outlines/lm-format-enforcer)
日期: 2026-09-04 方向: AI系统工程 阶段: Phase 3 - LLM基础与Prompt工程 (Day 121-134) 标签: #StructuredOutput #JSON #FunctionCalling #Tools #ConstrainedDecoding
今日目标
| 类型 | 内容 |
|---|---|
| 学习 | JSON mode底层原理、Function calling、Tool use、Constrained decoding (Outlines/lm-format-enforcer) |
| 实操 | 设计schema-driven prompt、用Anthropic tools api做function calling、用Pydantic校验 |
| 产出 | schemas.json + 金融场景3个完整schema设计 |
一、理论基础
1.1 为什么需要Structured Output?
LLM自然输出是free-form text。生产系统需要machine-readable:
- API responses
- Database insertions
- Workflow triggers
- Tool/agent调用
Naïve方法:prompt里写"输出JSON",然后解析。问题:
- 模型偶尔输出
jsonmarkdown wrapper - 含 Python comments 或 explanation
- Trailing comma
- Unicode escape问题
→ 需要更robust的机制。
1.2 Constrained Decoding原理
每步生成时,根据schema动态mask logits:只允许"语法上合法"的下一个token。
例:要输出JSON object开头是 {",模型在 { 后只能选 " 或whitespace。
实现:
- Outlines(Dottxt):FSM/CFG
- lm-format-enforcer:基于regex
- JSON-mode (OpenAI):内部grammar-constrained decoding
- Tools API (Anthropic):JSON schema → 内部约束
1.3 Function Calling vs Tool Use
历史脉络:
- GPT-3.5 (2023.06): function_calling — 单个function
- GPT-4 (2023.11): tools — 多个tool统一接口
- Claude 3 (2024): tool_use blocks
- Claude 4 (2025): 增强tool use + parallel tool calls
本质:模型决定"现在要不要调用XX工具"+"调用参数是什么",infrastructure执行tool,把result塞回context。
1.4 JSON Schema速览
{
"type": "object",
"properties": {
"ticker": {"type": "string", "pattern": "^[A-Z]{1,5}$"},
"quantity": {"type": "integer", "minimum": 1},
"side": {"enum": ["buy", "sell"]},
"price": {"type": "number", "minimum": 0}
},
"required": ["ticker", "quantity", "side"]
}
二、直觉解释
为什么constrained decoding不是免费的?
每步要mask logits:vocab 100K中可能只有5个valid token。计算上:
- 收益:100% schema compliance
- 成本:每step要做grammar matching(FSM transition)
- 在长schema上额外延迟5-15%
为什么模型对"自然JSON"出错?
JSON的边界字符({, [, ,, :)训练数据里和Python comments、explanation文本混杂出现。模型有"我应该解释一下"的倾向。Constrained decoding解决这个问题。
Tool use的"心智模型"
把model当program counter,tool当subroutine:
while not done:
response = model(context)
if response.is_tool_call:
result = execute(response.tool, response.args)
context += result
else:
return response
每个tool call都是一个decision point,tool result是observation。这是ReAct loop的产品化。
三、代码实现
3.1 Anthropic Tool Use完整示例
# tool_use_demo.py
"""
金融场景:股票查价 + 下单 + 查持仓
"""
import anthropic
import json
client = anthropic.Anthropic()
# Schema定义
TOOLS = [
{
"name": "get_stock_price",
"description": "Get current price of a stock by ticker symbol",
"input_schema": {
"type": "object",
"properties": {
"ticker": {
"type": "string",
"description": "Stock ticker (e.g., AAPL, TSLA)"
}
},
"required": ["ticker"]
}
},
{
"name": "place_order",
"description": "Place a buy/sell order",
"input_schema": {
"type": "object",
"properties": {
"ticker": {"type": "string"},
"side": {"enum": ["buy", "sell"]},
"quantity": {"type": "integer", "minimum": 1},
"order_type": {"enum": ["market", "limit"]},
"limit_price": {"type": "number"}
},
"required": ["ticker", "side", "quantity", "order_type"]
}
},
{
"name": "get_portfolio",
"description": "Get current portfolio holdings",
"input_schema": {"type": "object", "properties": {}}
}
]
# Mock implementations
def execute_tool(name, args):
if name == "get_stock_price":
prices = {"AAPL": 178.5, "TSLA": 245.3, "GOOGL": 142.1}
return {"price": prices.get(args["ticker"], 100.0), "ts": "2026-09-04"}
elif name == "place_order":
return {"status": "filled", "order_id": "ORD123", "filled_price": 178.5}
elif name == "get_portfolio":
return {"AAPL": 100, "GOOGL": 50, "cash": 25000}
def agent_loop(user_message, max_iterations=10):
messages = [{"role": "user", "content": user_message}]
for i in range(max_iterations):
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=4096,
tools=TOOLS,
messages=messages
)
# 模型停了
if response.stop_reason == "end_turn":
text = "".join(b.text for b in response.content if b.type == "text")
return text
# 模型要call tool
if response.stop_reason == "tool_use":
messages.append({"role": "assistant", "content": response.content})
tool_results = []
for block in response.content:
if block.type == "tool_use":
print(f"[Step {i}] Calling {block.name}({block.input})")
result = execute_tool(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": json.dumps(result)
})
messages.append({"role": "user", "content": tool_results})
continue
break
return "Max iterations exceeded"
if __name__ == "__main__":
answer = agent_loop("What's my AAPL position worth right now? If above $15K, sell half.")
print("\nFinal:", answer)
输出示例:
[Step 0] Calling get_portfolio({})
[Step 1] Calling get_stock_price({'ticker': 'AAPL'})
[Step 2] Calling place_order({'ticker': 'AAPL', 'side': 'sell', 'quantity': 50, 'order_type': 'market'})
Final: Your AAPL position is worth $17,850 (100 shares × $178.50). Since this exceeds $15K,
I sold half (50 shares) at market — order filled at $178.50.
3.2 强制JSON输出(无tools)
# json_output.py
"""
不用tools,纯prompt + Pydantic validation
"""
import anthropic
import json
from pydantic import BaseModel, ValidationError
client = anthropic.Anthropic()
class TradeIntent(BaseModel):
ticker: str
side: str # "buy" or "sell"
quantity: int
price_limit: float | None = None
confidence: float
def extract_intent(user_text):
schema = TradeIntent.model_json_schema()
prompt = f"""Extract trade intent from user message.
JSON schema:
{json.dumps(schema, indent=2)}
User message: "{user_text}"
Output ONLY a valid JSON matching the schema. No markdown, no explanation.
"""
resp = client.messages.create(
model="claude-haiku-4-5",
max_tokens=512,
temperature=0.0,
messages=[{"role": "user", "content": prompt}]
)
raw = resp.content[0].text.strip()
# robust parsing:处理可能的markdown wrapping
if raw.startswith("```"):
raw = raw.split("```")[1]
if raw.startswith("json"):
raw = raw[4:]
return TradeIntent(**json.loads(raw))
# Test
texts = [
"Buy 100 shares of Apple at market",
"Sell my Tesla, set limit at 250",
"I want to dump GOOGL, 50 shares",
]
for t in texts:
try:
intent = extract_intent(t)
print(f" {t}\n → {intent}\n")
except (ValidationError, json.JSONDecodeError) as e:
print(f" Failed: {e}")
3.3 用Tools API做严格schema validation
# schema_with_tools.py
"""
更稳的方法:用Tools API把"输出"包装成虚拟tool call。
Anthropic在tool_use时严格按schema约束输出。
"""
import anthropic
client = anthropic.Anthropic()
EXTRACTOR_TOOL = {
"name": "submit_extraction",
"description": "Submit extracted financial data",
"input_schema": {
"type": "object",
"properties": {
"company": {"type": "string"},
"quarter": {"type": "string", "pattern": "^Q[1-4]'\\d{2}$"},
"revenue_usd_m": {"type": "number"},
"net_income_usd_m": {"type": "number"},
"yoy_growth_pct": {"type": "number"}
},
"required": ["company", "quarter", "revenue_usd_m"]
}
}
def extract_financials(report_text):
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
tools=[EXTRACTOR_TOOL],
tool_choice={"type": "tool", "name": "submit_extraction"}, # 强制call
messages=[{
"role": "user",
"content": f"Extract from this report:\n\n{report_text}"
}]
)
for block in response.content:
if block.type == "tool_use":
return block.input
return None
# 测试
report = """Apple Inc. Q3 2026 Earnings: Revenue $94.5B, up 8.5% YoY.
Net income reached $24.1B."""
print(extract_financials(report))
# {'company': 'Apple', 'quarter': "Q3'26", 'revenue_usd_m': 94500, 'net_income_usd_m': 24100, 'yoy_growth_pct': 8.5}
这是Anthropic推荐的"structured output"做法:tool_choice强制 + 严格schema validation。
四、Anthropic API最佳实践
4.1 Tool use关键参数
client.messages.create(
tools=[...], # tool定义list
tool_choice={"type": "auto"}, # 模型自己决定
# 或
tool_choice={"type": "any"}, # 必须call至少一个tool
# 或
tool_choice={"type": "tool", "name": "X"}, # 强制call指定tool
# 或
tool_choice={"type": "none"}, # 禁止tool
# 选项
disable_parallel_tool_use=False, # Claude可同时call多个
)
4.2 Parallel tool use
Claude 4.x支持并行tool calls。例如同时query 5个数据源:
# 一个assistant message里可包含多个tool_use blocks
# 你必须在下个user message里返回所有tool_results
messages.append({"role": "user", "content": [
{"type": "tool_result", "tool_use_id": "id1", "content": "..."},
{"type": "tool_result", "tool_use_id": "id2", "content": "..."},
# ...
]})
4.3 Tool use + Extended Thinking
Claude 4.7支持"thinking before tools":
client.messages.create(
model="claude-opus-4-7",
thinking={"type": "enabled", "budget_tokens": 16000},
tools=TOOLS,
messages=[...]
)
模型会在thinking block里规划"我先call A,再call B",然后真正发出tool calls。质量显著提升。
4.4 Cache control + Tools
Tools定义可以缓存(5min/1h):
# 把stable的tool定义cache起来
tools = [
{**tool, "cache_control": {"type": "ephemeral"}}
for tool in TOOLS
]
不过实测:除非tools很多(>10个,>2K tokens),收益不大。
五、金融领域应用
案例1:交易意图Agent
# trading_agent_schemas.py
TRADING_TOOLS = [
{"name": "get_quote", "description": "Real-time quote",
"input_schema": {"type": "object",
"properties": {"symbol": {"type": "string"}},
"required": ["symbol"]}},
{"name": "check_balance", "description": "Get available cash",
"input_schema": {"type": "object", "properties": {}}},
{"name": "place_order", "description": "Submit order",
"input_schema": {
"type": "object",
"properties": {
"symbol": {"type": "string"},
"side": {"enum": ["buy", "sell"]},
"qty": {"type": "integer", "minimum": 1, "maximum": 10000},
"type": {"enum": ["market", "limit", "stop", "stop_limit"]},
"limit_price": {"type": "number", "minimum": 0},
"tif": {"enum": ["DAY", "GTC", "IOC"], "default": "DAY"}
},
"required": ["symbol", "side", "qty", "type"]
}},
{"name": "risk_check", "description": "Pre-trade risk check",
"input_schema": {
"type": "object",
"properties": {"order_id_or_intent": {"type": "string"}},
"required": ["order_id_or_intent"]
}}
]
关键设计:risk_check 必须在 place_order 前调用——通过system prompt强制流程。
案例2:财报抽取schema
{
"type": "object",
"properties": {
"company_name": {"type": "string"},
"ticker": {"type": "string", "pattern": "^[A-Z]{1,5}$"},
"fiscal_period": {"type": "string", "pattern": "^Q[1-4]'\\d{2}$"},
"income_statement": {
"type": "object",
"properties": {
"revenue": {"type": "number"},
"cogs": {"type": "number"},
"gross_profit": {"type": "number"},
"operating_income": {"type": "number"},
"net_income": {"type": "number"},
"eps_basic": {"type": "number"},
"eps_diluted": {"type": "number"}
}
},
"yoy_changes": {
"type": "object",
"properties": {
"revenue_pct": {"type": "number"},
"net_income_pct": {"type": "number"}
}
},
"guidance": {
"type": "object",
"properties": {
"next_q_revenue_low": {"type": "number"},
"next_q_revenue_high": {"type": "number"}
}
}
},
"required": ["company_name", "fiscal_period", "income_statement"]
}
案例3:合规审查(multi-step)
COMPLIANCE_TOOLS = [
{"name": "screen_ofac", "description": "Check counterparty against OFAC",
"input_schema": {"type": "object",
"properties": {"name": {"type": "string"},
"country": {"type": "string"}},
"required": ["name"]}},
{"name": "kyc_lookup", "description": "Get KYC status of account",
"input_schema": {"type": "object",
"properties": {"account_id": {"type": "string"}},
"required": ["account_id"]}},
{"name": "flag_for_review", "description": "Flag transaction for human review",
"input_schema": {"type": "object",
"properties": {
"tx_id": {"type": "string"},
"reason": {"type": "string"},
"severity": {"enum": ["low", "medium", "high", "critical"]}
},
"required": ["tx_id", "reason", "severity"]}}
]
六、常见陷阱
- JSON mode下含解释文字:模型在JSON前后加"Here is the JSON:"——破坏解析。用tools API或constrained decoding。
- Schema太复杂模型搞错:嵌套5层object的schema常见失败。Flatten。
required字段不强制:JSON schema的required是模型应该填,但LLM可能still skip。Pydantic二次validation必须。- enum value大小写:schema写
["BUY","SELL"],模型可能输出"buy"。统一lowercase or用regex。 - Tool result太大爆context:tool返回100K JSON塞回context炸token budget。先summarize再喂回。
- Parallel tool use结果顺序:不能假设tool_results按call order返回,必须按id匹配。
- Tool description太短模型乱用:description里多说"什么时候用、什么时候不用"。
七、关键速查
Anthropic Tool Use API
{
"tools": [...], # required
"tool_choice": {"type": "auto|any|tool|none"},
"disable_parallel_tool_use": False
}
# Response stop_reason: "tool_use" | "end_turn" | "max_tokens" | "stop_sequence"
Schema design checklist
- 所有field有清晰description
- enum覆盖所有合法值
- number有min/max
- string有pattern (regex)
- required列表明确
- 嵌套不超3层
- 用Pydantic做server-side二次校验
八、面试题
Q1: Tool use和Function calling有何区别?
历史上OpenAI先有function_calling(单function),后改名tools(支持多个)。Anthropic直接叫tool_use。技术上等价,都是"模型决定何时调用什么+参数"。
Q2: 用prompt写"输出JSON"和用Tools API哪个更稳?
Tools API显著更稳。底层做grammar-constrained decoding,schema强约束。Prompt JSON易出markdown wrapping、explanation泄漏。生产建议Tools API。
Q3: Constrained decoding怎么影响latency?
每step要做FSM/grammar transition(mask logits)。简单schema几乎无感(<5%延迟),复杂nested schema可能+10-20%。在GPU上可CUDA kernel化。
Q4: 设计一个agent能用5个tools,怎么避免它陷入循环?
(a) max_iterations硬限。(b) 每轮检查"上一轮call的tool和这轮一样吗+input相似吗"——重复立刻break。(c) 让模型在thinking里写"我已经做了X,下一步应该Y"——self-monitor。(d) 明确tool description里说"如果已得到XX就直接回答"。
九、明日预告
Day 127: Week 19复习 — 整理"LLM工程师必懂50点"。