Expert Day 126

Structured Output——JSON Mode、Function Calling、Constrained Decoding

JSON mode底层原理、Function calling、Tool use、Constrained decoding (Outlines/lm-format-enforcer)

2026-09-04

Phase 3 - LLM基础与Prompt工程 (Day 121-134)

StructuredOutputJSONFunctionCallingToolsConstrainedDecoding

日期: 2026-09-04 方向: AI系统工程阶段: Phase 3 - LLM基础与Prompt工程 (Day 121-134) 标签: #StructuredOutput #JSON #FunctionCalling #Tools #ConstrainedDecoding

今日目标

类型	内容
学习	JSON mode底层原理、Function calling、Tool use、Constrained decoding (Outlines/lm-format-enforcer)
实操	设计schema-driven prompt、用Anthropic tools api做function calling、用Pydantic校验
产出	`schemas.json` + 金融场景3个完整schema设计

一、理论基础

1.1 为什么需要Structured Output？

LLM自然输出是free-form text。生产系统需要machine-readable：

API responses
Database insertions
Workflow triggers
Tool/agent调用

Naïve方法：prompt里写"输出JSON"，然后解析。问题：

模型偶尔输出 json markdown wrapper
含 Python comments 或 explanation
Trailing comma
Unicode escape问题

→ 需要更robust的机制。

1.2 Constrained Decoding原理

每步生成时，根据schema动态mask logits：只允许"语法上合法"的下一个token。

例：要输出JSON object开头是 {"，模型在 { 后只能选 " 或whitespace。

实现：

Outlines（Dottxt）：FSM/CFG
lm-format-enforcer：基于regex
JSON-mode (OpenAI)：内部grammar-constrained decoding
Tools API (Anthropic)：JSON schema → 内部约束

1.3 Function Calling vs Tool Use

历史脉络：

GPT-3.5 (2023.06): function_calling — 单个function
GPT-4 (2023.11): tools — 多个tool统一接口
Claude 3 (2024): tool_use blocks
Claude 4 (2025): 增强tool use + parallel tool calls

本质：模型决定"现在要不要调用XX工具"+"调用参数是什么"，infrastructure执行tool，把result塞回context。

1.4 JSON Schema速览

{
  "type": "object",
  "properties": {
    "ticker": {"type": "string", "pattern": "^[A-Z]{1,5}$"},
    "quantity": {"type": "integer", "minimum": 1},
    "side": {"enum": ["buy", "sell"]},
    "price": {"type": "number", "minimum": 0}
  },
  "required": ["ticker", "quantity", "side"]
}

二、直觉解释

为什么constrained decoding不是免费的？

每步要mask logits：vocab 100K中可能只有5个valid token。计算上：

收益：100% schema compliance
成本：每step要做grammar matching（FSM transition）
在长schema上额外延迟5-15%

为什么模型对"自然JSON"出错？

JSON的边界字符（{, [, ,, :）训练数据里和Python comments、explanation文本混杂出现。模型有"我应该解释一下"的倾向。Constrained decoding解决这个问题。

Tool use的"心智模型"

把model当program counter，tool当subroutine：

while not done:
    response = model(context)
    if response.is_tool_call:
        result = execute(response.tool, response.args)
        context += result
    else:
        return response

每个tool call都是一个decision point，tool result是observation。这是ReAct loop的产品化。

三、代码实现

3.1 Anthropic Tool Use完整示例

# tool_use_demo.py
"""
金融场景：股票查价 + 下单 + 查持仓
"""
import anthropic
import json

client = anthropic.Anthropic()

# Schema定义
TOOLS = [
    {
        "name": "get_stock_price",
        "description": "Get current price of a stock by ticker symbol",
        "input_schema": {
            "type": "object",
            "properties": {
                "ticker": {
                    "type": "string",
                    "description": "Stock ticker (e.g., AAPL, TSLA)"
                }
            },
            "required": ["ticker"]
        }
    },
    {
        "name": "place_order",
        "description": "Place a buy/sell order",
        "input_schema": {
            "type": "object",
            "properties": {
                "ticker": {"type": "string"},
                "side": {"enum": ["buy", "sell"]},
                "quantity": {"type": "integer", "minimum": 1},
                "order_type": {"enum": ["market", "limit"]},
                "limit_price": {"type": "number"}
            },
            "required": ["ticker", "side", "quantity", "order_type"]
        }
    },
    {
        "name": "get_portfolio",
        "description": "Get current portfolio holdings",
        "input_schema": {"type": "object", "properties": {}}
    }
]

# Mock implementations
def execute_tool(name, args):
    if name == "get_stock_price":
        prices = {"AAPL": 178.5, "TSLA": 245.3, "GOOGL": 142.1}
        return {"price": prices.get(args["ticker"], 100.0), "ts": "2026-09-04"}
    elif name == "place_order":
        return {"status": "filled", "order_id": "ORD123", "filled_price": 178.5}
    elif name == "get_portfolio":
        return {"AAPL": 100, "GOOGL": 50, "cash": 25000}

def agent_loop(user_message, max_iterations=10):
    messages = [{"role": "user", "content": user_message}]

    for i in range(max_iterations):
        response = client.messages.create(
            model="claude-opus-4-7",
            max_tokens=4096,
            tools=TOOLS,
            messages=messages
        )

        # 模型停了
        if response.stop_reason == "end_turn":
            text = "".join(b.text for b in response.content if b.type == "text")
            return text

        # 模型要call tool
        if response.stop_reason == "tool_use":
            messages.append({"role": "assistant", "content": response.content})
            tool_results = []
            for block in response.content:
                if block.type == "tool_use":
                    print(f"[Step {i}] Calling {block.name}({block.input})")
                    result = execute_tool(block.name, block.input)
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": json.dumps(result)
                    })
            messages.append({"role": "user", "content": tool_results})
            continue

        break
    return "Max iterations exceeded"

if __name__ == "__main__":
    answer = agent_loop("What's my AAPL position worth right now? If above $15K, sell half.")
    print("\nFinal:", answer)

输出示例：

[Step 0] Calling get_portfolio({})
[Step 1] Calling get_stock_price({'ticker': 'AAPL'})
[Step 2] Calling place_order({'ticker': 'AAPL', 'side': 'sell', 'quantity': 50, 'order_type': 'market'})

Final: Your AAPL position is worth $17,850 (100 shares × $178.50). Since this exceeds $15K,
I sold half (50 shares) at market — order filled at $178.50.

3.2 强制JSON输出（无tools）

# json_output.py
"""
不用tools，纯prompt + Pydantic validation
"""
import anthropic
import json
from pydantic import BaseModel, ValidationError

client = anthropic.Anthropic()

class TradeIntent(BaseModel):
    ticker: str
    side: str  # "buy" or "sell"
    quantity: int
    price_limit: float | None = None
    confidence: float

def extract_intent(user_text):
    schema = TradeIntent.model_json_schema()
    prompt = f"""Extract trade intent from user message.

JSON schema:
{json.dumps(schema, indent=2)}

User message: "{user_text}"

Output ONLY a valid JSON matching the schema. No markdown, no explanation.
"""
    resp = client.messages.create(
        model="claude-haiku-4-5",
        max_tokens=512,
        temperature=0.0,
        messages=[{"role": "user", "content": prompt}]
    )
    raw = resp.content[0].text.strip()
    # robust parsing：处理可能的markdown wrapping
    if raw.startswith("```"):
        raw = raw.split("```")[1]
        if raw.startswith("json"):
            raw = raw[4:]
    return TradeIntent(**json.loads(raw))

# Test
texts = [
    "Buy 100 shares of Apple at market",
    "Sell my Tesla, set limit at 250",
    "I want to dump GOOGL, 50 shares",
]
for t in texts:
    try:
        intent = extract_intent(t)
        print(f"  {t}\n  → {intent}\n")
    except (ValidationError, json.JSONDecodeError) as e:
        print(f"  Failed: {e}")

3.3 用Tools API做严格schema validation

# schema_with_tools.py
"""
更稳的方法：用Tools API把"输出"包装成虚拟tool call。
Anthropic在tool_use时严格按schema约束输出。
"""
import anthropic
client = anthropic.Anthropic()

EXTRACTOR_TOOL = {
    "name": "submit_extraction",
    "description": "Submit extracted financial data",
    "input_schema": {
        "type": "object",
        "properties": {
            "company": {"type": "string"},
            "quarter": {"type": "string", "pattern": "^Q[1-4]'\\d{2}$"},
            "revenue_usd_m": {"type": "number"},
            "net_income_usd_m": {"type": "number"},
            "yoy_growth_pct": {"type": "number"}
        },
        "required": ["company", "quarter", "revenue_usd_m"]
    }
}

def extract_financials(report_text):
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        tools=[EXTRACTOR_TOOL],
        tool_choice={"type": "tool", "name": "submit_extraction"},  # 强制call
        messages=[{
            "role": "user",
            "content": f"Extract from this report:\n\n{report_text}"
        }]
    )
    for block in response.content:
        if block.type == "tool_use":
            return block.input
    return None

# 测试
report = """Apple Inc. Q3 2026 Earnings: Revenue $94.5B, up 8.5% YoY.
Net income reached $24.1B."""
print(extract_financials(report))
# {'company': 'Apple', 'quarter': "Q3'26", 'revenue_usd_m': 94500, 'net_income_usd_m': 24100, 'yoy_growth_pct': 8.5}

这是Anthropic推荐的"structured output"做法：tool_choice强制 + 严格schema validation。

四、Anthropic API最佳实践

4.1 Tool use关键参数

client.messages.create(
    tools=[...],                  # tool定义list
    tool_choice={"type": "auto"},     # 模型自己决定
    # 或
    tool_choice={"type": "any"},      # 必须call至少一个tool
    # 或
    tool_choice={"type": "tool", "name": "X"},  # 强制call指定tool
    # 或
    tool_choice={"type": "none"},     # 禁止tool
    # 选项
    disable_parallel_tool_use=False,  # Claude可同时call多个
)

4.2 Parallel tool use

Claude 4.x支持并行tool calls。例如同时query 5个数据源：

# 一个assistant message里可包含多个tool_use blocks
# 你必须在下个user message里返回所有tool_results
messages.append({"role": "user", "content": [
    {"type": "tool_result", "tool_use_id": "id1", "content": "..."},
    {"type": "tool_result", "tool_use_id": "id2", "content": "..."},
    # ...
]})

4.3 Tool use + Extended Thinking

Claude 4.7支持"thinking before tools"：

client.messages.create(
    model="claude-opus-4-7",
    thinking={"type": "enabled", "budget_tokens": 16000},
    tools=TOOLS,
    messages=[...]
)

模型会在thinking block里规划"我先call A，再call B"，然后真正发出tool calls。质量显著提升。

4.4 Cache control + Tools

Tools定义可以缓存（5min/1h）：

# 把stable的tool定义cache起来
tools = [
    {**tool, "cache_control": {"type": "ephemeral"}}
    for tool in TOOLS
]

不过实测：除非tools很多（>10个，>2K tokens），收益不大。

五、金融领域应用

案例1：交易意图Agent

# trading_agent_schemas.py
TRADING_TOOLS = [
    {"name": "get_quote", "description": "Real-time quote",
     "input_schema": {"type": "object",
                      "properties": {"symbol": {"type": "string"}},
                      "required": ["symbol"]}},
    {"name": "check_balance", "description": "Get available cash",
     "input_schema": {"type": "object", "properties": {}}},
    {"name": "place_order", "description": "Submit order",
     "input_schema": {
         "type": "object",
         "properties": {
             "symbol": {"type": "string"},
             "side": {"enum": ["buy", "sell"]},
             "qty": {"type": "integer", "minimum": 1, "maximum": 10000},
             "type": {"enum": ["market", "limit", "stop", "stop_limit"]},
             "limit_price": {"type": "number", "minimum": 0},
             "tif": {"enum": ["DAY", "GTC", "IOC"], "default": "DAY"}
         },
         "required": ["symbol", "side", "qty", "type"]
     }},
    {"name": "risk_check", "description": "Pre-trade risk check",
     "input_schema": {
         "type": "object",
         "properties": {"order_id_or_intent": {"type": "string"}},
         "required": ["order_id_or_intent"]
     }}
]

关键设计：risk_check 必须在 place_order 前调用——通过system prompt强制流程。

案例2：财报抽取schema

{
  "type": "object",
  "properties": {
    "company_name": {"type": "string"},
    "ticker": {"type": "string", "pattern": "^[A-Z]{1,5}$"},
    "fiscal_period": {"type": "string", "pattern": "^Q[1-4]'\\d{2}$"},
    "income_statement": {
      "type": "object",
      "properties": {
        "revenue": {"type": "number"},
        "cogs": {"type": "number"},
        "gross_profit": {"type": "number"},
        "operating_income": {"type": "number"},
        "net_income": {"type": "number"},
        "eps_basic": {"type": "number"},
        "eps_diluted": {"type": "number"}
      }
    },
    "yoy_changes": {
      "type": "object",
      "properties": {
        "revenue_pct": {"type": "number"},
        "net_income_pct": {"type": "number"}
      }
    },
    "guidance": {
      "type": "object",
      "properties": {
        "next_q_revenue_low": {"type": "number"},
        "next_q_revenue_high": {"type": "number"}
      }
    }
  },
  "required": ["company_name", "fiscal_period", "income_statement"]
}

案例3：合规审查（multi-step）

COMPLIANCE_TOOLS = [
    {"name": "screen_ofac", "description": "Check counterparty against OFAC",
     "input_schema": {"type": "object",
                      "properties": {"name": {"type": "string"},
                                     "country": {"type": "string"}},
                      "required": ["name"]}},
    {"name": "kyc_lookup", "description": "Get KYC status of account",
     "input_schema": {"type": "object",
                      "properties": {"account_id": {"type": "string"}},
                      "required": ["account_id"]}},
    {"name": "flag_for_review", "description": "Flag transaction for human review",
     "input_schema": {"type": "object",
                      "properties": {
                          "tx_id": {"type": "string"},
                          "reason": {"type": "string"},
                          "severity": {"enum": ["low", "medium", "high", "critical"]}
                      },
                      "required": ["tx_id", "reason", "severity"]}}
]

六、常见陷阱

JSON mode下含解释文字：模型在JSON前后加"Here is the JSON:"——破坏解析。用tools API或constrained decoding。
Schema太复杂模型搞错：嵌套5层object的schema常见失败。Flatten。
required字段不强制：JSON schema的required是模型应该填，但LLM可能still skip。Pydantic二次validation必须。
enum value大小写：schema写["BUY","SELL"]，模型可能输出"buy"。统一lowercase or用regex。
Tool result太大爆context：tool返回100K JSON塞回context炸token budget。先summarize再喂回。
Parallel tool use结果顺序：不能假设tool_results按call order返回，必须按id匹配。
Tool description太短模型乱用：description里多说"什么时候用、什么时候不用"。

七、关键速查

Anthropic Tool Use API

{
    "tools": [...],                            # required
    "tool_choice": {"type": "auto|any|tool|none"},
    "disable_parallel_tool_use": False
}
# Response stop_reason: "tool_use" | "end_turn" | "max_tokens" | "stop_sequence"

Schema design checklist

所有field有清晰description
enum覆盖所有合法值
number有min/max
string有pattern (regex)
required列表明确
嵌套不超3层
用Pydantic做server-side二次校验

八、面试题

Q1: Tool use和Function calling有何区别？

历史上OpenAI先有function_calling（单function），后改名tools（支持多个）。Anthropic直接叫tool_use。技术上等价，都是"模型决定何时调用什么+参数"。

Q2: 用prompt写"输出JSON"和用Tools API哪个更稳？

Tools API显著更稳。底层做grammar-constrained decoding，schema强约束。Prompt JSON易出markdown wrapping、explanation泄漏。生产建议Tools API。

Q3: Constrained decoding怎么影响latency？

每step要做FSM/grammar transition（mask logits）。简单schema几乎无感（<5%延迟），复杂nested schema可能+10-20%。在GPU上可CUDA kernel化。

Q4: 设计一个agent能用5个tools，怎么避免它陷入循环？

(a) max_iterations硬限。(b) 每轮检查"上一轮call的tool和这轮一样吗+input相似吗"——重复立刻break。(c) 让模型在thinking里写"我已经做了X，下一步应该Y"——self-monitor。(d) 明确tool description里说"如果已得到XX就直接回答"。

九、明日预告

Day 127: Week 19复习 — 整理"LLM工程师必懂50点"。