TR Day 46

用 Claude API 解读 10-Q / 10-K

10-Q vs 10-K 结构、SEC EDGAR API、把 LLM 当「特征工程器」而不是「决策器」的方法论、prompt engineering 中防止幻觉的 5 个硬约束、prompt caching 经济学

2026-06-24

Phase 2: 策略实战 + AI 信号

ClaudeAPI10Q10KEDGARFinancialNLPPromptEngineeringStructuredExtraction

日期: 2026-06-24 方向: Phase 2 / AI 信号 / 财报解读阶段: Phase 2: 策略实战 + AI 信号标签: #ClaudeAPI #10Q #10K #EDGAR #FinancialNLP #PromptEngineering #StructuredExtraction

今日目标

类型	内容
学习	10-Q vs 10-K 结构、SEC EDGAR API、把 LLM 当「特征工程器」而不是「决策器」的方法论、prompt engineering 中防止幻觉的 5 个硬约束、prompt caching 经济学
实操	写 3 个脚本：从 EDGAR 拉原文 → regex 切 MD&A → Claude 抽 4 类结构化信号 → 输出 JSON → 测准确率
产出	可跑通的 fetch_10q.py / extract_mda.py / claude_extract.py + 单份 10-Q 完整 JSON 输出 + 月度 SP100 成本测算 + 3 份金标准的对照

一、核心思路：LLM 是特征工程器，不是决策器

Phase 1 我们做的是「价量数据 + 因子模型」——纯数字进、纯数字出，回测好做。从 Day 46 开始，我们要把非结构化文本纳入信号管线。

财报是金融行业最经典的非结构化数据源：

非结构化文本（200 页 PDF / HTML）
        ↓ [传统做法：人工读，分析师 1 小时 / 份]
        ↓ [新做法：Claude / GPT 抽取，30 秒 / 份]
结构化信号（JSON：guidance / risk / tone / metric）
        ↓
特征向量（数值化）
        ↓
经典 ML 模型 / 因子模型 / 简单规则
        ↓
交易决策

关键认知：很多人第一反应是「让 GPT 直接告诉我这家公司该不该买」。这是错的，原因有三：

LLM 没有 PnL 反馈：它的训练目标是「下一个 token 的概率」，不是「我的建议有没有让用户赚钱」。它不知道自己在哪里错。
决策黑箱：监管 / 合规 / 自己回测的时候你都需要 explainability。「Claude 说买」不是一个可审计的信号。
过拟合到训练数据：Claude 见过 2023 年之前几乎所有上市公司的财报，它的「评价」很大概率是 leak from future。

正确做法：让 LLM 只做「人类标注员 1 小时能做的事」的那一小部分——读一段文字，挑出几个关键信号，按预定 schema 输出。剩下的「这些信号能不能赚钱」交给回测和经典模型回答。

这跟产品里「自动化分类用户反馈」一模一样：你不会让 LLM 直接决定下个迭代做什么，你让它把 1 万条反馈分到 20 个 bucket，然后产品经理看 bucket 分布做决策。Bucket 的设计是 PM 的核心工作，分类的执行才是 LLM 的工作。

二、10-Q vs 10-K 的结构差异

维度	10-Q	10-K
频率	季度（除 Q4，Q4 在 10-K 里）	年度
审计	未审计	已审计
篇幅	50-100 页	200-400 页
截止时间	季度末后 40 天（大公司）	财年末后 60 天
包含 risk factors	简略 / 增量更新	完整 risk factors 章节（占 30-60 页）
包含 MD&A	✓	✓（更长）
包含 forward-looking	经常更新 guidance	年度展望
信号密度	高（增量信息）	中（很多重复内容）

2.1 我们重点关心的章节

不要把整个文档喂给 Claude。这是新手最常见的错误：「一次性把 PDF 上传，让 LLM 全读」。这样做的代价：

单份 10-K 约 80k tokens，Sonnet 跑一次约 $0.24，没几次月预算就用完
注意力稀释，重要信号被废话淹没（重要的 attention 被分到了「我们是一家成立于 1985 年的公司...」这种废话）
输出不稳定，同一份文档跑两次结果可能不一样

只挑这几段：

章节	在哪	信号价值
MD&A（Management Discussion & Analysis）	Item 2 (10-Q) / Item 7 (10-K)	★★★★★ 管理层口径下的业绩解释 + 前瞻指引
Risk Factors	Item 1A	★★★★ 新增风险 = 公司内部已经感知到的麻烦
Outlook / Guidance（如果有）	MD&A 末尾或单独段	★★★★★ 直接的下季度 / 下财年预期
Legal Proceedings	Item 1 / Item 3	★★ 大诉讼信号
Financial Statements	Item 1 / Item 8	★ 数字本身有结构化 source（XBRL），不靠 LLM

结论：90% 的 alpha 在 MD&A。Day 46 主聚焦它，Risk Factors 当 bonus。

2.2 为什么 10-Q 比 10-K 更有用

10-K 全年只有 1 份，信号低频；10-Q 一年 3 份 + 10-K 1 份 = 4 个信号点，正好对应 PEAD（Post-Earnings Announcement Drift，财报后漂移）的高 alpha 时段。

PEAD 经典文献（Bernard & Thomas 1989 一直到现在）反复验证：财报发布后 60 个交易日内，beat 的股票继续涨、miss 的股票继续跌。Drift 的 anomaly 解释之一就是市场对「earnings call + 财报里的语气和指引」反应不充分。LLM 抽取出的语气和指引信号，正好可以做 PEAD 的增强。

Day 47 把这俩接起来；Day 46 先把抽取管线跑通。

三、SEC EDGAR API：免费且无限制

EDGAR 是 SEC 官方数据库，免费、无 API key、无 rate limit 之严格限制（建议 ≤10 req/sec 并加 User-Agent）。这在金融数据领域是奢侈品。

3.1 EDGAR 三件套

端点	用途
`https://data.sec.gov/submissions/CIK{cik}.json`	拿某公司所有 filings 列表
`https://www.sec.gov/Archives/edgar/data/{cik}/{accession}/...`	拿具体某份 filing 的原文
`https://efts.sec.gov/LATEST/search-index?q=...`	全文检索

3.2 CIK 是什么

每家 SEC 注册公司有唯一 CIK（Central Index Key），10 位数字。Apple = 0000320193，特斯拉 = 0001318605。

Ticker → CIK 映射文件公开：https://www.sec.gov/files/company_tickers.json

四、`fetch_10q.py`：从 EDGAR 拉原文

# fetch_10q.py
"""
Fetch the latest 10-Q (or 10-K) filing for a given ticker from SEC EDGAR.
Output: raw HTML text saved to ./filings/{TICKER}_{ACCESSION}.html
"""
import json
import time
from pathlib import Path
import requests

# SEC requires a descriptive User-Agent (else 403)
HEADERS = {"User-Agent": "TR-Pipeline learner contact@example.com"}

CACHE_DIR = Path("./filings")
CACHE_DIR.mkdir(exist_ok=True)


def ticker_to_cik(ticker: str) -> str:
    """Resolve ticker like 'AAPL' to 10-digit CIK string."""
    url = "https://www.sec.gov/files/company_tickers.json"
    data = requests.get(url, headers=HEADERS, timeout=30).json()
    for _, row in data.items():
        if row["ticker"].upper() == ticker.upper():
            return str(row["cik_str"]).zfill(10)
    raise ValueError(f"ticker {ticker} not found")


def latest_filing(cik: str, form: str = "10-Q") -> dict:
    """Return the metadata of the latest filing of given form."""
    url = f"https://data.sec.gov/submissions/CIK{cik}.json"
    data = requests.get(url, headers=HEADERS, timeout=30).json()
    recent = data["filings"]["recent"]
    for i, f in enumerate(recent["form"]):
        if f == form:
            return {
                "accession": recent["accessionNumber"][i].replace("-", ""),
                "primary_doc": recent["primaryDocument"][i],
                "filing_date": recent["filingDate"][i],
                "report_date": recent["reportDate"][i],
            }
    raise ValueError(f"no {form} for CIK {cik}")


def fetch_filing(cik: str, accession: str, primary_doc: str) -> str:
    url = (
        f"https://www.sec.gov/Archives/edgar/data/"
        f"{int(cik)}/{accession}/{primary_doc}"
    )
    r = requests.get(url, headers=HEADERS, timeout=60)
    r.raise_for_status()
    return r.text


def fetch_10q(ticker: str, form: str = "10-Q") -> Path:
    cik = ticker_to_cik(ticker)
    meta = latest_filing(cik, form=form)
    out = CACHE_DIR / f"{ticker}_{meta['accession']}.html"
    if out.exists():
        return out
    html = fetch_filing(cik, meta["accession"], meta["primary_doc"])
    out.write_text(html, encoding="utf-8")
    # save metadata sidecar
    (CACHE_DIR / f"{ticker}_{meta['accession']}.meta.json").write_text(
        json.dumps(meta, indent=2)
    )
    time.sleep(0.2)  # be nice to SEC
    return out


if __name__ == "__main__":
    p = fetch_10q("AAPL")
    print(f"saved to {p} ({p.stat().st_size/1024:.1f} KB)")

踩坑：

必须加 User-Agent：SEC 强制，不加直接 403。格式 公司/项目名联系邮箱，他们偶尔会发邮件确认你不是恶意爬虫。
CIK 要补足 10 位：API 路径里有的地方是 0000320193 有的是 320193，混了就 404。
HTML 比 PDF 友好：10-Q 在 EDGAR 上原生就是 HTML/XBRL，不要去找 PDF 版。

五、`extract_mda.py`：用 regex 切出 MD&A

# extract_mda.py
"""
Extract the MD&A section from raw 10-Q/10-K HTML.
Strategy:
  1) strip HTML to plain text
  2) locate 'Item 2.' (10-Q) or 'Item 7.' (10-K) - MD&A start
  3) locate next 'Item X.' as end
  4) sanity-check length, fall back to a fuzzy regex
"""
import re
from pathlib import Path
from bs4 import BeautifulSoup


MDA_START_PATTERNS = [
    r"item\s+2\.?\s+management.{0,5}s\s+discussion",  # 10-Q
    r"item\s+7\.?\s+management.{0,5}s\s+discussion",  # 10-K
]
MDA_END_PATTERNS = [
    r"item\s+3\.?\s+quantitative",   # 10-Q -> Item 3
    r"item\s+7a\.?\s+quantitative",  # 10-K -> Item 7A
    r"item\s+4\.?\s+controls",       # 10-Q fallback
]


def html_to_text(html: str) -> str:
    soup = BeautifulSoup(html, "lxml")
    # remove script/style
    for tag in soup(["script", "style", "table"]):
        tag.decompose()
    text = soup.get_text(separator="\n")
    # collapse whitespace
    text = re.sub(r"[ \t]+", " ", text)
    text = re.sub(r"\n{2,}", "\n\n", text)
    return text


def extract_mda(text: str) -> str:
    low = text.lower()
    start = None
    for pat in MDA_START_PATTERNS:
        m = re.search(pat, low)
        if m:
            start = m.start()
            break
    if start is None:
        raise ValueError("MD&A start not found")

    # find earliest end pattern after start
    end_candidates = []
    for pat in MDA_END_PATTERNS:
        m = re.search(pat, low[start + 50 :])
        if m:
            end_candidates.append(start + 50 + m.start())
    if not end_candidates:
        # fallback: take 30k chars
        end = start + 30000
    else:
        end = min(end_candidates)

    return text[start:end].strip()


def mda_from_file(html_path: Path) -> str:
    html = html_path.read_text(encoding="utf-8")
    text = html_to_text(html)
    mda = extract_mda(text)
    # save
    out = html_path.with_suffix(".mda.txt")
    out.write_text(mda, encoding="utf-8")
    print(f"MD&A: {len(mda)} chars, {len(mda)/4:.0f} estimated tokens")
    return mda


if __name__ == "__main__":
    import sys
    p = Path(sys.argv[1])
    mda_from_file(p)

为什么不用更花哨的 NLP：

财报结构是模板化的，「Item 2. Management's Discussion」几乎一字不变，regex 命中率 >95%
用 sentence transformer 之类的语义搜索是杀鸡用牛刀，且不稳定
留一个 fallback（取 30k 字符）防止 5% 的失败 case

经验数字：Apple 10-Q 的 MD&A 段约 35-45k 字符 ≈ 8-12k tokens。这是我们要给 Claude 的体量，刚好在 prompt caching 的甜蜜点。

六、Prompt 设计：5 个硬约束

财报 NLP 的 prompt 设计有 5 个硬约束。每个都是踩过坑总结出来的。

6.1 约束一：不问开放性问题

❌ "Is this company doing well?"
❌ "What are the key takeaways?"
✅ "Did management raise, lower, maintain, or withdraw forward guidance? Output one of: raise / lower / maintain / withdraw / none."

开放性问题 → Claude 写散文 → 解析不出来 → 信号无法量化。

6.2 约束二：强制 JSON schema

用 <output_format> 标签 + 显式字段列表 + Anthropic 的 tool use / structured output 强制返回结构化 JSON。

6.3 约束三：给 few-shot examples

每个抽取字段配 2-3 个例子，最好是边界 case（比如 "raise" 不只是 "we raise guidance"，也包括 "we now expect revenue of $X-$Y vs prior $A-$B" 这种隐式的）。

6.4 约束四：System prompt 锁定专业上下文

You are a senior equity research analyst at a buy-side fund.
You read MD&A sections of 10-Q and 10-K filings.
You extract structured signals - you do NOT give buy/sell recommendations.
Stay strictly within what the text says. Do not infer beyond the text.
If a signal is absent, mark it as "none" or null.

最后一句「If absent, mark none」是防幻觉的最核心约束。

6.5 约束五：温度 = 0 + self-consistency

对于抽取任务，temperature=0 是标配。但 Sonnet 在 temp=0 时仍有微小波动。我们用 self-consistency：同 prompt 跑 3 次，取多数票。

七、JSON Schema 设计

{
  "ticker": "AAPL",
  "filing_type": "10-Q",
  "report_date": "2026-03-29",
  "guidance_change": "raise",
  "guidance_detail": "Q4 revenue now expected $94-97B vs prior $89-92B",
  "new_risks": [
    "China supply chain concentration",
    "Pending EU DMA enforcement on App Store"
  ],
  "mgmt_tone": 0.6,
  "tone_evidence": "phrases like 'confident', 'record', 'strong demand' dominate; minimal hedging language",
  "key_metric_changes": {
    "revenue_yoy": "+5%",
    "gross_margin_change": "+1.5pp",
    "operating_expense_yoy": "+8%",
    "share_buyback_announced": "$110B"
  },
  "forward_looking_quotes": [
    "We expect double-digit Services growth to continue into next quarter"
  ],
  "extraction_confidence": 0.85
}

字段设计要点：

字段	类型	为什么这么设计
`guidance_change`	enum {raise/lower/maintain/withdraw/none}	闭集枚举，方便聚合统计、做截面排序
`guidance_detail`	string	留一个自由文本字段做 human review，但不参与量化
`new_risks`	string[]	数组，方便后续做 risk diff（本期新增 vs 上期）
`mgmt_tone`	float [-1, 1]	连续值，可以直接当因子用
`tone_evidence`	string	给 tone 数值一个 explainability 锚点
`key_metric_changes`	dict[str, str]	半结构化，因为各公司关键指标不同
`extraction_confidence`	float [0, 1]	Claude 自评，<0.5 的踢出 pipeline 不用

八、`claude_extract.py`：完整代码

# claude_extract.py
"""
Extract structured signals from a 10-Q/10-K MD&A section using Claude.
Uses prompt caching for the system prompt + few-shot examples.
"""
import json
import os
from pathlib import Path
from anthropic import Anthropic

client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
MODEL = "claude-sonnet-4-5-20250929"  # adjust to current production model


SYSTEM_PROMPT = """You are a senior equity research analyst at a buy-side fund.
You read MD&A sections of 10-Q and 10-K SEC filings.
Your job is to extract structured signals from the text - you do NOT give
buy/sell recommendations or opinions about valuation.

Hard rules:
1. Stay strictly within what the text states. Do not infer beyond the text.
2. If a signal is absent in the text, mark it as "none" or null.
3. Quote management language verbatim when asked for evidence.
4. Distinguish forward-looking statements ("we expect", "we anticipate")
   from descriptions of past results.

You output JSON only, conforming to the schema described in the user message.
"""

FEW_SHOT = """
Example 1 - guidance "raise":
TEXT: "We now expect full year revenue in the range of $94-97 billion,
up from our prior range of $89-92 billion."
EXTRACTION: {"guidance_change": "raise",
             "guidance_detail": "FY revenue $94-97B vs prior $89-92B"}

Example 2 - guidance "lower":
TEXT: "Given continued softness in the EMEA region, we are lowering our
full year revenue outlook to $48-50B from $52-54B previously."
EXTRACTION: {"guidance_change": "lower",
             "guidance_detail": "FY revenue $48-50B vs prior $52-54B"}

Example 3 - guidance "withdraw":
TEXT: "Due to the uncertain macroeconomic environment, we are withdrawing
our previously issued full year guidance."
EXTRACTION: {"guidance_change": "withdraw",
             "guidance_detail": "withdrew FY guidance citing macro"}

Example 4 - guidance "none":
TEXT: "We do not provide forward guidance as a matter of policy."
EXTRACTION: {"guidance_change": "none", "guidance_detail": null}
"""


def build_user_message(mda_text: str, ticker: str, filing_type: str, report_date: str):
    return f"""Below is the MD&A section of a {filing_type} filing for {ticker}
(report date {report_date}). Extract the following signals as JSON:

Schema:
{{
  "guidance_change": "raise" | "lower" | "maintain" | "withdraw" | "none",
  "guidance_detail": "<one-sentence summary or null>",
  "new_risks": ["<short risk description>", ...],
  "mgmt_tone": <float -1 to 1, where -1=very bearish, 0=neutral, 1=very bullish>,
  "tone_evidence": "<2-3 phrases from the text supporting your tone score>",
  "key_metric_changes": {{
     "revenue_yoy": "<e.g. +5% or null>",
     "gross_margin_change": "<e.g. +1.5pp or null>",
     "operating_expense_yoy": "<or null>",
     "share_buyback_announced": "<dollar amount or null>"
  }},
  "forward_looking_quotes": ["<verbatim forward-looking sentence>", ...],
  "extraction_confidence": <float 0 to 1>
}}

Few-shot examples for guidance_change:
{FEW_SHOT}

MD&A text:
<mda>
{mda_text}
</mda>

Return JSON only, no preamble.
"""


def extract(mda_text: str, ticker: str, filing_type: str, report_date: str,
            n_samples: int = 1) -> dict:
    """Run extraction with optional self-consistency (n_samples > 1)."""
    user_msg = build_user_message(mda_text, ticker, filing_type, report_date)

    results = []
    for _ in range(n_samples):
        resp = client.messages.create(
            model=MODEL,
            max_tokens=2000,
            temperature=0,
            system=[
                {
                    "type": "text",
                    "text": SYSTEM_PROMPT,
                    "cache_control": {"type": "ephemeral"},   # 5-min TTL cache
                }
            ],
            messages=[{"role": "user", "content": user_msg}],
        )
        text = resp.content[0].text.strip()
        # strip code fences if any
        if text.startswith("```"):
            text = text.split("```")[1]
            if text.startswith("json"):
                text = text[4:]
        results.append(json.loads(text))

    if n_samples == 1:
        out = results[0]
    else:
        out = aggregate(results)

    out["ticker"] = ticker
    out["filing_type"] = filing_type
    out["report_date"] = report_date
    return out


def aggregate(results: list[dict]) -> dict:
    """Simple majority vote for categorical fields, mean for tone."""
    from collections import Counter
    out = dict(results[0])
    out["guidance_change"] = Counter(
        r["guidance_change"] for r in results
    ).most_common(1)[0][0]
    out["mgmt_tone"] = sum(r["mgmt_tone"] for r in results) / len(results)
    # for risks, take union with frequency >= 2
    all_risks = [r for res in results for r in res.get("new_risks", [])]
    out["new_risks"] = [r for r, c in Counter(all_risks).items() if c >= 2]
    return out


if __name__ == "__main__":
    import sys
    mda_path = Path(sys.argv[1])
    mda = mda_path.read_text(encoding="utf-8")
    # parse metadata from filename or sidecar
    ticker = sys.argv[2]
    result = extract(mda, ticker, "10-Q", "2026-03-29", n_samples=1)
    print(json.dumps(result, indent=2, ensure_ascii=False))

8.1 Prompt Caching 的关键点

system=[
    {
        "type": "text",
        "text": SYSTEM_PROMPT,
        "cache_control": {"type": "ephemeral"},
    }
]

TTL = 5 分钟：连续 5 分钟内对同一份 system prompt 的复用，input cost 降 90%（缓存读取价 ~10% 原价）
最小 1024 tokens 才能缓存。我们的 system + few-shot 大约 1.5k tokens，正好够
跑批的关键：批量扫 SP100 时，把 100 个 ticker 在 5 分钟内连发，缓存只算一次

九、成本估算：为什么这是 AI 给个人量化的最大红利

9.1 单次成本

项	数量	单价 (Sonnet)	小计
Input MD&A	10k tokens	$3 / M	$0.030
System + few-shot (cached)	1.5k tokens	$0.30 / M	$0.0005
Output JSON	500 tokens	$15 / M	$0.0075
合计	-	-	≈ $0.038

调成 self-consistency=3 → $0.11/份。

9.2 月度规模化

场景	公司数	每月 filings 数	月成本
SP100 跟踪	100	~33（季度均摊）	$1.3
SP500 跟踪	500	~167	$6.3
Russell 3000 跟踪	3000	~1000	$38
SP500 + 实时 8-K 监控	500 + 200 8-K/月	~367	$14

结论：SP500 全市场每月扫一遍 + 自一致性 = $20 以内。比一杯星巴克便宜，但能拿到一个 quant 团队几年前还要养 3 个分析师才能产出的信号集。

9.3 与传统方案的对比

方案	月成本	覆盖度	时延
Bloomberg Terminal	$2,500	全市场	实时但要人工读
雇佣初级分析师	$5,000+	30-50 家	1 天/份
RavenPack / S&P Capital IQ sentiment	$1,500+	全市场	实时
Claude API + EDGAR	$6-20	SP500	<1 小时

这是 LLM 在投研侧最直接的 cost disruption。不是因为 LLM 更聪明，而是因为单次推理便宜了 100-1000 倍。十年前同样的任务要用 ML 工程师团队训一个 BERT 微调模型，现在一个个人 quant 一晚上能搭起来。

十、质量校验：用 3 份金标准对照

「便宜」不等于「准」。LLM 输出要经过校验才能进 pipeline。

10.1 金标准集构建

挑 3 份你自己有 strong prior 的财报：

AAPL Q3 2024 10-Q（你看新闻知道他们 guide up 了 Services）
NVDA Q4 2024 10-K（你知道他们 guide up + 提到了出口管制 risk）
某个 guide down 的小盘股（找一个明显 miss & guide down 的）

人工读 MD&A，按 schema 标 5 个字段：

guidance_change
mgmt_tone (±0.2 的人工评分)
new_risks (列前 2 个)
key_metric_changes.revenue_yoy
extraction_confidence（你自己判断信号清晰度）

10.2 准确率阈值

字段	同意阈值	不达标的处理
`guidance_change`	100% 完全一致（这是闭集枚举，错就是错）	改 prompt 加更多 few-shot
`mgmt_tone`	\|diff\| ≤ 0.3	改 system prompt 给 tone 的锚定描述
`new_risks`	至少 1/2 重合	加 risk 类别清单到 prompt
`revenue_yoy`	数字 ±0.5pp 一致	通常很准，不达标查 MD&A 是否切错
`extraction_confidence`	不强求一致	仅作 sanity 用

10.3 失败模式列表

失败模式	表现	解决
把 reaffirm 误判为 raise	LLM 把 "we maintain our outlook of $X-Y" 标成 raise	few-shot 加 maintain 例子
过度乐观 tone	财报里有 negative 段也给 +0.5	强调「整体平均口径」
risk 过于宽泛	输出 "general market risk" 这种废话	要求「相对上期新增」+ 给负面例子
revenue_yoy 算错	把 sequential growth 当 YoY	prompt 明示 "year-over-year, not sequential"
JSON parse error	输出带前缀 / 多余文字	用 Anthropic tool use 强制 schema

十一、PM 视角：4 条迁移性思考

「LLM 是特征工程器」是 PM 范式：很多产品里 GenAI 卖点是「让 AI 做决策」，但 production 跑通的常常是「让 AI 做分类 / 抽取 / 改写」，下游接经典系统。Web2 的 NLU、推荐召回、客服分级，都是这个模式。
Prompt 是 PRD 的子类：写 prompt 跟写 PRD 一回事——定义输入、输出、edge case、verification。few-shot 例子等于 acceptance criteria。Schema 设计就是 API 设计。
金标准集是 ML 产品的「testing infra」：任何用 LLM 做生产的产品都要有金标准对照。这是从 software QA 借来的方法论。3 份手工标的成本，可以让你确信「跑 SP500 时 80% 是可信的」。
Cost disruption 比 capability disruption 更早落地：Claude 不是「比 5 年前 BERT 微调更准」，但便宜了 100 倍。100 倍的成本降幅打开新的产品形态，正如云计算让 AWS 取代自建机房。个人量化能跑 SP500 实时 NLP，10 年前是不可想象的奢侈。

十二、明日预告

Day 47: 把 Claude 抽取与 PEAD 信号合并，做 SP500 自动化扫描

写 dispatcher：财报发布日自动触发 10-Q fetch → MD&A 切段 → Claude extract → 入库
PEAD 经典公式：SUE (Standardized Unexpected Earnings) × revision direction
把 Claude 的 guidance_change + mgmt_tone 作为 PEAD 的 z-score 增强
构建截面排序：每天对当日发布财报的公司，计算综合分位数 → top decile 买入 / bottom decile 卖出（如有融券）
回测 2018-2024 的「Claude + PEAD」vs 纯 PEAD：预期 Sharpe 提升 0.2-0.4
实盘脚本（Paper）：从 EDGAR RSS 监听新 filing → 5 分钟内 Claude 抽完 → 信号入实盘 queue

实际执行记录

启动一项填一项，时间戳 + 卡点。

[hh:mm] pip install anthropic beautifulsoup4 lxml requests —
[hh:mm] 在 Anthropic console 拿 API key，export ANTHROPIC_API_KEY=... —
[hh:mm] 跑 fetch_10q.py AAPL 拉到第一份 10-Q —
[hh:mm] 跑 extract_mda.py filings/AAPL_*.html，确认 MD&A 字符数在 30-60k —
[hh:mm] 跑 claude_extract.py filings/AAPL_*.mda.txt AAPL，看到完整 JSON —
[hh:mm] 人工对照 AAPL 最新 10-Q 新闻，verify guidance_change / mgmt_tone 是否合理 —
[hh:mm] 再跑 NVDA / 一个 guide-down 的小盘股，3 份金标准 —
[hh:mm] 计算实际 cost（看 Anthropic dashboard），对照预估 $0.04/份 —
[hh:mm] 把脚本提交进 tr/llm/ 目录 —
[hh:mm] 更新进度文件 docs/daily/TR_PROGRESS.md Phase 2 Week 7 Day 46 ✅ —
卡点 / 学到的：

总字数：约 7,200 字 今日完成度：理论 ✓ / 实操（你自己执行）/ 笔记 ✓