Expert Day 238

zkML 基础 — EZKL / Giza / 架构

zkML 架构、ONNX → ZK circuit 编译、量化、Halo2/Plonky2 在 ML 中的应用

2026-12-25

Phase 4 - ZK电路开发实战 (Day 223-243)

zkMLEZKLGizaML-on-chainHalo2

日期: 2026-12-25 方向: ZK工程 / 电路开发阶段: Phase 4 - ZK电路开发实战 (Day 223-243) 标签: #zkML #EZKL #Giza #ML-on-chain #Halo2

今日目标

类型	内容
学习	zkML 架构、ONNX → ZK circuit 编译、量化、Halo2/Plonky2 在 ML 中的应用
实操	用 EZKL 跑一个 MNIST 小模型的完整 ZK 推理（gen-srs → prove → verify）
产出	zkml_demo/（含 model.onnx + ezkl 完整流程）

背景与定位

zkML 想解决什么？/ What does zkML solve?

把 ML 推理变成 ZK proof，让 verifier 信任「模型 M 输入 x 得到输出 y」而无需重新跑模型。三大场景：

Trustless AI Oracle：链上合约调用「AI 模型输出」（如医疗判断、信用评分），需要证明 那个模型 + 那个输入 → 那个输出。
Privacy-preserving ML：医院的私密患者数据，跑公司模型，证明结果但不暴露数据。
AI-Native dApps：游戏 NPC、链上 AI agent 行为可验证。

为什么难？/ Why is it hard?

ML 推理涉及：

浮点运算（ZK 域只支持整数）→ 需要 量化（quantization）
矩阵乘法 GEMM（百万次 MUL+ADD）→ 巨量约束
非线性激活（ReLU, Sigmoid）→ 需要 lookup
卷积 Conv2D → 巨大 tensor

一个普通 MNIST CNN（6 层）就要 ~5M 约束。GPT-2 需要 ~10^11 约束（不可行）。

zkML 架构图

       TRAINING (off-chain, normal ML)
       ──────────────────────────────
       PyTorch / TF model
              │
              ▼ export
       ┌────────────┐
       │  ONNX file │
       └────────────┘

       ZKML COMPILATION
       ──────────────────────────────
              │
              ▼ ezkl gen-settings
       ┌──────────────────┐
       │ Quantize weights │  fp32 → int8/int16
       │ Quantize input   │
       └──────────────────┘
              │
              ▼ ezkl compile-circuit
       ┌──────────────────┐
       │  Halo2 circuit   │  ← intermediate
       │  (~5M constraints)│
       └──────────────────┘
              │
              ▼ ezkl gen-srs / setup
       ┌──────────────────┐
       │  pk + vk         │
       └──────────────────┘

       INFERENCE (with proof)
       ──────────────────────────────
       input data x
              │
              ▼ ezkl gen-witness
       ┌──────────────────┐
       │  witness         │
       └──────────────────┘
              │
              ▼ ezkl prove
       ┌──────────────────┐
       │  proof           │
       └──────────────────┘
              │
              ▼ on-chain verify
       Solidity verifier

完整代码：MNIST + EZKL 流程

1. 训练并导出 ONNX

# train.py
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision

class TinyCNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv = nn.Conv2d(1, 8, 3, padding=1)   # 1x28x28 → 8x28x28
        self.pool = nn.AvgPool2d(2)                 # 8x14x14
        self.fc   = nn.Linear(8*14*14, 10)
    def forward(self, x):
        x = torch.relu(self.conv(x))
        x = self.pool(x)
        x = x.flatten(1)
        return self.fc(x)

model = TinyCNN()
# ... train on MNIST until ~95% accuracy ...

# export ONNX
dummy_input = torch.zeros(1, 1, 28, 28)
torch.onnx.export(
    model, dummy_input, "network.onnx",
    input_names=["input"], output_names=["output"],
    opset_version=11
)

2. 生成 input.json + cal_data.json

# gen_inputs.py
import json
import numpy as np

# pick a sample digit
sample = np.load("digit7.npy").astype(np.float32)   # shape (1,1,28,28)

with open("input.json", "w") as f:
    json.dump({
        "input_data": [sample.flatten().tolist()],
    }, f)

# calibration set (used for quantization range)
cal = np.random.randn(20, 1, 28, 28).astype(np.float32)
with open("cal_data.json", "w") as f:
    json.dump({"input_data": [cal.flatten().tolist()]}, f)

3. EZKL 完整命令流程

# 安装 ezkl
pip install ezkl
# 或 cargo install --locked ezkl

# Step 1: 生成默认 settings
ezkl gen-settings -M network.onnx --settings-path settings.json

# Step 2: 用 calibration 数据自动调整量化精度
ezkl calibrate-settings \
     -D cal_data.json \
     -M network.onnx \
     --settings-path settings.json \
     --target accuracy

# Step 3: 编译电路
ezkl compile-circuit \
     -M network.onnx \
     --compiled-circuit network.compiled \
     --settings-path settings.json

# Step 4: 下载 SRS（Powers of Tau / Halo2 KZG SRS）
ezkl get-srs --settings-path settings.json

# Step 5: setup → produce pk + vk
ezkl setup \
     --compiled-circuit network.compiled \
     --vk-path vk.key \
     --pk-path pk.key

# Step 6: 生成 witness
ezkl gen-witness \
     -D input.json \
     --compiled-circuit network.compiled \
     -O witness.json

# Step 7: prove
ezkl prove \
     --witness witness.json \
     --compiled-circuit network.compiled \
     --pk-path pk.key \
     --proof-path proof.json

# Step 8: 链下 verify
ezkl verify \
     --proof-path proof.json \
     --vk-path vk.key \
     --settings-path settings.json

# Step 9: 导出 Solidity verifier
ezkl create-evm-verifier \
     --vk-path vk.key \
     --sol-code-path Verifier.sol \
     --settings-path settings.json

真实数据 / Real Numbers (MNIST TinyCNN)

指标	数值
ONNX 模型大小	~250 KB (fp32)
Halo2 circuit constraints	~500,000
SRS size (k=18)	~50 MB
Setup time	~30 sec on M1
Prove time	~10-30 sec
Proof size	~10 KB
Verify time	~50 ms
EVM verify gas	~600,000-2,000,000

更大模型的实测：

ResNet-18: ~1B constraints, prove ~30 min, gas ~5M
GPT-2 (single inference): ~10^11 约束（接近不可行）

关键技术：量化 (Quantization)

zkML 域是 prime field（如 bn254 ≈ 2^254），没有浮点。所有运算必须 int。

fp32 weight w = 0.1234
quantized:    w_q = round(0.1234 * 2^16) = 8087
              (in field as integer)

fp32 mult:    a * b
in ZK:        a_q * b_q ≈ (a * b) * 2^16 ⇒ shift back

量化挑战：

精度损失（int8 通常掉 1-3% accuracy）
ReLU max(0, x) 在域里没有大小，要用 lookup
Sigmoid/Tanh 用 piecewise linear approx + lookup

EZKL 的 calibrate-settings 自动选择最低精度满足 accuracy 阈值。

EZKL vs Giza vs Other zkML

项目	维护者	Backend	输入格式	状态
EZKL	Zkonduit	Halo2 + KZG	ONNX	主流，生产
Giza	Giza Tech	Cairo + STARK	PyTorch	StarkNet 生态
DDKang zkml	Daniel Kang	Halo2	ONNX	学术原型
Modulus Labs	Modulus	自研	自研	闭源
opML (ORA)	ORA	乐观验证	PyTorch	不是真 ZK

EZKL 是事实标准（最易上手 + 最广支持的 ONNX ops）。

应用案例 / Applications

1. Worldcoin 虹膜验证

虹膜模型 inference 在 Orb 设备本地跑，理论上 zkML 可以让用户证明「我虹膜匹配 World ID 模型」而不上传虹膜。目前未上 prod。

2. Modulus Labs - Rocky AI

Rocky 是链上 AI agent，用 ZK 证明每次决策（如 DeFi 调仓）都来自承诺的模型。

3. Bittensor + zkML

Bittensor subnets 提供 AI 服务，未来可用 zkML 让客户端验证返回结果是真实模型推理。

4. ZK Lottery / Game

链上游戏中的「AI NPC 行为」可以 zkML 证明，防止运营方作弊。

工程陷阱 / Engineering Pitfalls

陷阱 1：量化精度 vs 约束数

提高精度（int16 → int32）让 accuracy 接近 fp32，但约束数 ~2× 增加（更多 lookup）。要在 accuracy/cost 间 tune。

陷阱 2：模型结构不友好

BatchNorm 在推理时融合即可
Softmax 极贵（exp + division），尽量在 logits 输出
LayerNorm 同上
Dropout 不影响（推理无效）

陷阱 3：proof 太大

10 KB+ 的 proof 上 EVM 调用 calldata 也贵（10000 bytes × 16 gas/byte = 160k gas just for calldata）。Risc Zero / Plonky3 试图通过 final wrap 把 proof 压到 1 KB。

陷阱 4：EVM verifier gas 上限

复杂模型 verifier 可能超过 30M block gas limit。优化：(a) 用 aggregator chain (Aleo, StarkNet) 跑；(b) 用 application rollup；(c) 用 KZG ceremony 的更小 vkey。

真实运行（参考输出）

$ ezkl prove --proof-path proof.json ...
[INFO] generating proof for circuit (k=17)...
[INFO] num constraints: 487,652
[INFO] num lookup ops: 23,108
[INFO] proving... (10.4 sec)
[INFO] proof saved to proof.json (8,924 bytes)

$ ezkl verify ...
[INFO] proof valid ✓

$ ezkl create-evm-verifier ...
[INFO] gas estimate: ~720,000
[INFO] verifier saved to Verifier.sol

关键速查

EZKL 7-step pipeline:
  gen-settings → calibrate → compile → setup → witness → prove → verify

Output proof size:
  small CNN:      ~10 KB
  ResNet-18:      ~30 KB (with aggregation)
  Big LLM:        not feasible yet

Verify gas (EVM):
  ~500k - 2M depending on circuit size

面试题

Q: 为什么 zkML 难？把 GPT-4 整个 inference 转 ZK 现实吗？ A: 三大难点：(1) ML 是浮点 + 大 tensor，需量化（损失精度）；(2) ReLU/Softmax 等非线性激活需要 lookup；(3) 约束数与模型规模 ~线性。GPT-4 约 10^15 FLOPs，转 ZK 约束数同量级，目前完全不可行。短期 zkML 只支持小模型。
Q: zkML 与 opML 的差别？ A: zkML 用 ZK 证明每次推理（trustless 但贵），opML 用乐观策略（默认信任，挑战时 ZK 证明）便宜 1000×。trade-off 是延迟（opML 需 challenge period）和 trust assumption（opML 假设至少 1 个 honest watcher）。
Q: EZKL 如何处理 Float？ A: 量化：fp32 weights 与 input 在编译期 round 到固定精度（如 int16，scale = 2^13）。所有运算在域内整数完成；最终输出反量化或在电路中 stay-quantized。EZKL 提供 calibrate-settings 自动选最优精度。
Q: 你看好 zkML 的什么应用方向？ A: 短期：(a) 小模型链上验证（信用评分、欺诈检测）；(b) 隐私 ML（医疗 + 金融）；(c) AI agent 可信性（Modulus 的方向）。长期看 opML/TEE 可能比纯 zkML 更实用，三者会并存而非互斥。

明日预告

Day 239 — opML 与 TEE-ML。乐观 ML 验证（ORA）vs TEE-based ML（Phala/Marlin）vs 纯 zkML，三者优劣对比。