AI Day 54

AI Day 54: 实战(4)：LoRA微调实战 — 训练你的专属模型

2026-05-25

日期: 2026-05-25 | 阶段: 第五阶段 · 动手实战 (Day 51-60) | 主题: LoRA Fine-tuning in Practice

学习路径 / Learning Path

AI/LLM 深度技术学习 60天计划
├── 第一阶段：模型基础 (Day 1-15) ✅
│   ├── Day 1: Transformer与LLM基础 ✅
│   ├── Day 2: 量化与本地部署 ✅
│   ├── Day 3: 训练全流程 ✅
│   ├── Day 4: Prompt Engineering ✅
│   ├── Day 5: RAG架构 ✅
│   ├── Day 6: 向量数据库与Embedding ✅
│   ├── Day 7: 微调技术 ✅
│   ├── Day 8: 推理优化 ✅
│   ├── Day 9: 长上下文技术 ✅
│   ├── Day 10: 多模态模型 ✅
│   ├── Day 11: 推理模型 ✅
│   ├── Day 12: Agent框架 ✅
│   ├── Day 13: MCP协议 ✅
│   ├── Day 14: 模型评估 ✅
│   └── Day 15: 阶段一总结 ✅
├── 第二阶段：工程实践 (Day 16-30) ✅
│   ├── Day 16: LLM应用架构 ✅
│   ├── Day 17: 安全与护栏 ✅
│   ├── Day 18: 可观测性 ✅
│   ├── Day 19: 生产RAG·解析与分块 ✅
│   ├── Day 20: 生产RAG·检索与重排 ✅
│   ├── Day 21: 生产RAG·评估与迭代 ✅
│   ├── Day 22: Agent状态与恢复 ✅
│   ├── Day 23: Agent成本优化 ✅
│   ├── Day 24: 多Agent系统 ✅
│   ├── Day 25: Agent测试部署 ✅
│   ├── Day 26: LLM成本工程 ✅
│   ├── Day 27: 多模型编排 ✅
│   ├── Day 28: LLM应用测试 ✅
│   ├── Day 29: 企业LLM平台 ✅
│   └── Day 30: 阶段二总结 ✅
├── 第三阶段：金融零售AI应用 (Day 31-42) ✅
│   ├── Day 31: 金融AI风控 ✅
│   ├── Day 32: 智能投顾与量化 ✅
│   ├── Day 33: 合规与RegTech ✅
│   ├── Day 34: 信贷AI全链路 ✅
│   ├── Day 35: 金融AI总结 ✅
│   ├── Day 36: 零售AI推荐 ✅
│   ├── Day 37: 智能客服 ✅
│   ├── Day 38: 供应链AI ✅
│   ├── Day 39: 智能营销 ✅
│   ├── Day 40: 零售AI总结 ✅
│   ├── Day 41: CeFi×DeFi×AI融合 ✅
│   └── Day 42: AI融合案例与职业 ✅
├── 第四阶段：面试冲刺 (Day 43-50) ✅
│   ├── Day 43: 系统设计·LLM平台 ✅
│   ├── Day 44: 系统设计·RAG系统 ✅
│   ├── Day 45: 系统设计·Agent系统 ✅
│   ├── Day 46: 系统设计·推荐系统 ✅
│   ├── Day 47: 面试·产品AI ✅
│   ├── Day 48: 面试·架构AI ✅
│   ├── Day 49: 面试·行为AI ✅
│   └── Day 50: 学习总结 ✅
└── 第五阶段：动手实战 (Day 51-60)
    ├── Day 51: 本地大模型部署全流程 ✅
    ├── Day 52: RAG系统实战：从文档到问答 ✅
    ├── Day 53: RAG进阶：评估优化与生产化 ✅
    ├── Day 54: LoRA微调实战：训练你的专属模型 ← 你在这里
    ├── Day 55: Agent开发实战：构建工具调用Agent
    ├── Day 56: MCP Server开发：扩展AI能力边界
    ├── Day 57: 多模态应用：图文理解与文档分析
    ├── Day 58: AI应用全栈开发：前后端集成
    ├── Day 59: 性能调优与成本实战
    └── Day 60: 总结与作品集

核心概念 / Core Concepts

微调 = 给通用模型注入专属知识和风格 / Fine-tuning = Injecting Domain Knowledge & Style

Day 7 学过微调的理论，今天来真正动手！

通用基座模型（如 Qwen2.5-7B）：
├── 知道很多通用知识
├── 但不了解你的笔记内容
├── 不了解你的写作风格
├── 不了解你的领域偏好
└── 回答时"泛泛而谈"

微调后的模型：
├── 继承通用知识
├── ✅ 了解你笔记中的专业知识
├── ✅ 模仿你的表达风格
├── ✅ 在你的领域回答更精准
└── ✅ 像一个"懂你"的助手

RAG vs 微调：互补而非替代 / RAG vs Fine-tuning: Complementary

Day 52-53 做了 RAG，今天做微调
它们解决不同的问题：

RAG（检索增强生成）：
├── 优势：实时更新、可追溯来源、不改变模型
├── 适合：事实查询、最新信息、需要引用
└── 类比：给模型一本"参考书"，边翻边答

微调（LoRA Fine-tuning）：
├── 优势：内化知识、风格模仿、更快推理
├── 适合：固定领域、风格一致性、离线场景
└── 类比：让模型"上课学习"，记到脑子里

最佳实践 = RAG + 微调：
  微调让模型"说得对"（领域知识 + 回答风格）
  RAG 让模型"有依据"（检索最新内容 + 来源引用）

今天的目标 / Today's Goal

完成一次完整的微调流程：

环境准备 → 数据准备 → 训练 → 调试 → 导出 → 评估
   ↓          ↓        ↓      ↓      ↓      ↓
 Unsloth    QA对    QLoRA   Loss   GGUF   对比
              ↓
          Alpaca格式
              ↓
         500-2000条

最终产出：一个微调过的模型，能在 Ollama 中运行

知识点1：环境准备 / Environment Setup

Unsloth — 最快的微调框架 / The Fastest Fine-tuning Framework

为什么选 Unsloth？

Day 7 学微调时介绍了多种框架：
- HuggingFace PEFT: 最标准但速度一般
- Axolotl: 配置灵活但学习曲线陡
- LLaMA-Factory: 中文生态好但更新慢
- Unsloth: 速度最快、显存最省 ← 选这个！

Unsloth 的核心优势：
├── 速度: 比标准 PEFT 快 2-5 倍（手动优化了反向传播）
├── 显存: 节省 60-80%（梯度检查点 + 内核融合）
├── 易用: API 极简，几行代码开始训练
├── 兼容: 支持 QLoRA、LoRA、Full Fine-tuning
└── 导出: 直接导出 GGUF（不用手动转换）

8GB 显存的 RTX 4060 完全够用！

安装与依赖配置 / Installation & Dependencies

# === Step 1: 创建专用虚拟环境 ===
# 隔离微调环境，避免与 RAG 项目冲突

conda create -n finetune python=3.11 -y
conda activate finetune

# === Step 2: 安装 PyTorch (CUDA 12.1) ===
# RTX 4060 支持 CUDA 12.x

pip install torch==2.4.0 torchvision torchaudio \
  --index-url https://download.pytorch.org/whl/cu121

# === Step 3: 安装 Unsloth ===
# 从 GitHub 安装最新版（PyPI 版本可能滞后）

pip install "unsloth[cu121-ampere] @ git+https://github.com/unslothai/unsloth.git"

# === Step 4: 安装其他依赖 ===
pip install datasets transformers accelerate peft
pip install bitsandbytes  # 4-bit 量化支持
pip install trl            # 训练循环工具
pip install wandb          # (可选) 训练可视化

CUDA 验证 / CUDA Verification

"""
verify_env.py
验证微调环境是否就绪
"""

import torch
import sys

def verify_environment():
    """一键验证所有依赖"""
    results = {}

    # 1. Python 版本
    results["python"] = sys.version.split()[0]
    print(f"Python: {results['python']}")

    # 2. PyTorch + CUDA
    results["torch"] = torch.__version__
    results["cuda_available"] = torch.cuda.is_available()
    results["cuda_version"] = torch.version.cuda if torch.cuda.is_available() else "N/A"
    print(f"PyTorch: {results['torch']}")
    print(f"CUDA Available: {results['cuda_available']}")
    print(f"CUDA Version: {results['cuda_version']}")

    # 3. GPU 信息
    if torch.cuda.is_available():
        gpu_name = torch.cuda.get_device_name(0)
        gpu_mem = torch.cuda.get_device_properties(0).total_mem / 1024**3
        results["gpu_name"] = gpu_name
        results["gpu_memory_gb"] = round(gpu_mem, 1)
        print(f"GPU: {gpu_name}")
        print(f"GPU Memory: {results['gpu_memory_gb']} GB")
    else:
        print("WARNING: No GPU detected! Fine-tuning will be extremely slow.")
        return results

    # 4. Unsloth
    try:
        from unsloth import FastLanguageModel
        results["unsloth"] = "OK"
        print(f"Unsloth: OK")
    except ImportError as e:
        results["unsloth"] = f"FAILED: {e}"
        print(f"Unsloth: FAILED - {e}")

    # 5. bitsandbytes (4-bit 量化)
    try:
        import bitsandbytes as bnb
        results["bitsandbytes"] = bnb.__version__
        print(f"bitsandbytes: {results['bitsandbytes']}")
    except ImportError:
        results["bitsandbytes"] = "FAILED"
        print(f"bitsandbytes: FAILED")

    # 6. 显存测试 — 能否加载 4-bit 7B 模型
    print("\n--- Quick Memory Test ---")
    free_mem = torch.cuda.mem_get_info()[0] / 1024**3
    print(f"Free GPU Memory: {free_mem:.1f} GB")

    if free_mem >= 5.0:
        print("Memory OK: 足够加载 4-bit 7B 模型并训练")
    elif free_mem >= 3.0:
        print("Memory Tight: 可以训练但需要减小 batch_size")
    else:
        print("Memory Low: 建议关闭其他 GPU 程序")

    return results

if __name__ == "__main__":
    results = verify_environment()

    all_ok = (
        results.get("cuda_available", False)
        and results.get("unsloth") == "OK"
        and results.get("bitsandbytes") != "FAILED"
    )

    print(f"\n{'='*50}")
    if all_ok:
        print("ALL CHECKS PASSED — Ready to fine-tune!")
    else:
        print("SOME CHECKS FAILED — Fix issues above before continuing.")

RTX 4060 8GB 适配要点 / RTX 4060 8GB Optimization Tips

8GB 显存 = 够但需要精打细算

显存预算分配：
├── 4-bit 量化模型: ~3.5 GB (Qwen2.5-7B-Q4)
├── LoRA 权重:      ~0.2 GB (rank=16)
├── 优化器状态:     ~0.8 GB (Adam states)
├── 梯度:           ~0.5 GB (gradient checkpointing)
├── 激活值:         ~1.5 GB (batch_size=1)
└── 其余:           ~1.5 GB (框架开销)
                    ─────────
                    ~8.0 GB ← 正好！

如果 OOM，依次尝试：
1. gradient_checkpointing = True  (默认就开)
2. per_device_train_batch_size = 1
3. max_seq_length = 1024 → 512
4. 关闭 wandb 可视化 (省一点显存)
5. 降低 rank: 16 → 8

关键：训练前先关闭 Ollama！
  Ollama 即使空闲也会占用 ~1GB 显存
  taskkill /f /im ollama.exe (Windows)

知识点2：训练数据准备 / Training Data Preparation

从笔记中提取QA对 / Extracting QA Pairs from Notes

"""
prepare_data.py
从学习笔记中提取微调训练数据

数据来源：
- Day 1-53 的学习笔记 (docs/ai/day*.md)
- 已有的 Golden QA 集 (Day 53 构建的)
- 面试题答案 (Day 47-49)

目标数量：500-2000 条
太少：模型学不到什么
太多：过拟合 + 训练时间长
500-2000条：sweet spot for domain adaptation
"""

import json
import re
from pathlib import Path

class NoteQAExtractor:
    """从学习笔记中提取 QA 训练对"""

    def __init__(self, notes_dir: str = "docs/ai"):
        self.notes_dir = Path(notes_dir)
        self.qa_pairs = []

    def extract_all(self) -> list[dict]:
        """从所有笔记中提取 QA 对"""
        note_files = sorted(self.notes_dir.glob("day*.md"))
        print(f"Found {len(note_files)} note files")

        for note_file in note_files:
            content = note_file.read_text(encoding="utf-8")
            day_num = self._parse_day_number(note_file.name)

            # 策略1: 从标题和内容生成 QA
            self._extract_from_sections(content, note_file.name, day_num)

            # 策略2: 从代码块注释生成 QA
            self._extract_from_code_comments(content, note_file.name, day_num)

            # 策略3: 从"思考"部分生成 QA
            self._extract_from_reflections(content, note_file.name, day_num)

        print(f"Total QA pairs extracted: {len(self.qa_pairs)}")
        return self.qa_pairs

    def _parse_day_number(self, filename: str) -> int:
        match = re.search(r"day(\d+)", filename)
        return int(match.group(1)) if match else 0

    def _extract_from_sections(self, content: str, filename: str, day: int):
        """从 ## 标题和下方内容提取 QA"""
        sections = re.split(r"\n## ", content)

        for section in sections[1:]:  # 跳过第一段（标题前的内容）
            lines = section.strip().split("\n")
            title = lines[0].strip()

            # 跳过非知识性标题
            if any(skip in title for skip in ["学习路径", "明日预告", "资源"]):
                continue

            body = "\n".join(lines[1:]).strip()
            if len(body) < 100:
                continue

            # 从标题生成问题
            question = self._title_to_question(title)
            if question:
                # 提取关键内容作为答案（不要太长）
                answer = self._extract_answer(body, max_len=500)
                if answer:
                    self.qa_pairs.append({
                        "instruction": question,
                        "input": "",
                        "output": answer,
                        "source": filename,
                        "day": day,
                    })

    def _extract_from_code_comments(self, content: str, filename: str, day: int):
        """从代码块的注释/docstring提取 QA"""
        code_blocks = re.findall(r"```(?:python|bash)?\n(.*?)```", content, re.DOTALL)

        for block in code_blocks:
            # 提取 docstring 或注释
            docstring = re.search(r'"""(.*?)"""', block, re.DOTALL)
            if docstring:
                doc = docstring.group(1).strip()
                if len(doc) > 50:
                    # 第一行通常是"标题"
                    doc_lines = doc.split("\n")
                    title = doc_lines[0].strip()
                    explanation = "\n".join(doc_lines[1:]).strip()

                    if explanation:
                        self.qa_pairs.append({
                            "instruction": f"请解释：{title}",
                            "input": "",
                            "output": explanation[:500],
                            "source": filename,
                            "day": day,
                        })

    def _extract_from_reflections(self, content: str, filename: str, day: int):
        """从'思考'部分提取 QA"""
        # 匹配 ### 思考X：xxx 格式
        reflections = re.findall(
            r"### 思考\d+[：:](.*?)\n```\n(.*?)```",
            content,
            re.DOTALL,
        )

        for title, body in reflections:
            title = title.strip().split("/")[0].strip()
            if len(body.strip()) > 100:
                self.qa_pairs.append({
                    "instruction": f"你对「{title}」有什么思考？",
                    "input": "",
                    "output": body.strip()[:500],
                    "source": filename,
                    "day": day,
                })

    def _title_to_question(self, title: str) -> str:
        """将章节标题转化为自然语言问题"""
        # 移除英文部分和特殊字符
        cn_title = title.split("/")[0].strip()
        cn_title = re.sub(r"[#*`]", "", cn_title).strip()

        if not cn_title or len(cn_title) < 4:
            return ""

        # 生成问题
        question_templates = [
            f"请解释什么是{cn_title}？",
            f"{cn_title}的核心原理是什么？",
            f"请详细说明{cn_title}。",
        ]
        # 轮流使用不同模板
        idx = hash(cn_title) % len(question_templates)
        return question_templates[idx]

    def _extract_answer(self, body: str, max_len: int = 500) -> str:
        """从正文提取关键内容作为答案"""
        # 优先提取 ``` 代码块内的文字说明（非代码的）
        text_blocks = re.findall(r"```\n(.*?)```", body, re.DOTALL)

        # 过滤纯代码块，保留文字说明块
        text_content = []
        for block in text_blocks:
            # 如果没有 import/def/class 等代码关键字，认为是文字
            if not re.search(r"\b(import|def |class |from |return )\b", block):
                text_content.append(block.strip())

        if text_content:
            return "\n\n".join(text_content)[:max_len]

        # 如果没有文字代码块，取正文前 N 字
        plain = re.sub(r"```.*?```", "", body, flags=re.DOTALL)
        plain = re.sub(r"\n{3,}", "\n\n", plain).strip()
        return plain[:max_len] if len(plain) > 50 else ""

    def save(self, output_path: str = "train_data.json"):
        """保存为 JSON 文件"""
        with open(output_path, "w", encoding="utf-8") as f:
            json.dump(self.qa_pairs, f, ensure_ascii=False, indent=2)
        print(f"Saved {len(self.qa_pairs)} QA pairs to {output_path}")


if __name__ == "__main__":
    extractor = NoteQAExtractor("docs/ai")
    extractor.extract_all()
    extractor.save("train_data.json")

Alpaca / ShareGPT 格式转换 / Format Conversion

"""
format_converter.py
将 QA 对转换为微调框架要求的标准格式

两种主流格式：
1. Alpaca 格式 — 最通用，Unsloth 默认支持
2. ShareGPT 格式 — 多轮对话，LLaMA-Factory 常用
"""

import json

# === Alpaca 格式 ===
# Unsloth、Axolotl、LLaMA-Factory 都支持
# 最简单直接，适合单轮QA

ALPACA_TEMPLATE = {
    "instruction": "用户的问题/指令",
    "input": "可选的额外上下文（大部分为空）",
    "output": "模型应该给出的回答",
}

def convert_to_alpaca(qa_pairs: list[dict]) -> list[dict]:
    """
    确保数据符合 Alpaca 格式

    Alpaca 格式示例：
    {
        "instruction": "LoRA微调的核心原理是什么？",
        "input": "",
        "output": "LoRA通过在预训练模型的权重矩阵旁添加低秩分解矩阵..."
    }
    """
    alpaca_data = []
    for qa in qa_pairs:
        alpaca_data.append({
            "instruction": qa["instruction"],
            "input": qa.get("input", ""),
            "output": qa["output"],
        })
    return alpaca_data


# === ShareGPT 格式 ===
# 支持多轮对话

def convert_to_sharegpt(qa_pairs: list[dict]) -> list[dict]:
    """
    转换为 ShareGPT 格式

    ShareGPT 格式示例：
    {
        "conversations": [
            {"from": "human", "value": "LoRA微调的核心原理是什么？"},
            {"from": "gpt", "value": "LoRA通过..."}
        ]
    }
    """
    sharegpt_data = []
    for qa in qa_pairs:
        conversation = {
            "conversations": [
                {"from": "human", "value": qa["instruction"]},
                {"from": "gpt", "value": qa["output"]},
            ]
        }
        sharegpt_data.append(conversation)
    return sharegpt_data


# === 手动补充高质量 QA 对 ===
# 自动提取的 QA 质量参差不齐
# 手动补充 50-100 条高质量 QA 可以显著提升效果

MANUAL_QA = [
    {
        "instruction": "什么是LoRA微调？和全量微调有什么区别？",
        "input": "",
        "output": (
            "LoRA (Low-Rank Adaptation) 是一种参数高效微调方法。"
            "核心原理：不修改原始模型权重W，而是在旁边添加两个小矩阵A和B，"
            "使得 W' = W + BA。其中A和B是低秩矩阵，参数量只有原始的0.1-1%。\n\n"
            "与全量微调的区别：\n"
            "1. 参数量：LoRA只训练新增的A/B矩阵（几十MB），全量微调训练所有参数（几十GB）\n"
            "2. 显存：LoRA在8GB显卡上可训练7B模型，全量微调7B需要60-80GB\n"
            "3. 速度：LoRA快2-5倍\n"
            "4. 效果：在下游任务上，LoRA接近全量微调的效果\n"
            "5. 灵活性：LoRA权重可以热切换，一个基座配多个LoRA"
        ),
    },
    {
        "instruction": "RAG和微调各自适合什么场景？应该怎么选？",
        "input": "",
        "output": (
            "RAG适合的场景：\n"
            "1. 知识库频繁更新（如新闻、文档）\n"
            "2. 需要引用来源、可追溯\n"
            "3. 数据量大、内容多样\n"
            "4. 需要精确的事实查询\n\n"
            "微调适合的场景：\n"
            "1. 领域固定、知识相对稳定\n"
            "2. 需要特定输出风格/格式\n"
            "3. 需要更快的推理速度（无检索延迟）\n"
            "4. 离线/边缘部署场景\n\n"
            "最佳实践：两者结合。微调让模型'懂行'，RAG提供'依据'。"
        ),
    },
    # ... 继续添加 50-100 条高质量 QA
]

数据清洗和质量检查 / Data Cleaning & Quality Check

"""
data_quality.py
微调数据质量检查 — 垃圾进垃圾出，数据质量决定微调效果

Day 3 学过：数据质量 > 数据数量
  100条高质量数据 > 10000条低质量数据
"""

import json
import re
from collections import Counter

class DataQualityChecker:
    """微调数据质量检查器"""

    def __init__(self, data: list[dict]):
        self.data = data
        self.issues = []

    def run_all_checks(self) -> dict:
        """运行所有质量检查"""
        print(f"\n{'='*60}")
        print(f"Data Quality Check: {len(self.data)} samples")
        print(f"{'='*60}")

        stats = {
            "total": len(self.data),
            "passed": 0,
            "issues": [],
        }

        # 检查1: 空值检测
        self._check_empty_fields()

        # 检查2: 长度分布
        self._check_length_distribution()

        # 检查3: 重复检测
        self._check_duplicates()

        # 检查4: 指令多样性
        self._check_instruction_diversity()

        # 检查5: 输出质量
        self._check_output_quality()

        # 检查6: 格式一致性
        self._check_format_consistency()

        stats["issues"] = self.issues
        stats["passed"] = stats["total"] - len(self.issues)

        self._print_summary(stats)
        return stats

    def _check_empty_fields(self):
        """检查空字段"""
        empty_count = 0
        for i, item in enumerate(self.data):
            if not item.get("instruction", "").strip():
                self.issues.append(f"#{i}: Empty instruction")
                empty_count += 1
            if not item.get("output", "").strip():
                self.issues.append(f"#{i}: Empty output")
                empty_count += 1
        print(f"  Empty fields: {empty_count}")

    def _check_length_distribution(self):
        """检查长度分布"""
        inst_lens = [len(d["instruction"]) for d in self.data]
        out_lens = [len(d["output"]) for d in self.data]

        print(f"\n  Instruction length:")
        print(f"    Min={min(inst_lens)} Max={max(inst_lens)} "
              f"Avg={sum(inst_lens)/len(inst_lens):.0f}")

        print(f"  Output length:")
        print(f"    Min={min(out_lens)} Max={max(out_lens)} "
              f"Avg={sum(out_lens)/len(out_lens):.0f}")

        # 标记过短的输出
        short_count = sum(1 for l in out_lens if l < 20)
        if short_count:
            print(f"  WARNING: {short_count} outputs shorter than 20 chars")

        # 标记过长的输出
        long_count = sum(1 for l in out_lens if l > 2000)
        if long_count:
            print(f"  WARNING: {long_count} outputs longer than 2000 chars")
            print(f"    Recommend truncating to avoid training instability")

    def _check_duplicates(self):
        """检查重复数据"""
        instructions = [d["instruction"] for d in self.data]
        dups = [inst for inst, count in Counter(instructions).items() if count > 1]

        if dups:
            print(f"\n  Duplicates: {len(dups)} duplicate instructions found")
            for d in dups[:5]:
                print(f"    - {d[:60]}...")
        else:
            print(f"\n  Duplicates: None found")

    def _check_instruction_diversity(self):
        """检查指令多样性"""
        first_words = [d["instruction"].split()[0] if d["instruction"] else ""
                       for d in self.data]
        word_dist = Counter(first_words).most_common(10)

        print(f"\n  Instruction starting words (top 10):")
        for word, count in word_dist:
            pct = count / len(self.data) * 100
            print(f"    '{word}': {count} ({pct:.1f}%)")

        # 如果单一开头词超过50%，多样性不足
        if word_dist[0][1] / len(self.data) > 0.5:
            print(f"  WARNING: Low diversity — '{word_dist[0][0]}' dominates")

    def _check_output_quality(self):
        """检查输出质量"""
        low_quality = 0
        for i, item in enumerate(self.data):
            output = item["output"]

            # 检查是否只是复制了 instruction
            if output.strip() == item["instruction"].strip():
                self.issues.append(f"#{i}: Output == Instruction")
                low_quality += 1

            # 检查是否包含 markdown 残留
            if output.startswith("```") or output.startswith("---"):
                self.issues.append(f"#{i}: Raw markdown in output")
                low_quality += 1

        print(f"\n  Low quality outputs: {low_quality}")

    def _check_format_consistency(self):
        """检查格式一致性"""
        has_input = sum(1 for d in self.data if d.get("input", "").strip())
        print(f"\n  Format:")
        print(f"    With input field: {has_input}/{len(self.data)}")
        print(f"    Without input: {len(self.data) - has_input}/{len(self.data)}")

    def _print_summary(self, stats: dict):
        """打印总结"""
        print(f"\n{'='*60}")
        print(f"Summary:")
        print(f"  Total samples: {stats['total']}")
        print(f"  Issues found:  {len(stats['issues'])}")
        print(f"  Quality rate:  {stats['passed']/stats['total']*100:.1f}%")

        if stats["total"] < 500:
            print(f"\n  RECOMMENDATION: Data too few ({stats['total']})")
            print(f"    Target: 500-2000 samples for good results")
        elif stats["total"] > 5000:
            print(f"\n  RECOMMENDATION: Consider reducing to 2000-3000")
            print(f"    Too much data may cause overfitting on 7B model")
        else:
            print(f"\n  Data quantity OK for fine-tuning.")
        print(f"{'='*60}")


def clean_data(data: list[dict]) -> list[dict]:
    """清洗数据：去重、去空、截断过长"""
    seen = set()
    cleaned = []

    for item in data:
        inst = item["instruction"].strip()
        output = item["output"].strip()

        # 跳过空值
        if not inst or not output:
            continue

        # 跳过重复
        if inst in seen:
            continue
        seen.add(inst)

        # 截断过长输出（保留前1500字）
        if len(output) > 1500:
            output = output[:1500] + "..."

        cleaned.append({
            "instruction": inst,
            "input": item.get("input", ""),
            "output": output,
        })

    print(f"Cleaned: {len(data)} → {len(cleaned)} samples")
    return cleaned

知识点3：QLoRA 微调流程 / QLoRA Fine-tuning Pipeline

基座模型选择 / Base Model Selection

为什么选 Qwen2.5-7B？

Day 51 部署时的经验告诉我们：

模型选择矩阵：
┌──────────────────┬──────────┬───────────┬──────────┐
│ 模型             │ 显存需求 │ 中文能力  │ 微调支持 │
├──────────────────┼──────────┼───────────┼──────────┤
│ Qwen2.5-7B       │ ~3.5GB   │ ★★★★★     │ ★★★★★    │ ← 选这个
│ Llama-3.1-8B     │ ~4.0GB   │ ★★★       │ ★★★★★    │
│ Mistral-7B       │ ~3.8GB   │ ★★        │ ★★★★     │
│ Yi-1.5-9B        │ ~4.5GB   │ ★★★★      │ ★★★      │
│ Qwen2.5-3B       │ ~2.0GB   │ ★★★★      │ ★★★★★    │   降级备选
└──────────────────┴──────────┴───────────┴──────────┘

选择理由：
1. 中文能力最强 — 笔记中英混合，中文为主
2. 显存友好 — 4-bit 量化只需 ~3.5GB
3. Unsloth 一等公民支持
4. 社区活跃，问题容易找到解答

完整训练脚本 / Full Training Script

"""
train.py
LoRA 微调完整训练脚本

Day 7 学的理论：
  LoRA 添加低秩矩阵 A (d×r) 和 B (r×d)
  W' = W + α/r × BA
  只训练 A 和 B，参数量 ≈ 原始的 0.1%

今天的实践：
  基座: Qwen2.5-7B (4-bit 量化)
  方法: QLoRA (量化 + LoRA)
  硬件: RTX 4060 8GB
  数据: 笔记 QA 对 500-2000 条
"""

from unsloth import FastLanguageModel
import torch
from datasets import load_dataset
from trl import SFTTrainer
from transformers import TrainingArguments

# ==========================================
# 1. 加载基座模型 (4-bit 量化)
# ==========================================

print("Loading base model...")

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Qwen2.5-7B",     # Unsloth 优化版
    max_seq_length=2048,                   # 最大序列长度
    dtype=None,                            # 自动选择 (float16)
    load_in_4bit=True,                     # 4-bit 量化加载
)

print(f"Model loaded. GPU Memory: {torch.cuda.memory_allocated()/1024**3:.1f} GB")

# ==========================================
# 2. 添加 LoRA 适配器
# ==========================================

"""
超参选择理由 (Day 7 学的理论 → 实际取值)：

rank = 16
  理论：rank 越大，表达能力越强，但参数越多
  实践：7B 模型用 rank=16 是甜点
  类比：16维"笔记空间"足够编码领域知识

lora_alpha = 32
  理论：alpha/rank = 缩放系数，控制 LoRA 的影响强度
  实践：alpha = 2 × rank 是经验值
  alpha/rank = 32/16 = 2，意味着 LoRA 调整被放大 2 倍

target_modules：
  理论：应用 LoRA 到哪些权重矩阵
  实践：q_proj/k_proj/v_proj (注意力) + 部分 FFN
  Unsloth 会自动选择最优组合
"""

model = FastLanguageModel.get_peft_model(
    model,
    r=16,                          # LoRA rank
    target_modules=[
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj",
    ],
    lora_alpha=32,                 # 缩放系数
    lora_dropout=0.05,             # Dropout 防过拟合
    bias="none",                   # 不训练 bias
    use_gradient_checkpointing="unsloth",  # Unsloth 优化版
    random_state=42,
)

# 打印可训练参数量
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
all_params = sum(p.numel() for p in model.parameters())
print(f"Trainable params: {trainable_params:,} / {all_params:,} "
      f"= {trainable_params/all_params*100:.2f}%")

# ==========================================
# 3. 准备数据集
# ==========================================

"""
Alpaca Prompt 模板
Unsloth 提供了内置模板，确保训练和推理格式一致
"""

alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Input:
{input}

### Response:
{output}"""

def formatting_prompts_func(examples):
    """将数据集格式化为 Alpaca 模板"""
    instructions = examples["instruction"]
    inputs = examples["input"]
    outputs = examples["output"]

    texts = []
    for inst, inp, out in zip(instructions, inputs, outputs):
        text = alpaca_prompt.format(
            instruction=inst,
            input=inp if inp else "",
            output=out,
        )
        texts.append(text + tokenizer.eos_token)
    return {"text": texts}


# 加载数据集
dataset = load_dataset("json", data_files="train_data.json", split="train")
print(f"Dataset: {len(dataset)} samples")

# 格式化
dataset = dataset.map(formatting_prompts_func, batched=True)

# ==========================================
# 4. 配置训练参数
# ==========================================

training_args = TrainingArguments(
    # --- 输出 ---
    output_dir="./output_lora",

    # --- 训练轮次 ---
    num_train_epochs=3,                   # 3 轮，500条数据足够
    # 如果数据 > 1500 条，建议 2 轮
    # 如果数据 < 300 条，可以 5 轮

    # --- Batch Size ---
    per_device_train_batch_size=2,        # 8GB 显存用 2
    gradient_accumulation_steps=4,        # 等效 batch_size = 2 × 4 = 8

    # --- 学习率 ---
    learning_rate=2e-4,                   # QLoRA 经典学习率
    lr_scheduler_type="cosine",           # Cosine 衰减
    warmup_steps=10,                      # 预热步数

    # --- 精度 ---
    fp16=not torch.cuda.is_bf16_supported(),
    bf16=torch.cuda.is_bf16_supported(),  # A100/4090 用 bf16

    # --- 日志 ---
    logging_steps=10,                     # 每 10 步打印 Loss
    save_strategy="epoch",                # 每轮保存
    save_total_limit=3,                   # 最多保存 3 个检查点

    # --- 优化 ---
    optim="adamw_8bit",                   # 8-bit Adam 省显存
    weight_decay=0.01,
    max_grad_norm=0.3,                    # 梯度裁剪
    seed=42,
)

# ==========================================
# 5. 创建 Trainer 并开始训练
# ==========================================

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=2048,
    args=training_args,
    packing=False,         # 不打包短样本（简单起见）
)

print("\n" + "=" * 60)
print("Starting training...")
print(f"  Samples: {len(dataset)}")
print(f"  Epochs:  {training_args.num_train_epochs}")
print(f"  Batch:   {training_args.per_device_train_batch_size} "
      f"× {training_args.gradient_accumulation_steps} "
      f"= {training_args.per_device_train_batch_size * training_args.gradient_accumulation_steps}")
print(f"  LR:      {training_args.learning_rate}")
print(f"  GPU Mem: {torch.cuda.memory_allocated()/1024**3:.1f} GB")
print("=" * 60 + "\n")

# 开始训练！
train_result = trainer.train()

# 打印训练结果
print(f"\n{'='*60}")
print(f"Training completed!")
print(f"  Total steps: {train_result.global_step}")
print(f"  Final loss:  {train_result.training_loss:.4f}")
print(f"  Runtime:     {train_result.metrics['train_runtime']:.0f}s")
print(f"  GPU Memory:  {torch.cuda.max_memory_allocated()/1024**3:.1f} GB peak")
print(f"{'='*60}")

# 保存 LoRA 权重
model.save_pretrained("./output_lora/final")
tokenizer.save_pretrained("./output_lora/final")
print(f"\nLoRA weights saved to ./output_lora/final")

Loss 监控与训练日志 / Loss Monitoring & Training Logs

训练过程中你会看到这样的日志：

Step  10 | Loss: 2.3456 | LR: 1.8e-4 | GPU Mem: 7.2 GB
Step  20 | Loss: 1.8901 | LR: 2.0e-4 | GPU Mem: 7.3 GB
Step  30 | Loss: 1.5234 | LR: 2.0e-4 | GPU Mem: 7.3 GB
Step  40 | Loss: 1.2567 | LR: 1.9e-4 | GPU Mem: 7.3 GB
Step  50 | Loss: 1.0890 | LR: 1.8e-4 | GPU Mem: 7.3 GB
...
Step 180 | Loss: 0.4523 | LR: 0.3e-4 | GPU Mem: 7.3 GB
Step 190 | Loss: 0.4234 | LR: 0.1e-4 | GPU Mem: 7.3 GB

健康的训练曲线：
  开始: Loss ~2.0-2.5 (模型还不会)
  中期: Loss ~0.5-1.0 (在学习)
  结束: Loss ~0.3-0.5 (学得差不多了)

Loss 下降的直觉理解：
  Loss 2.5: 模型在"瞎猜"
  Loss 1.0: 模型能猜出大概方向
  Loss 0.5: 模型说得像模像样
  Loss 0.2: 可能开始过拟合了（小心！）
  Loss 0.05: 几乎肯定过拟合了

知识点4：训练调试 / Training Debugging

Loss 不下降 / Loss Not Decreasing

问题：训练了50步，Loss 还在 2.3 附近晃

可能原因和解决方案：

原因1: 学习率太小
  症状: Loss 缓慢下降或不动
  解决: 提高 LR: 2e-4 → 5e-4 → 1e-3
  ┌──────────────────────────────────┐
  │ learning_rate=5e-4  # 试试更大  │
  └──────────────────────────────────┘

原因2: 数据格式错误
  症状: Loss 高且不稳定
  检查:
    # 打印一条格式化后的样本看看
    print(dataset[0]["text"][:500])
    # 确保 instruction/output 没有颠倒
    # 确保 EOS token 正确添加

原因3: LoRA rank 太小
  症状: Loss 下降到某个值后卡住
  解决: rank: 8 → 16 → 32
  但注意 rank 增大 → 显存增加

原因4: 数据质量差
  症状: Loss 波动大，不稳定
  解决: 回去检查数据质量（知识点2的质量检查器）

调试步骤：
  1. 先确认数据格式正确
  2. 用小数据集（50条）快速跑通
  3. 确认 Loss 能下降后再用全量数据
  4. 不要一上来就用 2000 条训练！

过拟合判断与处理 / Overfitting Detection

过拟合 = 模型"背答案"而非"学知识"

过拟合的信号：
├── Training Loss < 0.1 (太低了)
├── 回答和训练数据一模一样（原文复述）
├── 对训练集外的问题回答变差
├── 回答变得"机械"，缺乏灵活性
└── 开始"编造"训练集里有但相关性不大的内容

判断方法：留出 10% 数据做验证集

  # 分割数据
  train_test = dataset.train_test_split(test_size=0.1, seed=42)
  train_dataset = train_test["train"]
  eval_dataset = train_test["test"]

  # 训练时监控验证集 Loss
  training_args = TrainingArguments(
      ...
      evaluation_strategy="steps",
      eval_steps=50,           # 每50步评估一次
  )

  # 健康：eval_loss 和 train_loss 同步下降
  # 过拟合：train_loss 下降但 eval_loss 开始上升
  #
  #  Loss
  #  2.0 ─  ╲
  #       │  ╲╲  train (下降)
  #  1.0 ─    ╲───────────
  #       │     ╱ eval (开始上升!) ← 过拟合点
  #  0.5 ─   ╱
  #       └──────────────── Steps

应对过拟合的策略：
  1. 减少 Epoch: 3 → 2 → 1
  2. 增大 Dropout: 0.05 → 0.1
  3. 减小 rank: 16 → 8
  4. 增加训练数据量
  5. Early Stopping（在验证 Loss 上升时停止）

显存 OOM 排查 / Out of Memory Troubleshooting

OOM 是微调最常见的错误

典型报错：
  torch.cuda.OutOfMemoryError: CUDA out of memory.
  Tried to allocate 256.00 MiB (GPU 0; 8.00 GiB total capacity;
  6.78 GiB already allocated; 128.00 MiB free...)

排查步骤：

Step 1: 确认显存占用情况
  nvidia-smi  # 看是否有其他程序占用
  # 常见"偷"显存的程序: Ollama, Chrome, VS Code GPU渲染

Step 2: 逐步降低显存需求
  ┌────────────────────────────────────────────┐
  │ 操作                    │ 节省显存    │ 风险 │
  ├────────────────────────────────────────────┤
  │ 关闭 Ollama             │ ~1-2 GB    │ 无   │
  │ batch_size: 2→1         │ ~1 GB      │ 变慢 │
  │ max_seq_length: 2048→1024│ ~1.5 GB   │ 截断 │
  │ rank: 16→8              │ ~0.1 GB    │ 质量 │
  │ 关闭 wandb              │ ~0.2 GB    │ 无   │
  │ 换 Qwen2.5-3B           │ ~1.5 GB    │ 质量 │
  └────────────────────────────────────────────┘

Step 3: 如果还是 OOM
  # 用 Unsloth 的超级省显存模式
  model = FastLanguageModel.from_pretrained(
      model_name="unsloth/Qwen2.5-7B",
      max_seq_length=512,       # 大幅缩短
      load_in_4bit=True,
  )

  # 或者降级到 3B 模型
  model = FastLanguageModel.from_pretrained(
      model_name="unsloth/Qwen2.5-3B",
      max_seq_length=2048,
      load_in_4bit=True,
  )

Step 4: 训练过程中监控
  import torch
  # 在训练脚本中定期打印
  print(f"GPU Memory: {torch.cuda.memory_allocated()/1024**3:.1f} GB "
        f"/ {torch.cuda.memory_reserved()/1024**3:.1f} GB reserved")

Epoch 选择与学习率调整 / Epoch & LR Tuning

Epoch 选择经验法则：

数据量 vs Epoch 推荐：
┌──────────────┬────────────┬──────────────┐
│ 数据量       │ 推荐 Epoch │ 原因         │
├──────────────┼────────────┼──────────────┤
│ < 200 条     │ 5-8        │ 数据少需多轮 │
│ 200-500 条   │ 3-5        │ 标准范围     │
│ 500-2000 条  │ 2-3        │ 数据够，防过拟合 │
│ > 2000 条    │ 1-2        │ 一轮可能就够 │
└──────────────┴────────────┴──────────────┘

学习率策略：

1. 起点: lr=2e-4 (QLoRA 标准值)
2. 如果 Loss 不降: 试 5e-4 或 1e-3
3. 如果 Loss 震荡: 降到 1e-4 或 5e-5
4. Cosine Schedule 比 Linear 更稳定

  # Cosine 学习率曲线
  #
  #  LR
  #  2e-4 ─  ╲  warmup ╲
  #       │ ╱            ╲
  #  1e-4 ─                ╲
  #       │                  ╲
  #  0    ─                    ╲___
  #       └──────────────────── Steps

实用建议：
  先用默认值跑一次
  看 Loss 曲线决定是否调整
  一次只改一个参数！

知识点5：模型导出与部署 / Model Export & Deployment

LoRA 合并 / LoRA Merging

"""
merge_and_export.py
将 LoRA 权重合并到基座模型，然后导出为 GGUF

流程:
  基座模型 (Qwen2.5-7B) + LoRA 权重 → 合并后模型 → GGUF 量化 → Ollama
"""

from unsloth import FastLanguageModel

# ==========================================
# 1. 加载训练好的模型
# ==========================================

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="./output_lora/final",     # LoRA 权重路径
    max_seq_length=2048,
    dtype=None,
    load_in_4bit=True,
)

# 快速测试：看看微调后的效果
FastLanguageModel.for_inference(model)

prompt = alpaca_prompt.format(
    instruction="什么是LoRA微调？",
    input="",
    output="",
)

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    temperature=0.7,
    top_p=0.9,
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f"Test response:\n{response}")

# ==========================================
# 2. 导出 GGUF (Unsloth 直接支持!)
# ==========================================

"""
GGUF 量化级别选择 (Day 2 学过)：
  Q4_K_M: 平衡质量和大小（推荐）
  Q5_K_M: 更好质量，稍大
  Q8_0:   最佳质量，最大
"""

# 方法1: 直接导出 GGUF (推荐)
model.save_pretrained_gguf(
    "my-finetuned-model",        # 输出目录
    tokenizer,
    quantization_method="q4_k_m",  # 4-bit 量化
)
print("GGUF exported: my-finetuned-model/")

# 方法2: 导出多个量化版本
for quant in ["q4_k_m", "q5_k_m", "q8_0"]:
    model.save_pretrained_gguf(
        f"my-model-{quant}",
        tokenizer,
        quantization_method=quant,
    )
    print(f"Exported: my-model-{quant}/")

# 方法3: 如果只想保存 LoRA (不合并)
model.save_pretrained("my-lora-adapter")
# 之后可以用 llama.cpp 手动合并

导入 Ollama / Importing to Ollama

# === Step 1: 确认 GGUF 文件 ===
ls my-finetuned-model/
# 应该看到: unsloth.Q4_K_M.gguf

# === Step 2: 创建 Modelfile ===
cat > Modelfile << 'EOF'
# 基于微调后的 GGUF 模型
FROM ./my-finetuned-model/unsloth.Q4_K_M.gguf

# 模型参数
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER top_k 40
PARAMETER num_ctx 2048
PARAMETER stop "<|im_end|>"
PARAMETER stop "<|endoftext|>"

# 系统提示词
SYSTEM """你是一个基于 AI/LLM 学习笔记微调的专属助手。
你掌握了 Transformer、RAG、Agent、微调、金融AI、零售AI 等领域的知识。
请基于你学到的知识，用清晰、结构化的方式回答问题。
如果不确定，请诚实说明。"""

# Alpaca 格式模板
TEMPLATE """{{ if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{ if .Prompt }}<|im_start|>user
{{ .Prompt }}<|im_end|>
{{ end }}<|im_start|>assistant
{{ .Response }}<|im_end|>"""
EOF

# === Step 3: 创建 Ollama 模型 ===
ollama create my-ai-notes -f Modelfile

# === Step 4: 测试运行 ===
ollama run my-ai-notes "什么是LoRA微调？和全量微调有什么区别？"

# === Step 5: 对比基座模型 ===
ollama run qwen2.5:7b "什么是LoRA微调？和全量微调有什么区别？"

微调前 vs 微调后效果对比 / Before vs After Comparison

同一个问题，对比三个模型的回答：

问题："RAG系统中如何评估检索质量？"

──────────────────────────────────────────────
基座模型 (Qwen2.5-7B 原版):
──────────────────────────────────────────────
"RAG系统的检索质量可以通过以下方式评估：
1. 准确率和召回率
2. NDCG 指标
3. 人工评估
..."
→ 泛泛的教科书回答，没有深度

──────────────────────────────────────────────
微调模型 (my-ai-notes):
──────────────────────────────────────────────
"Day 21 学的 RAGAS 框架提供了四个核心指标：
1. Context Precision: 检索到的内容是否精确相关
2. Context Recall: 是否检索到了所有相关内容
3. Faithfulness: 回答是否忠于上下文
4. Answer Relevancy: 回答是否与问题相关

Day 53 实战中，我用 Golden QA 集做了基线评估，
发现 Hybrid Search 的 Precision 比纯向量搜索高 15%..."
→ 有具体框架名、有实战经验、有数据

──────────────────────────────────────────────
Claude (参考答案):
──────────────────────────────────────────────
"RAG检索质量评估可以从多个维度进行..."
→ 最全面、最准确，但没有个人经验视角

──────────────────────────────────────────────

关键差异：
  基座 → 教科书式回答，适合通用场景
  微调 → 融入个人学习经验，带有"我"的视角
  Claude → 最强大但无法个性化

知识点6：微调效果评估 / Fine-tuning Evaluation

系统化对比评估 / Systematic Comparison

"""
eval_finetuned.py
对比评估：基座 vs 微调 vs Claude

用 Day 53 的 Golden QA 集
对三个模型同时提问，人工评分
"""

import json
from openai import OpenAI

class ModelComparator:
    """三模型对比评估器"""

    def __init__(self):
        # Ollama API (本地模型)
        self.local_client = OpenAI(
            base_url="http://localhost:11434/v1",
            api_key="ollama",
        )

    def compare(self, question: str) -> dict:
        """同一问题，三个模型回答"""

        results = {}

        # 1. 基座模型
        results["base"] = self._query_ollama("qwen2.5:7b", question)

        # 2. 微调模型
        results["finetuned"] = self._query_ollama("my-ai-notes", question)

        # 3. Claude (如果有 API key)
        # results["claude"] = self._query_claude(question)

        return results

    def _query_ollama(self, model: str, question: str) -> str:
        """查询 Ollama 模型"""
        try:
            response = self.local_client.chat.completions.create(
                model=model,
                messages=[{"role": "user", "content": question}],
                temperature=0.7,
                max_tokens=500,
            )
            return response.choices[0].message.content
        except Exception as e:
            return f"Error: {e}"

    def batch_compare(self, questions: list[str]) -> list[dict]:
        """批量对比"""
        results = []
        for i, q in enumerate(questions):
            print(f"\n[{i+1}/{len(questions)}] {q[:50]}...")
            comparison = self.compare(q)
            comparison["question"] = q
            results.append(comparison)

            # 打印对比
            for model, answer in comparison.items():
                if model != "question":
                    print(f"\n  [{model}]: {answer[:150]}...")

        return results


# 评估题目集
EVAL_QUESTIONS = [
    "LoRA微调的核心原理是什么？",
    "RAG和微调各自适合什么场景？",
    "Transformer中Self-Attention的计算过程是什么？",
    "如何评估一个RAG系统的质量？",
    "Agent中的ReAct模式是如何工作的？",
    "信贷AI风控的关键环节有哪些？",
    "推荐系统的冷启动问题如何解决？",
    "CeFi和DeFi在风控架构上有什么区别？",
    "LLM应用的成本优化策略有哪些？",
    "如何设计一个多Agent协作系统？",
]

if __name__ == "__main__":
    comparator = ModelComparator()
    results = comparator.batch_compare(EVAL_QUESTIONS)

    # 保存结果
    with open("comparison_results.json", "w", encoding="utf-8") as f:
        json.dump(results, f, ensure_ascii=False, indent=2)

    print(f"\nResults saved. Please review and score manually.")

人工评分标准 / Manual Scoring Rubric

对每个回答，按三个维度打分 (1-5):

1. 领域知识 (Domain Knowledge)
   1分: 回答错误或完全不相关
   2分: 只有泛泛的概述
   3分: 有正确的基本概念
   4分: 有具体细节和框架
   5分: 有深度见解和个人经验

2. 回答风格 (Response Style)
   1分: 机械、无结构
   2分: 有结构但生硬
   3分: 清晰、有条理
   4分: 自然流畅，像人写的
   5分: 有独特风格，令人印象深刻

3. 准确率 (Accuracy)
   1分: 多处错误
   2分: 有少量错误
   3分: 基本准确，有小瑕疵
   4分: 准确，无明显错误
   5分: 完全准确，可作为参考

评分模板：
┌─────────────────────────────────────────────────────┐
│ 问题: "LoRA微调的核心原理是什么？"                  │
├──────────┬───────────┬──────────┬──────────┬────────┤
│ 模型     │ 领域知识  │ 回答风格 │ 准确率   │ 总分   │
├──────────┼───────────┼──────────┼──────────┼────────┤
│ 基座     │    3      │    3     │    4     │ 10/15  │
│ 微调     │    5      │    4     │    4     │ 13/15  │
│ Claude   │    5      │    5     │    5     │ 15/15  │
└──────────┴───────────┴──────────┴──────────┴────────┘

值不值得微调？判断框架 / Is Fine-tuning Worth It?

完成整个流程后，问自己这些问题：

投入成本：
├── 数据准备: ~3-4 小时
├── 环境配置: ~1 小时
├── 训练时间: ~30-60 分钟 (500条/3 epochs/RTX 4060)
├── 调试时间: ~1-2 小时
├── 评估时间: ~1-2 小时
└── 总计: ~8-10 小时

获得收益：
├── 领域知识: 基座 3.5 → 微调 4.2 (+20%)
├── 回答风格: 基座 3.0 → 微调 4.0 (+33%)
├── 推理速度: 无额外延迟（不需要检索）
└── 模型大小: 和基座一样（LoRA 已合并）

值得微调的场景：
  ✅ 需要固定风格/格式的输出（如"用我的笔记风格回答"）
  ✅ 领域知识相对稳定，不经常更新
  ✅ 需要离线/边缘部署
  ✅ 高频查询，想省掉 RAG 检索时间
  ✅ 模型需要"人设"（如客服、教练、导师）

不值得微调的场景：
  ❌ 知识库频繁更新（每天都有新内容）
  ❌ 需要精确引用来源
  ❌ Claude/GPT-4 的 API 成本可接受
  ❌ 训练数据量 < 100 条
  ❌ 只是想在特定问题上表现好（用 Prompt 就够了）

我的结论：
  微调 + RAG 组合使用 = 最佳效果
  微调负责"风格"和"基础领域知识"
  RAG 负责"精确查询"和"最新信息"
  两者互补，不是二选一

今日思考 / Today's Reflections

思考1：理论到实践的鸿沟比想象中大 / Theory-Practice Gap is Real

Day 7 学微调理论时，觉得自己"都懂了"：
  LoRA = 低秩分解，W' = W + BA
  QLoRA = 4-bit 量化 + LoRA
  rank 越大越好，alpha = 2 × rank

今天实操发现了理论没教的事：
  1. 安装环境花了比预期多的时间（依赖冲突）
  2. 数据准备的工作量远超训练本身
  3. OOM 排查需要逐步尝试，没有万能公式
  4. Loss 数字的"直觉"需要经验积累
  5. 导出 GGUF 的格式兼容性问题

最大感悟：
  看教程觉得微调"好简单"
  实际做了才知道每一步都有坑
  但每个坑都是真实的工程经验
  面试时能讲出这些"坑"= 有实战能力的证明

思考2：数据质量 >> 模型大小 >> 训练技巧 / Data Quality Matters Most

一天的实验验证了 Day 3 学的原则：
  "Garbage In, Garbage Out"

花时间的正确优先级：
  60% 时间 → 数据准备和清洗
  20% 时间 → 训练和调参
  20% 时间 → 评估和迭代

常见的错误优先级：
  10% 数据准备 → "差不多就行"
  70% 调参 → "lr 到底是 2e-4 还是 3e-4?"
  20% 评估 → "看起来还行"

500 条高质量数据 > 5000 条低质量数据

什么是"高质量"：
  ✅ 问题自然、多样
  ✅ 答案准确、有深度
  ✅ 覆盖目标领域的关键知识
  ✅ 没有重复、没有矛盾
  ❌ 问题单一（全是"什么是xxx"）
  ❌ 答案敷衍（一两句话）
  ❌ 有错误信息

思考3：微调是 PM 的新技能 / Fine-tuning as a PM Skill

传统 PM 不需要懂微调
AI 时代的 PM 必须理解微调

不是要你会写训练代码，而是要理解：

1. 什么时候该用微调 vs RAG vs Prompt Engineering
   → 产品架构决策

2. 微调需要多少数据、多少成本
   → 项目规划和资源评估

3. 微调能解决什么问题、不能解决什么问题
   → 设定正确的期望

4. 如何评估微调效果
   → 产品质量把控

今天的实操让我能在面试中说：
  "我实际做过 QLoRA 微调，
   在 RTX 4060 上用 500 条数据训练了一个领域模型，
   领域知识评分从 3.5 提升到 4.2，
   并且和 RAG 系统结合使用达到了最佳效果。"

这比"我了解微调的原理"有说服力 100 倍。

学习资源 / Resources

明日预告 / Tomorrow's Preview

Day 55: Agent开发实战 — 构建能用工具的AI助手

从"微调模型"到"Agent"：
  微调 = 让模型更聪明（改变大脑）
  Agent = 让模型能行动（给它手和脚）

明天将实际构建：
1. 用 LangGraph 搭建 Agent 框架
2. 开发自定义工具（文件读取、Web搜索、API调用）
3. 实现完整的 ReAct 循环
4. 给 Agent 加上记忆系统
5. 构建一个 Web3 研究 Agent

Day 12 学了 Agent 理论
Day 22-25 学了 Agent 工程实践
Day 55 = 真正动手写一个能用的 Agent！

需要提前准备：
  pip install langgraph langchain langchain-community
  pip install tavily-python  # Web搜索
  准备好 Ollama 本地模型（或 API key）

Day 54 完成！ 从环境搭建到模型部署，走完了 LoRA 微调的完整流程。 Day 7 的理论终于变成了真实的模型权重文件。明天开始 Agent 开发，让 AI 从"回答问题"进化到"完成任务"！

AI Day 54: 实战(4)：LoRA微调实战 — 训练你的专属模型

学习路径 / Learning Path

核心概念 / Core Concepts

微调 = 给通用模型注入专属知识和风格 / Fine-tuning = Injecting Domain Knowledge & Style

RAG vs 微调：互补而非替代 / RAG vs Fine-tuning: Complementary

今天的目标 / Today's Goal

知识点1：环境准备 / Environment Setup

Unsloth — 最快的微调框架 / The Fastest Fine-tuning Framework

安装与依赖配置 / Installation & Dependencies

CUDA 验证 / CUDA Verification

RTX 4060 8GB 适配要点 / RTX 4060 8GB Optimization Tips

知识点2：训练数据准备 / Training Data Preparation

从笔记中提取QA对 / Extracting QA Pairs from Notes

Alpaca / ShareGPT 格式转换 / Format Conversion

数据清洗和质量检查 / Data Cleaning & Quality Check

知识点3：QLoRA 微调流程 / QLoRA Fine-tuning Pipeline

基座模型选择 / Base Model Selection

完整训练脚本 / Full Training Script

Loss 监控与训练日志 / Loss Monitoring & Training Logs

知识点4：训练调试 / Training Debugging

Loss 不下降 / Loss Not Decreasing

过拟合判断与处理 / Overfitting Detection

显存 OOM 排查 / Out of Memory Troubleshooting

Epoch 选择与学习率调整 / Epoch & LR Tuning

知识点5：模型导出与部署 / Model Export & Deployment

LoRA 合并 / LoRA Merging

导入 Ollama / Importing to Ollama

微调前 vs 微调后效果对比 / Before vs After Comparison

知识点6：微调效果评估 / Fine-tuning Evaluation

系统化对比评估 / Systematic Comparison

人工评分标准 / Manual Scoring Rubric

值不值得微调？判断框架 / Is Fine-tuning Worth It?

今日思考 / Today's Reflections

思考1：理论到实践的鸿沟比想象中大 / Theory-Practice Gap is Real

思考2：数据质量 >> 模型大小 >> 训练技巧 / Data Quality Matters Most

思考3：微调是 PM 的新技能 / Fine-tuning as a PM Skill

学习资源 / Resources

微调框架

教程与文档

数据集准备

模型导出

明日预告 / Tomorrow's Preview