AI Day 58
AI Day 58: 实战(8):AI应用全栈开发 — 前后端完整集成
AI Day 58: 实战(8):AI应用全栈开发 — 前后端完整集成
2026-05-29
日期: 2026-05-29 | 阶段: 第五阶段 · 动手实战 (Day 51-60) | 主题: Full-Stack AI App — Frontend & Backend Integration
学习路径 / Learning Path
AI/LLM 深度技术学习 60天计划
├── 第一阶段:模型基础 (Day 1-15) ✅
│ ├── Day 1: Transformer与LLM基础 ✅
│ ├── Day 2: 量化与本地部署 ✅
│ ├── Day 3: 训练全流程 ✅
│ ├── Day 4: Prompt Engineering ✅
│ ├── Day 5: RAG架构 ✅
│ ├── Day 6: 向量数据库与Embedding ✅
│ ├── Day 7: 微调技术 ✅
│ ├── Day 8: 推理优化 ✅
│ ├── Day 9: 长上下文技术 ✅
│ ├── Day 10: 多模态模型 ✅
│ ├── Day 11: 推理模型 ✅
│ ├── Day 12: Agent框架 ✅
│ ├── Day 13: MCP协议 ✅
│ ├── Day 14: 模型评估 ✅
│ └── Day 15: 阶段一总结 ✅
├── 第二阶段:工程实践 (Day 16-30) ✅
│ ├── Day 16: LLM应用架构 ✅
│ ├── Day 17: 安全与护栏 ✅
│ ├── Day 18: 可观测性 ✅
│ ├── Day 19: 生产RAG·解析与分块 ✅
│ ├── Day 20: 生产RAG·检索与重排 ✅
│ ├── Day 21: 生产RAG·评估与迭代 ✅
│ ├── Day 22: Agent状态与恢复 ✅
│ ├── Day 23: Agent成本优化 ✅
│ ├── Day 24: 多Agent系统 ✅
│ ├── Day 25: Agent测试部署 ✅
│ ├── Day 26: LLM成本工程 ✅
│ ├── Day 27: 多模型编排 ✅
│ ├── Day 28: LLM应用测试 ✅
│ ├── Day 29: 企业LLM平台 ✅
│ └── Day 30: 阶段二总结 ✅
├── 第三阶段:金融零售AI应用 (Day 31-42) ✅
│ ├── Day 31: 金融AI风控 ✅
│ ├── Day 32: 智能投顾与量化 ✅
│ ├── Day 33: 合规与RegTech ✅
│ ├── Day 34: 信贷AI全链路 ✅
│ ├── Day 35: 金融AI总结 ✅
│ ├── Day 36: 零售AI推荐 ✅
│ ├── Day 37: 智能客服 ✅
│ ├── Day 38: 供应链AI ✅
│ ├── Day 39: 智能营销 ✅
│ ├── Day 40: 零售AI总结 ✅
│ ├── Day 41: CeFi×DeFi×AI融合 ✅
│ └── Day 42: AI融合案例与职业 ✅
├── 第四阶段:面试冲刺 (Day 43-50) ✅
│ ├── Day 43: 系统设计·LLM平台 ✅
│ ├── Day 44: 系统设计·RAG系统 ✅
│ ├── Day 45: 系统设计·Agent系统 ✅
│ ├── Day 46: 系统设计·推荐系统 ✅
│ ├── Day 47: 面试·产品AI ✅
│ ├── Day 48: 面试·架构AI ✅
│ ├── Day 49: 面试·行为AI ✅
│ └── Day 50: 学习总结 ✅
└── 第五阶段:动手实战 (Day 51-60)
├── Day 51: 本地大模型部署全流程 ✅
├── Day 52: RAG系统实战:从文档到问答 ✅
├── Day 53: RAG进阶:评估优化与生产化 ✅
├── Day 54: LoRA微调实战:训练你的专属模型 ✅
├── Day 55: Agent开发实战:构建工具调用Agent ✅
├── Day 56: MCP Server开发:扩展AI能力边界 ✅
├── Day 57: 多模态应用:图文理解与文档分析 ✅
├── Day 58: AI应用全栈开发:前后端集成 ← 你在这里
├── Day 59: 性能调优与成本实战
└── Day 60: 总结与作品集
核心概念 / Core Concepts
把之前所有实战成果整合成一个完整可用的产品 / From Scripts to Product
Day 51-57 做了什么?
Day 51: Ollama本地部署 → ollama serve / ollama run
Day 52: RAG文档问答 → rag_pipeline.py
Day 53: RAG评估优化 → rag_evaluator.py
Day 54: LoRA微调 → finetune.py
Day 55: Agent工具调用 → agent.py
Day 56: MCP Server → mcp-notes-server/
Day 57: 多模态分析 → multimodal_test.py
问题:这些都是独立的 Python 脚本
用户要打开终端输入命令
没有界面,不好看
不同功能之间不互通
不能给别人用
Day 58 的目标:
一堆脚本 ──→ 一个完整产品
┌─────────────────────────────┐
│ Next.js 前端 │ ← 好看的UI
│ Chat / RAG / Agent / 多模态 │
└──────────┬──────────────────┘
│ HTTP/SSE
┌──────────┴──────────────────┐
│ FastAPI 后端 │ ← 统一API
│ /chat /rag /agent /mm │
└──────────┬──────────────────┘
│
┌──────────┴──────────────────┐
│ Ollama + ChromaDB │ ← AI引擎
│ 本地模型 + 向量数据库 │
└─────────────────────────────┘
从"一个人能用"到"所有人能用"!
为什么选择 FastAPI + Next.js / Tech Stack Choice
后端选择 FastAPI 的理由:
1. Python 生态 — 所有AI库(ollama/chromadb/langchain)都是Python
2. 异步支持 — 原生 async,适合AI推理的长等待
3. SSE 支持 — 流式响应对Chat体验至关重要
4. 自动文档 — OpenAPI/Swagger 自动生成
5. 类型安全 — Pydantic 模型验证
前端选择 Next.js 的理由:
1. 项目已有 — momoweb3 就是 Next.js 项目
2. SSE 客户端 — 方便处理流式响应
3. React 生态 — 丰富的UI组件库
4. SSR/SSG — SEO 友好
其他选项对比:
Streamlit → 快但丑,不适合生产
Gradio → AI demo 好用,但定制性差
Flask → 不支持异步,AI场景性能差
Express → JS写AI后端太痛苦
知识点1:后端架构 / Backend Architecture
FastAPI 服务设计 / FastAPI Service Design
"""
backend/main.py — AI应用后端入口
"""
from fastapi import FastAPI, UploadFile, File, HTTPException
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import StreamingResponse
from pydantic import BaseModel
from contextlib import asynccontextmanager
import ollama
import time
import os
# === 应用生命周期管理 ===
@asynccontextmanager
async def lifespan(app: FastAPI):
"""启动时初始化,关闭时清理"""
# 启动:检查 Ollama 连接
try:
ollama.list()
print("[OK] Ollama 连接成功")
except Exception as e:
print(f"[WARN] Ollama 未就绪: {e}")
yield # 应用运行中
# 关闭:清理资源
print("[INFO] 服务关闭")
app = FastAPI(
title="AI Full-Stack Backend",
description="整合 Chat / RAG / Agent / 多模态 的统一后端",
version="1.0.0",
lifespan=lifespan,
)
# === CORS 配置 ===
app.add_middleware(
CORSMiddleware,
allow_origins=[
"http://localhost:3000", # Next.js 开发
"http://localhost:3001", # 备用端口
],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
API 端点设计 / API Endpoints
"""
backend/api/chat.py — Chat API
"""
from fastapi import APIRouter
from pydantic import BaseModel
from fastapi.responses import StreamingResponse
import ollama
import json
router = APIRouter(prefix="/chat", tags=["Chat"])
class ChatRequest(BaseModel):
message: str
model: str = "qwen2.5:7b"
system_prompt: str = "你是一个AI助手。"
history: list[dict] = []
stream: bool = True
class ChatResponse(BaseModel):
response: str
model: str
time_seconds: float
tokens: int
@router.post("/")
async def chat(request: ChatRequest):
"""普通对话(非流式)"""
messages = [{"role": "system", "content": request.system_prompt}]
messages.extend(request.history)
messages.append({"role": "user", "content": request.message})
start = time.time()
response = ollama.chat(model=request.model, messages=messages)
elapsed = time.time() - start
return ChatResponse(
response=response["message"]["content"],
model=request.model,
time_seconds=round(elapsed, 2),
tokens=response.get("eval_count", 0),
)
@router.post("/stream")
async def chat_stream(request: ChatRequest):
"""流式对话(SSE)"""
messages = [{"role": "system", "content": request.system_prompt}]
messages.extend(request.history)
messages.append({"role": "user", "content": request.message})
async def generate():
stream = ollama.chat(
model=request.model,
messages=messages,
stream=True,
)
for chunk in stream:
token = chunk["message"]["content"]
# SSE 格式
data = json.dumps({"token": token, "done": False})
yield f"data: {data}\n\n"
yield f"data: {json.dumps({'token': '', 'done': True})}\n\n"
return StreamingResponse(
generate(),
media_type="text/event-stream",
headers={
"Cache-Control": "no-cache",
"Connection": "keep-alive",
},
)
RAG API / RAG Endpoint
"""
backend/api/rag.py — RAG API
整合 Day 52-53 的 RAG 系统
"""
from fastapi import APIRouter, UploadFile, File
from pydantic import BaseModel
import chromadb
import ollama
router = APIRouter(prefix="/rag", tags=["RAG"])
# 复用 Day 52-53 的 RAG 组件
rag_client = chromadb.PersistentClient(path="./rag_db")
collection = rag_client.get_or_create_collection("documents")
class RAGQueryRequest(BaseModel):
question: str
n_results: int = 3
model: str = "qwen2.5:7b"
class RAGQueryResponse(BaseModel):
answer: str
sources: list[dict]
time_seconds: float
@router.post("/upload")
async def upload_document(file: UploadFile = File(...)):
"""上传文档到RAG知识库"""
content = await file.read()
text = content.decode("utf-8", errors="ignore")
# 简单分块
chunks = [text[i:i+500] for i in range(0, len(text), 400)]
# 存入向量库
for i, chunk in enumerate(chunks):
collection.add(
ids=[f"{file.filename}_chunk_{i}"],
documents=[chunk],
metadatas=[{"source": file.filename, "chunk_index": i}],
)
return {
"status": "success",
"filename": file.filename,
"chunks": len(chunks),
}
@router.post("/query")
async def rag_query(request: RAGQueryRequest):
"""RAG问答"""
start = time.time()
# 检索
results = collection.query(
query_texts=[request.question],
n_results=request.n_results,
)
# 构建上下文
context = "\n\n".join(results["documents"][0])
sources = [
{"text": doc[:100], "metadata": meta}
for doc, meta in zip(results["documents"][0], results["metadatas"][0])
]
# 生成
response = ollama.chat(
model=request.model,
messages=[{
"role": "user",
"content": f"基于以下上下文回答问题:\n\n{context}\n\n问题:{request.question}",
}],
)
elapsed = time.time() - start
return RAGQueryResponse(
answer=response["message"]["content"],
sources=sources,
time_seconds=round(elapsed, 2),
)
Agent API / Agent Endpoint
"""
backend/api/agent.py — Agent API
整合 Day 55 的 Agent 系统
"""
from fastapi import APIRouter
from fastapi.responses import StreamingResponse
from pydantic import BaseModel
import json
router = APIRouter(prefix="/agent", tags=["Agent"])
class AgentRunRequest(BaseModel):
task: str
max_steps: int = 5
model: str = "qwen2.5:7b"
@router.post("/run")
async def run_agent(request: AgentRunRequest):
"""
运行Agent(流式返回思考过程)
返回格式(SSE):
data: {"type": "thinking", "content": "分析任务..."}
data: {"type": "tool_call", "tool": "get_price", "args": {"token": "ETH"}}
data: {"type": "tool_result", "result": "$3,200"}
data: {"type": "answer", "content": "最终答案..."}
"""
async def generate():
# Step 1: 分析任务
yield f"data: {json.dumps({'type': 'thinking', 'content': f'分析任务: {request.task}'})}\n\n"
# Step 2: 规划步骤(调用LLM)
plan_response = ollama.chat(
model=request.model,
messages=[{
"role": "user",
"content": f"""你是一个AI Agent。分析以下任务并列出需要调用的工具:
任务:{request.task}
可用工具:
- get_token_price(token): 获取加密货币价格
- get_protocol_tvl(protocol): 获取DeFi协议TVL
- web_search(query): 搜索网络信息
请用JSON格式列出步骤。""",
}],
)
plan = plan_response["message"]["content"]
yield f"data: {json.dumps({'type': 'thinking', 'content': f'规划: {plan[:200]}'})}\n\n"
# Step 3: 模拟工具调用(实际项目中连接真实工具)
yield f"data: {json.dumps({'type': 'tool_call', 'tool': 'get_token_price', 'args': {'token': 'ETH'}})}\n\n"
yield f"data: {json.dumps({'type': 'tool_result', 'result': 'ETH: $3,245.67'})}\n\n"
# Step 4: 生成最终答案
final_response = ollama.chat(
model=request.model,
messages=[{
"role": "user",
"content": f"基于工具调用结果,回答用户问题:{request.task}\n\n工具结果:ETH: $3,245.67",
}],
)
answer = final_response["message"]["content"]
yield f"data: {json.dumps({'type': 'answer', 'content': answer})}\n\n"
return StreamingResponse(
generate(),
media_type="text/event-stream",
)
认证/限流/CORS / Auth, Rate Limiting, CORS
"""
backend/middleware.py — 中间件:认证、限流
"""
from fastapi import Request, HTTPException
from starlette.middleware.base import BaseHTTPMiddleware
import time
from collections import defaultdict
class RateLimitMiddleware(BaseHTTPMiddleware):
"""
简单的令牌桶限流
生产环境应使用 Redis + 更复杂的策略
"""
def __init__(self, app, max_requests: int = 30, window_seconds: int = 60):
super().__init__(app)
self.max_requests = max_requests
self.window_seconds = window_seconds
self.requests = defaultdict(list) # IP -> [timestamps]
async def dispatch(self, request: Request, call_next):
client_ip = request.client.host
now = time.time()
# 清理过期记录
self.requests[client_ip] = [
t for t in self.requests[client_ip]
if now - t < self.window_seconds
]
# 检查限流
if len(self.requests[client_ip]) >= self.max_requests:
raise HTTPException(
status_code=429,
detail=f"请求过于频繁,请{self.window_seconds}秒后重试",
)
self.requests[client_ip].append(now)
response = await call_next(request)
return response
class APIKeyMiddleware(BaseHTTPMiddleware):
"""
API Key 认证(简单版)
生产环境应使用 JWT/OAuth
"""
API_KEYS = {"demo-key-12345"} # 实际从数据库/环境变量读取
PUBLIC_PATHS = {"/docs", "/openapi.json", "/health"}
async def dispatch(self, request: Request, call_next):
# 公开路径不需要认证
if request.url.path in self.PUBLIC_PATHS:
return await call_next(request)
# 检查 API Key
api_key = request.headers.get("X-API-Key")
if api_key not in self.API_KEYS:
raise HTTPException(status_code=401, detail="Invalid API Key")
response = await call_next(request)
return response
# === 健康检查端点 ===
@app.get("/health")
async def health_check():
"""健康检查:确认所有依赖可用"""
checks = {}
# 检查 Ollama
try:
models = ollama.list()
checks["ollama"] = {
"status": "ok",
"models": len(models.get("models", [])),
}
except Exception as e:
checks["ollama"] = {"status": "error", "message": str(e)}
# 检查 ChromaDB
try:
count = collection.count()
checks["chromadb"] = {
"status": "ok",
"documents": count,
}
except Exception as e:
checks["chromadb"] = {"status": "error", "message": str(e)}
all_ok = all(c["status"] == "ok" for c in checks.values())
return {
"status": "healthy" if all_ok else "degraded",
"checks": checks,
"timestamp": time.time(),
}
知识点2:前端集成 / Frontend Integration
流式响应 SSE 处理 / SSE Stream Handling
/**
* lib/ai-api.ts — AI后端API客户端
* 处理流式响应(SSE)
*/
const API_BASE = process.env.NEXT_PUBLIC_AI_API_URL || "http://localhost:8000";
// === 流式Chat ===
export async function chatStream(
message: string,
history: Array<{ role: string; content: string }>,
onToken: (token: string) => void,
onDone: () => void,
onError: (error: Error) => void,
) {
try {
const response = await fetch(`${API_BASE}/chat/stream`, {
method: "POST",
headers: {
"Content-Type": "application/json",
"X-API-Key": "demo-key-12345",
},
body: JSON.stringify({ message, history, stream: true }),
});
if (!response.ok) throw new Error(`HTTP ${response.status}`);
const reader = response.body?.getReader();
const decoder = new TextDecoder();
if (!reader) throw new Error("No response body");
while (true) {
const { done, value } = await reader.read();
if (done) break;
const text = decoder.decode(value);
const lines = text.split("\n").filter((l) => l.startsWith("data: "));
for (const line of lines) {
const json = JSON.parse(line.replace("data: ", ""));
if (json.done) {
onDone();
} else {
onToken(json.token);
}
}
}
} catch (error) {
onError(error as Error);
}
}
// === RAG查询 ===
export async function ragQuery(question: string) {
const response = await fetch(`${API_BASE}/rag/query`, {
method: "POST",
headers: {
"Content-Type": "application/json",
"X-API-Key": "demo-key-12345",
},
body: JSON.stringify({ question }),
});
if (!response.ok) throw new Error(`HTTP ${response.status}`);
return response.json();
}
// === 文件上传 ===
export async function uploadDocument(file: File) {
const formData = new FormData();
formData.append("file", file);
const response = await fetch(`${API_BASE}/rag/upload`, {
method: "POST",
headers: { "X-API-Key": "demo-key-12345" },
body: formData,
});
if (!response.ok) throw new Error(`HTTP ${response.status}`);
return response.json();
}
// === Agent 流式执行 ===
export async function agentRunStream(
task: string,
onEvent: (event: AgentEvent) => void,
onDone: () => void,
) {
const response = await fetch(`${API_BASE}/agent/run`, {
method: "POST",
headers: {
"Content-Type": "application/json",
"X-API-Key": "demo-key-12345",
},
body: JSON.stringify({ task }),
});
const reader = response.body?.getReader();
const decoder = new TextDecoder();
if (!reader) return;
while (true) {
const { done, value } = await reader.read();
if (done) { onDone(); break; }
const text = decoder.decode(value);
const lines = text.split("\n").filter((l) => l.startsWith("data: "));
for (const line of lines) {
const event = JSON.parse(line.replace("data: ", ""));
onEvent(event);
}
}
}
type AgentEvent =
| { type: "thinking"; content: string }
| { type: "tool_call"; tool: string; args: Record<string, unknown> }
| { type: "tool_result"; result: string }
| { type: "answer"; content: string };
Chat UI 组件 / Chat UI Component
/**
* components/ChatUI.tsx — Chat界面组件
* 支持流式输出、Markdown渲染、代码高亮
*/
"use client";
import { useState, useRef, useEffect } from "react";
import { chatStream } from "@/lib/ai-api";
import ReactMarkdown from "react-markdown";
import { Prism as SyntaxHighlighter } from "react-syntax-highlighter";
import { oneDark } from "react-syntax-highlighter/dist/esm/styles/prism";
interface Message {
role: "user" | "assistant";
content: string;
timestamp: Date;
}
export default function ChatUI() {
const [messages, setMessages] = useState<Message[]>([]);
const [input, setInput] = useState("");
const [isStreaming, setIsStreaming] = useState(false);
const messagesEndRef = useRef<HTMLDivElement>(null);
// 自动滚动到底部
useEffect(() => {
messagesEndRef.current?.scrollIntoView({ behavior: "smooth" });
}, [messages]);
const handleSend = async () => {
if (!input.trim() || isStreaming) return;
const userMessage: Message = {
role: "user",
content: input,
timestamp: new Date(),
};
setMessages((prev) => [...prev, userMessage]);
setInput("");
setIsStreaming(true);
// 添加空的AI消息(流式填充)
const aiMessage: Message = {
role: "assistant",
content: "",
timestamp: new Date(),
};
setMessages((prev) => [...prev, aiMessage]);
const history = messages.map((m) => ({
role: m.role,
content: m.content,
}));
await chatStream(
input,
history,
// onToken: 逐token追加
(token) => {
setMessages((prev) => {
const updated = [...prev];
const last = updated[updated.length - 1];
last.content += token;
return updated;
});
},
// onDone
() => setIsStreaming(false),
// onError
(error) => {
console.error("Chat error:", error);
setIsStreaming(false);
setMessages((prev) => {
const updated = [...prev];
const last = updated[updated.length - 1];
last.content = `Error: ${error.message}`;
return updated;
});
},
);
};
return (
<div className="flex flex-col h-[600px] border rounded-lg bg-white dark:bg-gray-900">
{/* 消息列表 */}
<div className="flex-1 overflow-y-auto p-4 space-y-4">
{messages.map((msg, i) => (
<div
key={i}
className={`flex ${msg.role === "user" ? "justify-end" : "justify-start"}`}
>
<div
className={`max-w-[80%] rounded-lg px-4 py-2 ${
msg.role === "user"
? "bg-blue-500 text-white"
: "bg-gray-100 dark:bg-gray-800"
}`}
>
{msg.role === "assistant" ? (
<ReactMarkdown
components={{
code({ className, children, ...props }) {
const match = /language-(\w+)/.exec(className || "");
return match ? (
<SyntaxHighlighter
style={oneDark}
language={match[1]}
PreTag="div"
>
{String(children).replace(/\n$/, "")}
</SyntaxHighlighter>
) : (
<code className="bg-gray-200 dark:bg-gray-700 rounded px-1" {...props}>
{children}
</code>
);
},
}}
>
{msg.content}
</ReactMarkdown>
) : (
<p>{msg.content}</p>
)}
</div>
</div>
))}
<div ref={messagesEndRef} />
</div>
{/* 输入框 */}
<div className="border-t p-4 flex gap-2">
<input
type="text"
value={input}
onChange={(e) => setInput(e.target.value)}
onKeyDown={(e) => e.key === "Enter" && handleSend()}
placeholder="输入消息..."
className="flex-1 border rounded-lg px-4 py-2 dark:bg-gray-800"
disabled={isStreaming}
/>
<button
onClick={handleSend}
disabled={isStreaming}
className="bg-blue-500 text-white px-6 py-2 rounded-lg disabled:opacity-50"
>
{isStreaming ? "生成中..." : "发送"}
</button>
</div>
</div>
);
}
知识点3:RAG 集成 / RAG Integration
RAG 前端界面 / RAG Frontend
/**
* components/RAGInterface.tsx — RAG问答界面
* 支持:文件上传 + 问答
*/
"use client";
import { useState, useCallback } from "react";
import { uploadDocument, ragQuery } from "@/lib/ai-api";
interface Source {
text: string;
metadata: { source: string; chunk_index: number };
}
export default function RAGInterface() {
const [files, setFiles] = useState<string[]>([]);
const [question, setQuestion] = useState("");
const [answer, setAnswer] = useState("");
const [sources, setSources] = useState<Source[]>([]);
const [isLoading, setIsLoading] = useState(false);
const [uploadStatus, setUploadStatus] = useState("");
// 文件上传
const handleUpload = useCallback(async (e: React.ChangeEvent<HTMLInputElement>) => {
const file = e.target.files?.[0];
if (!file) return;
setUploadStatus(`上传中: ${file.name}...`);
try {
const result = await uploadDocument(file);
setFiles((prev) => [...prev, file.name]);
setUploadStatus(`${file.name} 上传成功 (${result.chunks} 个分块)`);
} catch (error) {
setUploadStatus(`上传失败: ${(error as Error).message}`);
}
}, []);
// RAG 问答
const handleQuery = async () => {
if (!question.trim()) return;
setIsLoading(true);
setAnswer("");
setSources([]);
try {
const result = await ragQuery(question);
setAnswer(result.answer);
setSources(result.sources);
} catch (error) {
setAnswer(`Error: ${(error as Error).message}`);
} finally {
setIsLoading(false);
}
};
return (
<div className="space-y-6">
{/* 文件上传区 */}
<div className="border-2 border-dashed rounded-lg p-6 text-center">
<input type="file" onChange={handleUpload} accept=".txt,.md,.pdf" />
<p className="text-sm text-gray-500 mt-2">{uploadStatus}</p>
<div className="flex gap-2 mt-2 justify-center">
{files.map((f) => (
<span key={f} className="bg-green-100 text-green-800 px-2 py-1 rounded text-sm">
{f}
</span>
))}
</div>
</div>
{/* 问答区 */}
<div className="flex gap-2">
<input
value={question}
onChange={(e) => setQuestion(e.target.value)}
onKeyDown={(e) => e.key === "Enter" && handleQuery()}
placeholder="基于文档提问..."
className="flex-1 border rounded-lg px-4 py-2"
disabled={isLoading}
/>
<button
onClick={handleQuery}
disabled={isLoading}
className="bg-green-500 text-white px-6 py-2 rounded-lg disabled:opacity-50"
>
{isLoading ? "检索中..." : "提问"}
</button>
</div>
{/* 答案显示 */}
{answer && (
<div className="bg-gray-50 dark:bg-gray-800 rounded-lg p-4">
<h3 className="font-bold mb-2">答案</h3>
<p>{answer}</p>
{/* 来源引用 */}
{sources.length > 0 && (
<div className="mt-4 border-t pt-2">
<h4 className="text-sm font-bold text-gray-500">引用来源</h4>
{sources.map((s, i) => (
<div key={i} className="text-xs text-gray-400 mt-1">
[{i + 1}] {s.metadata.source} (chunk {s.metadata.chunk_index}): {s.text}...
</div>
))}
</div>
)}
</div>
)}
</div>
);
}
知识点4:Agent 集成 / Agent Integration
Agent 思考过程展示 / Agent Thinking Process Display
/**
* components/AgentRunner.tsx — Agent执行界面
* 展示完整的思考过程、工具调用、最终结果
*/
"use client";
import { useState } from "react";
import { agentRunStream } from "@/lib/ai-api";
type AgentStep =
| { type: "thinking"; content: string }
| { type: "tool_call"; tool: string; args: Record<string, unknown> }
| { type: "tool_result"; result: string }
| { type: "answer"; content: string };
export default function AgentRunner() {
const [task, setTask] = useState("");
const [steps, setSteps] = useState<AgentStep[]>([]);
const [isRunning, setIsRunning] = useState(false);
const handleRun = async () => {
if (!task.trim() || isRunning) return;
setIsRunning(true);
setSteps([]);
await agentRunStream(
task,
(event) => setSteps((prev) => [...prev, event]),
() => setIsRunning(false),
);
};
const renderStep = (step: AgentStep, index: number) => {
switch (step.type) {
case "thinking":
return (
<div key={index} className="flex items-start gap-2 text-gray-500">
<span className="text-lg">🧠</span>
<div>
<span className="text-xs font-bold">THINKING</span>
<p className="text-sm">{step.content}</p>
</div>
</div>
);
case "tool_call":
return (
<div key={index} className="flex items-start gap-2 text-blue-600">
<span className="text-lg">🔧</span>
<div>
<span className="text-xs font-bold">TOOL CALL</span>
<p className="text-sm font-mono">
{step.tool}({JSON.stringify(step.args)})
</p>
</div>
</div>
);
case "tool_result":
return (
<div key={index} className="flex items-start gap-2 text-green-600">
<span className="text-lg">📋</span>
<div>
<span className="text-xs font-bold">RESULT</span>
<p className="text-sm">{step.result}</p>
</div>
</div>
);
case "answer":
return (
<div key={index} className="flex items-start gap-2 text-gray-900 dark:text-white">
<span className="text-lg">✅</span>
<div>
<span className="text-xs font-bold">ANSWER</span>
<p>{step.content}</p>
</div>
</div>
);
}
};
return (
<div className="space-y-4">
<div className="flex gap-2">
<input
value={task}
onChange={(e) => setTask(e.target.value)}
onKeyDown={(e) => e.key === "Enter" && handleRun()}
placeholder="输入Agent任务,例如:查询ETH价格并分析趋势"
className="flex-1 border rounded-lg px-4 py-2"
disabled={isRunning}
/>
<button
onClick={handleRun}
disabled={isRunning}
className="bg-purple-500 text-white px-6 py-2 rounded-lg disabled:opacity-50"
>
{isRunning ? "执行中..." : "运行Agent"}
</button>
</div>
{/* Agent 执行过程 */}
{steps.length > 0 && (
<div className="border rounded-lg p-4 space-y-3 bg-gray-50 dark:bg-gray-900">
{steps.map((step, i) => renderStep(step, i))}
{isRunning && (
<div className="flex items-center gap-2 text-gray-400">
<div className="animate-spin h-4 w-4 border-2 border-gray-300 border-t-blue-500 rounded-full" />
<span className="text-sm">Agent 执行中...</span>
</div>
)}
</div>
)}
</div>
);
}
知识点5:部署方案 / Deployment
Docker Compose 打包 / Docker Compose Setup
# docker-compose.yml — 一键部署全栈AI应用
version: "3.9"
services:
# === Ollama 本地模型服务 ===
ollama:
image: ollama/ollama:latest
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
interval: 30s
timeout: 10s
retries: 3
# === FastAPI 后端 ===
backend:
build:
context: ./backend
dockerfile: Dockerfile
ports:
- "8000:8000"
environment:
- OLLAMA_HOST=http://ollama:11434
- CHROMADB_PATH=/data/chromadb
- API_KEY=${API_KEY:-demo-key-12345}
volumes:
- chromadb_data:/data/chromadb
depends_on:
ollama:
condition: service_healthy
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
# === Next.js 前端 ===
frontend:
build:
context: ./frontend
dockerfile: Dockerfile
ports:
- "3000:3000"
environment:
- NEXT_PUBLIC_AI_API_URL=http://backend:8000
depends_on:
backend:
condition: service_healthy
volumes:
ollama_data:
chromadb_data:
本地 vs 云部署对比 / Local vs Cloud Deployment
部署方案对比:
方案 成本 GPU 适用场景
────────────────────────────────────────────────────
本地 Docker $0/月 你的GPU 开发/演示
(6GB+)
云 GPU 实例 $200-800/月 A10G/T4 小团队/POC
(AWS/GCP/Azure)
Serverless API 按量付费 无需管理 生产环境
(Replicate/Modal) ~$0.001/req
混合方案 $100-300/月 云端GPU 推荐方案
Ollama → 云GPU + 本地CPU
前端 → Vercel
后端 → Railway
推荐的渐进式部署路径:
Phase 1: 本地开发 → Docker Compose 跑通
Phase 2: 演示 → 后端+Ollama 部署到云GPU实例
Phase 3: 生产 → 前端Vercel + 后端Railway + GPU按需
环境变量管理 / Environment Variables
# .env.example — 环境变量模板
# === Ollama 配置 ===
OLLAMA_HOST=http://localhost:11434
DEFAULT_MODEL=qwen2.5:7b
VLM_MODEL=qwen2-vl:7b
# === ChromaDB ===
CHROMADB_PATH=./rag_db
# === API 安全 ===
API_KEY=your-secret-key-here
CORS_ORIGINS=http://localhost:3000
# === 可选:云端API ===
# OPENAI_API_KEY=sk-...
# GOOGLE_API_KEY=AI...
# ANTHROPIC_API_KEY=sk-ant-...
# === 限流 ===
RATE_LIMIT_MAX=30
RATE_LIMIT_WINDOW=60
知识点6:用户体验优化 / UX Optimization
关键 UX 模式 / Key UX Patterns
AI 应用的 UX 和传统应用不同:
1. Loading 状态 — 不能只有一个转圈
传统应用: Loading spinner → 完整结果
AI 应用: 应该展示进度
"正在理解问题..." → "检索相关文档..." → "生成回答..."
实现: SSE流式返回不同阶段的状态
2. 流式输出 — 逐字出现,像人在打字
不要等全部生成完才展示
每个token到达就立即渲染
减少用户的"等待焦虑"
实测: 同样3秒生成时间
非流式: 用户觉得"卡了3秒"
流式: 用户觉得"AI在思考和回答"
3. 错误处理 — 要友好,要有重试
"模型超时" → "AI正在思考复杂问题,请稍后重试"
"上下文太长" → "输入信息较多,已自动精简,请确认"
提供"重新生成"按钮
4. 历史记录 — localStorage 持久化
用户刷新页面不应该丢失对话
实现: 每次消息变化自动保存到 localStorage
5. 暗色模式 — 开发者/夜间必备
Next.js: next-themes 一行搞定
Tailwind: dark: 前缀
6. 移动适配 — 响应式布局
Chat气泡宽度: max-w-[80%]
输入框: 底部固定
Agent步骤: 可折叠
实现代码片段 / Implementation Snippets
/**
* hooks/useLocalStorage.ts — 对话历史持久化
*/
import { useState, useEffect } from "react";
export function useLocalStorage<T>(key: string, initialValue: T) {
const [value, setValue] = useState<T>(() => {
if (typeof window === "undefined") return initialValue;
try {
const stored = localStorage.getItem(key);
return stored ? JSON.parse(stored) : initialValue;
} catch {
return initialValue;
}
});
useEffect(() => {
localStorage.setItem(key, JSON.stringify(value));
}, [key, value]);
return [value, setValue] as const;
}
// 使用:
// const [messages, setMessages] = useLocalStorage<Message[]>("chat-history", []);
/**
* components/StreamingStatus.tsx — 流式状态指示器
*/
interface Props {
status: "idle" | "thinking" | "retrieving" | "generating" | "done" | "error";
}
const statusConfig = {
idle: { text: "", color: "" },
thinking: { text: "AI 正在理解问题...", color: "text-yellow-500" },
retrieving: { text: "正在检索相关文档...", color: "text-blue-500" },
generating: { text: "正在生成回答...", color: "text-green-500" },
done: { text: "完成", color: "text-gray-400" },
error: { text: "出错了,请重试", color: "text-red-500" },
};
export default function StreamingStatus({ status }: Props) {
if (status === "idle") return null;
const config = statusConfig[status];
return (
<div className={`flex items-center gap-2 ${config.color} text-sm`}>
{status !== "done" && status !== "error" && (
<div className="animate-pulse h-2 w-2 rounded-full bg-current" />
)}
<span>{config.text}</span>
</div>
);
}
今日思考 / Today's Reflections
思考1:从脚本到产品的鸿沟 / The Script-to-Product Gap
Day 51-57 写了很多酷炫的脚本:
RAG能搜索文档、Agent能调用工具、VLM能分析图片
但这些都只是"能跑"
Day 58 体会到"能跑"和"能用"之间的巨大差距:
能跑: python rag_pipeline.py → 输出答案
能用: 漂亮的界面、流式输出、错误处理、持久化
差距在哪?
1. 流式响应 — 没有流式,用户体验差10倍
2. 错误处理 — 脚本crash了重跑,产品不行
3. 状态管理 — 脚本没有历史记录和会话管理
4. 并发支持 — 脚本一次只能服务一个人
5. 部署 — 脚本只能在自己电脑跑
PM 的认知:
"AI模型只占产品价值的30%"
"剩下70%是工程、UX、运维"
这也是为什么"AI工程师"比"AI研究员"更稀缺
思考2:SSE是AI应用的标配 / SSE is the Standard for AI Apps
为什么所有AI产品都用流式输出?
心理学角度:
非流式:等3秒 → 一大段文字
流式: 立刻开始逐字出现
同样的等待时间,流式感觉快3倍
因为用户的"第一个字节时间"(TTFB)大幅缩短
技术角度:
LLM 本身就是一个 token 一个 token 生成
非流式 = 等所有 token 生成完 → 一次性返回
流式 = 每生成一个 token → 立刻发送
没有任何技术理由不用流式!
实现选择:
WebSocket → 全双工,但复杂
SSE → 服务器推送,简单轻量,够用
AI 应用大多是"请求-响应"模式,SSE完美匹配
不需要WebSocket的全双工能力
思考3:全栈能力是AI PM的杀手锏 / Full-Stack is a PM Superpower
传统PM:
写PRD → 交给开发 → 等2周 → 看到效果 → 改需求 → 再等2周
会全栈的AI PM:
想到一个idea → 3小时做出原型 → 直接给用户试
Day 58 证明了这是可能的:
FastAPI后端 ≈ 100行代码
Next.js前端 ≈ 200行代码
总共 ≈ 300行代码 = 一个可用的AI产品原型
这意味着:
验证idea的速度 × 10
和开发沟通的效率 × 5
对技术可行性的判断准确度 × 3
面试时能说:
"我不仅设计了这个产品方案"
"我还亲手实现了一个可用的原型"
"这是地址,你可以试用"
→ 这比100页PPT都有说服力
学习资源 / Resources
后端
- FastAPI 官方文档: https://fastapi.tiangolo.com/
- FastAPI SSE: https://github.com/sysid/sse-starlette
- Pydantic V2: https://docs.pydantic.dev/latest/
前端
- Next.js 文档: https://nextjs.org/docs
- react-markdown: https://github.com/remarkjs/react-markdown
- react-syntax-highlighter: https://github.com/react-syntax-highlighter/react-syntax-highlighter
- Tailwind CSS: https://tailwindcss.com/docs
部署
- Docker Compose: https://docs.docker.com/compose/
- Vercel (前端): https://vercel.com/docs
- Railway (后端): https://docs.railway.app/
- Modal (GPU): https://modal.com/docs
AI 全栈参考
- Open WebUI: https://github.com/open-webui/open-webui
- Chatbot UI: https://github.com/mckaywrigley/chatbot-ui
- Lobe Chat: https://github.com/lobehub/lobe-chat
明日预告 / Tomorrow's Preview
Day 59: 性能调优与成本实战 — 从能跑到跑得好
Day 58 做出了一个"能用"的产品
但"能用"和"好用"还差一步:
能用: 每次请求 5-10 秒
好用: 常见问题 <1 秒(缓存命中)
Day 59 将解决:
1. 延迟分析:到底慢在哪里?
2. 缓存实战:三层缓存让热门问题秒回
3. 成本核算:跑这个应用每月花多少钱?
4. 压测:能支持多少并发用户?
5. 监控:上线后怎么知道系统是否健康?
准备工作:
pip install locust # 压测工具
pip install prometheus_client # 监控指标
从"能跑"到"跑得好",这是上线前的最后一步!
Day 58 完成! 把7天的实战成果整合成了一个完整的全栈AI应用。 FastAPI后端统一API + Next.js前端流式UI + Docker一键部署。 明天进入性能调优,让产品不仅"能用"还"好用"!