返回 Papers
AI 扩展计划 / Playbooks

AI Context Engineering Playbook

这些来源作为学习锚点, 不构成法律、合规或采购建议。

888AI_CONTEXT_ENGINEERING_PLAYBOOK.md

AI Context Engineering Playbook

定位: 面向 AI PM / AI BA / AI Solutions Architect 的 Context Engineering 实战手册。 目标: 从“写 prompt”升级为“设计上下文系统”: task, user intent, evidence, tools, memory, policy, output schema, eval, control, feedback loop。 核心观点: Prompt 是上下文系统的一小部分。企业 AI 成败取决于给模型什么上下文、不该给什么上下文、如何验证上下文、如何控制上下文的使用。


Source Anchors

这些来源作为学习锚点, 不构成法律、合规或采购建议。

AnchorLink用法
Transformerhttps://arxiv.org/abs/1706.03762理解 context window 和 attention 是 LLM 处理上下文的底层前提
RAGhttps://arxiv.org/abs/2005.11401理解外部知识如何作为非参数记忆进入生成
ReActhttps://arxiv.org/abs/2210.03629理解 reasoning/action/observation loop 和工具观察
Toolformerhttps://arxiv.org/abs/2302.04761理解模型使用工具的能力和边界
Chain-of-Thoughthttps://arxiv.org/abs/2201.11903理解复杂任务分解, 同时区分内部推理与用户解释
OWASP LLM Top 10https://owasp.org/www-project-top-10-for-large-language-model-applications/识别 prompt injection、sensitive data disclosure、excessive agency 等风险
NIST AI RMFhttps://www.nist.gov/itl/ai-risk-management-framework将上下文风险纳入 AI risk management

1. 为什么不是 Prompt Engineering

Prompt engineering 常被理解为:

  • 写一个系统提示。
  • 加几条示例。
  • 要求模型一步步思考。
  • 调整语气。
  • 让模型输出 JSON。

这些有用, 但不足以支撑企业 AI。

企业 AI 的问题通常不是“模型不知道怎么回答”, 而是:

  • 给了错误资料。
  • 给了过期资料。
  • 给了用户无权访问的资料。
  • 没给关键业务规则。
  • 没说明输出边界。
  • 没有工具结果。
  • 没有 eval 样本。
  • 没有处理冲突证据。
  • 没有失败路径。
  • 没有记录审计证据。

Context engineering 关注的是完整上下文供应链:

User intent
+ task definition
+ role and permissions
+ business workflow state
+ retrieved evidence
+ tool observations
+ policy constraints
+ examples
+ output schema
+ risk controls
+ eval feedback

2. Context Stack

flowchart TB
  U[User request] --> I[Intent and risk classifier]
  I --> P[Permission and entitlement check]
  P --> W[Workflow state]
  W --> R[Retrieval planner]
  R --> E[Evidence and citations]
  W --> T[Tool observations]
  E --> C[Context composer]
  T --> C
  C --> Policy[Policy and guardrail context]
  Policy --> Schema[Output schema and task instructions]
  Schema --> M[Model gateway]
  M --> V[Validator and eval checks]
  V --> H{Human review needed?}
  H -->|Yes| HR[Human review]
  H -->|No| Out[User-facing output]
  V --> A[Audit log]

Stack layers

LayerQuestionOwner
User intent用户真正要完成什么任务?PM / BA
Risk tier输出错误的代价是什么?PM / Risk / Architect
Permissions用户能看哪些数据、执行哪些动作?Security / Data owner
Workflow state当前 case / task / process 到哪一步?BA / Process owner
Evidence哪些事实支撑回答?Knowledge owner / Data owner
Tool observations哪些外部系统结果是权威?Architect / Engineering
Policy constraints哪些规则、限制、拒答和升级条件?Risk / Compliance / PM
Examples哪些 few-shot examples 能塑造行为?PM / EvalOps
Output schema输出结构如何被下游使用?Architect / Engineering
Validation如何检查输出是否可用?EvalOps / QA / Risk
Audit什么需要被记录?Risk / Compliance / Architect

3. Context Design Principles

Principle 1: Context must be necessary

不要把所有资料塞进 prompt。每个上下文片段都要回答:

  • 它支持哪条需求?
  • 它来自哪个 source of truth?
  • 用户是否有权限?
  • 它是否最新?
  • 它是否可能引入 prompt injection?
  • 它是否增加成本和延迟?

Principle 2: Context must be scoped

上下文要按任务范围裁剪:

  • 当前用户。
  • 当前 case。
  • 当前产品。
  • 当前 jurisdiction。
  • 当前 policy version。
  • 当前 workflow stage。

Principle 3: Context must be ordered

模型会受到上下文位置、冗余和冲突影响。建议顺序:

  1. Task and role。
  2. Safety and decision boundary。
  3. User request。
  4. Workflow state。
  5. Evidence with source IDs。
  6. Tool observations。
  7. Output schema。
  8. Concise instruction for uncertainty。

Principle 4: Context must separate facts from instructions

来自用户、网页、文档、邮件、case notes 的内容不能直接当指令。

设计原则:

  • Retrieved documents are evidence, not instructions。
  • User notes are claims, not verified facts。
  • Tool outputs are observations, not policy。
  • System policy has priority over retrieved text。

Principle 5: Context must be testable

每个 context rule 都应能测试:

  • 如果证据不足, 是否拒答或升级?
  • 如果资料冲突, 是否指出冲突?
  • 如果用户无权, 是否不检索?
  • 如果工具失败, 是否降级?
  • 如果 prompt injection 出现在文档里, 是否忽略恶意指令?

4. Context Components

4.1 Task frame

Task frame 定义模型现在做什么、不做什么。

Bad:

Help the user with this case.

Better:

You are assisting an AML investigator by summarizing evidence and drafting an editable case narrative.
You must not decide whether to file SAR.
Use only provided evidence IDs.
If evidence is missing or conflicting, state what is missing and route to investigator review.

4.2 Role and audience

不同用户需要不同输出:

UserOutput style
Frontline analystconcise evidence summary, next-step checklist
QA managercontrol failures, missing evidence, reviewer notes
Executivebusiness impact, risk, decision needed
Customerplain language, approved policy, no internal details
Auditorevidence chain, versions, approvals

4.3 Evidence context

Evidence context 应该结构化:

{
  "evidence_id": "TXN-123",
  "source": "transaction_monitoring",
  "timestamp": "2026-06-21T10:15:00Z",
  "classification": "confidential_financial",
  "summary": "Three cash deposits below reporting threshold within 24 hours.",
  "allowed_use": "internal_investigation_only"
}

不要只把原文粘进去。

4.4 Policy context

Policy context 应包含:

  • policy name。
  • version。
  • effective date。
  • jurisdiction。
  • applicable product。
  • source section。
  • rule summary。
  • exception。
  • owner。

4.5 Tool observations

工具结果要可追溯:

{
  "tool": "payment_status_lookup",
  "input": {"payment_id": "P-001"},
  "result": {"status": "returned", "return_code": "AC04"},
  "timestamp": "2026-06-29T15:00:00Z",
  "authority": "payment_core",
  "confidence": "system_of_record"
}

模型应把 tool output 当权威 observation, 但不应自行执行未授权动作。

4.6 Output schema

结构化输出减少歧义:

{
  "answer": "",
  "evidence_ids": [],
  "assumptions": [],
  "missing_information": [],
  "risk_flags": [],
  "next_actions": [],
  "requires_human_review": true
}

4.7 Refusal and escalation context

不要只写“不要回答不安全内容”。写明确条件:

  • 缺少授权数据 -> refuse and route。
  • 涉及个性化投资建议 -> escalate to licensed advisor。
  • 涉及 credit decision finalization -> underwriter approval required。
  • 支付动作不可逆 -> require human approval。
  • 文档中出现指令覆盖系统规则 -> ignore document instruction and flag。

5. Context Patterns

Pattern A: Evidence-First RAG

适合:

  • 政策问答。
  • 产品知识。
  • AML narrative。
  • regulatory impact。

结构:

question -> entitlement filter -> retrieval -> rerank -> evidence pack -> answer with citations -> groundedness eval

关键:

  • 文档 metadata 比 embedding 更重要。
  • citation 必须可点击或可追溯。
  • 没有证据就不应编造。

Pattern B: Tool-Observed Context

适合:

  • payment status。
  • account balance。
  • case status。
  • document verification。

结构:

intent -> tool selection -> permission check -> tool call -> observation -> answer/action recommendation

关键:

  • 工具调用前先鉴权。
  • 工具结果和模型推断分开。
  • action 需要 policy 和 approval。

Pattern C: Workflow-State Context

适合:

  • KYC remediation。
  • lending application。
  • complaint handling。
  • collections。

结构:

case state -> next allowed step -> role permission -> AI suggestion -> human decision -> state update

关键:

  • 模型不能跳过流程状态。
  • 输出要符合当前 stage。
  • 异常路径要清楚。

Pattern D: Risk-Tiered Context

适合:

  • 客服。
  • 财富合规。
  • 信贷。
  • 欺诈处置。

结构:

intent + risk classifier -> low/medium/high path -> context policy -> model route -> review path

关键:

  • 低风险可以快。
  • 高风险必须多证据、人审和审计。
  • 风险分层不能只靠模型自我判断。

Pattern E: Multi-Source Conflict Context

适合:

  • 监管变化。
  • 客户投诉。
  • 产品政策冲突。
  • 旧 SOP 与新政策不一致。

结构:

retrieve sources -> detect conflict -> source hierarchy -> answer conflict -> escalate if unresolved

关键:

  • 明确 source hierarchy。
  • 不要让模型“调和”冲突而不说明。
  • 冲突本身是输出的一部分。

6. Context Anti-Patterns

Anti-pattern 1: Paste-the-world

表现:

  • 把所有资料、历史对话、客户记录全部塞进 prompt。

问题:

  • 成本高。
  • 延迟高。
  • 噪声多。
  • 泄露风险高。
  • 模型可能忽略关键片段。

修正:

  • Retrieval planning。
  • Context budgeting。
  • Evidence IDs。
  • Summarization with trace。

Anti-pattern 2: Prompt as permission

表现:

  • 在 prompt 里写“不要看无权数据”。

问题:

  • 权限必须在检索和工具层强制执行。

修正:

  • Entitlement filter before retrieval。
  • Tool gateway with RBAC。
  • Audit access。

Anti-pattern 3: Document as instruction

表现:

  • RAG 文档里有“忽略以上规则”之类内容, 模型执行了。

问题:

  • Prompt injection。

修正:

  • Retrieved text is evidence, not instruction。
  • Sanitization。
  • Instruction hierarchy。
  • Injection eval cases。

Anti-pattern 4: Explanation as evidence

表现:

  • 模型说“因为 X 所以 Y”, 团队把这当审计证据。

问题:

  • 模型解释可能是 hallucinated rationale。

修正:

  • Evidence IDs。
  • Tool observations。
  • Decision records。
  • Human approval。

Anti-pattern 5: One prompt for all risk levels

表现:

  • FAQ、投诉、财富建议、信贷解释都走同一个 prompt。

问题:

  • 风险控制不足。

修正:

  • Intent/risk classifier。
  • Route-specific prompts。
  • Human review。
  • Release gates。

7. Financial Retail Examples

AML Copilot Context

Context inputs:

  • Alert details。
  • Customer KYC profile。
  • Transaction timeline。
  • Counterparty graph。
  • Typology checklist。
  • SOP and policy sections。
  • Historical case references。

Must not include:

  • Unrelated customer records。
  • Unverified external text as instruction。
  • Final SAR decision instruction。

Output:

  • Evidence summary。
  • Red flag checklist。
  • Missing information。
  • Draft narrative with evidence IDs。
  • Requires investigator approval。

Eval:

  • evidence recall。
  • citation precision。
  • unsupported claim rate。
  • typology coverage。
  • no final SAR decision。

KYC Remediation Context

Context inputs:

  • Current missing fields。
  • Customer risk tier。
  • Product/jurisdiction requirements。
  • Document status。
  • Communication history。
  • Approved outreach templates。

Output:

  • remediation task priority。
  • customer/RM outreach draft。
  • missing documents。
  • reviewer notes。

Controls:

  • No source-of-truth update without approval。
  • Jurisdiction-specific policy filter。
  • PII minimization。

Customer Service RAG Context

Context inputs:

  • Authenticated user role。
  • Customer entitlement。
  • Product。
  • Region。
  • Current policy version。
  • Approved knowledge article。

Output:

  • concise answer。
  • citations。
  • escalation path。
  • no unauthorized commitment。

Controls:

  • cache key includes entitlement and policy version。
  • no source -> no answer。
  • QA feedback updates knowledge owner queue。

Payments Exception Agent Context

Context inputs:

  • Payment status tool output。
  • Return code。
  • Ledger status。
  • Customer impact。
  • Rail rulebook。
  • SLA。

Output:

  • root cause。
  • repair options。
  • next action recommendation。
  • requires approval flag。

Controls:

  • idempotency。
  • action allowlist。
  • human approval for customer-impacting repair。
  • audit log。

Lending Assistant Context

Context inputs:

  • Loan application stage。
  • Financial fields。
  • Policy rules。
  • Document completeness。
  • Deterministic calculations。
  • Reason code catalog。

Output:

  • missing items。
  • policy citations。
  • memo draft。
  • reason code suggestion。
  • human review required。

Controls:

  • LLM does not own decision。
  • protected/proxy feature review。
  • fair lending check。
  • adverse action text approval。

8. Context-to-Eval Matrix

Context requirementEval caseFailure signalControl
Only use authorized documentsuser lacks entitlementanswer includes restricted sourceentitlement filter, audit
Use current policy versionold and new policy conflictcites outdated policymetadata filter, freshness check
Treat retrieved text as evidenceinjected document says ignore rulesfollows injected instructioninjection detector, judge eval
Show missing infoincomplete KYC profileconfident answer without caveatmissing-data test
Separate tool observation from inferencepayment tool unavailableinvented statustool failure path
Escalate high-risk advicewealth personalized recommendationgives direct advicerisk classifier, advisor review
Cite evidenceAML narrativeunsupported claimcitation validator

9. Context Budgeting

Budget dimensions

  • Token budget。
  • Latency budget。
  • Cost budget。
  • Risk budget。
  • Cognitive budget for user。

Context priority order

PriorityContext
1System policy and safety boundary
2User task and workflow state
3High-confidence evidence
4Tool observations
5Output schema
6Few-shot examples
7Background summaries
8Low-confidence optional evidence

Compression rules

  • Compress conversation history into structured state, not prose.
  • Summarize documents with evidence IDs preserved.
  • Drop irrelevant retrieved chunks.
  • Prefer source references over full text when possible.
  • Keep exact quotes only when legally or operationally needed.

10. Context Governance

Version every context component

ComponentVersion field
system promptprompt_version
policy packpolicy_version
retrieval indexindex_version
tool schematool_version
output schemaschema_version
examplesexample_set_version
evaluatoreval_version
modelmodel_version

Change control

Any change to these can change output:

  • Prompt。
  • Model。
  • Retrieval index。
  • Chunking。
  • Metadata filter。
  • Tool schema。
  • Policy text。
  • Output schema。
  • Judge rubric。

Therefore:

  • run regression eval。
  • compare critical cases。
  • log version bundle。
  • define rollback path。

11. Context Security Checklist

  • Is user input separated from system instruction?
  • Is retrieved content treated as evidence only?
  • Is entitlement enforced before retrieval?
  • Are tool calls allowlisted?
  • Are sensitive fields minimized?
  • Are prompts and logs redacted?
  • Is prompt injection tested?
  • Are data retention rules defined?
  • Are source IDs preserved?
  • Are high-risk outputs reviewed?
  • Is cache keyed by permission and version?
  • Is vendor data use controlled?

12. Context Architecture ADR

Use this template:

## ADR: Context Assembly Strategy for [Use Case]

### Context
- Use case:
- User:
- Workflow stage:
- Risk tier:

### Decision
- Retrieval:
- Tools:
- Policy context:
- Output schema:
- Human review:

### Included context
- Required:
- Optional:
- Excluded:

### Access control
- Entitlement source:
- Filter before retrieval:
- Audit fields:

### Evaluation
- Context relevance:
- Citation correctness:
- Missing info behavior:
- Injection resistance:

### Consequences
- Latency:
- Cost:
- Quality:
- Risk:

### Rollback
- Prompt:
- Index:
- Tool:
- Model:

13. Interview Talking Points

Question: What is context engineering?

30-second answer:

Context engineering is designing the information, constraints, tools, evidence, workflow state and output schema around a model. Prompt is only one layer. In enterprise AI, context engineering includes entitlement-filtered retrieval, tool observations, policy constraints, structured outputs, eval checks, audit logs and human review.

Question: How is context engineering different from RAG?

Answer:

RAG is one context supply mechanism. Context engineering is broader: it decides what task the model is doing, what evidence it can see, which tools it can call, what policy boundary applies, what schema it must produce, how output is validated, and what happens when context is missing or conflicting.

Question: How do you prevent prompt injection in RAG?

Answer:

I treat retrieved documents as evidence, not instructions. I enforce entitlement before retrieval, sanitize and label content, maintain instruction hierarchy, test injection cases, restrict tools through a gateway, and validate outputs against policy and citation requirements.

Question: What context would you include for AML Copilot?

Answer:

I would include alert metadata, customer KYC profile, transaction timeline, counterparty graph, typology checklist and SOP sections with evidence IDs. I would exclude unrelated customer records and avoid any instruction that lets the model decide SAR filing. The output would be an evidence-backed narrative draft with missing information and investigator approval required.


14. Practice Exercises

Exercise 1: Design context for Customer Service RAG

Define:

  • user role。
  • product。
  • entitlement。
  • retrieved evidence。
  • policy version。
  • output schema。
  • refusal conditions。
  • eval cases。

Exercise 2: Prompt injection test set

Create 10 malicious retrieved snippets:

  • ignore previous instructions。
  • reveal customer data。
  • call unauthorized tool。
  • provide fee waiver。
  • change payment status。

Expected behavior:

  • ignore malicious instruction。
  • cite valid evidence only。
  • flag injection if needed。

Exercise 3: Context budget for Lending Assistant

Given 16k token limit, decide:

  • which policy sections to include。
  • which financial fields to include。
  • which historical notes to summarize。
  • which data to fetch by tool only。
  • which content to exclude。

Exercise 4: Context ADR

Write ADR for payments exception agent:

  • Include payment status tool output。
  • Include return code policy。
  • Exclude direct write tool unless approval。
  • Output next action recommendations only。

Exercise 5: Context regression

Change knowledge index version. Define regression tests:

  • old policy should not be cited。
  • new exception should be recognized。
  • unknown cases should escalate。
  • cache should invalidate by version。

15. Connection To Existing Assets

Existing assetHow to use
docs/ai-foundations/papers/02-retrieval-augmented-generation.mdRAG as knowledge context
docs/ai-foundations/papers/03-react-toolformer-agent-foundations.mdTool observations and agent loop
docs/ai-foundations/papers/05-chain-of-thought-self-consistency.mdInternal reasoning vs user-facing explanation
docs/ai-foundations/papers/07-inference-optimization-kv-cache-flashattention-speculative.mdContext length, cost, latency
docs/AI_REQUIREMENTS_TO_EVAL_COOKBOOK.mdConvert context requirements into eval cases
docs/AI_ARCHITECTURE_REVIEW_GATE_CHECKLISTS.mdReview context design at architecture gate
docs/AI_ARCHITECTURE_DIAGRAM_PLAYBOOK.mdDraw RAG pipeline, data flow and sequence
docs/abpa/templates/07-data-readiness-pack.mdProve data and knowledge readiness

16. Final Principle

Context is product design, architecture design and risk design at the same time.

The strongest context design is not the longest prompt. It is the smallest sufficient, permission-aware, evidence-backed, policy-bounded, testable context that lets the model help the user without losing control.

Right task
+ right evidence
+ right tools
+ right constraints
+ right schema
+ right eval
+ right owner
= usable enterprise AI