Lever 02: every turn, you assemble the exact tokens the model sees. Curating that payload is the single biggest factor in how the agent performs.
Step 1 of the six-move turn was “assemble context.” It sounds like plumbing. It isn’t — Anthropic calls context engineering “the natural progression of prompt engineering” and the dominant lever on agent reliability.1 Prompt engineering tunes one message; context engineering governs the entire state — system instructions, tools, external data, and the whole message history — and re-decides it on every single turn.1
That principle has a non-obvious consequence: more context is not safer. The instinct to “just include everything in case it’s needed” is the main thing this lever fights. Below are the four places you spend the budget — and here they are in one function:
def build_context(state):
return [
system_prompt, # 1 · right altitude — not brittle, not vague
*tool_defs(minimal=True), # 2 · few, non-overlapping tools
*retrieve_just_in_time(state), # 3 · pull data on demand, not up front
*compact(state.history), # 4 · summarise; drop redundant tool output
] # goal: the smallest high-signal set
The failure mode at one extreme is hardcoding brittle if-this-then-that logic; at the other, vague guidance that gives the model no concrete signal.1 Aim for the “right altitude”: specific enough to guide behaviour, flexible enough to leave the model strong heuristics. Organise it with clear sections (XML tags or Markdown headers) so each region of guidance is legible.1
Tools are context too: their definitions sit in the window every turn. Keep the set small, self-contained, and unambiguous, returning token-efficient results.1 The litmus test is sharp:
If a human engineer can’t definitively say which tool should be used in a given situation, an AI agent can’t be expected to do better. — Anthropic, Effective Context Engineering
This is the ACI (Lever 03) from Lesson 1, seen through the context budget: bloated, overlapping tool sets create ambiguous decision points and burn tokens. Hold the idea that tool definitions are spent budget — here it is concretely:
# ✗ overlapping — neither you nor the agent can pick confidently
search(q) · find(q) · lookup(q) · query_db(q)
# ✓ one obvious tool per job; returns ids, not whole files
search_docs(q) -> [{path, snippet}]
Don’t stuff all the data up front. Let the agent hold lightweight identifiers — file paths, queries, links — and pull the actual content with tools when it needs it.1 This mirrors human cognition: we don’t memorise the corpus, we keep an index and look things up. The agent assembles understanding layer by layer, keeping only what’s necessary in working memory. The trade-off is honest — runtime exploration is slower than pre-computed data — so a hybrid (load a little up front, explore the rest) is often best.1
When a task outruns a single context window, you don’t get a bigger bucket — you manage the one you have. Three techniques, each suited to a different shape of work:1
| Technique | What it does | Best when |
|---|---|---|
| Compaction | Summarise the conversation near the limit; restart the window from the summary. Keep decisions & open bugs; drop redundant tool output. | Long back-and-forth that must keep flowing |
| Structured notes | Agent writes notes to memory outside the window, pulls them back when relevant. Persistent memory, minimal overhead. | Iterative work with clear milestones |
| Sub-agents | A clean-context sub-agent does the deep work, returns only a ~1–2k-token distilled summary. | Parallel research; isolating detail |
Notice the through-line: all three protect the same scarce resource by keeping high-signal tokens in, and pushing raw detail out to where it can be recalled on demand.
You can now audit any turn’s context against one principle — smallest high-signal set — and locate waste in four places: an off-altitude system prompt, overlapping tools, eager pre-loading, and history you should have compacted. That audit is the context lever.
Retrieve from memory. (One question reaches back to Lesson 2 — that’s deliberate.)
Anthropic — Effective Context Engineering for AI Agents. The definitive treatment of this lever: attention budget, right-altitude prompts, just-in-time retrieval, and the three long-horizon techniques. ~20 minutes, and it pairs directly with this lesson.
TradingAgents loop? Ask away.