Context Engineering: The Skill That Turns Claude Into a Production Co-Developer

There is a finite amount of paper your model can write on.

Everything you’ve ever read about getting the most out of an AI agent, every tip about prompts, every technique for “making Claude smarter”, collapses into managing what fits on that paper.

This is context engineering. It’s not a mindset. It’s not a philosophy. It’s a set of mechanical decisions about what tokens go into the context window, what tokens get compressed, and what tokens never make it in at all.

Get it right and your agent finishes its work, ships, and moves on. Get it wrong and your agent enters a doom loop of re-reading the same files, summarizing its own findings, hallucinating results, and asking you for context it should have inferred.

In our autonomous development cluster at Chipp, context engineering is the difference between an agent that ships 25 production changes a day and one that fills a context window with 90 tool calls of confusion before timing out. We’ve made the mistakes. This post is the rules I wish I’d known a year ago.

What’s actually in your context window

Every Claude Code session has six things competing for the same paper:

The system prompt. Your CLAUDE.md, plus any subdirectory CLAUDE.mds loaded by the hub-and-spoke pattern.
The tool definitions. Every MCP server adds its tool schemas. The descriptions are part of every prompt.
The conversation history. Every prior user message, assistant message, and tool call result since the session started.
The tool results. File contents you read. Grep results. Database query rows. Browser screenshots. All of these get serialized into context.
The reasoning scratch space. Where the model thinks. Internal chain-of-thought tokens that don’t show up in the final output.
The current user message and the pending response.

You don’t get to choose how the model allocates between these. You get to choose what’s available to fill them.

A typical Claude Sonnet/Opus session has 1M tokens of context. Your CLAUDE.md should fit in under 25k of that, about 5% of the budget. Tool definitions for a typical autonomous setup will run another 20–50k. That leaves roughly 900k for actual work.

The math sounds generous until you read three files of 1,500 lines each, run a few grep commands, and take two browser screenshots. You’re 200k tokens into your remaining budget before the agent has done anything productive.

The compaction trap

When the context window fills, the model doesn’t gracefully degrade. It hits compaction: a smaller, cheaper model summarizes the entire conversation into a paragraph and replaces the original tokens. The session continues, but the model wakes up with vague recollections instead of specifics.

What you lose in a compaction:

The exact file contents you read.
The specific stack traces and error messages from earlier in the session.
The reasoning chain that led to the current state.
Specific line numbers, variable names, and database row values.

What survives:

The system prompt (your CLAUDE.md).
A paragraph summary of everything that was compacted.
The most recent few messages.

A summary written by a cheap model is a worse representation of the past than the actual past. After a compaction, the agent is reasoning over a faded photocopy of its memory. This is where hallucinations come from.

“Compaction is incredibly destructive. You really want to avoid compactions at all cost.” — Hunter Hodnett, Chipp CTPO

The corollary: your CLAUDE.md is your highest-leverage file. It survives compactions. Everything else is in danger.

The four core moves

Context engineering, in practice, is four moves you make over and over. None of them are hard. The discipline is doing them consistently.

Move 1: Stabilize the system prompt for KV-cache hits

Anthropic’s API caches input tokens. If your system prompt is byte-identical between two requests, the cached version costs you a fraction of the original.

This sounds obvious until you realize how easy it is to bust the cache by accident. We were burning through input tokens at full price for months before we figured out our bug.

Our CLAUDE.md had a line that injected the current date so the agent would know what day it was. We were injecting the date down to the second:

The current date and time is: 2026-05-06T14:23:47.318Z

That value changed on every request. The cache busted on every request. We were paying full price for a 25k-token system prompt thousands of times a day.

The fix was trivial:

The current date is: 2026-05-06

KV cache hits jumped from near-zero to over 90%. Token spend dropped accordingly.

The general rule: anything in your system prompt that varies between calls, timestamps, request IDs, randomly-ordered lists, busts the cache. Make the system prompt stable. Inject volatile context as user messages, not as system prompt content.

Move 2: One context window per goal

The single biggest mistake teams make is trying to do too much in one session.

You start a Claude Code session. You ask it to investigate a bug. It reads ten files, runs a few grep commands, queries the database. It forms a hypothesis. You ask it to implement the fix. It writes the code, runs the tests. You ask it to review the code. It edits a few things. You ask it to update the docs. By now you’re six tool calls past compaction and the agent’s reasoning has gone fuzzy.

The fix: break the work into stages, and start a fresh context window for each stage.

This is what our Bug Bot pipeline does. Five stages, research, implement, review, docs, push, and each stage is a separate session. The output of one stage is a markdown file, which becomes the input for the next.

Stage 1 fills its context window with research and outputs a plan.md. Stage 2 starts fresh, reads only plan.md, writes the code. Stage 3 starts fresh, reads only the diff, reviews it.

No stage ever runs out of room because no stage tries to do everything.

Move 3: Use sub-agents to dilute

Some work is inherently context-heavy. Investigating a Kubernetes pod restart can require reading thousands of lines of logs, querying multiple endpoints, cross-referencing deploy histories. If you do this in your main session, you’ve burned your budget.

The solution: spawn a sub-agent. The main session calls the sub-agent like any tool, the sub-agent gets its own fresh 1M-token context window, it does whatever investigation it needs, and it returns a one-paragraph insight.

The first time we used this in production was for an infrastructure issue. Pods were restarting; we didn’t know why. I prompted the main session: “Figure out why our pods are restarting.” It spawned an infra-ops sub-agent we’d configured with all our Kubernetes runbooks.

The sub-agent ran 47 kubectl commands. Queried Loki for recent error patterns. Cross-referenced the deploy history. Filled almost a full context window with raw evidence.

Then it returned one sentence: “OOM after the last deploy, memory limit too low; recommend bumping the limit from 512Mi to 1Gi.”

That sentence, 23 tokens, was what landed in my main session. The 950k tokens of evidence stayed in the sub-agent’s context, where it belonged.

Use sub-agents for any work where the answer is short but the investigation is long.

Move 4: Pre-load with an auto-load table

Hub-and-spoke CLAUDE.md works for static, location-based context. But sometimes you want context to load based on what the agent is doing, not where in the codebase it is.

We built an auto-load table for this. At the top of our root CLAUDE.md, we have a small markdown table:

## Auto-load table
| Mention | Read |
|---|---|
| billing, stripe, payment, subscription | docs/billing.md |
| auth, login, session, oauth | docs/auth.md |
| websocket, realtime, streaming | docs/realtime.md |
| voice, livekit, transfer | docs/voice-agents.md |

The pattern: when a prompt mentions any of these keywords, the agent reads the corresponding doc into context before starting work.

We don’t load the docs in CLAUDE.md itself, that would burn the budget on every session, even sessions that don’t need them. We load them dynamically, only when relevant.

This is how I keep my root CLAUDE.md lean while still giving the agent rich context for specific subsystems.

“I have my autonomous AI cluster updating its own CLAUDE.md. I honestly barely know what’s in there these days.” — Hunter Hodnett, Chipp CTPO

The mental model

Picture the context window as a single sheet of paper, fixed font size.

When you read a file, you’ve copied that file onto the paper. When you run a grep, you’ve copied the result. When the agent reasons, it’s writing on the paper.

Run out of room and the paper gets folded. A cheap intern reads everything you wrote and replaces it with a paragraph summary on a fresh sheet. You keep working but you’ve lost the details.

The discipline of context engineering is engineering what gets written on the paper before it runs out, and never letting the cheap intern get involved.

Patterns we use every day

Beyond the four core moves, here are the patterns that show up most in our daily work.

Fresh-context handoff via markdown

Pipeline stages communicate by writing markdown files to disk. Stage 1’s last action is Write plan.md. Stage 2’s first action is Read plan.md. Stage 1’s context window is gone forever, but the distilled insight survives.

This is the same pattern as the sub-agent dilution move, applied to sequential work.

Three-strikes-then-rule for `CLAUDE.md`

Don’t add a rule to CLAUDE.md after a single mistake. Wait for the same class of mistake to happen three times. Otherwise your CLAUDE.md bloats with one-off lessons that never recur, and the truly important rules get diluted.

Three strikes is a heuristic, not a hard rule. The point is to be conservative about what gets the elevated status of “every-session context.”

Hub-and-spoke directory loading

Place a CLAUDE.md in any subdirectory where the rules differ from the root. Claude Code automatically reads the nearest CLAUDE.md when it reads a file in that directory.

We have CLAUDE.md files in:

src/db/. ORM-specific rules (we use Kysely, not Drizzle; never let the agent forget)
src/api/. API conventions (Hono routing, error-handling patterns)
src/components/, design system rules (CSS variables only, never hex codes)
tests/, test framework conventions

The agent loads the right one without me having to tell it.

Kill the summary mid-stream

When you see Claude write something like “I’ve now read several files. Let me summarize what I learned…” in the middle of an interactive session, stop it. That summary is about to land in your context as the canonical record of what the agent did. You want the evidence, not a pre-compaction.

Tell it: “Don’t summarize. I want to see the actual results.”

This matters less in autonomous pipeline runs, you’re not watching those, but the underlying principle is general: prefer raw artifacts over the agent’s interpretation of artifacts.

Use the most powerful model: every time

When teams ask me how to save money on token spend, the first thing I say is: don’t.

Use the most expensive model. Always. Even when it feels wasteful.

The reason is that frontier models hallucinate less, plan better, and finish work in fewer total tokens. A cheaper model in an autonomous setting will burn more total tokens chasing its own mistakes than a frontier model would have spent doing the work right the first time.

This is even more true in the early weeks after a model release. Frontier labs subsidize new models, they serve the highest-quality version at launch and gradually quantize them down to cheaper-to-serve versions over the following weeks. If you’re going to do hard work with an autonomous agent, do it in the first weeks after a release, when the model is at its sharpest.

Write your context-engineering scars into auto-load docs

When you encounter a context-engineering failure, say, the agent kept reading the wrong file because it didn’t have enough context about a subsystem, don’t put the lesson in your root CLAUDE.md. Write a doc into /docs/, add a row to the auto-load table, and move on.

Auto-loaded docs are scoped to relevance. Root CLAUDE.md is global. Match the scope of the lesson to the scope of the file.

Capture every run’s data

Every Claude Code session you run produces a record of how a frontier model reasoned about a real problem in your codebase. That’s training data, the kind that, six months from now, you might want to fine-tune a cheaper specialized model on. The builders who treat their pipeline outputs as a strategic data asset, instead of throwing them away after each run, will end up with the only kind of moat that compounds in this industry.

Even if you never do anything with the data, archive it. Storage is cheap. Past inference is irreplaceable.

How you know it’s working

You know context engineering is working when:

The agent finishes work without compacting.
Re-running similar tasks gives consistent quality.
You can launch sessions, walk away, and come back to shipped code.
The agent stops asking you for context it should have inferred.
Your CLAUDE.md grows by a line or two per week, not per day.

You know it’s not working when:

The agent hits compaction in the middle of routine tasks.
You see the same hallucinations repeatedly (this is your scar tissue not yet hardened into rules).
The agent reads the same files in every session because there’s no doc layer to load them once and remember.
Token spend per task is going up, not down.

If you’re in the second category: start with one core move at a time. Get the system prompt stable for KV cache hits. Then split your sessions one-context-per-goal. Then add sub-agents for any investigation that fills the budget. Auto-load tables are the polish; the moves above are the foundation.

What’s next

Context engineering is the foundational discipline of autonomous development, but it’s only the first layer. Layered on top of it:

CLAUDE.md Architecture: the hub-and-spoke pattern, scar tissue practice, and auto-load tables in detail.
Skills vs Sub-Agents: when to use which, and why the distinction matters.
MCP Is the USB-C of AI: building the senses that let your agent verify its own work.
Building a Self-Healing Bug Bot: context engineering applied end-to-end in a production pipeline.

If you want to see what context engineering enables in production, start with The Autonomous Development Manifesto.

If you want to see what it looks like when it goes wrong, run an interactive Claude Code session for an hour without thinking about any of this. Watch your token spend. Watch the agent compact. Watch it hallucinate. That’s the baseline. Everything above is what we do to escape it.

Join the Alchemist waitlist →