{
  "slug": "context-engineering",
  "url": "https://adaas.dev/blog/context-engineering",
  "formats": {
    "html": "https://adaas.dev/blog/context-engineering",
    "markdown": "https://adaas.dev/blog/context-engineering.md",
    "plaintext": "https://adaas.dev/blog/context-engineering.txt",
    "json": "https://adaas.dev/blog/context-engineering.json"
  },
  "title": "Context Engineering: The Skill That Turns Claude Into a Production Co-Developer",
  "description": "Context engineering is the foundational discipline of autonomous development, and the source of most of your hallucinations, token bills, and pipeline failures. Four core moves, the patterns we run every day at Chipp, and how to know it's working.",
  "publishedAt": "2026-04-30",
  "updatedAt": null,
  "author": "Hunter Hodnett",
  "authorRole": "Co-founder & CTPO, Chipp",
  "authorUrl": null,
  "category": "Engineering",
  "tags": [
    "context-engineering",
    "claude-code",
    "claude-code-best-practices",
    "autonomous-development"
  ],
  "keywords": [
    "context engineering",
    "claude code best practices",
    "claude.md",
    "context window",
    "autonomous coding agents",
    "kv cache claude"
  ],
  "coverImage": null,
  "readingMinutes": 13,
  "canonicalUrl": "https://adaas.dev/blog/context-engineering",
  "bodyMarkdown": "There is a finite amount of paper your model can write on.\n\nEverything you've ever read about getting the most out of an AI agent, every tip about prompts, every technique for \"making Claude smarter\", collapses into managing what fits on that paper.\n\nThis is context engineering. It's not a mindset. It's not a philosophy. It's a set of mechanical decisions about what tokens go into the context window, what tokens get compressed, and what tokens never make it in at all.\n\nGet it right and your agent finishes its work, ships, and moves on. Get it wrong and your agent enters a doom loop of re-reading the same files, summarizing its own findings, hallucinating results, and asking you for context it should have inferred.\n\nIn our [autonomous development cluster](/blog/autonomous-development) at Chipp, context engineering is the difference between an agent that ships 25 production changes a day and one that fills a context window with 90 tool calls of confusion before timing out. We've made the mistakes. This post is the rules I wish I'd known a year ago.\n\n## What's actually in your context window\n\nEvery Claude Code session has six things competing for the same paper:\n\n1. **The system prompt.** Your `CLAUDE.md`, plus any subdirectory `CLAUDE.md`s loaded by the hub-and-spoke pattern.\n2. **The tool definitions.** Every MCP server adds its tool schemas. The descriptions are part of every prompt.\n3. **The conversation history.** Every prior user message, assistant message, and tool call result since the session started.\n4. **The tool results.** File contents you read. Grep results. Database query rows. Browser screenshots. All of these get serialized into context.\n5. **The reasoning scratch space.** Where the model thinks. Internal chain-of-thought tokens that don't show up in the final output.\n6. **The current user message and the pending response.**\n\nYou don't get to choose how the model allocates between these. You get to choose what's available to fill them.\n\nA typical Claude Sonnet/Opus session has 1M tokens of context. Your `CLAUDE.md` should fit in under 25k of that, about 5% of the budget. Tool definitions for a typical autonomous setup will run another 20–50k. That leaves roughly 900k for actual work.\n\nThe math sounds generous until you read three files of 1,500 lines each, run a few grep commands, and take two browser screenshots. You're 200k tokens into your remaining budget before the agent has done anything productive.\n\n## The compaction trap\n\nWhen the context window fills, the model doesn't gracefully degrade. It hits **compaction**: a smaller, cheaper model summarizes the entire conversation into a paragraph and replaces the original tokens. The session continues, but the model wakes up with vague recollections instead of specifics.\n\nWhat you lose in a compaction:\n\n- The exact file contents you read.\n- The specific stack traces and error messages from earlier in the session.\n- The reasoning chain that led to the current state.\n- Specific line numbers, variable names, and database row values.\n\nWhat survives:\n\n- The system prompt (your `CLAUDE.md`).\n- A paragraph summary of everything that was compacted.\n- The most recent few messages.\n\nA summary written by a cheap model is a worse representation of the past than the actual past. After a compaction, the agent is reasoning over a faded photocopy of its memory. This is where hallucinations come from.\n\n> \"Compaction is incredibly destructive. You really want to avoid compactions at all cost.\"\n> — Hunter Hodnett, Chipp CTPO\n\nThe corollary: your `CLAUDE.md` is your highest-leverage file. It survives compactions. Everything else is in danger.\n\n## The four core moves\n\nContext engineering, in practice, is four moves you make over and over. None of them are hard. The discipline is doing them consistently.\n\n### Move 1: Stabilize the system prompt for KV-cache hits\n\nAnthropic's API caches input tokens. If your system prompt is byte-identical between two requests, the cached version costs you a fraction of the original.\n\nThis sounds obvious until you realize how easy it is to bust the cache by accident. We were burning through input tokens at full price for months before we figured out our bug.\n\nOur `CLAUDE.md` had a line that injected the current date so the agent would know what day it was. We were injecting the date down to the second:\n\n```\nThe current date and time is: 2026-05-06T14:23:47.318Z\n```\n\nThat value changed on every request. The cache busted on every request. We were paying full price for a 25k-token system prompt thousands of times a day.\n\nThe fix was trivial:\n\n```\nThe current date is: 2026-05-06\n```\n\nKV cache hits jumped from near-zero to over 90%. Token spend dropped accordingly.\n\nThe general rule: anything in your system prompt that varies between calls, timestamps, request IDs, randomly-ordered lists, busts the cache. Make the system prompt stable. Inject volatile context as user messages, not as system prompt content.\n\n### Move 2: One context window per goal\n\nThe single biggest mistake teams make is trying to do too much in one session.\n\nYou start a Claude Code session. You ask it to investigate a bug. It reads ten files, runs a few grep commands, queries the database. It forms a hypothesis. You ask it to implement the fix. It writes the code, runs the tests. You ask it to review the code. It edits a few things. You ask it to update the docs. By now you're six tool calls past compaction and the agent's reasoning has gone fuzzy.\n\nThe fix: break the work into stages, and start a fresh context window for each stage.\n\nThis is what our [Bug Bot pipeline](/blog/self-healing-bug-bot) does. Five stages, research, implement, review, docs, push, and each stage is a separate session. The output of one stage is a markdown file, which becomes the input for the next.\n\nStage 1 fills its context window with research and outputs a `plan.md`.\nStage 2 starts fresh, reads only `plan.md`, writes the code.\nStage 3 starts fresh, reads only the diff, reviews it.\n\nNo stage ever runs out of room because no stage tries to do everything.\n\n### Move 3: Use sub-agents to dilute\n\nSome work is inherently context-heavy. Investigating a Kubernetes pod restart can require reading thousands of lines of logs, querying multiple endpoints, cross-referencing deploy histories. If you do this in your main session, you've burned your budget.\n\nThe solution: spawn a sub-agent. The main session calls the sub-agent like any tool, the sub-agent gets its own fresh 1M-token context window, it does whatever investigation it needs, and it returns a one-paragraph insight.\n\nThe first time we used this in production was for an infrastructure issue. Pods were restarting; we didn't know why. I prompted the main session: *\"Figure out why our pods are restarting.\"* It spawned an `infra-ops` sub-agent we'd configured with all our Kubernetes runbooks.\n\nThe sub-agent ran 47 `kubectl` commands. Queried Loki for recent error patterns. Cross-referenced the deploy history. Filled almost a full context window with raw evidence.\n\nThen it returned one sentence: *\"OOM after the last deploy, memory limit too low; recommend bumping the limit from 512Mi to 1Gi.\"*\n\nThat sentence, 23 tokens, was what landed in my main session. The 950k tokens of evidence stayed in the sub-agent's context, where it belonged.\n\nUse sub-agents for any work where the answer is short but the investigation is long.\n\n### Move 4: Pre-load with an auto-load table\n\nHub-and-spoke `CLAUDE.md` works for static, location-based context. But sometimes you want context to load based on *what the agent is doing*, not *where in the codebase it is*.\n\nWe built an auto-load table for this. At the top of our root `CLAUDE.md`, we have a small markdown table:\n\n```markdown\n## Auto-load table\n| Mention | Read |\n|---|---|\n| billing, stripe, payment, subscription | docs/billing.md |\n| auth, login, session, oauth | docs/auth.md |\n| websocket, realtime, streaming | docs/realtime.md |\n| voice, livekit, transfer | docs/voice-agents.md |\n```\n\nThe pattern: when a prompt mentions any of these keywords, the agent reads the corresponding doc into context before starting work.\n\nWe don't load the docs in `CLAUDE.md` itself, that would burn the budget on every session, even sessions that don't need them. We load them dynamically, only when relevant.\n\nThis is how I keep my root `CLAUDE.md` lean while still giving the agent rich context for specific subsystems.\n\n> \"I have my autonomous AI cluster updating its own `CLAUDE.md`. I honestly barely know what's in there these days.\"\n> — Hunter Hodnett, Chipp CTPO\n\n## The mental model\n\nPicture the context window as a single sheet of paper, fixed font size.\n\nWhen you read a file, you've copied that file onto the paper.\nWhen you run a grep, you've copied the result.\nWhen the agent reasons, it's writing on the paper.\n\nRun out of room and the paper gets folded. A cheap intern reads everything you wrote and replaces it with a paragraph summary on a fresh sheet. You keep working but you've lost the details.\n\nThe discipline of context engineering is engineering what gets written on the paper before it runs out, and never letting the cheap intern get involved.\n\n## Patterns we use every day\n\nBeyond the four core moves, here are the patterns that show up most in our daily work.\n\n### Fresh-context handoff via markdown\n\nPipeline stages communicate by writing markdown files to disk. Stage 1's last action is `Write plan.md`. Stage 2's first action is `Read plan.md`. Stage 1's context window is gone forever, but the distilled insight survives.\n\nThis is the same pattern as the sub-agent dilution move, applied to sequential work.\n\n### Three-strikes-then-rule for `CLAUDE.md`\n\nDon't add a rule to `CLAUDE.md` after a single mistake. Wait for the same class of mistake to happen three times. Otherwise your `CLAUDE.md` bloats with one-off lessons that never recur, and the truly important rules get diluted.\n\nThree strikes is a heuristic, not a hard rule. The point is to be conservative about what gets the elevated status of \"every-session context.\"\n\n### Hub-and-spoke directory loading\n\nPlace a `CLAUDE.md` in any subdirectory where the rules differ from the root. Claude Code automatically reads the nearest `CLAUDE.md` when it reads a file in that directory.\n\nWe have `CLAUDE.md` files in:\n\n- `src/db/`. ORM-specific rules (we use Kysely, not Drizzle; never let the agent forget)\n- `src/api/`. API conventions (Hono routing, error-handling patterns)\n- `src/components/`, design system rules (CSS variables only, never hex codes)\n- `tests/`, test framework conventions\n\nThe agent loads the right one without me having to tell it.\n\n### Kill the summary mid-stream\n\nWhen you see Claude write something like *\"I've now read several files. Let me summarize what I learned…\"* in the middle of an interactive session, stop it. That summary is about to land in your context as the canonical record of what the agent did. You want the *evidence*, not a pre-compaction.\n\nTell it: *\"Don't summarize. I want to see the actual results.\"*\n\nThis matters less in autonomous pipeline runs, you're not watching those, but the underlying principle is general: prefer raw artifacts over the agent's interpretation of artifacts.\n\n### Use the most powerful model: every time\n\nWhen teams ask me how to save money on token spend, the first thing I say is: don't.\n\nUse the most expensive model. Always. Even when it feels wasteful.\n\nThe reason is that frontier models hallucinate less, plan better, and finish work in fewer total tokens. A cheaper model in an autonomous setting will burn more total tokens chasing its own mistakes than a frontier model would have spent doing the work right the first time.\n\nThis is even more true in the early weeks after a model release. Frontier labs subsidize new models, they serve the highest-quality version at launch and gradually quantize them down to cheaper-to-serve versions over the following weeks. If you're going to do hard work with an autonomous agent, do it in the first weeks after a release, when the model is at its sharpest.\n\n### Write your context-engineering scars into auto-load docs\n\nWhen you encounter a context-engineering failure, say, the agent kept reading the wrong file because it didn't have enough context about a subsystem, don't put the lesson in your root `CLAUDE.md`. Write a doc into `/docs/`, add a row to the auto-load table, and move on.\n\nAuto-loaded docs are scoped to relevance. Root `CLAUDE.md` is global. Match the scope of the lesson to the scope of the file.\n\n### Capture every run's data\n\nEvery Claude Code session you run produces a record of how a frontier model reasoned about a real problem in your codebase. That's training data, the kind that, six months from now, you might want to fine-tune a cheaper specialized model on. The builders who treat their pipeline outputs as a strategic data asset, instead of throwing them away after each run, will end up with the only kind of moat that compounds in this industry.\n\nEven if you never do anything with the data, archive it. Storage is cheap. Past inference is irreplaceable.\n\n## How you know it's working\n\nYou know context engineering is working when:\n\n- The agent finishes work without compacting.\n- Re-running similar tasks gives consistent quality.\n- You can launch sessions, walk away, and come back to shipped code.\n- The agent stops asking you for context it should have inferred.\n- Your `CLAUDE.md` grows by a line or two per week, not per day.\n\nYou know it's not working when:\n\n- The agent hits compaction in the middle of routine tasks.\n- You see the same hallucinations repeatedly (this is your scar tissue not yet hardened into rules).\n- The agent reads the same files in every session because there's no doc layer to load them once and remember.\n- Token spend per task is going up, not down.\n\nIf you're in the second category: start with one core move at a time. Get the system prompt stable for KV cache hits. Then split your sessions one-context-per-goal. Then add sub-agents for any investigation that fills the budget. Auto-load tables are the polish; the moves above are the foundation.\n\n## What's next\n\nContext engineering is the foundational discipline of autonomous development, but it's only the first layer. Layered on top of it:\n\n- **[CLAUDE.md Architecture](/blog/claude-md-architecture)**: the hub-and-spoke pattern, scar tissue practice, and auto-load tables in detail.\n- **[Skills vs Sub-Agents](/blog/skills-vs-sub-agents)**: when to use which, and why the distinction matters.\n- **[MCP Is the USB-C of AI](/blog/building-your-first-mcp-server)**: building the senses that let your agent verify its own work.\n- **[Building a Self-Healing Bug Bot](/blog/self-healing-bug-bot)**: context engineering applied end-to-end in a production pipeline.\n\nIf you want to see what context engineering enables in production, start with [The Autonomous Development Manifesto](/blog/autonomous-development).\n\nIf you want to see what it looks like when it goes wrong, run an interactive Claude Code session for an hour without thinking about any of this. Watch your token spend. Watch the agent compact. Watch it hallucinate. That's the baseline. Everything above is what we do to escape it.\n\n**[Join the Alchemist waitlist →](/#waitlist)**"
}