Skills vs Sub-Agents: When to Use Each in Claude Code
Skills and sub-agents are the two tools you reach for when CLAUDE.md and hub-and-spoke aren't enough. They look similar, both give Claude specialized capability, but they're architecturally different in a way that determines whether they save your context budget or burn it.
Once your CLAUDE.md is dialed in and you’ve sprinkled directory-scoped CLAUDE.md files through your codebase, you’ll start hitting a different class of problem. There are kinds of knowledge that don’t fit either pattern.
Some knowledge is too big to put in CLAUDE.md, it would bloat every context window with information you only sometimes need. Some knowledge requires work to retrieve, not just to read. Some tasks are fundamentally side quests: you don’t want them polluting your main agent’s context, but you do want them done.
Claude Code has two features for this: skills and sub-agents. They look superficially similar, both let you give Claude specialized capability, and most people I talk to use them interchangeably for a few weeks before they figure out the actual difference.
The actual difference is one sentence:
Skills are knowledge the agent reads while it’s working. Sub-agents are work the agent delegates to a separate instance.
Once you internalize that, the rest writes itself. But it took me a quarter to figure out, so let me save you the time.
What a skill is
A skill is a markdown file with a name, a description, and a body. The body is whatever you want, usually instructions, examples, or domain-specific rules. The description is what Claude reads to decide whether to invoke the skill at all.
When the description matches what the user asked for, the skill loads. The body of the skill becomes part of the context window for the rest of that session, and the agent has it as reference while it works.
The mental model is a cheat sheet on your desk. You’re working at a desk. There’s a sticky note on the desk. While you work, you glance at the sticky note for the formulas you can’t remember.
I have a skill called chipp-design that contains every visual convention of the Chipp brand: our color tokens, our spacing scale, our component library, when to use which animation, how to do glass-morphism in a way that’s still legible. The description tells Claude: “Build UI components and pages following the Chipp brand design system for Svelte 5. Use this skill when creating Svelte components, pages, or UI elements.”
When I ask the agent to build a settings page, the description matches, the skill loads into context, and the agent now has the entire design system as reference. Without the skill, the agent would invent tokens, hardcode hex colors, and ship something that doesn’t match the rest of the platform. With it, the output looks like our team built it.
That’s a skill working at its best. The cost: tokens. The skill body, which can be substantial, is now occupying space in my context window. Every other tool call has less room.
What a sub-agent is
A sub-agent is a separate Claude Code session that the main agent spawns, gives a starting prompt, and waits for a result. The sub-agent has its own context window. It runs its own tool calls. When it’s done, it sends a summary back to the calling agent, and only the summary lands in the main context window.
The mental model is sending an intern to the library. You’re sitting at your desk. You realize you need to know something obscure, say, what’s the current state of all our Kubernetes pods. Rather than walking to the library yourself, you send an intern. They go off, do the work, come back, and hand you a one-page summary. You absorb the summary in seconds. The intern absorbed every page of the encyclopedia.
I have a sub-agent called infra-ops. Its system prompt knows everything about our Kubernetes cluster: which kubectl commands are safe, where the production logs live, how to read deployment YAML, what’s normal versus alarming. When the main agent runs into something like “pods are restarting in production,” it doesn’t try to investigate itself, it spawns the infra-ops sub-agent.
The sub-agent fills its own 1M-token context window with raw kubectl output, log excerpts, deployment manifests. It correlates them, finds the actual issue, and reports back to the main agent: “Pods are OOM-killing because the last deploy lowered the memory limit too aggressively. Recommend bumping requests.memory from 512Mi to 1Gi.”
That two-sentence summary lands in my main context window. The 950k of garbage that was needed to derive it stays in the sub-agent’s window, where it can’t pollute anything.
The token math
Both tools cost tokens. They don’t cost them in the same way.
A skill spends from your current context budget. The skill body is loaded into the active session’s context window. If your skill is 8,000 tokens long, you have 8,000 fewer tokens for everything else this session. If you load three skills, that’s 24,000 tokens gone.
A sub-agent spends from a separate context budget. The sub-agent has its own window, its own tool calls, its own model invocations. From the calling agent’s perspective, it spent the cost of one tool call: “spawn sub-agent, here’s the prompt, get back a summary.” The sub-agent might have spent 150,000 tokens internally to produce that summary, but the calling agent doesn’t see them.
This matters more than you’d think. On a complex task, the main agent might spawn five sub-agents over the course of its run. Each sub-agent fills 100,000–200,000 tokens of its own context window doing real work. The main agent, meanwhile, accumulates the five summaries, maybe 5,000 tokens total. The main agent stays nimble. It doesn’t compact. It doesn’t lose track of why it started the task.
If you tried to do the same work with five skills loaded into the main agent, you’d have to load all the relevant context for all five domains into the same window. The main agent would either compact halfway through or run out of budget completely.
This is the central trick. Skills concentrate context in the main agent. Sub-agents distribute context across separate agents.
“If you’d describe the task as ‘go figure out X and tell me,’ it’s a sub-agent. If you’d describe it as ‘while you work, remember X,’ it’s a skill.” — Hunter Hodnett, Chipp CTPO
When to reach for which
The decision rubric is simpler than I expected once I had it.
Use a skill when:
- You need the knowledge while the agent is actively coding, referencing it dozens of times during the work.
- The knowledge is short enough that loading it doesn’t blow the budget.
- The work is in the agent’s main domain (writing the feature, not debugging infra).
Examples: design systems, code style guides, API conventions, common pitfalls for the current subsystem.
Use a sub-agent when:
- The task is fact-finding or side investigation, read a bunch of stuff, return one insight.
- The task involves a lot of tool calls that the main agent doesn’t need to see.
- The task is in a separate domain (debugging Kubernetes from a feature-development session).
- The task can fail without affecting the main work.
Examples: debugging production issues, researching how a third-party library works, auditing the codebase for instances of a deprecated pattern, summarizing a long document.
A test that almost always works: if you would describe the task as “go figure out X and tell me,” it’s a sub-agent. If you’d describe it as “while you work, remember X,” it’s a skill.
Real examples from Bug Bot
Three live examples from our autonomous cluster.
chipp-design (skill)
Loads when the agent is doing UI work. About 6,000 tokens of design rules, component library reference, and scar-tissue notes about CSS gotchas. Description: “Build UI components and pages following the Chipp brand design system for Svelte 5.”
The agent reads it dozens of times during a UI ticket, checking spacing tokens, color palette, component conventions. Loading it as a skill means the reference is there the whole time, not behind a tool call.
If we tried to handle this with a sub-agent, “go figure out our design system and tell me what to use”, the agent would have to dispatch the sub-agent, wait for the summary, and then realize it needed more detail and dispatch again. Skill is the right tool.
infra-ops (sub-agent)
Loads when the main agent has a production issue and needs investigation. The sub-agent has its own runbook (kubectl commands, log query patterns, common failure modes), its own tools (kubectl MCP, Loki MCP), and its own context window.
The main agent dispatches it with one tool call: “Investigate why pods restarted in the last hour.” The sub-agent runs 47 tool calls, fills 200k of context window, correlates everything, and returns a one-paragraph diagnosis.
If we tried to handle this with a skill, “here’s everything about our K8s cluster, now investigate”, the main agent’s context would fill up with raw kubectl output and lose its grip on the actual ticket. Sub-agent is the right tool.
feature-deep-dive (sub-agent)
Loads when the main agent needs to understand how an existing feature works before modifying it. The sub-agent reads the feature’s code, related tests, recent git history, and any relevant docs, then returns an architecture summary.
The main agent gets the summary in its context, applies it to the modification work, ships the change. The 100k of code-reading the sub-agent did doesn’t pollute the main session.
This is one of our most-dispatched sub-agents. Almost every non-trivial feature ticket dispatches it as the first step.
Anti-patterns I’ve shipped and regretted
Three patterns I’ve burned tokens learning are wrong.
The omnibus skill
I had one called everything-about-our-platform that loaded a 30,000-token document with our entire architecture. It was easier than thinking about which skill to write. The agent would load it for every task, including ones that didn’t need it, and we’d lose 30,000 tokens of budget every session.
Splitting it into focused skills (chipp-billing, chipp-design, chipp-routing, chipp-auth) meant only the relevant 5,000 tokens loaded at a time. Big improvement.
If your skill description starts with “general knowledge about…”, you’re building an omnibus. Split it.
The sub-agent for trivial work
Spawning a sub-agent has overhead, a separate model invocation, the round-trip of starting a new session, the cost of sending the initial prompt. If the task is small, just do it inline.
Sub-agents pay off when the task would fill 50,000+ tokens of context. They cost more than they save when the task would fill 500.
Heuristic: if you’d be embarrassed to interrupt a colleague to ask the question, don’t dispatch a sub-agent for it.
Skills used as sub-agents
This is the most common mistake I see. Someone wants the agent to “go check the database for orphaned records.” They write a skill called database-investigation with instructions and example queries. The skill loads, then the main agent runs the queries, and now its context window is filling up with raw database rows.
They wanted the work done elsewhere. They gave themselves a cheat sheet instead.
The fix is a sub-agent: spawn one, let it run the queries, have it return “found 47 orphans, here are the IDs.” Main context window stays clean.
The pattern to internalize: skills are reference material the main agent uses to do work itself. Sub-agents are work delegated to a separate agent, with only the result coming back.
The simplest decision
If you want the agent to know something, write a skill.
If you want someone else to know something, spawn a sub-agent.
That’s it. That’s the whole post.
The deeper the autonomous cluster gets, and the Bug Bot pipeline leans on both heavily, the more these two patterns become the basic structural elements of every workflow. We use both, constantly. The skills carry the patterns we want consistent across all our work. The sub-agents carry the heavy lifting that would otherwise crush our main agent’s context.
Get this distinction right and your context budget stops being the bottleneck on everything you ship.
If you want the foundational discipline these patterns build on, read Context Engineering: The Skill That Turns Claude Into a Production Co-Developer.
If you want the architecture for managing skills and sub-agents at scale, read CLAUDE.md Architecture.
If you want to see skills and sub-agents at work in a production cluster, read Building a Self-Healing Bug Bot.