Key takeaways
- A skill is reusable instructions - a SKILL.md file whose body loads only when used, so it costs almost nothing at rest
- A subagent is delegated work - it runs in a fresh, isolated context window and returns only a summary
- Parallel agent is not a real primitive - the phrase just means running several subagents at the same time
- Choose by what you are protecting - context, repeatability, or wall-clock time
What is the difference between a subagent and a skill? People ask it constantly, usually right after a Claude Code session has done something slow and expensive that a smaller move would have handled.
Here is the answer before the detail. A skill is reusable instructions. A subagent is delegated work. A parallel agent is not a separate thing at all; it is just several subagents running at once. Three different answers to three different problems. Claude Code makes it easy to grab the wrong one, because all three feel like the same act of getting the AI to do more.
The deeper question, and the one worth the rest of this post, is what each one costs. The cost is where a wrong choice actually hurts. A skill you never needed is a rounding error. A subagent you did not need is a whole context window you paid to spin up and throw away. So the goal is not to memorize definitions. It is to know which lever you are pulling and what it bills you.
The three things people conflate
Three names, three different mechanisms. A skill is a SKILL.md file: a written instruction set that Claude loads when it is relevant, or when you type its slash command. The body sits dormant and costs almost nothing until something invokes it. A subagent is a unit of delegated work. Claude hands a task to a worker that runs in its own fresh context window, with its own tools and permissions, and sends back only a summary. The main conversation never sees the worker’s mess. A parallel agent is not a configured object at all. It is a description of timing: several subagents running at the same time instead of one after another. You do not create a parallel agent. You run subagents in parallel. Conflate these three and you will reach for an isolated worker when a skill would do, or run things in sequence that should have run at once.
The reason the confusion persists is that all three are reached the same way, by talking to Claude in plain language. You do not import a library or call a constructor. You say “review this code” and Claude might apply a skill, might delegate to a subagent, might do neither. The mechanism is invisible at the moment you trigger it. That is good for ease of use and bad for cost intuition, because the thing you cannot see is the thing you cannot budget for.
What a subagent is
A subagent is the heavyweight option, and it is heavy for a reason. The official Claude Code subagents documentation describes it plainly: each subagent runs in its own context window with a custom system prompt, specific tool access, and independent permissions. It starts fresh. It does not see your conversation history, the skills you have already invoked, or the files Claude has already read. Claude writes a short delegation message describing the task, and the subagent works from there.
That isolation is the entire point. When a side task would flood your main conversation with search results, logs, or file contents you will never look at again, a subagent does that work somewhere else and hands back only the summary. Your main context stays clean. In long Claude Code sessions, that cleanliness is worth real money, because a context window that fills up forces compaction, and compaction is where detail quietly goes missing.
It is worth knowing the naming history here, because it trips people up. The tool that spawns this kind of worker used to be called the Task tool. As of Claude Code version 2.1.63 it was renamed to the Agent tool, and old Task(...) references still work as aliases. A custom subagent, the kind you define once and reuse, is a Markdown file in .claude/agents/ with frontmatter for its name, description, tools, and model. You can even point a subagent at a cheaper model like Haiku to control cost. If you want the deeper comparison of one-off delegation versus a defined, reusable specialist, I wrote that up separately in the Task tool versus subagents piece.
One nuance that matters for the cost conversation: there is a variant called a fork. A fork is a subagent that inherits the entire conversation so far instead of starting fresh. It trades away the input isolation, since it sees everything the main session sees, but its own tool calls still stay out of your conversation. Use a fork when a clean subagent would need so much background that re-explaining the situation costs more than the isolation saves.
Parallel is not a primitive
Here is the part that the phrase “parallel agent” gets wrong. There is no parallel agent. There is no setting, no file, no object you configure with that name. What exists is subagents, and subagents can run two ways: in the foreground, where the main session blocks and waits, or in the background, where they run concurrently while you keep working. Background subagents are the concurrency. “Run parallel research across the authentication, database, and API modules” is just three subagents started at once.
So when someone says parallel agent, translate it in your head to “several subagents at the same time.” That translation matters because it tells you the cost. Running three subagents in parallel does not cost less than running them in sequence. It costs the same in tokens, three separate context windows, and saves you only wall-clock time. Parallelism buys speed, not efficiency. If you were hoping that “going parallel” would make a job cheaper, it will not. It will make it faster and bill you the same.
Two larger structures sit beyond a single session, and they are worth knowing. Background agents let you run many independent Claude Code sessions at once and watch them from one place. Agent teams go further and let separate sessions communicate. Those are separate primitives and a real step up in commitment. But they are a step up in commitment, and most people reaching for “a parallel agent” do not need them. They need two or three subagents started together inside the session they already have. If you are trying to get this right across a team and the token bill is the symptom that sent you looking, my door is open.
What a skill is
A skill is the lightweight option, and the contrast with a subagent is the whole story. A skill is a SKILL.md file: YAML frontmatter that tells Claude when the skill applies, plus Markdown instructions Claude follows when it runs. Custom slash commands were merged into skills, so a skill is also how you build your own /command. Where a subagent is a worker, a skill is a recipe.
The cost difference is structural. The official Claude Code skills documentation puts it directly:
“Unlike CLAUDE.md content, a skill’s body loads only when it’s used, so long reference material costs almost nothing until you need it.” — Claude Code documentation
Read that carefully, because it explains the resting cost. A skill’s short description sits in context so Claude knows the skill exists and when to reach for it. The full body, which can be hundreds of lines of procedure, does not load until the skill is actually invoked. A subagent spends a whole context window the moment it runs. A skill spends almost nothing until the moment you need it, and even then the content enters the conversation once and stays, rather than being paid for again on every turn. For anything you do repeatedly, a checklist, a house style, a deployment procedure, a skill is the correct home. Pasting the same instructions into chat every time is the pattern a skill exists to kill. This sits alongside plugins and connectors as one of the ways Claude is extended, and I have mapped how those plugins, connectors and skills relate if you want the wider picture.
Skills and subagents are not rivals. They compose. A skill can be set to run in a forked subagent with one frontmatter line, which gives you a reusable recipe that also executes in isolation. The point of telling them apart is not to pick a side. It is to know which cost you are signing up for.
Which one to reach for
The decision comes down to one question asked three ways. What are you protecting?
If you are protecting repeatability, write a skill. The signal is that you keep typing the same instructions, or a section of your CLAUDE.md has quietly turned from a fact into a procedure. A skill captures that once and costs you almost nothing until it fires. This is the most underused of the three, because it does not feel like “using an agent,” and people came looking for an agent.
If you are protecting context, spawn a subagent. The signal is a side task that will dump material into your main conversation that you will never reference again: a wide search, a pile of logs, the full text of twenty files. Send that work into an isolated window and take back the summary. The cost is real, a fresh context window, so do it when the isolation is worth more than the spin-up.
If you are protecting wall-clock time, run subagents in parallel. The signal is several independent investigations that do not depend on each other’s results. Start them together. Just remember the bill is the same as running them one by one; you are buying speed, not savings. And if none of those signals is present, the real answer is the fourth box on the diagram: do not reach for any of them. Just ask Claude. The cheapest agent is the one you did not spawn.
The mistake worth avoiding is treating “more machinery” as “more capable.” A skill, a subagent, and three subagents in parallel are not a ladder you climb toward better results. They are three tools on a wall, and the craft is the same as any workshop. You do not pick up the heaviest tool because the job feels important. You pick up the one shaped like the problem in front of you.



