· AI

CEO of Tallyfy · AI advisor at Blue Sheen for mid-size companies

How the general-purpose agent works in Claude Code

The general-purpose agent in Claude Code is not the main agent and not something you pick. It is a built-in subagent Claude routes to on its own for complex, multi-step work. It inherits your model and, by default, runs in its own fresh context that Claude briefs with a short summary. This post explains how it actually works and what that costs you.

You ask Claude Code to fix a bug. It reads six files, traces the logic, makes the change, runs the tests, and reports back. The whole thing feels like one continuous conversation with one assistant.

It usually was not. Somewhere in there, Claude handed the real work to its general-purpose agent, and you never saw the handoff.

This is the thing to understand, stated plainly. The general-purpose agent is not the main Claude you are talking to. It is a built-in subagent that Claude delegates to, on its own, when a task is complex enough to warrant it. You do not summon it. You do not configure it. Most people using Claude Code every day have never typed its name, and it has still done a large share of their work. So it is worth knowing what it is, when it fires, and what it quietly costs, because the agent doing your hardest tasks should not be the one you understand least.

It is just a subagent

The general-purpose agent is a built-in subagent. That is the whole definition, and it dissolves most of the confusion. It is not a separate product, not a smarter mode, not the “real” Claude behind a curtain. It is the same subagent machinery I described in what a subagent is and set against parallel agents and skills, with one specific built-in configuration that ships with Claude Code.

The official documentation describes it as “a capable agent for complex, multi-step tasks that require both exploration and action.” Three properties define it. Its model inherits from your main conversation, so it is exactly as capable, and exactly as expensive per token, as the Claude you started with. Its tools are all tools, the full set, because a general-purpose worker cannot know in advance which ones it will need. Its purpose is the broad middle of real work: complex research, multi-step operations, code modifications. Where the Explore subagent only reads and the Plan subagent only researches in plan mode, the general-purpose agent is the one allowed to explore and then change things. That breadth is the entire reason it exists, and also the reason Claude reaches for it so often. Most hard requests are some mix of looking and doing.

It helps to be precise about what “subagent” means here, because the word gets stretched. A subagent is not a separate chat window. It is not a second Claude you can talk to. It is a worker that the main Claude starts, hands a task, and waits on. The worker runs in its own context, does its job, and returns a single result. The main Claude reads that result and carries on. From your seat, you see a tool call go out and an answer come back. The subagent never speaks to you and you never speak to it. That one-way structure is the whole point. It keeps the main conversation, the one you read, free of the dozens of intermediate steps the worker took to get its answer.

Why does the tool set being “all tools” matter so much? Because tool access is the difference between a worker that can finish a job and one that gets stuck halfway. Picture a worker handed a task that turns out to need a file edit, but it only has read tools. It can find the problem and describe the fix, then stop, because it cannot apply it. The general-purpose agent never hits that wall. It can read, search, edit, run commands, fetch pages, whatever the task turns out to need. The cost of that generality is that it is heavier to spawn than a narrow worker, and we will get to that. The benefit is that Claude can route almost any shape of task to it and trust it will not come back blocked.

So when someone asks how the general-purpose agent works, the plain first answer is: like any other subagent. The interesting parts are the three that follow.

The five built-in subagents

The general-purpose agent makes more sense once you see the company it keeps. Claude Code ships with a small set of built-in subagents, each one a worker with a fixed job, and the official docs are clear that “Claude Code includes built-in subagents that Claude automatically uses when appropriate.”

Explore is the read-only researcher. It searches and understands a codebase without changing anything, and it deliberately skips your CLAUDE.md and git status to stay fast and cheap. Plan does research while you are in plan mode, gathering context for a plan without touching files. General-purpose is the one that both explores and acts. Then there are two narrow helpers you will almost never think about: statusline-setup, which runs on Sonnet when you configure your status line, and claude-code-guide, which runs on the cheaper Haiku model to answer questions about Claude Code itself.

Look at how the models differ across that set, because it tells you something. The two trivial helpers run on cheaper models. claude-code-guide is on Haiku, statusline-setup on Sonnet. That is a deliberate match of model to job: answering a question about a keyboard shortcut does not need the strongest reasoning, so it does not get it. The general-purpose agent gets no such treatment. It runs whatever you are running. If the cheap helpers prove anything, it is that Claude Code can and does pick smaller models for small jobs. The general-purpose agent stays at full strength because the jobs it gets are not small. Keep that contrast in mind. It is the clearest hint that this worker is built for weight, not thrift.

There is also a why behind Explore skipping your CLAUDE.md and git status. Those files are useful context for doing work, but they are pure overhead for a worker whose only job is to look. Every token Explore spends loading project rules it will not act on is a token wasted. So the design strips them out. The general-purpose agent does the opposite. It needs that context, because it is going to act, and acting against a project means knowing the project’s rules. The split is not arbitrary. Each worker carries exactly the context its job requires and nothing more, and that discipline is part of why the cheap workers stay cheap.

Notice the pattern. Each built-in subagent is a specialist with a deliberately narrow remit, except one. General-purpose is the generalist by design, the worker with no specialty and therefore no gaps. It is the catch-all, and a catch-all is exactly what you want as the default for the unpredictable shape of real tasks. Explore and Plan are scalpels. General-purpose is the hand that holds whichever tool the moment needs.

This is also why you should not reach for a named subagent when you are not sure which one fits. If a task is clearly read-only research, Explore is the right and cheaper call. If it clearly needs to look and then change things, general-purpose is correct. But the grey zone is wide, and in the grey zone the generalist wins, because a scalpel used for the wrong job leaves you stuck while a general tool used for a precise job merely costs a little more. The asymmetry favors the catch-all. That is a design choice, and it is the right one for software work, where you often do not know a task’s true shape until you are partway into it.

How Claude Code routes a request to the Explore, Plan, or general-purpose subagent, or handles it directly

When Claude routes to it

You rarely invoke the general-purpose agent by name. Claude routes to it, and the routing rule is specific. The documentation says Claude delegates to general-purpose “when the task requires both exploration and modification, complex reasoning to interpret results, or multiple dependent steps.”

Read those three triggers as a single test: is this task too big and too branching to keep tidy in the main conversation? A one-line edit fails that test, so Claude just does it directly. “Find every call site of this function, work out which ones are unsafe, and fix them” passes it on all three counts, so Claude delegates. The decision is about shape, not difficulty. A task can be hard and still stay in the main session if it is linear and self-contained. A task can be moderate and still get delegated if it sprawls.

It is worth sitting with the difference between shape and difficulty, because most people guess wrong about it. Say you ask Claude to rewrite a tricky algorithm in one file. That is hard. It needs care. But it is one file, one change, one line of reasoning, and Claude can hold all of it in the main conversation without the context getting messy. No delegation. Now say you ask Claude to rename a config key used in eleven places. Each edit is easy. A child could see what to change. But the work branches: find the uses, check each one, edit each one, confirm nothing broke. That sprawl is what triggers a handoff. Difficulty lives in a single step. Shape lives in how the steps multiply and depend on each other. Claude routes on shape because shape, not difficulty, is what fills a context window with noise.

Think about what the main conversation would look like without this routing. Every file the worker opened, every search it ran, every command and its output would land in the session you are reading. A task that touched fifteen files would bury your conversation under fifteen files of raw text. You would lose the thread. The next thing you asked would be answered by a Claude wading through that debris. Delegation is the fix. The worker absorbs all that intermediate mess in its own context and hands back a clean summary. Your conversation stays readable, and the Claude you are talking to stays sharp because its context is not clogged.

This is why the handoff is invisible and why that is mostly fine. Claude is making a context-management decision on your behalf: keep the noisy, file-heavy, multi-step work out of the conversation you are actually reading, and bring back the result. When it routes well, you get a clean main session and a finished task. When it routes badly, usually by delegating something small enough that the overhead was not worth it, you pay for a subagent you did not need.

What goes wrong if you never notice this is happening? You lose the ability to tell a good session from a wasteful one. Two sessions can produce the same result and cost very differently, depending on whether Claude delegated work that deserved it or work that did not. A team that has never looked at this will treat all sessions as the same, and quietly absorb the cost of bad routing forever, because it is invisible by default. The fix is not to fight the routing. It is to learn to read it. If you are trying to get a team consistent about when this delegation helps and when it just adds cost, reach out and I am happy to talk it through.

A general-purpose subagent running as a Task in the Claude Code terminal

An experimental fork mode

Here is a part that surprises even experienced Claude Code users. By default, when Claude routes work to the general-purpose agent, that agent starts the way every subagent does: with a fresh, empty context. It does not see your conversation. Claude compresses the situation into a short delegation brief, and the worker works from that. The isolation is the entire point.

There is also an opt-in way to invert that, and it is worth knowing. Claude Code has an experimental fork mode. The documentation is explicit that it is experimental, needs Claude Code v2.1.117 or later, and stays off until you turn it on by setting CLAUDE_CODE_FORK_SUBAGENT=1. Turn it on and one of the things that changes is this: Claude “spawns a fork whenever it would otherwise use the general-purpose subagent. Named subagents such as Explore still spawn as before.”

A fork is a subagent that inherits the entire conversation so far, rather than starting fresh. This is a real departure from the textbook picture of a subagent. A named subagent like Explore begins empty and has to be told the task from scratch in a delegation message. A fork already knows everything your main session knows, because it is a copy of it. The isolation it keeps is one-directional: the fork’s own tool calls and file reads stay out of your conversation, so your main context still does not fill with debris, but the fork itself is not working blind.

Consider what a delegation message can and cannot carry. When a named subagent starts empty, the main Claude has to compress the situation into a written brief: here is the task, here is the relevant background, go. That brief is a summary, and every summary drops detail. If your conversation has built up forty exchanges of nuance about a particular module, no delegation message recaptures all forty. The worker gets the gist and loses the texture. For a narrow read-only job that is fine, because the job does not need the texture. For the sprawling, deeply-contextual work the general-purpose agent handles, the lost texture is exactly what would have made the work correct. A fork sidesteps the whole problem. There is no brief to write and nothing to compress, because the worker just is a copy of the session.

So the fork buys two things at once. It removes the cost of writing a long delegation message, and it removes the errors that come from an incomplete one. Picture a worker handed a half-accurate summary of a thorny situation. It will do competent work against the wrong picture and hand back a result that looks right and is subtly off. That failure is quiet and expensive to catch. A fork cannot fail that way, because it never had a summary to be wrong about. It inherits the real thing.

That design makes sense for the general-purpose case. The tasks Claude routes here are exactly the ones where re-explaining the situation in a delegation message would be expensive and lossy. Inheriting the conversation means the fork starts with full context and wastes nothing re-establishing it. It is the right trade for complex, deeply-contextual work. It also means that, in fork mode, the general-purpose agent is less “isolated worker” and more “a second instance of this exact session, sent to do the messy part.”

There is a cost surprise here, and it runs the opposite way to the obvious guess. Because a fork’s system prompt and tools are identical to the parent, its first request reuses the parent’s prompt cache, which the docs note makes forking cheaper than spawning a fresh subagent for work that needs the same context. So the inherited conversation is not the expensive thing many assume it to be. If you do turn fork mode on, the quality of your earlier conversation also feeds straight into the fork’s work: a muddled session produces a fork that inherits the muddle, a clear one a fork that starts clear. The care you put into the main session is carried along, not wasted, when a fork picks the work up.

What it costs you

The most common wrong idea about the general-purpose agent is that it is a cheap escape hatch, a way to offload work to something lighter. It is not. Its model inherits from your main conversation. If you are running Opus, the general-purpose agent is running Opus. There is no quiet downgrade to a cheaper model, and no token discount.

It is worth being clear on why people expect a discount that is not there. The mental model many bring to subagents is that of a junior worker: you hand the small stuff down to someone cheaper and keep your own time for the hard parts. That picture fits the two trivial helpers, which do run on smaller models. It does not fit the general-purpose agent at all. The general-purpose agent is not a junior. It is a clone of you, sent to a different room. A clone does not cost less per hour than the original. It costs exactly the same, because it is the same. Once you hold that picture instead of the junior-worker one, the pricing stops being a surprise.

So the cost is real, and it comes from what the agent is, not from any copy of your conversation. It reads files and runs tools, and every one of those tokens is billed, just in its own window instead of yours. And it runs your full model the entire time. The general-purpose agent is the most expensive of the built-in subagents to invoke, precisely because it is the most capable and the least stripped-down: full model, all tools, nothing traded away for thrift.

What drives that cost is the task and the model, not some hidden copy of your session. A big, branching task spends more because it does more: more files read, more tools run, more thinking on your full model. A small one spends little. That is the real variable to watch, and it is the same one you would watch for any work the main session did itself. The figure to keep in mind is the model, not the length of the conversation that came before, because by default the agent was handed a brief, not the whole transcript.

What you buy for that cost is real, though. You buy a main context window that stays clean while a hard, sprawling task gets done somewhere else. That is worth paying for in a long session. It is worth less in a short one. Think about the trade in plain terms. In a long session, your main context is your scarce resource; protecting it is worth a real price, because a clogged context degrades everything you do next. In a short session, your context was never under threat, so the protection buys you little while the cost stays the same. Same mechanism, different value, and the variable is session length. That is why the same delegation can be a smart move and a wasteful one depending only on when it happens.

The practical takeaway is not to avoid the general-purpose agent, since you mostly cannot, Claude routes to it for you. It is to recognize when it has fired, by watching for the Agent tool call in your terminal, and to notice whether the task that triggered it was actually big enough to deserve it. Once you can see the handoff, you can read your own sessions properly. You start to notice the pattern of when delegation paid off and when it did not, and that pattern is the thing worth learning, far more than any single rule. The agent doing your hardest work should not be the one you never think about. Now you will.

About the Author

Amit Kothari is an experienced consultant, advisor, coach, and educator specializing in AI and operations for executives and their companies. With 25+ years of experience, he is the Co-Founder & CEO of Tallyfy® (raised $3.6m, the Workflow Made Easy® platform) and Partner at Blue Sheen, an AI advisory firm for mid-size companies. He helps companies identify, plan, and implement practical AI solutions that actually work. Originally British and now based in St. Louis, MO, Amit combines deep technical expertise with real-world business understanding. Read Amit's full bio →

Disclaimer: The content in this article represents personal opinions based on extensive research and practical experience. While every effort has been made to ensure accuracy through data analysis and source verification, this should not be considered professional advice. Always consult with qualified professionals for decisions specific to your situation.

Related Posts

View All Posts »
The built-in agent types in Claude Code

The built-in agent types in Claude Code

Claude Code ships with five built-in agent types: Explore, Plan, general-purpose, statusline-setup, and claude-code-guide. Most people know two of them. The other three run constantly and shape how much your sessions cost. This is the full catalog, what each one is for, and why knowing them changes how you read your own terminal.

What is a subagent in Claude Code

What is a subagent in Claude Code

A subagent in Claude Code is a specialized worker that runs in its own fresh, isolated context window, with its own tools and permissions, and reports back only a summary. It is how Claude does a noisy side task without flooding your main conversation. Here is what a subagent is, what file defines it, and when it earns its cost.

Subagent vs parallel agent vs skill in Claude Code

Subagent vs parallel agent vs skill in Claude Code

Subagent, parallel agent and skill get used as if they mean the same thing in Claude Code. They do not. A skill is reusable instructions that cost almost nothing until invoked. A subagent is delegated work in a fresh isolated context. Parallel agent is not a primitive at all. Picking the wrong one wastes tokens or floods your context.

When to use a dynamic workflow

When to use a dynamic workflow

A dynamic workflow in Claude Code runs up to sixteen subagents at once and a thousand across a job. That power is wasted on most tasks. This is the decision I use before reaching for one: when a single agent wins, when a dynamic workflow earns its cost, and when the answer is to not automate at all.

Dynamic workflows: parallel verification at scale

Dynamic workflows: parallel verification at scale

Dynamic workflows in Claude Code run tens to hundreds of subagents that check each other before anything reaches you. The parallelism is not the interesting part. The verification is. Here is how I am using one to re-verify 250 posts on this site, and when it earns its cost.

How to debug Claude Code subagents

How to debug Claude Code subagents

When a Claude Code subagent fails, you cannot open it and look inside. It ran in its own isolated context and handed back a summary. Debugging a subagent is the skill of reading that summary, recognizing context-isolation failures, and designing subagents that report enough to be diagnosed. Here is how to do it.

AI advisory services via Blue Sheen.
Contact me Follow 10k+