Claude Code - When to use task tool vs subagents

Quick answers

Why does this matter? Tasks are ephemeral workers, subagents are persistent specialists - Tasks spin up lightweight Claude instances for one-off parallel work, while subagents maintain configurations across sessions

What should you do? Each approach carries a 20k token overhead cost - Both Tasks and subagents start with roughly 20,000 tokens of context loading before your actual work begins

What is the biggest risk? Parallelism caps at 10 concurrent operations - You can queue more, but only 10 Tasks or subagents run simultaneously, executing in batches

Where do most people go wrong? Context isolation is both the strength and the weakness - Separate 200k token windows prevent pollution but require careful orchestration to share results

The confusion that costs you speed

Use Tasks for parallel file searches. Use subagents for code review.

Done. Blog post over.

Except that’s what everyone says, and then you watch your token count explode while Claude spawns 50 Tasks to read three files. Or you carefully configure a subagent that can’t spawn its own workers, leaving you wondering why your “parallel” processing feels so… sequential. I’ve felt real frustration sitting there watching usage balloon past 160k tokens for work I expected to cost 3k.

Users are reporting patterns where subagents consume 160k tokens for work that takes 3k in the main context. The documentation covers the basics but not these edge cases. The official best practices help, though they don’t address token overhead in detail.

The Task tool and subagents aren’t just different interfaces to the same thing. They’re fundamentally different execution models with opposing strengths. And most people are using them backwards. It’s another example of how enterprises fragment their AI implementations instead of thinking about the whole picture.

What the Task tool actually does

The Task tool doesn’t create “subagents.” It spawns ephemeral Claude workers. Think temporary contractors who show up, do one specific job, then vanish. Each Task gets its own 200k context window, isolated from everything else.

One naming note before we go further: Claude Code renamed this tool from Task to Agent in version 2.1.63. Older Task(...) calls still work as aliases, so the name you use does not matter much. The behavior does, and that is what this post is about, so I use both names interchangeably here.

Watch what actually happens when you run multiple Tasks:

# What you think happens:
# Task 1 starts -> Task 2 starts -> Task 3 starts -> all run together

# What happens:
# Batch 1: Tasks 1-10 start -> all must complete
# Batch 2: Tasks 11-20 start -> all must complete
# Batch 3: Tasks 21-30 start...

Community testing shows Claude doesn’t dynamically pull from the queue as Tasks complete. It waits for the entire batch to finish before starting the next one. The parallelism level caps at 10, according to user reports.

Tasks are fast for the right job. Need to search for a pattern across 500 files? Spawn 10 parallel Tasks, each handling 50 files. They can’t talk to each other (that’s the point), but they all report back to you. The main thread stays clean while the workers dig through the mess.

The problem? Each Task starts with that 20k token overhead. Your “quick file search” just cost you a painful 200k tokens before any actual work began. Active multi-agent sessions can consume 3-4x more tokens than single-threaded operations. This is where cost optimization strategies matter most. For subscription users, understanding this overhead is one of many practical techniques for reducing Claude costs without changing your plan tier.

Want a second pair of eyes on your situation? Blue Sheen is built for this.

Subagents aren’t what you think

Subagents aren’t faster Tasks. They’re not even really “sub” anything.

Subagents are specialized Claude instances with their own system prompts, tool permissions, and persistent configurations. Think department heads in your organization. The Security Reviewer, the Test Writer, the API Designer. They exist as Markdown files in your .claude/agents/ folder, ready to be called into service. For the ground-up definition of the primitive, what a subagent is in Claude Code is the place to start; this post assumes it.

What most people miss: the documented rule has been that subagents can’t spawn other subagents. Mind you, this limitation is by design, not a bug; the docs say so plainly in the context of plan mode, and the point is to stop runaway nesting. For most of this tool’s life, when a subagent reached for the Agent tool, it got nothing. No nested hierarchies, no recursive task decomposition. One level of delegation. (Update, June 2026: the v2.1.172 changelog notes that subagents can now nest, up to five levels deep. So treat the flat-rule wording above as the old default rather than a law of physics. The cost reason behind it has not changed though: every level you add multiplies the per-worker overhead, so deep nesting is rarely the cheap option even when it is allowed.)

What they can do now: background subagents run concurrently while you keep working. They run with the permissions already granted in the session and auto-deny anything that would otherwise prompt, so they execute without blocking your main thread. Parallel execution without the communication overhead of Tasks.

That constraint creates proper clarity. Your main Claude instance becomes an orchestrator, and subagents become specialists. The code reviewer doesn’t suddenly decide to refactor your entire codebase. It reviews code. That’s it. Does it need to do more? No.

The real power is consistency. Configure a subagent once, use it across every project. Community-shared security-auditor subagents demonstrate how standardized configurations can catch common OWASP Top 10 vulnerabilities consistently. Same configuration, same results, every time.

When to use which

Forget the theory. A practical framework based on documented patterns and community experience looks like this.

Use Tasks when:

You need to search without a target (“find all database connections” across 1,000 files)
Parallel reads dominate (reading 50 config files to build a dependency map)
Context isolation matters (analyzing competitor codebases without contamination)
It’s one-off work you’ll never need again
Speed matters more than token cost

Use subagents when:

Expertise requires consistency (code review with specific style guides)
Tool access needs restriction (reviewer can read, can’t write)
Workflows repeat predictably (every PR gets the same security check)
Teams need standardization (everyone uses the same test-writer agent)
Context persistence matters across tasks

Use neither when:

You’re working with 2-3 specific files. Stay in the main thread.
Simple sequential operations. Keep it in primary context.
Tasks need to communicate. Rethink your architecture.
You need nested parallelism. Write a bash script.

Is it really worth spending more than 30 seconds on this decision for most operations? Probably not. The performance difference is often negligible. The token cost difference isn’t.

Decision tree for choosing Tasks, Subagents, or main thread in Claude Code

Real patterns worth stealing

These patterns come from documented use cases where speed and cost both matter.

The repository explorer pattern

When exploring a new codebase, everyone’s instinct is to spawn one Task per directory. Wrong move. Turns out, feature-based splitting works better:

# DON'T: One task per directory (fails on cross-references)
"Explore src/, tests/, docs/ using 3 parallel tasks"

# DO: Feature-based exploration
"Use 4 parallel tasks:
- Auth system: find all auth/login/session code
- Data models: locate all database schemas
- API endpoints: map all routes and handlers
- Test coverage: analyze test patterns"

Each Task hunts for a concept, not a location. This approach handles cross-directory dependencies that directory-based splitting misses.

The code review pipeline

This is where subagents dominate. A typical effective setup uses three specialized agents:

style-checker: Runs first, catches formatting and naming issues
security-reviewer: OWASP Top 10, credential scanning, injection vectors
test-validator: Ensures tests cover the changes

They run sequentially, not in parallel. Each writes results to a markdown file that the next one reads. No context pollution, no token explosion. The sequential workflow with file-based communication beats parallel execution for complex reviews. Old school, but it works.

The hybrid orchestration

For large refactoring, combine both:

Main thread identifies all affected files
Tasks (parallel) read current implementations
Subagent (architect) designs the refactoring approach
Tasks (parallel) implement changes in isolated files
Subagent (test-writer) creates integration tests
Main thread coordinates git operations

This pattern can cut refactoring time compared to sequential processing, though tokens typically increase 3-4x. Actually, “can” is doing heavy lifting there. Sometimes that trade-off is worth it.

Limitations that will catch you off guard

Both approaches have failure modes worth knowing before they bite you.

Task tool gotchas: No visibility into running Tasks. You fire off 10 parallel operations and then wait. No progress bars, no intermediate output, nothing until they all complete or timeout. Users have been requesting better progress tracking in GitHub discussions for months.

Task results can be truncated. When a Task returns results from 100 files, you might only see summaries. Critical details like stack traces can get lost in the handoff.

No error recovery within Tasks. If Task 7 of 10 fails, the others continue, but Task 7 won’t retry or provide useful failure info. Generic “task failed” and nothing more.

Subagent surprises: Subagents can’t see each other’s work. You can’t have a designer agent pass specs directly to a coder agent. Everything routes through the main thread, adding latency and token overhead.

Configuration drift is real. That carefully tuned subagent from six months ago? Its behavior shifts subtly as Claude’s base model updates. Version control your agent configs and test them periodically.

The 20k token overhead isn’t negotiable. Even a subagent that reads one file and returns “LGTM” costs 20k tokens. Sort of absurd, when you think about it. For small tasks, staying in the main thread is 10x cheaper.

Three questions that replace every decision matrix

Stop optimizing for elegance. Optimize for getting work done.

Question 1: Will I run this exact operation again?

Yes. Create a subagent.
No. Continue to Question 2.

Question 2: Do I need to search or read more than 10 files?

Yes. Use Tasks.
No. Stay in the main thread.

Question 3: Must operations share context?

Yes. Stay in the main thread.
No. Use Tasks if parallel, subagent if specialized.

Three questions. Five seconds.

The teams that fail with Claude Code design elaborate multi-agent choreographies before writing a single line of code. It’s similar to how AI readiness assessments can lie to you. Over-engineering before understanding the actual constraints. The teams that succeed start simple, measure performance, then fix only the bottlenecks that actually matter.

Your token budget will thank you. Your deadlines will too, straightaway. Most importantly, you’ll ship features instead of debugging agent communication protocols.

The real point isn’t choosing between Tasks and subagents. It’s recognizing that the main thread is still the best orchestrator Claude Code has. Everything else is a tool for moving faster when you know exactly what you need. When you layer this into a full project management system with persistent CLAUDE.md files and structured folder hierarchies, even non-code work benefits from the same parallel execution patterns.

June 2026 changes one line of this. The “write a bash script” advice grew an official answer: dynamic workflows let Claude write a JavaScript orchestration script that runs subagents in the background, 16 at once and as many as 1,000 across a run, and the ultracode setting makes that the default for big tasks. So the main thread is no longer the only orchestrator worth using; for work that splits into dozens of independent pieces, the script now holds the plan. The per-worker overhead arithmetic in this post still applies to every one of those agents, which is exactly why the bill scales the way it does.

claude-codemulti-agentai-architectureperformance-optimization

About the Author

Amit Kothari is an experienced consultant, advisor, coach, and educator specializing in AI and operations for executives and their companies. With 25+ years of experience, he is the Co-Founder & CEO of Tallyfy® (raised $3.6m, the Workflow Made Easy® platform) and Partner at Blue Sheen, an AI advisory firm for mid-size companies. He helps companies identify, plan, and implement practical AI solutions that actually work. Originally British and now based in St. Louis, MO, Amit combines deep technical expertise with real-world business understanding. Read Amit's full bio →

Disclaimer: The content in this article represents personal opinions based on extensive research and practical experience. While every effort has been made to ensure accuracy through data analysis and source verification, this should not be considered professional advice. Always consult with qualified professionals for decisions specific to your situation.