· AI

CEO of Tallyfy · AI advisor at Blue Sheen for mid-size companies

How to budget tokens in Claude Code

A surprising Claude Code bill is almost never one big expense. It is four different cost shapes stacked up: a context window that bills every turn, subagents that each cost a fixed chunk, skills that cost almost nothing until used, and caching that can cut the recurring cost or not. Budgeting tokens means knowing the four shapes.

Key takeaways

  • Token spend has four shapes, not one - context, subagents, skills, and caching each cost differently
  • The context window is the recurring cost - every file you load is re-billed on every turn after
  • Budget per task, not per month - decide what a task is worth before you start it
  • Measure the real number - the context readout tells you where you stand mid-session

A Claude Code bill that surprises you is almost never one large, obvious expense. If it were, you would have seen it coming. It is small amounts, charged in four different ways, stacking up under a session that felt ordinary while you ran it.

That is why “use fewer tokens” is useless advice. Tokens are not one thing with one price. They are spent in four distinct shapes, and each shape behaves differently: one is recurring, one is a fixed lump, one is nearly free until it is not, and one is a discount you either earn or miss. You cannot budget a number you think of as flat. You can budget four shapes once you can tell them apart.

So this is not a post about spending less. It is a post about spending on purpose. Knowing the four shapes turns a vague unease about cost into a set of specific, answerable questions, and answerable questions are the whole of budgeting.

Where your tokens actually go

Open a long Claude Code session and the single biggest line item, almost always, is the context window. This is the cost shape people miss, because it does not feel like spending. It feels like the session just working.

Here is the mechanism. Claude Code’s context window holds the entire conversation: every message, every file Claude has read, every command output. And the model is re-sent that whole window on every turn. So a file you asked Claude to read on turn three is not paid for once on turn three. It is paid for again on turn four, and turn five, and every turn after, until the session ends or the context is compacted. A large file read early in a session is not a one-time purchase. It is a subscription you signed without noticing. This is why a session that did not feel expensive can be: nothing in it was a big move, but a dozen file reads each quietly re-billed thirty times adds up to the surprise. The first rule of token budgeting follows directly: what you load, you keep paying for. Load deliberately.

The four cost shapes

The context window is one shape. Three more sit alongside it, and a budget is just knowing which shape each action belongs to.

The context window is the recurring shape. It grows as you work and is re-billed every turn, so its cost is roughly the size of the window multiplied by how many turns remain. A subagent is the fixed-lump shape. Every subagent you spawn pays for a whole fresh context window and a delegation message to brief it; that overhead is real and it does not shrink, which is why spawning a subagent for a tiny task is poor budgeting. A skill is the nearly-free shape. The official skills documentation is exact about this: “a skill’s body loads only when it’s used, so long reference material costs almost nothing until you need it.” A skill sits in context as a short description and costs next to nothing until something invokes it. And caching is the discount shape. A cache read costs a fraction of sending the same tokens fresh, while a cache write costs a little more than a fresh send, and the cache lives only a few minutes before it expires. Caching does not lower your token count. It changes the price of the tokens you were going to send anyway, if you arrange your session to keep hitting the cache. The mechanics of that are their own subject, covered in LLM caching strategies.

The four shapes of Claude Code token spend: context window, subagents, skills, and caching

Budget by the task

Most people who try to control AI cost reach for a monthly number. A monthly cap tells you nothing useful in the moment, because by the time you have breached it the spending already happened. The unit of token budgeting is not the month. It is the task.

Before a task starts, decide what it is worth. Think in proportion rather than money, since a money figure would be meaningless here and would age badly anyway. Is this a task that deserves the whole context window filled with relevant files, several subagents, and an hour of back and forth? Or is it a task that deserves one file and a direct answer? Most tasks are the second kind and get treated like the first, because nobody set the proportion at the start. A task with a ceiling behaves differently from a task without one. You load less speculatively, you delegate only when the isolation earns its lump, you stop when the answer is good rather than when the model runs out of room. If you are setting these proportions for a whole team rather than just yourself, that is the kind of operating discipline Blue Sheen helps put in place. The point is small and it is the entire post: you cannot overspend on a task whose worth you decided in advance.

Watch the real number

You cannot budget what you cannot see, and Claude Code does show you. The context window has a visible usage readout, and a custom status line can keep token usage in front of you continuously while you work. The number people guess at is sitting on the screen.

Build the habit of glancing at it. When the context window is filling fast, that is information: something heavy went in, and it is now being re-billed on every turn. The readout is also how you catch the moment to act. Claude Code compacts the context automatically when it gets full, summarizing to free room, but you do not have to wait for that. Watching the number lets you compact on purpose between tasks, or clear the context when you switch to something unrelated, before the bloat has cost you twenty turns of re-billing. A budget you check is a budget. A budget you set and never look at is a wish. The difference is one glance, every few minutes, at a number that is already there.

Five habits that cut spend

Knowing the shapes is the theory. Five habits put it into practice, and none of them costs you any quality.

Load narrow. Ask for the specific file or the specific function, not the directory. Every file you pull in joins the recurring cost, so the cheapest token is the one you never loaded.

Clear context between unrelated tasks. When you switch from a bug fix to a documentation pass, the bug-fix context is now pure overhead, re-billed every turn for nothing. Resetting it is the single largest saving available to most sessions.

Delegate by the lump, not by reflex. A subagent costs a fixed chunk every time. Spend that chunk when the isolation is worth it, on a noisy, file-heavy side task, and not on something small you could have answered in the main session.

Move recurring instructions into skills or CLAUDE.md. Anything you paste into chat repeatedly is being paid for as fresh context every time. The same text as a skill costs almost nothing until invoked.

Keep sessions tight enough to cache. Caching only helps if you keep hitting the cache, and long idle gaps let it expire. Working in focused stretches is not just good for attention. It is what keeps the discount alive.

None of these is a sacrifice. Each one is the same move: stop paying, turn after turn, for tokens that are no longer doing any work. That is all token budgeting is: refusing to keep paying, turn after turn, for what you have stopped using.

About the Author

Amit Kothari is an experienced consultant, advisor, coach, and educator specializing in AI and operations for executives and their companies. With 25+ years of experience, he is the Co-Founder & CEO of Tallyfy® (raised $3.6m, the Workflow Made Easy® platform) and Partner at Blue Sheen, an AI advisory firm for mid-size companies. He helps companies identify, plan, and implement practical AI solutions that actually work. Originally British and now based in St. Louis, MO, Amit combines deep technical expertise with real-world business understanding. Read Amit's full bio →

Disclaimer: The content in this article represents personal opinions based on extensive research and practical experience. While every effort has been made to ensure accuracy through data analysis and source verification, this should not be considered professional advice. Always consult with qualified professionals for decisions specific to your situation.

Related Posts

View All Posts »
Claude Code effort mode and where it falls short

Claude Code effort mode and where it falls short

Claude Code effort mode looks like a cost dial: turn it down to spend fewer tokens. The official docs say otherwise. Effort is a behavioral signal, not a strict budget, so low effort does not reliably cut spend and can quietly raise it. Here are the five levels, where they stop, and how to set effort with intent.

The real cost of a large context window in Claude Code

The real cost of a large context window in Claude Code

A large context window in Claude Code feels free, and it is the opposite. Every token you load is re-billed on every turn after. The prompt cache that should make that cheap expires in five minutes, turning a tenth-price read into a higher-price write. And accuracy fades as the window fills. Here is the real cost.

How Claude Code scheduled jobs actually work

How Claude Code scheduled jobs actually work

Claude Code scheduled jobs come in three forms with very different guarantees: the in-session /loop, Desktop tasks, and Cloud routines. A missed run does not queue up a backlog. And despite a common belief, none of them creates a Windows Task Scheduler entry or a .bat file. Here is how each one actually behaves.

How Claude Code stop hooks work

How Claude Code stop hooks work

A Claude Code stop hook runs the moment Claude finishes a turn and can refuse to let it stop. It is the one hook that inverts control: Claude must pass your check before the turn ends. This post covers what a stop hook is, why exit code 2 is the whole game, the infinite-loop trap to avoid, and the patterns worth wiring up.

AI advisory services via Blue Sheen.
Contact me Follow 10k+