Key takeaways
- Token spend has four shapes, not one - context, subagents, skills, and caching each cost differently
- The context window is the recurring cost - every file you load is re-billed on every turn after
- Budget per task, not per month - decide what a task is worth before you start it
- Measure the real number - the context readout tells you where you stand mid-session
A Claude Code bill that surprises you is almost never one large, obvious expense. If it were, you would have seen it coming. It is small amounts, charged in four different ways, stacking up under a session that felt ordinary while you ran it.
That is why “use fewer tokens” is useless advice. Tokens are not one thing with one price. They are spent in four distinct shapes, and each shape behaves differently: one is recurring, one is a fixed lump, one is nearly free until it is not, and one is a discount you either earn or miss. You cannot budget a number you think of as flat. You can budget four shapes once you can tell them apart.
So this is not a post about spending less. It is a post about spending on purpose. Knowing the four shapes turns a vague unease about cost into a set of specific, answerable questions, and answerable questions are the whole of budgeting.
Where your tokens actually go
Open a long Claude Code session and the single biggest line item, almost always, is the context window. This is the cost shape people miss, because it does not feel like spending. It feels like the session just working.
Here is the mechanism. Claude Code’s context window holds the entire conversation: every message, every file Claude has read, every command output. And the model is re-sent that whole window on every turn. So a file you asked Claude to read on turn three is not paid for once on turn three. It is paid for again on turn four, and turn five, and every turn after, until the session ends or the context is compacted. A large file read early in a session is not a one-time purchase. It is a subscription you signed without noticing. This is why a session that did not feel expensive can be: nothing in it was a big move, but a dozen file reads each quietly re-billed thirty times adds up to the surprise. The first rule of token budgeting follows directly: what you load, you keep paying for. Load deliberately.
The four cost shapes
The context window is one shape. Three more sit alongside it, and a budget is just knowing which shape each action belongs to.
The context window is the recurring shape. It grows as you work and is re-billed every turn, so its cost is roughly the size of the window multiplied by how many turns remain. A subagent is the fixed-lump shape. Every subagent you spawn pays for a whole fresh context window and a delegation message to brief it; that overhead is real and it does not shrink, which is why spawning a subagent for a tiny task is poor budgeting. A skill is the nearly-free shape. The official skills documentation is exact about this: “a skill’s body loads only when it’s used, so long reference material costs almost nothing until you need it.” A skill sits in context as a short description and costs next to nothing until something invokes it. And caching is the discount shape. A cache read costs a fraction of sending the same tokens fresh, while a cache write costs a little more than a fresh send, and the cache lives only a few minutes before it expires. Caching does not lower your token count. It changes the price of the tokens you were going to send anyway, if you arrange your session to keep hitting the cache. The mechanics of that are their own subject, covered in LLM caching strategies.
Budget by the task
Most people who try to control AI cost reach for a monthly number. A monthly cap tells you nothing useful in the moment, because by the time you have breached it the spending already happened. The unit of token budgeting is not the month. It is the task.
Before a task starts, decide what it is worth. Think in proportion rather than money, since a money figure would be meaningless here and would age badly anyway. Is this a task that deserves the whole context window filled with relevant files, several subagents, and an hour of back and forth? Or is it a task that deserves one file and a direct answer? Most tasks are the second kind and get treated like the first, because nobody set the proportion at the start. A task with a ceiling behaves differently from a task without one. You load less speculatively, you delegate only when the isolation earns its lump, you stop when the answer is good rather than when the model runs out of room. If you are setting these proportions for a whole team rather than just yourself, that is the kind of operating discipline Blue Sheen helps put in place. The point is small and it is the entire post: you cannot overspend on a task whose worth you decided in advance.
Watch the real number
You cannot budget what you cannot see, and Claude Code does show you. The context window has a visible usage readout, and a custom status line can keep token usage in front of you continuously while you work. The number people guess at is sitting on the screen.
Build the habit of glancing at it. When the context window is filling fast, that is information: something heavy went in, and it is now being re-billed on every turn. The readout is also how you catch the moment to act. Claude Code compacts the context automatically when it gets full, summarizing to free room, but you do not have to wait for that. Watching the number lets you compact on purpose between tasks, or clear the context when you switch to something unrelated, before the bloat has cost you twenty turns of re-billing. A budget you check is a budget. A budget you set and never look at is a wish. The difference is one glance, every few minutes, at a number that is already there.
Five habits that cut spend
Knowing the shapes is the theory. Five habits put it into practice, and none of them costs you any quality.
Load narrow. Ask for the specific file or the specific function, not the directory. Every file you pull in joins the recurring cost, so the cheapest token is the one you never loaded.
Clear context between unrelated tasks. When you switch from a bug fix to a documentation pass, the bug-fix context is now pure overhead, re-billed every turn for nothing. Resetting it is the single largest saving available to most sessions.
Delegate by the lump, not by reflex. A subagent costs a fixed chunk every time. Spend that chunk when the isolation is worth it, on a noisy, file-heavy side task, and not on something small you could have answered in the main session.
Move recurring instructions into skills or CLAUDE.md. Anything you paste into chat repeatedly is being paid for as fresh context every time. The same text as a skill costs almost nothing until invoked.
Keep sessions tight enough to cache. Caching only helps if you keep hitting the cache, and long idle gaps let it expire. Working in focused stretches is not just good for attention. It is what keeps the discount alive.
None of these is a sacrifice. Each one is the same move: stop paying, turn after turn, for tokens that are no longer doing any work. That is all token budgeting is: refusing to keep paying, turn after turn, for what you have stopped using.



