Quick answers
Does low effort always cut my bill? No. Effort is a signal, not a cap. Claude still thinks hard on hard problems whatever you set.
Which level should I default to? High is the default. For Opus 4.8 coding work the docs suggest starting at xhigh; for cost-sensitive work, step down to medium.
Is max always best? No. Max can overthink simple tasks and adds real cost for small gains.
Effort mode looks like a cost dial. It has levels, the levels are named in a sensible order, and the low end is described as cheaper. So people use it like a thermostat: turn it down to spend less, turn it up when quality matters. That mental model is wrong, and using it produces bills that do not match expectations.
Here is the contrarian claim, and it is not mine, it is in Anthropic’s own documentation: effort is not a budget. It is a behavioral signal. The difference sounds small and it is the whole post. A budget is a cap; spend hits it and stops. A signal is a suggestion the model weighs against the actual difficulty of the task in front of it. Set effort low on a hard problem and Claude does not run out of room and quit. It thinks anyway, because the problem demands it.
So effort mode is real, it is useful, and it does not do the one thing most people reach for it to do. This is what it is, where the levels stop, and where treating it as a dial quietly costs you.
What effort mode is
The effort parameter controls how eager Claude is to spend tokens when it responds. It has five levels. From most conservative to most capable they are low, medium, high, xhigh, and max. The default is high, and the documentation notes that setting high is exactly the same as not setting effort at all.
What makes effort more than a thinking toggle is its reach. It does not only govern extended thinking. It affects every token in the response: the text, the explanations, and, above all, the tool calls. At lower effort Claude makes fewer tool calls and combines operations; at higher effort it makes more of them and explains its plan as it goes. That is why effort matters for agentic work specifically, where tool calls, not prose, are most of the spend.
Think about what a tool call actually is for a moment. When Claude reads a file, runs a search, or edits a line, that is a round trip. The call goes out, a result comes back, and the result becomes new context the model has to read on the next turn. So the cost of agentic work is not a flat number per task. It compounds. Each tool call adds to the pile of text the model carries forward, and a long agentic session can end with the model re-reading a great deal of accumulated output on every step. Effort sits on top of that mechanism. Turn it up and Claude is willing to spend more calls now, in exchange for a plan it trusts more. Turn it down and it tries to do the same job in fewer, broader strokes. Neither is automatically cheaper, because a session that takes more small careful steps can still finish lighter than one that takes a few large ones and gets them wrong.
The five names also matter less than people assume. low through max is an ordering, not a set of fixed dosages. There is no published table that says medium spends a certain count and high spends double. The level is a relative lean. It tells Claude where, on the spectrum between thrift and thoroughness, you want its instinct to start. Where it actually lands depends on the task, the model, and what the work turns up partway through. That is the first hint that the word “dial” is doing damage. A dial implies a measured output for a measured input. Effort gives you a direction, and then the problem itself fills in the amount.

You set it in three places depending on where you are working. Through the API it is a field, output_config: {effort: "medium"}. In Claude Code it lives in your configuration: the effortLevel setting, or the CLAUDE_CODE_EFFORT_LEVEL environment variable. I run Claude Code at max, which I will come back to, because the reason is not the obvious one. The level you choose is a real lever, one of the Claude Code settings worth tuning. It is just not the lever it is shaped like.
A signal, not a cost dial
This is the sentence to internalize, quoted directly from the effort documentation:
“Effort is a behavioral signal, not a strict token budget. At lower effort levels, Claude will still think on sufficiently difficult problems, but it will think less than it would at higher effort levels for the same problem.” — Claude effort documentation
Sit with the second sentence. At low effort, on a hard problem, Claude still thinks. It thinks less than it would at high, but it does not refuse to engage. There is no hard ceiling that, once hit, makes Claude hand back a half-answer. The effort level tilts the model’s judgment about how much a problem is worth; it does not overrule the problem.
That is the gap between effort and a budget, and it is the gap that breaks the thermostat mental model. A real token cap is predictable: you know the maximum because you set it. Effort is not predictable that way, on purpose. Set low and the spend on an easy task drops a lot, because the task did not need much and the signal told Claude not to reach. Set low and the spend on a hard task drops far less, because the task needed the thinking and the signal only nudged. The same setting produces two different savings depending on work you cannot fully see in advance. You are not setting a number. You are expressing a preference, and the model decides how far to honour it.
Why was it built this way? Because the alternative is worse. Picture a strict cap that did stop. You set low, the model hits the ceiling on a difficult problem, and it has to hand something back. What it hands back is a guess, or a partial fix, or a confident answer with a hole in it. A cap that fires mid-thought does not save you money. It moves the cost from tokens, which are cheap, to a broken result, which is expensive. So Anthropic made effort a lean rather than a wall. The model is allowed to overspend your preference when the work clearly needs it. That is a feature. It means a low setting can never quietly produce a wrong answer just because you wanted to be thrifty that day. The trade you accept in return is the loss of a clean number. You cannot point at the setting and predict the bill.
This is also why the word “save” is slippery here. Effort does not save tokens directly. It changes how readily the model reaches for them, and the result of that change depends on the task. Imagine two tickets in the same queue. One is a one-line config tweak. The other is a subtle bug spread across three files. You run both at low. The config tweak finishes cheap, because it was always going to be cheap and the signal stopped Claude from gold-plating it. The bug finishes only a little cheaper than it would have at high, because the model still had to do most of the thinking the bug demanded. Same setting, two outcomes. If you judged the setting by the first ticket you would conclude low is a reliable saving. The second ticket tells the truth. The saving was never the setting. It was the difficulty of the work, and the setting only chose how hard to lean against it.
Where the levels stop
The second thing the dial model gets wrong is assuming all five levels are always available. They are not, and the restrictions are specific.
low, medium, and high work across every model that supports effort. The two top levels are gated. xhigh, described as extended capability for long-horizon work, is available on Claude Opus 4.8 and Opus 4.7. max is broader but still not universal: it runs on Opus 4.8, Opus 4.7, Opus 4.6, Sonnet 4.6, and the Mythos Preview, but not on Opus 4.5. So a script that pins xhigh is a script that only runs on the top Opus models, and that coupling is easy to forget until you switch to a model without it and the setting silently means something else.
The failure mode here is quiet, which is what makes it worth a paragraph. Say you tune a workflow on Opus 4.7, settle on xhigh, and check it into a shared config. Months later someone changes the model for an unrelated reason, or a default shifts under you. The xhigh line does not error. It does not warn. It just stops being honoured the way you tuned it, because the new model does not offer that level, and now your carefully chosen setting means something other than what you chose. Nothing in the run output shouts about it. You only find out when the quality drifts and you go looking. The lesson is small and worth keeping: the effort level and the model are a pair. If you pin one of the gated levels, pin the model alongside it in the same place, so the coupling is visible to the next person instead of hidden.
There is a subtler limit, and it matters more in daily use. Models do not all honour effort with the same strictness. The documentation is direct that Claude Opus 4.7 “respects effort levels more strictly than Claude Opus 4.6, especially at low and medium.” Read that as a warning about portability. The exact same medium setting produces noticeably different behavior on two models from the same family. Effort is not a fixed contract you sign once. It is a signal each model interprets slightly differently, and a setting tuned on one model is a setting you re-test on the next. If you are standardizing Claude Code settings across a team and want them to behave the same for everyone, reach out and I am glad to help work it through.
Strictness cuts both ways, and it is worth being clear about the direction. A model that respects low more strictly will lean harder away from spending on easy work, which is the behavior you wanted when you set low. But it also means the floor is lower. If that same model gets a task you misjudged as easy, the stricter reading of low gives it less room to recover than a looser model would have. The reverse holds for the older, looser model: it forgives a wrong setting more, because it interprets low as more of a hint than an instruction. So upgrading the model can change the result of an effort level you have not touched. The number in your config did not move. The model’s reading of it did, and that is enough to shift behavior on the tasks that sit near the edge of the level you picked.
When low effort costs more
Now the failure that the dial model walks people straight into. They want to save money, so they set effort low and leave it there. Sometimes that saves money. Sometimes it costs more than high would have, and the mechanism is worth understanding.
Effort lowered on a hard task does not make the task easy. It makes Claude reach less far for an answer that the task still requires. The result is a thinner first attempt: a shallower fix, a missed edge case, a tool call skipped that should have happened. You then notice the problem and correct it, which is another turn, more context, more tokens. Two or three of those correction loops and you have spent more than a single high-effort pass would have cost, and you have spent your own attention on top. The official guidance says it plainly: if you see shallow reasoning on a complex problem, raise effort rather than prompting around it. Cheap-looking work that has to be redone is not cheap.
Walk through the arithmetic of a correction loop, because it is worse than it first looks. The cost is not just the second pass. When you point out what the first attempt missed, the model re-reads the original task, your correction, the code it already changed, and whatever it said the first time. All of that is context, all of it is paid for again, and the pile only grows with each round. So the second pass is more expensive than the first, the third more than the second, and a low setting that triggered the loop has not capped your spend. It has set a slow leak running. The single high-effort pass you skipped would have read the task once and done the work once. The thrifty-looking choice replaced one clean reading with a sequence of compounding ones.
There is a cost the bill never shows, and it can be the bigger one. Every correction loop pulls you back in. You read the thin first attempt, you work out what is wrong with it, you write the correction, you wait, you check again. That is your time and your focus, spent supervising a task you set to low precisely so you could stop supervising it. Picture a developer who batches ten small tasks overnight at low to save a few tokens. If three of them come back shallow, the morning is not free. It is three rounds of diagnosis before the real work starts. The setting that was supposed to buy back attention quietly spent it. Token cost is visible and recoverable. Attention cost is neither, and a low setting on work that needed more is one of the easier ways to lose it.
The mirror-image mistake is treating max as a free upgrade. It is not. The documentation warns that on most workloads max adds real cost for relatively small quality gains, and on some structured or less demanding tasks it can lead to overthinking, where the model labours a simple thing into something worse. So the dial is wrong at both ends. Low is not a guaranteed saving and high is not a guaranteed improvement. Effort rewards matching, not maximizing.
Overthinking deserves a closer look, because it is the failure people expect least. The intuition is that more reasoning can only help. It does not always. Give a model a plain, well-shaped task and a strong push to think hard, and it can start inventing difficulty that is not there. It second-guesses a clean approach, weighs options the task never needed weighed, or adds handling for cases that cannot arise. The output gets longer and, on a simple job, sometimes worse, because the simplest correct answer was the right one and the model talked itself past it. So max on easy work fails twice over. You pay more tokens, and you can get a result you then have to simplify back down. That is the symmetry worth holding onto. A wrong effort setting costs you in the same way at both ends, just by a different route: too low and the model under-reaches, too high and it over-reaches, and only a setting matched to the actual task avoids both.
Setting effort with intent
If effort is not a dial you set once, what do you do with it? You match it to the shape of the task, and you accept that the match is a judgment, not a formula.
The documentation gives sound starting points. For Claude Opus 4.8 on coding and agentic work, begin at xhigh, and use high as the floor for anything intelligence-sensitive. Step down to medium for cost-sensitive workflows. Drop to low for simple, high-volume work such as classification or quick lookups, and for subagents doing narrow jobs. Reserve max for frontier problems where your own testing shows headroom above xhigh. The pattern underneath all of that is the same: effort should track difficulty, and difficulty is something you assess per task, not once per quarter.
Notice the shape of that advice. It does not give you one setting. It gives you a way to read the task and pick. The classification-and-lookup case is a good one to sit with, because it is where low earns its place cleanly. Work like that is high volume and low stakes per item. The thinking each item needs is small, the cost of a thin answer on any single item is small, and you are running thousands of them. There, a low setting is doing exactly its job: it stops the model gold-plating work that does not reward gold-plating, and the saving is real because it repeats across the whole batch. The subagent case follows the same logic. A subagent sent to do one narrow, well-scoped job does not need the reach you would give the main task. Match the effort to the size of the job the subagent was handed, not to the size of the project it sits inside.
The harder skill is judging the task in the first place, and it is worth being clear that you will get it wrong sometimes. A ticket can look like a one-liner and turn out to touch three systems. A request can read as deep and reduce to a rename. You cannot always tell from the outside, and effort asks you to guess before the work begins. So the practical move is not to find the perfect setting. It is to watch what comes back and adjust. If a low-effort run hands you a shallow answer on something that was not actually shallow, that is information: raise the level and rerun, rather than firing off corrections at the thin result. If a max-effort run is grinding a plain task into a long-winded one, that is information too, in the other direction. The skill is not perfect prediction. It is reading the first result carefully and re-setting the level when it is wrong, which is the same loop the documentation points at when it says to raise effort on shallow reasoning instead of prompting around it.
So why do I run Claude Code at max? The reason is not that more is always better. For the work I do in it, deep reasoning over a whole codebase, the cost of a shallow answer that has to be unwound is reliably higher than the cost of the extra tokens, so the match, for my tasks, lands at the top. That is a decision about my work, not a universal setting, and that is the real point of effort mode. It is not a slider you push toward “save money.” It is a small, plain admission, made per task, about how hard the thing in front of you actually is. Treat it as a token-budgeting tool rather than a token-budget, set it deliberately, and re-set it when the task or the model changes. The dial was never the thing. The judgment behind it was.





