Quick answers
Does low effort always cut my bill? No. Effort is a signal, not a cap. Claude still thinks hard on hard problems whatever you set.
Which level should I default to? High is the default. For Opus 4.7 coding work the docs suggest starting at xhigh; for cost-sensitive work, step down to medium.
Is max always best? No. Max can overthink simple tasks and adds real cost for small gains.
Effort mode looks like a cost dial. It has levels, the levels are named in a sensible order, and the low end is described as cheaper. So people use it like a thermostat: turn it down to spend less, turn it up when quality matters. That mental model is wrong, and using it produces bills that do not match expectations.
Here is the contrarian claim, and it is not mine, it is in Anthropic’s own documentation: effort is not a budget. It is a behavioral signal. The difference sounds small and it is the whole post. A budget is a cap; spend hits it and stops. A signal is a suggestion the model weighs against the actual difficulty of the task in front of it. Set effort low on a hard problem and Claude does not run out of room and quit. It thinks anyway, because the problem demands it.
So effort mode is real, it is useful, and it does not do the one thing most people reach for it to do. This is what it is, where the levels stop, and where treating it as a dial quietly costs you.
What effort mode is
The effort parameter controls how eager Claude is to spend tokens when it responds. It has five levels. From most conservative to most capable they are low, medium, high, xhigh, and max. The default is high, and the documentation notes that setting high is exactly the same as not setting effort at all.
What makes effort more than a thinking toggle is its reach. It does not only govern extended thinking. It affects every token in the response: the text, the explanations, and, above all, the tool calls. At lower effort Claude makes fewer tool calls and combines operations; at higher effort it makes more of them and explains its plan as it goes. That is why effort matters for agentic work specifically, where tool calls, not prose, are most of the spend.
You set it in three places depending on where you are working. Through the API it is a field, output_config: {effort: "medium"}. In Claude Code it lives in your configuration: the effortLevel setting, or the CLAUDE_CODE_EFFORT_LEVEL environment variable. I run Claude Code at max, which I will come back to, because the reason is not the obvious one. The level you choose is a real lever, one of the Claude Code settings worth tuning. It is just not the lever it is shaped like.
A signal, not a cost dial
This is the sentence to internalize, quoted directly from the effort documentation:
“Effort is a behavioral signal, not a strict token budget. At lower effort levels, Claude will still think on sufficiently difficult problems, but it will think less than it would at higher effort levels for the same problem.” — Claude effort documentation
Sit with the second sentence. At low effort, on a hard problem, Claude still thinks. It thinks less than it would at high, but it does not refuse to engage. There is no hard ceiling that, once hit, makes Claude hand back a half-answer. The effort level tilts the model’s judgment about how much a problem is worth; it does not overrule the problem.
That is the gap between effort and a budget, and it is the gap that breaks the thermostat mental model. A real token cap is predictable: you know the maximum because you set it. Effort is not predictable that way, on purpose. Set low and the spend on an easy task drops a lot, because the task did not need much and the signal told Claude not to reach. Set low and the spend on a hard task drops far less, because the task needed the thinking and the signal only nudged. The same setting produces two different savings depending on work you cannot fully see in advance. You are not setting a number. You are expressing a preference, and the model decides how far to honour it.
Where the levels stop
The second thing the dial model gets wrong is assuming all five levels are always available. They are not, and the restrictions are specific.
low, medium, and high work across every model that supports effort. The two top levels are gated. xhigh, described as extended capability for long-horizon work, is available on Claude Opus 4.7 only. max is broader but still not universal: it runs on Opus 4.7, Opus 4.6, Sonnet 4.6, and the Mythos Preview, but not on Opus 4.5. So a script that pins xhigh is a script that only runs on one model, and that coupling is easy to forget until you switch models and the setting silently means something else.
There is a subtler limit, and it matters more in daily use. Models do not all honour effort with the same strictness. The documentation is direct that Claude Opus 4.7 “respects effort levels more strictly than Claude Opus 4.6, especially at low and medium.” Read that as a warning about portability. The exact same medium setting produces noticeably different behavior on two models from the same family. Effort is not a fixed contract you sign once. It is a signal each model interprets slightly differently, and a setting tuned on one model is a setting you re-test on the next. If you are standardizing Claude Code settings across a team and want them to behave the same for everyone, reach out and I am glad to help work it through.
When low effort costs more
Now the failure that the dial model walks people straight into. They want to save money, so they set effort low and leave it there. Sometimes that saves money. Sometimes it costs more than high would have, and the mechanism is worth understanding.
Effort lowered on a hard task does not make the task easy. It makes Claude reach less far for an answer that the task still requires. The result is a thinner first attempt: a shallower fix, a missed edge case, a tool call skipped that should have happened. You then notice the problem and correct it, which is another turn, more context, more tokens. Two or three of those correction loops and you have spent more than a single high-effort pass would have cost, and you have spent your own attention on top. The official guidance for Opus 4.7 says it plainly: if you see shallow reasoning on a complex problem, raise effort rather than prompting around it. Cheap-looking work that has to be redone is not cheap.
The mirror-image mistake is treating max as a free upgrade. It is not. The documentation warns that on most workloads max adds real cost for relatively small quality gains, and on some structured or less demanding tasks it can lead to overthinking, where the model labours a simple thing into something worse. So the dial is wrong at both ends. Low is not a guaranteed saving and high is not a guaranteed improvement. Effort rewards matching, not maximizing.
Setting effort with intent
If effort is not a dial you set once, what do you do with it? You match it to the shape of the task, and you accept that the match is a judgment, not a formula.
The documentation gives sound starting points. For Claude Opus 4.7 on coding and agentic work, begin at xhigh, and use high as the floor for anything intelligence-sensitive. Step down to medium for cost-sensitive workflows. Drop to low for simple, high-volume work such as classification or quick lookups, and for subagents doing narrow jobs. Reserve max for frontier problems where your own testing shows headroom above xhigh. The pattern underneath all of that is the same: effort should track difficulty, and difficulty is something you assess per task, not once per quarter.
So why do I run Claude Code at max? The reason is not that more is always better. For the work I do in it, deep reasoning over a whole codebase, the cost of a shallow answer that has to be unwound is reliably higher than the cost of the extra tokens, so the match, for my tasks, lands at the top. That is a decision about my work, not a universal setting, and that is the real point of effort mode. It is not a slider you push toward “save money.” It is a small, plain admission, made per task, about how hard the thing in front of you actually is. Treat it as a token-budgeting tool rather than a token-budget, set it deliberately, and re-set it when the task or the model changes. The dial was never the thing. The judgment behind it was.



