Your subagent failed. Now what?
It is a fair question, and it does not have a comfortable answer, because a subagent is the one part of Claude Code you cannot step inside. A failed function call leaves a stack trace. A failed subagent leaves a sentence. It did its work behind a closed door, and all you got back was a note slid underneath it.
That is not a flaw to fix. It is the design, and the design is the reason subagents are useful. A subagent runs in its own context window precisely so its forty file reads and its noisy tool output never reach yours. The isolation that keeps your main session clean is the same isolation that keeps you from watching the subagent work. You cannot have one without the other.
It helps to sit with that trade before you try to debug it. People reach for subagents because the alternative is worse. Run a big search-and-summarize task inline and your main context fills with raw file contents, half of which you will never need again, and the useful thread of your work gets buried under noise. The subagent exists to absorb that noise on your behalf. So when it fails and you find yourself wishing you could see inside it, remember that the wish and the benefit are the same thing pointing in two directions. The closed door cost you visibility on this one task. It saved you a polluted context on every task. You agreed to that trade the moment you delegated.
So debugging a subagent is a different skill from debugging code. It is not about reading a trace. It is about reading a summary, recognizing the small set of ways isolation goes wrong, and, more than anything, designing subagents that report enough about themselves to be diagnosed at all. This post is that skill.
The good news is that the skill is small. There is no large surface area to learn here, because a subagent only fails in a handful of ways, and they all sit close to the same boundary. Once you can name those ways, most failures stop looking mysterious and start looking like one of three or four familiar shapes. The work is learning to recognize the shape from a paragraph of text, then knowing what to change in response. Everything below is in service of that.
Why subagents fail in the dark
To debug a subagent you have to know what it can and cannot see, because almost every subagent failure traces back to that boundary. The subagents documentation is precise about it: each subagent “starts with a fresh, isolated context window. It does not see your conversation history, the skills you’ve already invoked, or the files Claude has already read.” Claude writes it a delegation message describing the task, and the subagent works from that message and whatever it can discover with its own tools. Nothing else.
Hold that picture, because it explains the opacity in full. The subagent cannot see your session, and your session cannot see the subagent. What crosses the boundary is two things and only two things: a delegation message going in, and a summary coming out. Everything in between, the files it chose to read, the searches it ran, the dead ends it backed out of, the reasoning it used, happens in a context window that is discarded when the subagent finishes. There is no log to open afterward because the working context no longer exists. This is why a subagent failure feels so much worse than a normal bug. It is not that the information is hard to find. It is that most of it was never kept. Debugging in the dark means accepting that you are working from two artifacts, the brief you sent and the summary you got, and getting as much out of those two as they can give.
Compare it to a normal bug for a second, because the contrast is the whole point. When code throws an exception, the failure is a frozen moment you can return to. The stack is still there. The variables are still there. You can read the line that broke, walk back up the calls that led to it, print whatever state you want, and the program will sit patiently while you do. Debugging code is mostly an act of looking. The evidence waits for you. A subagent gives you none of that patience. By the time you read the summary, the run is over and the room it worked in has been emptied. You are not looking at a paused failure. You are reading a report of one. That single difference, evidence that waits versus evidence that is gone, is why the instincts you built debugging code do not carry over cleanly, and why a new instinct has to take their place.
It is worth being precise about why the working context cannot just be saved for you. The point of the isolation is that the subagent’s context never touches yours. If Claude Code kept that context around and handed it back so you could inspect it, it would have to live somewhere, and the natural somewhere is your session. At that point the noise you delegated the task to avoid is back in the room with you. The discard is not an oversight. It is the mechanism. The summary is the deliberate replacement for the working context: a small thing that fits in your session, in exchange for the large thing that would not. So when you wish the log still existed, you are wishing away the feature. The better move is to make the summary itself carry more, which is most of what the rest of this post is about.
Start with the summary
The summary is the only direct evidence you have, so read it like evidence, not like a status update. Most people skim it, see the word “done” or “failed,” and move on. The summary almost always says more than that if you slow down.
Ask three things of it. Did the subagent understand the task the way you meant it? A summary that describes solving a slightly different problem tells you the delegation message was ambiguous, and that is a fixable thing. Did it report what it could not do, as opposed to what it did? Phrases like “I was unable to locate” or “no matching files were found” are the subagent telling you it hit the edge of what it could see. And did it hand back a result or a description of a result? A subagent that says it “would” do something, or describes the change it “recommends,” often did not actually do it, and that gap is the bug. The summary is a small artifact, but it is a dense one. The failure is usually named in it, in plain language, and the only reason people miss it is that they expected a stack trace and got a paragraph instead. Read the paragraph.

Take that third question, because it is the one people miss most. There is a real difference between a subagent that did the work and a subagent that wrote a convincing description of the work. The description reads well. It is fluent, it is organized, it lists the steps in order, and if you are skimming for reassurance it gives you exactly that. But fluency is not evidence. A summary that says the change “should now” be in place, or that the file “can be updated” a certain way, or that something “is recommended,” is using the grammar of a plan, not a result. The bug hides in that grammar. Picture a subagent asked to rename a function across a codebase that hands back a tidy paragraph explaining which files contain the function and how each call site would change. It sounds like completed work. It is a description of work not yet done. The tell is verb tense and mood. Past and definite means it happened. Conditional and future means it did not, or the subagent is not sure, and either way you have found the failure.
There is a habit that makes this faster, and it costs nothing. Read the summary once for what it claims, then read it a second time only for the gaps. The first pass picks up the headline. The second pass, where you are reading against the grain and looking for the unfinished edge, is where the diagnosis usually lives. What did it carefully not say? Where did a confident sentence trail off into a softer one? A summary that opens by listing three things it accomplished and then spends its last line on a thing it “noted for follow-up” has told you, quietly, where it ran out of room or ran out of certainty. The end of a summary is often more informative than the start, because the start is what the subagent was proud of and the end is what it was still worried about.
The context-isolation failure
When the summary does not explain the failure on its own, the cause is almost always the same one, and naming it makes it easy to spot. The subagent could not see something it needed, because it started fresh.
This is the failure mode that catches everyone, because it is invisible from your seat. You can see the whole conversation, so the task feels fully specified. But the subagent only received the delegation message. If the task depended on something established earlier in your session, a decision you made, a file Claude already read, a convention you stated, the subagent never received it. It did not fail because it is weak. It failed because it was briefed badly, and it was briefed badly because the brief was written by a Claude that forgot the subagent cannot see what it can see.
The reason this one is so slippery deserves a moment. The mistake does not feel like a mistake while you are making it. When you write a request, you are sitting on top of the whole conversation, and that history is part of how the request reads to you. You say “fix the same issue in the other file” and to you that sentence is complete, because three messages ago you discussed what the issue was. The sentence carries all of that for you. It carries none of it for the subagent. The delegation message is not the conversation. It is a fresh page, and the subagent reads “the same issue” with no idea what “same” refers to. So it guesses, or it asks, or it does something adjacent and reports back. Then you read the summary, see work that does not match what you wanted, and your first thought is that the model is having a bad day. The model is fine. It answered the brief you actually sent, which is not the brief you thought you sent. The whole class of failure comes from the gap between those two, and the gap is invisible from your seat because your seat is the only one with the history.
Here is a way to test for it before you ever read a summary. Before you delegate, read your own delegation message as if you had just walked into the room and knew nothing else. Strip away everything you remember. Could a stranger do the task from those words alone? If the answer needs even one fact that is not on the page, that fact is missing, and the subagent is about to miss it too. This costs ten seconds and removes most of the surprises.
There is a related version worth knowing. A subagent inherits the parent’s permissions with tool restrictions on top, so a subagent scoped to read-only tools that is then asked to make a change cannot do it. The summary will usually say so, but if you skimmed it you will read the refusal as a model failure rather than a permissions one. This one is easy to misread because the subagent’s report sounds like it chose not to act. It says something like it did not make the change, and a quick reading hears reluctance. There was no reluctance. There was a wall. The subagent was handed a task that its tools physically could not perform, and it did the only correct thing, which was to stop and say so. Punishing the brief by rewording it will not help, because the brief was not the problem. The scoping was. The fix is to widen the tools the subagent is allowed to use, or to accept that this task belongs to a differently scoped worker. Picture a subagent set up purely to audit code and report risks, then later asked to also apply the fixes it found. It will read the files happily and write nothing, and that is the configuration working, not breaking.
If you keep hitting these and want a second pair of eyes on how your team scopes and briefs its agents, my door is open. The fix for the whole category is the same: the subagent is only as good as its delegation message, so when one fails for no visible reason, suspect the brief before you suspect the subagent. Get into the habit of asking two questions in order. Did it have the facts? Did it have the tools? Almost every dark failure is one of those two, and they have different fixes, so it pays to know which you are looking at. A facts failure is solved by writing a fuller brief. A tools failure is solved by changing the subagent’s permissions. Reword a brief to fix a permissions problem and you will just watch the same wall get hit again. The deeper background on this boundary is in what a subagent is, and it is worth being fluent in it.
Design subagents to be read
Reactive debugging only goes so far when the evidence is this thin. The real move is to stop relying on whatever summary you happen to get, and design subagents that report on themselves by default. A subagent you wrote for observability is one you can debug. A subagent you wrote and forgot is one you can only guess at.
This is the shift that changes everything else. Up to here, debugging has been something you do after the fact, squeezing a diagnosis out of a paragraph you did not get to specify. That is reactive work, and it is hard precisely because you are reading whatever the subagent felt like telling you. But you do not have to accept the default summary. A custom subagent has a system prompt, and that system prompt decides what kind of reporter the subagent is. Write a subagent that says little when it struggles and you have signed up for guesswork on every failure. Write one that is required to describe its own struggles and you have moved the diagnosis from after the failure to before it. The summary stops being a thing you interrogate and becomes a thing you designed. That is the difference between debugging a subagent and debugging a subagent you built to be debugged.
Two habits do most of the work. First, give a custom subagent an explicit return contract. In its system prompt, state exactly what its summary must contain: what it did, what it could not do and why, which files it changed, and what it would need to finish if it ran out of room. A subagent told to report its own boundaries will report them, and a vague failure becomes a specific one. Second, brief it as though it knows nothing, because it does. Put every fact the task depends on into the delegation message itself rather than trusting that context will carry over, since it will not. Most of what looks like a flaky subagent is really an under-specified one, and both fixes are about writing, not debugging. You are not fixing the subagent after it fails. You are building one that, when it fails, tells you why.
The return contract earns its keep most when a run only half succeeds. A subagent that finishes cleanly does not need much of a report. A subagent that gets two-thirds of the way and runs short on room needs to tell you exactly where it stopped, what it had done so far, and what the next person, you or another subagent, would have to pick up. Without that instruction it tends to hand back something rounded-off and optimistic, because a tidy summary reads better than a frank one. With the instruction, the partial failure arrives already diagnosed. The contract is not bureaucracy. It is you deciding, in advance and once, what every future summary has to confess.
It is worth being plain about why the briefing habit matters as much as the contract. The return contract improves the report you get after a run. The full brief reduces the number of bad runs in the first place. They work on opposite ends of the problem, and you want both. Imagine two teams using the same subagent. One writes thin briefs and rich return contracts; it gets clear reports of frequent failures. The other writes full briefs and thin contracts; it fails less often but cannot tell why when it does. Neither is enough alone. The first team debugs well and often. The second team rarely needs to debug but is helpless when it must. The setup you want is full briefs and rich contracts together: fewer failures, and every one of them legible. One habit is about prevention. The other is about diagnosis. Skipping either leaves you doing more guesswork than the tool requires.
When to stop and go inline
The last skill is knowing when to stop. Subagent debugging has a point of diminishing returns that arrives faster than with ordinary code, because every diagnostic cycle is expensive: you adjust the brief, you spend a whole fresh context window running the subagent again, you read another summary. Two or three rounds of that and you have spent more than the isolation was ever going to save.
Sit with the arithmetic, because it is what makes the stopping rule firm rather than vague. Each retry is not a small step. It is a full run. You rewrite the delegation message, a whole context window is spent doing the task over, and you get back another paragraph that may or may not be clearer than the last one. That is not a cheap loop, and it does not always converge. Sometimes the second summary is no more revealing than the first, and you are no closer, you have just paid again. The reason to delegate in the first place was to save your own attention and keep your context clean. Once you are on the third rewrite, that saving is long gone. You are now spending more effort steering a worker you cannot see than the task would have cost you in plain sight. The trade has quietly inverted. The skill is noticing the moment it inverts and not pushing past it out of stubbornness.
A simple rule holds up well. Give a stubborn subagent two real attempts, maybe a third if each summary is visibly teaching you something new. If the summaries are improving, keep going; you are converging. If two rounds leave you with the same fog, stop. That is not failure on your part. It is the correct reading of a tool with a known limit. Pushing a fourth and fifth attempt rarely breaks the pattern, because if the first three summaries could not tell you what went wrong, a fourth written the same way will not either.
When you reach that point, the answer is to pull the work back into your main session and do it inline, where you can see every step as it happens. You lose the clean context, which is a real cost. You gain full visibility, which is exactly what you were missing. For a task that is being stubborn, that trade is correct. Think about what inline actually buys you. Every file read, every search, every small decision happens in front of you, and you can correct a wrong turn the instant it is taken instead of discovering it in a summary an entire run later. The feedback loop shrinks from one-run-long to one-step-long. For a task that has already resisted three briefs, that tight loop is worth more than the clean context you give up to get it. The same is true if the work turns out to need broad context to begin with: that is the case for forking the conversation, which hands a worker your full history instead of a fresh window, or for not delegating at all. There is no shame in that retreat, and it is worth saying so plainly, because people treat going inline as an admission that they used the tool wrong. They did not. Delegation was a reasonable bet. The bet did not pay off on this particular task, and a worker you can watch is the right response to that, not a thing to feel sheepish about. A subagent is a worker sent into another room. Most of the time that is the right call, and the closed door is a feature. But when the door has stayed closed through three failed attempts, stop knocking. Open it, bring the work into the room you are standing in, and watch it run.





