Everybody wants AI to do their job. I’ve spent ten years building software that quietly bets the opposite way, and I think the people chasing the whole job are going to keep getting burned.
Here’s the distinction I keep coming back to. A task is one defined unit of work: draft this email, check this number against the contract, file this form. A job is a long chain of those tasks that adds up to an outcome, like onboarding a client or closing the books. AI is brilliant at the first kind and shaky at the second. And the reason isn’t that the models are thick. It’s multiplication.
Why AI needs one defined task
0.9010 = 0.35. A 10-step job run blind is a coin flip you lose.
Play with the full calculatorWhy does the whole job fall apart?
Because reliability compounds, and it compounds downward.
Say an agent does each task at 90% reliability. Respectable for one task. But the job only finishes if every task in the chain lands, so you multiply 0.9 by itself once per task. Three tasks and you’re near 73%. Ten tasks and you’re at 35%. Twenty and you slide under 13%. The model never got worse at any single step. The chain ate the reliability.
That’s the whole problem in one line.
There’s a measurement from METR that stuck with me. Frontier models hit almost 100% on tasks a human could do in under four minutes, and under 10% on tasks that run past four hours. Short and defined, reliable. Long and sprawling, not yet, and maybe not for a while. The thing is, most real jobs are the long kind, which is exactly why aiming an agent at a whole job and walking away tends to end in a mess.
I felt this directly last year. I built an MCP server so AI agents could run Tallyfy tasks on their own. Hand an agent one well-scoped task and it’s a joy to watch. Ask it to carry an eight-step process end to end with nobody checking between steps, and it drifts. Same model. Different unit of work.
Watch it fall apart yourself
I don’t expect you to take my word for the arithmetic, so here’s a tiny simulation. It flips a weighted coin per task across a hundred thousand runs. The simulated numbers land right on top of the predicted ones, because the maths really is that boring.

import random
random.seed(42)
TRIALS = 100_000
def chain_success(n, r, trials=TRIALS):
# Autonomous chain: the job succeeds only if all n tasks succeed
wins = 0
for _ in range(trials):
if all(random.random() < r for _ in range(n)):
wins += 1
return wins / trials
def gated_success(n, r, attempts=3, trials=TRIALS):
# Gated chain: each task gets up to `attempts` tries before the job fails
wins = 0
for _ in range(trials):
ok = all(any(random.random() < r for _ in range(attempts)) for _ in range(n))
wins += 1 if ok else 0
return wins / trials
R = 0.90
for n in (1, 3, 5, 10, 20):
print(f"{n:>2} tasks predicted {R**n:>6.1%} simulated {chain_success(n, R):>6.1%}")
print(f"gated 10 tasks: {gated_success(10, R):.1%}")Download it and change the inputs. The chained number falls off a cliff. The gated number, where each task gets a couple of retries before the job is allowed to fail, holds near 99%. The model is identical in both. The only thing that changed is structure.
What a task actually is
So the move is almost a no-brainer once you’ve seen the numbers: stop handing AI jobs, start handing it tasks.
Economists worked out that the task was the unit a long time ago. Acemoglu and Restrepo model automation as something that acts on specific tasks inside a role, never the whole role in one go. A job is a basket of tasks. Machines take some, people keep others, new ones appear. “Job” is an HR word. “Task” is the real economic unit, and it always was.
The engineers landed in the same spot. Anthropic’s guide to building agents calls the reliable pattern a workflow of predefined steps, and tells you to make each call an easier task on purpose. Smaller task, higher hit rate. Mind you, that isn’t a workaround. It’s the design.
This is the bet I made with Tallyfy ten years ago, before any of this was fashionable. The atom is a single defined task, with an owner, a deadline, and a check before the next one starts. Turns out that’s the exact shape AI needs to be useful. We didn’t build it for the agents. The agents just happen to need what a good process already had.
Where this leaves the job
I don’t think AI is coming for your job in one clean sweep. I think it’s going to swallow more and more of the tasks inside your job, and the roles that survive will be the ones that get good at defining and supervising those tasks. That’s a less dramatic story than the headlines sell, and a more demanding one.
The fix was never a cleverer model. It’s structure. Define each task. Track it. Gate it. Keep a human on the hook where it matters. Do that and a fragile chain turns into something that actually finishes, whether the doer is a person, an agent, or a rule.
If you want the longer, product-flavoured version of this argument, I wrote one over on Tallyfy. And if you just want to drag the sliders until the job collapses, the full calculator is here. Either way, the lesson is the same one I’ve been living with for a decade. Give AI a task. Never give it a job.



