The Grunt Work Was Never Just Grunt Work, and You're About to Lose It
Automate the work associates learned judgment from, and you trade a cost problem for a training problem that stays hidden for ten years.
When AI Does the Grunt Work: What Anthropic’s Self-Improvement Report Means for Law Firms
TL;DR: Earlier this week, Anthropic published its own internal numbers on how much of its engineering AI now does. More than 80% of the code it ships is written by Claude. The average engineer pushes 8x the code they did in 2024. The part worth your time isn’t the AI. It’s what broke once execution got cheap. Human review turned into the bottleneck, and judgment turned into the whole job. Your firm runs on the same shape, and Anthropic hit the wall first.
Anthropic is your firm, three years early
You’ve run the math on your own shop. A few partners at the top, a wider band of associates beneath them, and the whole thing holds together because the people at the bottom do the hours and the people at the top sell the judgment. That ratio is the business. Move it and everything moves with it: profit per partner, what you can bill, who you hire, how you train them.
Now open Anthropic’s new report on what they call recursive self-improvement, and walk right past the science-fiction headline. What’s underneath is plainer and more useful to you. Anthropic is a pyramid too. Smart people at the top, a mountain of execution underneath. And they just showed us, with their own numbers, what happens to that structure when the execution underneath it mostly disappears.
The grunt work, gone
Start with one engineer, this past April. He pointed Claude at a backlog of errors and let it run. It shipped more than 800 fixes and cut that whole class of error by a factor of a thousand. How long would a person have taken? His estimate was four years. Not because the work was hard. Because it was slow, scattered, and no human can hold that much unfamiliar detail in their head at once.
That’s not a stray win. Across the company, Claude went from writing a few percent of the code in early 2025 to more than 80% by May 2026. The average engineer now pushes roughly 8x the code per day they did in 2024. Anthropic is straight about the weakness in that number: lines of code rewards quantity over quality, so the real gain is smaller. Fine. The direction is the point.
And the direction is the lesson. The work you bill by the hour, first-pass research, the standard memo, document review, the cleanup nobody wants, is the work that gets cheap first. Not eventually. It already happened somewhere, and somebody wrote down the numbers.
Then everyone’s waiting on the same few people
Here’s the part most coverage skips, and the part that should hold a managing partner’s attention.
When the doing gets cheap, the doing stops being the constraint. Something else takes its place. At Anthropic it was review. They say it flatly: once Claude writes most of the code, getting a human to read and approve that code becomes the thing that holds up everything behind it. Your reviewers can only go so fast. Speed up everything in front of them and you haven’t fixed the line, you’ve just moved where it backs up.
You’ve watched a smaller version of this already. Technology-assisted review pulled first-pass document review away from junior associates years ago. The work didn’t vanish. It climbed. Someone senior now validates what the model flagged, defends how it was done, signs the certification. The doing got cheap. Standing behind it never did.
Anthropic is living the bigger version now. They run an automated reviewer over every code change before it merges, and they found it would have caught about a third of the bugs behind past production incidents, misses by engineers who are among the best in the world at this. Their newest model, out at the end of May, is sold partly on being far less likely to let its own errors through, so teams can lean on human review a little less. Sit with that. The product roadmap is now openly about relieving the review bottleneck, because the bottleneck is real and it’s expensive.
For you, that bottleneck already has a name: professional responsibility. You can’t put unreviewed work into a client’s matter. The duty to supervise doesn’t relax because the first draft came from a machine. So the wall Anthropic hit on speed, you hit on speed and ethics at the same time, and the second one doesn’t move.
A hundred people doing the work of a thousand
The report’s other claim is the one I’d put on the whiteboard. A 100-person company, they argue, increasingly does the work of a 1,000-person one, because each person sits on top of a stack of agents handling the execution.
Run that through a firm and the questions get uncomfortable fast. If a small team produces what used to take a big one, how big should your associate class actually be? What are you selling, once the hours collapse? And the one I keep coming back to: where will your associates learn judgment, if not by grinding through the execution that’s now automated?
That’s the quiet crisis in all of this. The grunt work was never only grunt work. It was the apprenticeship. Pull it out and you’ve solved a cost problem while handing yourself a much harder training problem, one that won’t show up on the P&L for a decade.
Where I’d push back on all this
A few honest caveats, because this report has a tell.
Anthropic is grading Anthropic. A company that sells AI, telling you how good AI is at its own work, on a metric it admits flatters the result. Take the trend seriously. Take the exact multiples with salt.
And software isn’t law. Code comes with tests that tell you in seconds whether it runs. A settlement posture or a regulatory read has no green light that flips on when you’re right. The execution that automates first in your shop is the most mechanical part, and the judgment that resists it is a far bigger slice of legal work than it is of writing code. That’s a real cushion. Don’t mistake it for a wall.
The trend could also bend. Anthropic says so themselves: the curve might flatten, the hard part of judgment might not yield to more compute. They don’t believe it. But they said it, which beats most vendors.
One last thing, since it’s in the headlines. Anthropic is not calling for a pause on AI. They’re asking for the ability to coordinate one later, the machinery that would let several labs in several countries stop together and check that the others actually stopped. They’re blunt that one company pausing alone does almost nothing but hand the lead to someone less careful. If someone tells you “even Anthropic wants to stop,” they’ve got it backward.
What to do Monday morning
Three moves. None of them needs a technology budget.
Find your review bottleneck before you buy a single tool. Name who reviews AI-assisted work, count the hours they really have, and be honest about whether they can keep up. Speeding up the front of the line while the back stays fixed just builds a faster pile-up.
Rebuild the apprenticeship on purpose. If associates used to earn their judgment by doing the work a machine now does, design a path to that judgment deliberately. It won’t happen by osmosis anymore.
Trace where your fees attach. Split your work by how much is execution and how much is judgment, then ask what’s left worth charging for when execution runs close to free, and where your name, and your liability, actually sit.
The bottom of the pyramid is hollowing out, and that part isn't really up for debate anymore. What's still open is whether you redesign the top on your own terms, before the cheap execution does it for you. If you want a second set of eyes on where your firm sits, reach me at steve@intelligencebyintent.com. The firms that move first won't be the ones with the biggest tech budget. They'll be the ones who saw the shape early.


