2 Comments
User's avatar
Ed Brandman's avatar

Steve - this is powerful stuff but I don’t see any mention of the token costs associated with this intensity of work. Claude Enterprise is 100% token based pricing now and OpenAI is headed there. Burning tokens with agents is a black box and expensive. How do you approach this issue?

Stephen Smith's avatar

Ed - really good question, and I should have addressed it in the piece.

I went and ran the math after reading your comment. For the 800-document review I described, here’s what it actually looks like.

A 30-page PDF runs about 20,000 tokens. So 800 documents is 16 million tokens of source material, plus the output the agent writes back. Add the overhead of an agentic workflow, where the agent re-reads, plans, and validates, and you’re looking at roughly 24 million tokens of total usage on a well-designed run.

At Sonnet 4.6 rates that lands around $75 to $80. With prompt caching and batch processing, both standard features, the same job comes in closer to $40.

Now compare that to the human first pass. 800 documents at five minutes each is 67 hours of associate or paralegal time. Even at a fully loaded $100 an hour that’s $6,700.

So the real comparison is something like $40 in tokens versus $6,700 in human time. The black box is real and the visibility needs to get better, no argument there. But on the actual matter economics, the token bill almost never moves the needle.

That being said - I think there’s a lot more conversation that we collectively need to have about LLMs and straight labor cost comparisons before we fully jump in and totally overhaul existing workflows.

Appreciate you pushing on it. It’s the right question.