Your 800-Document Review Can Now Run While You Eat Lunch
Two AI agents can now do the grind work your associates used to lose nights to. Whether your firm should let them is the harder question.
Codex Catches Cowork. Two Real Agentic Tools, Not One.
You know the feeling. Partner drops a folder on your desk. Five hundred PDFs, give or take. Due Friday. Or it’s 4:47 on a Thursday and a new matter intake just hit the inbox and you’re looking at a conflict check that’s going to eat what’s left of the evening.
This is the kind of work AI is supposed to help with. But the chat window keeps choking on it.
That’s been changing for me over the last few months, first with Claude Cowork and now with ChatGPT Codex. I’ve been running both on the heavy work. Document-heavy review, multi-step analysis, anything that needs to touch hundreds of files or open a browser and sit there clicking for an hour. Cowork has been my workhorse for this kind of thing since the start of the year. The April Codex rebuild puts the two roughly on the same footing for agentic work, with a couple of architectural differences that matter in narrower lanes than the headlines suggest.
TL;DR
OpenAI rebuilt Codex on April 16. A week later they dropped GPT-5.5 into it. Codex now controls your Mac, runs its own browser, works across whole folders of files, and can chain together long tasks that would crash a normal chat window. If your firm is already running Claude Cowork, Codex isn’t a replacement. It’s a second serious agentic tool worth knowing about. The piece Codex has that Cowork doesn’t is a hosted browser inside the app itself, though that matters less for typical law-firm work than the marketing implies. Caveats: consumer ChatGPT is the wrong door for client work, computer use is Mac-only right now, pointing a browser agent at the big legal research vendors runs straight into their terms of service, and some of the new features are gated to specific plan tiers in ways you should check before promising your partners anything. Worth a serious pilot if you’re past the kicking-tires phase on either platform.
What actually changed
Two things landed almost on top of each other.
The April 16 update. OpenAI called it “Codex for (almost) everything,” which is marketing copy but turns out to be pretty accurate. Codex picked up computer use on Mac, where it can see, click, and type into your apps with its own cursor while you keep working in other windows. It got an in-app browser built on the Atlas tech, image generation, more than 90 plugins for things like Gmail and Calendar, and automations that can wake up across days or weeks to keep working on something.
GPT-5.5 dropped a week later. The pitch is that it’s better at multi-step tasks. Planning, using tools, checking its own work, carrying things through to completion without needing you to nudge it back on track every two prompts. Early users describe it as a research partner more than an answer engine. That matches what I’ve seen, mostly.
If you’ve been running Cowork, most of the desktop-agent picture should sound familiar. File access, plugins, multi-step automation, the ability to run for hours on something. Cowork has been doing all of that for a while. Both tools can also drive an external browser through computer use, and Cowork has the Claude in Chrome extension on top of that. So when I say Codex caught up, that’s what I mean. They’re in the same lane now.
The piece Codex has that Cowork doesn’t is its own hosted browser inside the app, where the agent can render pages and you can comment directly on a rendered element to tell it what to do. Useful for specific work, mostly frontend dev and review of pages that don’t need a login. Worth flagging: this in-app browser doesn’t handle authentication, cookies, browser extensions, or your existing tabs. Calling it a “browser” is technically right but the capability is narrower than the word implies. Took me longer than it should have to internalize that.
The bigger shift, taking both tools together, is harder to summarize neatly. AI agents are starting to do work, not just answer questions. A chat window is fine for “what does this clause mean.” Codex and Cowork are built for the next thing over: “go through this folder, tell me which contracts have non-standard indemnification, summarize the variations, and put it in a spreadsheet.”
Where this actually helps a law firm
Four real examples. None are demos. They’re things that take real billable hours and real associate sanity. Most work in either Codex or Cowork. I’ll flag the spots where the differences matter.
Reviewing a folder of 800 documents at once
You drop 800 PDFs into a project folder. You ask the agent to read every one, classify them by document type, pull out key dates and parties, flag anything that mentions a specific issue, and put the results in a spreadsheet with file paths so you can jump back to the source.
In a chat window this hits a wall. Context limits, file upload caps, you end up doing it in twenty rounds. In an agentic tool the agent reads the folder directly, walks through the documents in order, and writes results to a file as it goes. A 500-document review that used to take an associate a week of nights can run while you’re at lunch. You still review what comes back. Always. But the first pass is done by the time you finish your sandwich.
Probably where most firms will land first. The time savings are easy to see and the risk is easy to contain. Either tool handles this well.
Bates numbering without the night shift
The grind work. Stamping every page in a production set, tracking the ranges, building the index, organizing privileged versus produced versus withheld. Either tool can run a script across the folder, apply the numbers in sequence, generate the production log, and put a clean output folder together for transmission.
A 2,000-document set runs in about an hour. Clean log, no skipped numbers. A paralegal still verifies, but the verification is faster than the original work by a wide margin. Firms have been quietly paying overtime to get this kind of thing done for years.
Cite-checking a brief
The one that gets partners’ attention. Either tool can run a cite-check workflow by driving your real Chrome browser through computer use. The agent searches for the cases cited in a brief, checks whether each one exists, was quoted accurately, and is still good law, then builds you a memo with flags for the problems.
Codex’s in-app browser doesn’t help here, by the way, because it can’t handle logged-in pages. So on this specific workflow, Codex and Cowork are essentially even. I had originally assumed Codex would have the edge here. It doesn’t. The bigger question isn’t which agent does the work. It’s whether your subscription terms allow either of them to do it at all.
A real caveat before anyone runs this on Westlaw or Lexis: do not point a browser agent at a paid legal research subscription just because you have login credentials. The published terms I reviewed prohibit automated, robotic, scripted, or software-mediated access without written vendor approval, and they separately restrict scraping, bulk downloading, storage, third-party access, and AI-related use of the materials. The technical capability is there. The contract permission usually is not. Use vendor-approved AI features, licensed APIs, or get written approval from the vendor and your firm before automating anything. For first-pass research, use public or lower-cost legal databases only where their terms actually permit that use.
That paragraph is doing a lot of work. Read it twice.
The intake automation nobody’s demoing
This is the sleeper. Either tool can be set up as an automation that runs every morning. It checks a shared inbox for new matter intake forms, pulls names, parties, and adverse parties out, runs a first-pass search across your firm’s existing client list and document store, drafts a conflict memo, and puts it in your queue with a “review needed” flag.
Twenty to forty minutes per matter. Multiplied by every new matter your firm opens. A 200-matter-per-month firm is looking at a hundred or so hours a month of associate time that doesn’t have to be spent on intake friction. The associate still signs off. That part doesn’t change.
How this changes the work for associates
Some of this is uncomfortable to talk about. The first-year work that has historically been how associates learn to think like lawyers, document review, cite-checking, intake screening, that’s exactly the work agents are getting good at. This isn’t a Codex problem or a Cowork problem. It’s an agentic-AI-in-general problem.
I don’t think the answer is to stop using the tools. The answer is to get intentional about how junior lawyers learn judgment now. If the agent does the first pass, the associate has to do the harder, more interesting second pass. Deciding what’s actually important. Spotting what the agent missed. Explaining the result to the partner. That’s a real skill. It just isn’t the same skill we used to teach by volume.
If you’re a managing partner, this is a training problem you have to solve on purpose. It will not solve itself.
Risks worth raising with your tech committee
The big one is the contract you’re on. Consumer ChatGPT and the Business or Enterprise terms are not the same agreement. Consumer ChatGPT can train on your inputs unless you turn that off. Business and Enterprise have data protections by default. Same logic on the Anthropic side: consumer Claude and the Team or Enterprise Claude contracts are different documents. Whichever platform your firm uses, you need to be on the right contract for client work. Forget the price tier. The contract language is what matters.
There’s a related wrinkle worth surfacing. Some of the new features ship first on the consumer plans and reach the enterprise tiers later. Cowork’s computer use is currently a research preview on Pro and Max only, with Team and Enterprise not yet in. Codex memory and parts of the agentic stack are landing on Enterprise on a later schedule. So the plan that gives you the right contract may not currently be the plan with full feature access. That gap should resolve over time. Right now it means you sometimes can’t pilot on the same plan you’d want to deploy on, which is a less satisfying piece of advice than I’d like to give you.
Capability is uneven across platforms in other ways too. Computer use in Codex is Mac-only right now and rolling out to the EU and UK on a separate timeline. The Codex desktop app runs on Windows, but full computer use isn’t there yet on that side. Cowork has its own platform-specific quirks. Plan around them.
And these agents still make mistakes. They miss things, they occasionally hallucinate confidently, they sometimes generate output that looks right and isn’t. Whether you’re running Codex, Cowork, or both, the agent’s result on a document review is the start of the work, not the end. Build human review into your process and bill it that way.
What to do Monday morning
Check what version of ChatGPT and Claude your firm is on. If anyone’s paying out-of-pocket on consumer accounts and using them for client work, that’s the first thing to fix.
If you’re already on Cowork, you don’t need to switch. Add Codex to the pilot list as a second tool, mostly because having two real options is healthier than depending on one.
Pick one document-heavy task that doesn’t involve client confidential data and run it through one of these tools. Something you’d hand off to a temp without thinking about it. Learn the muscle on the low-stakes version first.
The work didn’t get easier. The math on who does it just changed.
Two real agentic tools is a fundamentally different conversation than one. It means you can pilot, compare, and match the tool to the work without betting the firm on a single vendor's roadmap. If you're sorting out which one belongs where in your practice, or thinking through the contract and associate-training questions that come with either, send me a note at steve@intelligencebyintent.com. The math on who does this work has changed. The decisions about how your firm adapts to that haven't been made yet.


