The Smartest Model Is the Wrong Question
If it ends up in a folder, hand it to an agent. If it's just a thought, stay in chat. And when your name's on the result, run it through both.
Chat or Agent: When to Hand AI the Work, and When to Just Ask
TL;DR: We’ve crossed from asking AI questions to handing it real work. The skill that matters now isn’t picking the smartest model. It’s knowing when to work in a chat window, when to work in an agent that touches your actual files, and when to put two models on the same job so one does the work and the other checks it. The rule I use: if the output ends up in a folder, use an agent. If the output is a thought, stay in chat. When the stakes are high, use both.
The copy-paste shuffle you already know
Here’s a workflow a lot of us fell into without noticing.
You run a piece of analysis in one model. Say a research summary in Claude. The answer’s good, but you’ve learned not to trust the first pass on anything that matters. So you copy the whole thing, the original question and the output, and you paste it into a second model. ChatGPT, usually. And you ask the obvious question: what did this miss?
It finds something. You paste that back into the first model. Refine. Then back again. Four or five round trips, two browser tabs, a lot of copying. By the end you’ve got a better answer than either model gave you alone. You were the wire between them. The relay runner carrying the baton back and forth by hand.
I did that dance for the better part of a year. Then it stopped being necessary.
From asking to handing off
Two things happened at once, and they’re easy to miss if you only ever open AI in a browser tab.
The first is that these tools stopped suggesting and started doing. For two years, AI handed you text and you did something with it. Now the agent versions act on your real work. Claude Code and Anthropic’s new Cowork can read a folder on your machine, write the files, run the steps, even take over your screen when there’s no cleaner path. OpenAI’s Codex can copy a project into a sandbox, work on it for an hour while you do something else, and hand back the finished change. You stop describing what you want done. You hand off the doing.
The second is that the manual relay I described got automated. I’ve wired my Claude and Codex setups together so one can call the other without me in the middle. I run analysis in Claude, and in the same command it sends the work to Codex for a second read and brings the feedback back. I’ve connected Google’s Gemini the same way, through its new agent interface, so I can pull a third model in when a question is worth it. The copy-paste shuffle that used to eat an afternoon is now one instruction. Three models on one problem, no tabs, no baton-passing.
That’s not a niche trick. It’s where the whole market is going. Codex is running about four million developers a week. Claude Code passed two and a half billion dollars in annualized revenue, more than doubling since January. Anthropic says the quiet part out loud: 2025 was the year this changed how developers work, and 2026 is the year it changes knowledge work. One industry analyst put the shift cleanly. The bar moved from generating content to executing workflows, and the three big AI labs are all racing to own the part that does the work.
So the question for anyone running a firm isn’t really “which model is smartest” anymore. It’s “where should this work happen, and who checks it.”
The split that sorts most of it
I’ve landed on a rule that holds up across almost everything I do. Two buckets.
If the durable output belongs in a folder, use an agent. A client summary built from forty PDFs. A spreadsheet. A deck. An updated matter file or knowledge base. Code. Anything you’ll save, version, and come back to. This is what Claude Code, Codex, and Cowork are built for. They work against the real state of your files instead of handing you a copy to paste somewhere.
If the output is mostly a thought, stay in chat. Advice. A first draft. Brainstorming. Turning a pile of research into something you can act on. A quick comparison. The language for a client conversation before it ever becomes a document. Claude on the web and ChatGPT are faster and more fluid here, and more comfortable when you’re thinking out loud or shaping prose line by line. My early-morning writing happens in a chat window, not a terminal. That’s the right tool for that job.
And when the stakes are high, use both. One model implements, the other reviews. This is the automated handoff I described, and it’s where the real edge is. Draft a firm policy in one model, have a second one tear it apart for gaps before it reaches the committee. The two together catch what either one alone would wave through.
Here’s the concrete version for a firm. Need to turn a stack of discovery documents into a structured summary you’ll file and reuse? That’s folder work. Hand it to an agent. Trying to decide how to frame your AI policy to a roomful of skeptical partners? That’s a thought. Stay in chat and think it through. Writing the actual policy that will govern the firm? High stakes. Draft it in one model, break it in another, then have a human read every line.
Where this bites
None of this is free, and the risks are exactly the ones a law firm should care about.
More power means more blast radius. An agent that can write files can write the wrong ones. Cowork runs in an isolated space on your machine and asks before it deletes anything, which helps. But the screen-control piece is still an early research preview, and Anthropic itself says don’t point it at anything touching client finances or health records yet. These tools have already had real security holes. Cowork shipped with a data-leak bug that surfaced days after launch. Codex had a critical flaw where a poisoned code branch name could hand an attacker your access token. For most people that’s a footnote. For you it’s confidentiality and privilege. Treat agent access to client data as a decision the firm makes on purpose, not a default you back into.
Two models agreeing is not the same as two models being right. A second opinion shrinks your blind spots. It does not replace judgment. Models can be confidently wrong together, in stereo, and a tidy consensus can talk you out of the doubt you should have kept. The human still owns the call.
And not everything needs the heavy machinery. Agent sessions burn through usage far faster than a chat does, and they take longer to set up. Half the time you don’t need a sandbox and a review loop. You need an answer. Reaching for the workbench when a question would do is its own kind of waste.
What to do Monday morning
You don’t need to rewire anything to get value from this. You need to start sorting.
Split your recurring work into two piles: the work that ends as a file and the work that ends as a thought. Run the file work in Claude Code, Codex, or Cowork. Keep the thinking in a chat window. That one sort decides most of your tool choices for you.
Take one deliverable that matters, an AI policy, a client memo, a valuation summary, and run it through two models on purpose. One to draft, one to tear it apart. Then a human signs off. See what the second pass catches that the first one missed.
Before any agent touches real files, set the rails. Decide which folders it can write to, what it has to ask before deleting, and draw a hard line: nothing client-confidential or privileged goes near an agent until your firm has vetted exactly how it’s handled.
The part that’s still yours
We spent two years getting good at asking AI better questions. That skill still matters. But it’s not the whole game anymore.
The tools can do the work now. They can check each other’s work. You can put three models on one problem before breakfast. What none of them can do is decide what “done” means, or whose name is on the result when it leaves the building. That part didn’t get automated. It got more important.
The relay runner can finally put the baton down. The call at the finish line is still yours.



