For anything else, I can't stand them, and it genuinely feels like I am interacting with different models outside of codex:
- They act like terribly arrogant agents. It's just in the way they talk: self-assured, assertive. They don't say they think something, they say it is so. They don't really propose something, they say they're going to do it because it's right.
- If you counter them, their thinking traces are filled with what is virtually identical to: "I must control myself and speak plainly, this human is out of his fucking mind"
- They are slow. Measurably slow. Sonnet is so much faster. With Sonnet models, I can read every token as it comes, but it takes some focusing. With GPT, I can read the whole trace in real-time without any effort. It genuinely gives off this "dumb machine that can't follow me" vibe.
- Paradoxically, even though they are so full of themselves, they insist upon checking things which are obvious. They will say "The fix is to move this bit of code over there [it isn't]" and then immediately start looking at sort of random files to check...what exactly?
- I feel they make perhaps as many mistakes as Sonnet, but they are much less predictable mistakes. The kind that leaves me baffled. This doesn't have to be bad for code quality: Sonnet makes mistakes which _might_ at points even be _harder_ to catch, so might be easier to let slip by. Yet, it just imprints this feeling of distrust in the model which is counter-productive to make me want to come back to it
I didn't compare either with Gemini because Gemini is a joke that "does", and never says what it is "doing", except when it does so by leaving thinking traces in the middle of python code comments. Love my codebase to have "But wait, ..." in the middle of it. A useless model.
I've recently started saying this:
- Anthropic models feel like someone of that level of intelligence thinking through problems and solving them. Sonnet is not Opus -- it is sonnet-level intelligence, and shows it. It approaches problems from a sensible, reasonably predictable way.
- Gemini models feel like a cover for a bunch of inferior developers all cluelessly throwing shit at the wall and seeing what sticks -- yet, ultimately, they only show the final decision. Almost like you're paying a fraudulent agency that doesn't reveal its methods. The thinking is nonsensical and all over the place, and it does eventually achieve some of its goals, but you can't understand what little it shows other than "Running command X" and "Doing Y".
On a final note: when building agentic applications, I used to prefer GPT (a year ago), but I can't stand it now. Robotic, mechanic, constantly mis-using tools. I reach for Sonnet/Opus if I want competence and adherence to prompt, coupled with an impeccable use of tools. I reach for Gemini (mostly flash models) if I want an acceptable experience at a fraction of the price and latency.
Oh I feel that. I sometimes ask ChatGPT for "a review, pull no punches" of something I'm writing, and my god, the answers really get on my nerves! (They do make some useful points sometimes, though.)
> On a final note: when building agentic applications, I used to prefer GPT (a year ago), but I can't stand it now. Robotic, mechanic, constantly mis-using tools. I reach for Sonnet/Opus if I want competence and adherence to prompt, coupled with an impeccable use of tools. I reach for Gemini (mostly flash models) if I want an acceptable experience at a fraction of the price and latency.
Yeah, this has been almost exactly my experience as well.
For example, I pass it a list of database collections and tools to search through them, ask a question that can very obviously be answered with them, and it responds with "I can’t tell yet from your current records" (just tested with GPT 5.4-mini).
But I've prodded it a bit more now, and maybe the model doesn't want to answer unless it can be very very confident of the answer it produces. So it's sort of a "soft refusal".
Would you mind expanding on this? Do you mean in the resulting code? Or a security problem on your local machine?
I naively use models via our Copilot subscription for small coding tasks, but haven't gone too deep. So this kind of threat model is new to me.
I don't use OpenCode but looks like it also triggered similar use. My message was similar but different.