For a while there I had both Opus 4.6 and Codex access and I frequently pitted them against each other, I never once saw Opus come out ahead. Opus was good as a reviewer though, but as an implementer it just felt lazy compared to 5.4 xhigh.
One feature that I haven’t seen discussed that much is how codex has auto-review on tool runs. No longer are you a slave to all or nothing confirmations or endless bugging, it’s such a bad pattern.
Even in a week of heavy duty work and personal use I still haven’t been able to exhaust the usage on the $200 plan.
I’ll probably change my mind when (not IF) OpenAI rug pull, but for spring ‘26, codex is definitely the better deal.
The models and tools levelling out is great for users because the cost of switching is basically nil. I'm reading people ITT saying they signed up for a year - big mistake. A year is a decade right now.
Now with Opus 4.7 of course the “burden” of adjusting reasoning effort has been taken away from you even at the API level.
In my experience people don’t change the thinking level at all.
Codex is abysmal for UI design imo.
But if you go information architecture first and have that codified in some way (espescially if you already have the templates), then you can nudge any agent to go straight into CSS and it will produce something reasonable.
I still have their subscription, but am using pi now, mainly because something happened that made my opencode sessions unusable (cannot continue them, just blanks out, I assume something in the sqlite is fucked), and I cannot be bothered to debug it.
For what I use the agents, the Chinese models are enough
Plus I like being able to switch a model.
Anthropic models write much better code, they are easy to follow, reasonable and very close to what I would done if I had the time... OpenAI's on the other hand generate extremely complex solutions to the simplest problems.
I was so disappointed by non-Anthropic models, that for a couple of weeks I only used Anthropic models, but based on this thread, I'll go back and give it another try. It's good to go back and try things again every couple of weeks.
Of course, I was annoyed that they lobotomized 4.6, the difference was day and night, and Anthropic is certainly not a company I trust. In my opinion, it shows their willingness to rugpull, so I'm looking at other approaches. Since 4.7, things went back to normal, things you'd expect to work just work.
Had to stop because they don't like us proxying requests anymore.
Some negative signal for better overall view on things: I'm still with Anthropic and will probably stay with them for the foreseeable future.
I think after DoD/DoW shenanigans (which in of itself felt like a reasonable take on the part of Anthrpic) they got a bunch of visibility and new users, so them hitting some scaling limits is pretty much inevitable - so some service disruption is inevitable. Couple this with the tokenizer changes and seeming decrease in model performance (adaptive thinking etc.), and lots of people will be rightfully pissed off, alongside increased downtime (doesn't matter that much for me, definitely does matter for anything time-sensitive).
At the same time, in practice I've only seen it do stupid things across 8 million tokens about 5 times (confusing user/assistant roles, not reading files that should be obvious for a given use case, and picking trivially wrong/stupid solutions when planning things), alongside another 4 times that tests/my ProjectLint tool caught that I would have missed. The error rate is still arguably lower than mine, though I work in a very well known and represented domain (webdev with a bunch of DevOps and also some ML stuff, and integration with various APIs etc.).
At the same time, the 85 EUR they gave to me for free has been enough to weather the instability in regards to pricing changes and peak usage. They've fixed most of the issues I had with Claude Code (notably performance), and the sub-agent support is great and it's way better than OpenCode in my experience. They also keep shipping new features that are pretty nice, like Dispatch and Routines and Design, those features also seem nice and not like something completely misdirected, so that's nice. The Opus 4.7 model quality with high reasoning is actually pretty nice as well and works better than most of the other models I've tried (OpenAI ones are good, I just prefer Claude phrasing/language/approaches/the overall vibe, not even sure what I'd call it exactly, all the stuff in addition to the technical capabilities).
At the same time, if they mess too much with the 100 USD tier, I bet I could go to OpenAI or try out the GLM 5.1 subscription without too many issues. For now they're replacing all the other providers for me. Oh also I find the subscription vs API token-based payment approach annoying, but I guess that's how they make their money.
All these models and agents are shortcuts for all of us to be lazy and play games and watch YouTube or Netflix because we use them to work-less, well the party will be over soon.