I'm not accusing anyone specifically, but I've noticed Chinese bots swamping certain YouTube channels that, for example, cover US defense industry news. They'll downplay any and all technical advances, play up China's dominance, US cowardice, etc. All very transparent. I suspect some of the online conversation about open Chinese models is driven by that. How often do you see people talking about Mistral or Trinity? Never. Because they don't play that game.
openai, google and anthropic subscriptions are not available with privacy.
looking at the link there it's interesting that going from cursor cli to codex cli take gpt 5.5 from 7th to 3rd. but they didn't do open model in codex.
so, hard to say it's for sure a model benchmark. maybe open models are just shit at swe agent harness...it's not the most parsimonious explanation though.
Unless you're running it locally, aren't you just trusting some other entity?
it's not a recommendation, its an option. if you don't have capital then it doesn't apply to you and move on. it wasn't an option for even people with capital.
come back in a few years when its more accessible
additionally I like that there are providers with faster special purpose processors for faster tokens/sec, all at different pricing strategies
so just pick something that matches your personal risk tolerance
however the legal terms are different, openai reads your data. they store it for 30 days, but of course once it hits the disk you can keep as long as you like in a civil case like nyt v openai.
the same for google and anthropic. so, it's not always nice if someone is paid to read your data for safety. people upload sensitive matters, personal videos and so on.
i wouldn't prioritise it myself but you can also know that the data will all come out in discovery if you are in a legal issue. maybe that's not important, but people thought it did matter to give some protections to patient records, legal advice and therapy. you upload that to gpt and it goes into discovery.
Fable 5 is cool and all, but we have not yet seen GPT-5.6.
It's easily 4x the cost of DeepSeek V4 but I didn't actually feel the results were that much better. I had GPT 5.5 in Codex review it after it was done and there was plenty of slop to go around.
Having better luck with MiniMax M3, from a cost/benefit ratio.
With a good harness, that's my favorite model for any personal project. I use Opus 4.8 at work because i don't have to pay for it and of course I love it, but DeepSeek is like 80% there for one tenth of the price.
GPT can find fault in everything and anything including its own work.
Code is somewhat artistic. If you don't have well defined standards and priorities, the AI review cycle can spiral infinitely figuratively debating what makes art good, and your code will be no better for it.
This makes it slower to work with for prototyping, and it will, if not properly disciplined, litter your code with "legacy adapters" and "bridge code" and temporary incremental refactoring steps [arguably not terrible for work in real commercial software projects]. And it will create too many unit & integration tests, if you're not careful.
But it does, in my opinion, tend to produce more reliable software and I trust it far more than I did when I was working in Claude.
When I could afford it, I had both plans running, Claude to produce new features, and then Codex to brutally critique it battle test it, sharpen the edges, and produce better tests, and this flow went extremely well.
Now I just work with Codex and various open models.
Somehow it's just way more careful than the others, and also much better at empirical verification of its hypothesis, writing tests, etc. I am assuming a lot of RL done on that kind of flow, and on seeking out negative cases, failure points, race conditions.