Also, another difference is the stochastic nature of the LLMs. With table saws, CNC machines, and modern 3D printers, you kind of know what you are getting out. With LLMs, there is a whole chance aspect; sometimes, what it spits out is plainly incorrect, sometimes, it is exactly what you are thinking, but when you hit the jackpot, and get the nugget of info that elegantly solves the problem, you get the rush. Then, you start the whole bikeshedding of your prompt/models/parameters to try and hit the jackpot again.
But it's also a tool that (can) save(s) you time.
Absolutely, not understanding why you even ask. Humans are creatures of habits that often dip a bit or more into outright addictions, in one of its many forms.
This scenario obviously does not apply to folks who run their own benches with the same inputs between models. I'm just discussing a possible and unintentional human behavioral bias.
Even if this isn't the root cause, humans are really bad at perceiving reality. Like, really really bad. LLMs are also really difficult to objectively measure. I'm sure the coupling of these two facts play a part, possibly significant, in our perception of LLM quality over time.
I've cancelled my subscriptions to both Codex and Claude and am going to go back to writing my own code.
When the merry-go-round of cheap high quality inference truly ends, I don't want to be caught out.
"I think we can postpone this to phase 2 and start with the basics".
Meanwhile using more tokens to make a silly plan to divide tasks among those phases, complicated analysis of dependency chains, deliverables, all that jazz. All unprompted.
And it does seem likely to me that there were intermittent bugs in adaptive reasoning, based on posts here by Boris.
So all told, in this case it seems correct to say that Opus has been very flaky in its reasoning performance.
I think both of these changes were good faith and in isolation reasonable, ie most users don’t need high effort reasoning. But for the users that do need high effort, they really notice the difference.
We aren't superstitious, you are just ignorant.
Don't use these technologies if you can't recognize this, like a person shouldn't gamble unless they understand concretely the house has a statistical edge and you will lose if you play long enough. You will lose if you play with llms long enough too, they are also statistical machines like casino games.
This stuff is bad for your brain for a lot of people, if not all.
Some day maybe they will converge into approximately the same thing but then training will stop making economic sense (why spend millions to have ~the same thing?)