You are free to do you. But you were asking about why others want the best model.
The answer is, clearly, agentic coding (ie multiple agents each cranking through tasks independently) lets you ship A LOT more business value if used correctly.
And hey, don't get me wrong, you can get pretty far with just prompting. But the subtle misses and (I'm looking at you GPT) the overengineered 20k line PRs to do a simple thing are going to cost you a lot if you're not vigilant.
I don't think anyone is stopping you. This is an entirely valid way of working.
I for one am glad to leave that behind me. The sooner I never have to write another line of code the better (professional software engineer for nearly 30 years here, for context).
I am still struggling how to deal with sub agents and different roles for each model. I still think Claude or Codex are overall better models, but everything around them transpires such weird vibes, including, and this one kills me, that at certain times they feel like dumbed down.
I keep changing these things often, but I have basic subscription to codex (20$ plan) which I use with GLM 5.2 to do some high level planning of what I intend to do, and then leave Deepseek do the coding. Or something along those lines.
Point is, GLM 5.2 is now at a point where I cannot tell you if it's better or worse. I can tell you however one thing: no matter when I use it, it's consistent in what it does and how it works.
Then there is the Fable thing, but as with many things, I think the past has distorted the reality. It lasted two days, but Anthropic said it clearly for plan users it would only be there for two weeks. It was great for doing what you can already do with other tools: doing all the planning, and reviews, and launching a million subagents talking to each other. I sometimes wonder if it was really a new model, or just Opus 4.9 wrapped with some fancy model driven harness.
As for Fable: I used it as much as I could while we had it.
It was a step change over Opus with my work.
I've had no trouble getting the current generation of smaller models to do the same thing. Maybe it's more of a harness issue than a model issue?
Recently I've used both MiniMax M3 and DeepSeek V4 Flash to one-shot moderately complex applications from a written spec, and neither one got lost along the way
Price and speed, for me. GLM5.2 is "good enough" for some tasks, but rather slow (on their coding plan). In the time it takes GLM to "read files to figure out...", gemini flash is usually finished. It's not SotA for coding, but it's fast and often "good enough" for normal tasks.
For Flash 3.5?
I'm a big fan of Gemini 3.1 Flash Lite Preview (yes that is the name..).
I keep a agentic SQL benchmark up to-date to test new models. It's more-or-less saturated above 23/25 but below that is still useful, and even at that level is good for comparing speed, cost and toke efficiency.
3.1 Flash Lite Preview scores 22/25 in 142 seconds for $0.02. That's a great result if you care about cost for performance.
3.5 Flash scores 20/25 in 367 seconds for $0.76. The slow speed is because it takes a lot of tokens to generate its results, so even if tokens are produced quickly it takes too many to get a positive result.
There's nothing I've seen or heard that indicates 3.5 Flash is better than this indicates.
https://sql-benchmark.nicklothian.com/?highlight=google_gemi.... vs https://sql-benchmark.nicklothian.com/?highlight=google_gemi... (click the cells to see the traces)