undefined

upvote

points

by jjice21 hours ago |

upvote

by SwellJoe20 hours ago|

[-]

The thing is, Opus 4.5 is where the model reached Good Enough, at least for a wide variety of problems I use LLMs for. Before that, I almost never thought it was a more productive use of my time to use AI for development tasks, because it would always hallucinate something that would waste a bunch of my time. It just wasn't a good trade.

But, if for some reason everything stopped at Opus 4.5 level and we never got a better model (and 4.6/4.7 are better, if only marginally so and mostly expanding the kind of work it can do rather than making it better at making web apps), we could still do a lot of real work real fast with Opus 4.5, and software development would never go back to everyone handwriting most of the code.

A model as good as Opus 4.5 (or slightly better according to the mostly easily gamed benchmarks) at a 10th the price is probably a worthwhile proposition for a lot of people. $100 a month, or more, to get Opus 4.7 is well worth it for a western developer...the time the lower-end models waste is far more expensive than the cost of using the most expensive models. For the foreseeable future, I'll keep paying a premium for the models that waste less of my time and produce better results with less prodding.

But, also, it's wild how fast things move. Open models you can run on relatively modest hardware are competitive with frontier models of two years ago. I mean, you can run Qwen 3.6 MoE 35B A3B or the larger Gemma 4 models on normal hardware, like a beefy Macbook or a Strix Halo or any recentish 24GB/32GB GPU...not much more expensive than the average developer laptop of pre-AI times. And, it can write code. It can write decent prose (Qwen is maybe better at code, Gemma definitely has better prose), they can use tools, they have a big enough context window for real work. They aren't as good as Opus 4.5, yet.

Anyway, I use several models at this point, for security and code reviews, even if Claude Code with Opus is still obviously the best option for most software development tasks. I'll give Qwen a try, too. I like their small models, which punch well above their weight, I'll probably like the big one, too.

reply

upvote

by Someone123421 hours ago|

[-]

If money is no object, then nothing else is worth considering if it isn't Codex 5.4/Opus 4.7/SOTA. But for many to most people, value Vs. relative quality are huge levers.

Even many people on a Claude subscription aren't choosing or able to choose Opus 4.7 because of those cost/usage pressures. Often using Sonnet or an older opus, because of the value Vs. quality curve.

reply

upvote

by dd8601fn21 hours ago|

[-]

Also us weirdos with local model uses. But your point stands.

reply

upvote

by seplite21 hours ago|

[-]

Unfortunately, like with the release of Qwen3.6-Plus, this model also isn’t released for local use. From the linked article: “Qwen3.6-Max-Preview is the hosted proprietary model available via Alibaba Cloud Model Studio”

reply

upvote

by zozbot23421 hours ago|

[-]

The Max series was never available for local use, though. So this is expected.

reply

upvote

by dd8601fn17 hours ago|

[-]

Sure, not plus or max. I just use their lesser moe ones locally (that would never come close to massive sota models) all the time.

reply

upvote

by CamperBob221 hours ago|

[-]

Cost may or may not be a factor in my choice of model, but knowing the capabilities and knowing they will remain consistent, reliable, and available over time is always a dominant consideration. Lately, Anthropic in particular has not been great at that.

reply

upvote

by jpfromlondon20 hours ago|

[-]

anecdotally the quality of output isn't significantly different, the speed seems to be what you're really paying for, and since the alternative is free I'll stick to local.

reply

upvote

by paprikanotfound19 hours ago|

[-]

What are the best models to run locally?

reply

upvote

by jpfromlondon4 hours ago|

[-]

right now Gemma 4 and Qwen 3.6, I've found the latter to have the slight edge but your results may vary.

reply

upvote

by elAhmo19 hours ago|

[-]

Codex 5.4 is not out?

reply

upvote

by wahnfrieden21 hours ago|

[-]

Codex subscription is very generous at pro tiers

reply

upvote

by oidar21 hours ago|

[-]

Opus 4.6 performance has been so wildly inconsistent over the past couple of months, why waste the tokens?

reply

upvote

by vidarh20 hours ago|

[-]

When Sonnet 4.6 was released, I switchmed my default from Opus to Sonnet because it was about en par with Opus 4.5. While 4.6 and 4.7 are "better", the leap is too small for most tasks for me to need it, and so reducing cost is now a valid reason to stay at that level.

If even cheaper models start reaching that level (GLM 5.1 is also close enough that I'm using it at lot), that's a big deal, and a totally valid reason to compare against Opus 4.5

reply

upvote

by jasonjmcghee20 hours ago|

[-]

Wow I couldn't disagree more.

For me, Opus 4.5 and 4.6 feel so different compared to sonnet.

Maybe I'm lazy or something but sonnet is much worse in my experience at inferring intent correctly if I've left any ambiguity.

That effect is super compounding.

reply

upvote

by hirako200021 hours ago|

[-]

You compare with what's most comparable.

In any case a benchmark provided by the provider is always biased, they will pick the frameworks where their model fares well. Omit the others.

Independent benchmarks are the go to.

reply

upvote

by 21 hours ago|

[-]

deleted

reply

upvote

by culi19 hours ago|

[-]

Opus 4.6 was released in February. It can take quite some time to run all these benchmarks properly

reply

upvote

by alex_young21 hours ago|

[-]

Quite some time is a little over 2 months. I understand this is actually true right now, but it’s still a bit hard to accept.

reply

upvote

by cute_boi19 hours ago|

[-]

Comparing it with Opus 4.6 is difficult, since Anthropic may ban accounts and accuse users of state-sponsored hacking.

reply

upvote

by 20 hours ago|

[-]

deleted

reply

upvote

by bluegatty20 hours ago|

[-]

I think its only been like 10 weeks. I meant that's forever in AI time, but not a long time in normie people time.

reply