undefined

points

[-]

We've had the great small Qwen 3.6 early April that many could actually run on their laptop. Then similar from Google a few weeks later (Gemma4, better in prose, worse in code). Then the super cheap large Deepseek V4 a few weeks later. Then antirez DS4 build that made that actually runnable on MacBooks and Mac Studios. And now the "near-frontier / near-Opus" GLM 5.2.

For people who follow open LLMs, none of these were quiet and all were the most interesting open model release for a few days/weeks. In one or two months, it will be some other model again. Now I do appreciate the real rapid improvements in open models. But there's also a ton of hype and fast-fashion around all of this.

by CuriouslyC8 hours ago|

parent|

[-]

The difference here is that those small models are impressive, but not super useful. Deepseek 4 is impressively cheap for the intelligence, but not reliable enough to daily drive unless your time has low value.

GLM passes a meaningful threshold of reliability/utility that puts it in a different category for real work. Just like Opus really took off after passing a threshold with 4.5. It's the first open model to do that.

by hnfong6 hours ago|

parent|

[-]

Qwen models are super useful for those running local.

And there are valid reasons to run local, even if performance (quality and speed) aren't best.

by epolanski11 hours ago|

prev|

[-]

To me DS 4 is still the most interesting due to much lower costs. Also DS 4 training isn't done yet.

From my Opus vs DS 4 Pro personal benchmarks, 16 different real-life work tasks, DS 4 has performed as well as Opus 4.8 high overall but with few drawbacks:

- on the 16 tasks, one needed several prompts to be steered back into the topic

- its review capabilities seem much worse

- DS4 had the cleanly better solution in 3 cases out of 16, with Opus "only" doing cleanly better 2 times out of 16. But still, I want to emphasize, is the worst case scenarios that imho matter the most, not the best ones, and on that front Opus outperformed.

That being said I spent less than 2$ of API working 4 days, which is more or less what I would've spent with Anthropic APIs for less than one task.