undefined

upvote

points

by someotherperson10 hours ago |

upvote

by frank_nitti10 hours ago|

[-]

Interesting to see your observation where I have observed the opposite: posts that share big news about open-weight local models have many upvoted comments arguing local models shouldn’t be taken seriously and promoting the SOTA commercial models as the only viable options for serious developers.

Here and on AI tech subreddits (ones that aren’t specifically about local or FOSS) seem to have this dynamic, to the degree I’ve suspected astroturfing.

So it’s refreshing to see maybe that’s just a coincidence or confirmation bias on my end.

reply

upvote

by lxgr9 hours ago|

[-]

Both can be true at the same time. I currently wouldn't waste my time with open models for almost all use cases, but they're crucial from a data privacy and competitive perspective, and I can't wait for them to catch up enough to be as useful as the current frontier models.

reply

upvote

by organsnyder9 hours ago|

[-]

I've found qwen3 to be very usable on my local machine (a Framework Desktop with 128gb RAM). I doubt it could handle the complex tasks I throw at Claude Opus at work, but it's more than capable of doing a surprising number of tasks, with good performance.

reply

upvote

by dotancohen8 hours ago|

[-]

What tasks do you use qwen3 for? Coding? Are you running it on CPU or GPU? What GPU does that Framework have?

Thanks!

reply

upvote

by girvo8 hours ago|

[-]

I have an Asus GX10 that I run Qwen3.5 122B A10B on, and I use it for coding through the Pi coding agent (and my own); I have to put more work in to ensure that the model verifies what it does, but if you do so its quite capable.

It makes using my Claude Pro sub actually feasible: write a plan with it, pick it up with my local model and implement it, now I'm not running out of tokens haha.

Is it worth it from a unit economics POV? Probably not, but I bought this thing to learn how to deploy and serve models with vLLM and SGLang, and to learn how to fine tune and train models with the 128GB of memory it gets to work with. Adding up two 40GB vectors in CUDA was quite fun :)

I also use Z.ai's Lite plan for the moment for GLM-5.1 which is very capable in my experience.

I was using Alibaba's Lite Coding Plan... but they killed it entirely after two months haha, too cheap obviously. Or all the *claw users killed it.

reply

upvote

by jeremyjh5 hours ago|

[-]

GLM 5.1 is extremely good, and ridiculously cheap on their coding plan. Its far better than Sonnet, and a fifth of the cost at API rates. I don't know if the American providers can compete long-term; what good is it to be more innovative it only buys them a six month lead andthey can't build the data center capacity fast enough for demand? Chinese providers have a huge advantage in electrical grid capacity.

reply

upvote

by girvo4 hours ago|

[-]

True but Z.ai also just silently raised the price, and the entire Chinese frontier set is having to make profit now... hence Alibaba killing the Lite plan and not letting people sign up to their Pro one either; and why MiniMax has their non-commercial license, etc. etc.

So I agree with you, its better than Sonnet but way cheaper. I do wonder how long that will last though

reply

upvote

by fragmede2 hours ago|

[-]

Z.ai does really well at the carwash question!

reply

upvote

by dotancohen8 hours ago|

[-]

Thank you. I've been using ollama for a much more modest local inference system. I'll research some of the things you've mentioned.

reply

upvote

by botanrice6 hours ago|

[-]

[dead]

reply

upvote

by organsnyder8 hours ago|

[-]

The Framework Desktop has a Ryzen 395 chip that is able to allocate memory to either the CPU or GPU. I've been able to allocate 100+gb to the GPU, so even big models can run there.

Most recently I used it to develop a script to help me manage email. The implementation included interacting with my provider over JMAP, taking various actions, and implementing an automated unsubscribe flow. It was greenfield, and quite trivial compared to the codebases I normally interact with, but it was definitely useful.

reply

upvote

by dotancohen8 hours ago|

[-]

That's great. Ostensibly my system could also allocate some of the 32 GB of system memory to argument the 12 GB VRAM, but I've not been able to get it to load models over 20B. I should spend some more time on it.

reply

upvote

by bloppe9 hours ago|

[-]

I'm just waiting till I can afford a GPU again

reply

upvote

by nl5 hours ago|

[-]

I've invested significant time into getting open models to work, and investigating what works well.

The TL;DR is that unless you are doing it as a hobby or working in an environment where none of the data privacy options supported by Anthropic/OpenAI (including running on Azure/Bedrock with ZDR) work for you then it's not worth it.

The best open models are around the Sonnet 4.6 level. That's excellent, but the level of tasks you can give to GPT 5.4 or Opus 4.6 is just so much higher it doesn't compare (and Opus 4.7 seems noticeably better in my few hours of testing too).

I have my own benchmarks, but I like this much under-publicized OpenHands page: https://index.openhands.dev/home

It shows for every task they test closed models do the best. The closest and open model gets is Minmax 2.7 on issue resolution where it's ~1% worse than the leaders.

That matches my experience - fine for small problems, but well behind has the task gets bigger.

reply

upvote

by 10 hours ago|

[-]

deleted

reply

upvote

by echelon5 hours ago|

[-]

> Interesting to see your observation where I have observed the opposite: posts that share big news about open-weight local models have many upvoted comments arguing local models shouldn’t be taken seriously and promoting the SOTA commercial models as the only viable options for serious developers.

When I argue this, my point is that FOSS shouldn't target the desktop with open weights - it should target H200s. Really big parameter models with big VRAM requirements.

Those can always be distilled down, but you can't really go the other way.

reply

upvote

by whymememe10 hours ago|

[-]

I agree but I’d like to add that people are definitely falling for PR, people are always falling for PR or no one would bother with PR

reply

upvote

by dmix2 hours ago|

[-]

This assumes people are in touch with reality and aren't just motivated by vibes and insta-reactions on social media

reply

upvote

by daveguy9 hours ago|

[-]

> Competition is bad? Who cares - let the big players subsidize and compete between each other.

Subsidizing is the opposite of competing. It's literally the practice of underpricing your product to box out competition. If everyone was competing on a level playing field they would all price their products above cost.

All these tech oligarch asshat companies need to be regulated to hell and back.

reply

upvote

by ipaddr9 hours ago|

[-]

The moat was already too large for smaller players. Let them subsidize. Take from investors and give to us buying me time to beef up my local stack to run local models.

For many things now you need to go local and in the future if you want any privacy you'll need to go local.

reply

upvote

by daveguy9 hours ago|

[-]

Excellent point, but I still think the oligarchs have gotten a little monopoly-happy.

reply

upvote

by agentifysh8 hours ago|

[-]

What's the alternative, move to North Korea ?

reply

upvote

by daveguy7 hours ago|

[-]

Well, that's a great big wtf out of left field.

reply

upvote

by 7 hours ago|

[-]

deleted

reply

upvote

by watwut10 hours ago|

[-]

Big players subsidizing is what kills medium and small players which then kills competition. What follows is monopoly.

Big players operating at loss to distort the market is not a good thing overall.

reply

upvote

by someotherperson9 hours ago|

[-]

The medium and small players are literally just distilling the larger models.

It's not the smaller players spending billions on training data.

reply

upvote

by sofixa3 hours ago|

[-]

No, the medium and small players are the Mistals, DeepSeek and H Company of the world, with their own models using quirky optimisation techniques to be able to compete.

reply

upvote

by badrequest3 hours ago|

[-]

It's hilarious how much this post reads as drafted by an LLM. The emdash, "it's not X, it's Y" framing, incredible.

reply

upvote

by someotherperson2 hours ago|

[-]

I wrote my post myself.

reply

upvote

by sph3 hours ago|

[-]

Dogfooding by the slop factory. The artificial centipede.

reply