undefined

points

by organsnyder8 hours ago |

comments

by dotancohen8 hours ago|

[-]

What tasks do you use qwen3 for? Coding? Are you running it on CPU or GPU? What GPU does that Framework have?

Thanks!

by girvo8 hours ago|

parent|

[-]

I have an Asus GX10 that I run Qwen3.5 122B A10B on, and I use it for coding through the Pi coding agent (and my own); I have to put more work in to ensure that the model verifies what it does, but if you do so its quite capable.

It makes using my Claude Pro sub actually feasible: write a plan with it, pick it up with my local model and implement it, now I'm not running out of tokens haha.

Is it worth it from a unit economics POV? Probably not, but I bought this thing to learn how to deploy and serve models with vLLM and SGLang, and to learn how to fine tune and train models with the 128GB of memory it gets to work with. Adding up two 40GB vectors in CUDA was quite fun :)

I also use Z.ai's Lite plan for the moment for GLM-5.1 which is very capable in my experience.

I was using Alibaba's Lite Coding Plan... but they killed it entirely after two months haha, too cheap obviously. Or all the *claw users killed it.

by jeremyjh5 hours ago|

parent|

[-]

GLM 5.1 is extremely good, and ridiculously cheap on their coding plan. Its far better than Sonnet, and a fifth of the cost at API rates. I don't know if the American providers can compete long-term; what good is it to be more innovative it only buys them a six month lead andthey can't build the data center capacity fast enough for demand? Chinese providers have a huge advantage in electrical grid capacity.

by girvo4 hours ago|

parent|

[-]

True but Z.ai also just silently raised the price, and the entire Chinese frontier set is having to make profit now... hence Alibaba killing the Lite plan and not letting people sign up to their Pro one either; and why MiniMax has their non-commercial license, etc. etc.

So I agree with you, its better than Sonnet but way cheaper. I do wonder how long that will last though

by fragmede1 hours ago|

parent|

[-]

Z.ai does really well at the carwash question!

by dotancohen8 hours ago|

parent|

prev|

[-]

Thank you. I've been using ollama for a much more modest local inference system. I'll research some of the things you've mentioned.

by botanrice6 hours ago|

parent|

[-]

[dead]

by organsnyder8 hours ago|

parent|

prev|

[-]

The Framework Desktop has a Ryzen 395 chip that is able to allocate memory to either the CPU or GPU. I've been able to allocate 100+gb to the GPU, so even big models can run there.

Most recently I used it to develop a script to help me manage email. The implementation included interacting with my provider over JMAP, taking various actions, and implementing an automated unsubscribe flow. It was greenfield, and quite trivial compared to the codebases I normally interact with, but it was definitely useful.

by dotancohen8 hours ago|

parent|

[-]

That's great. Ostensibly my system could also allocate some of the 32 GB of system memory to argument the 12 GB VRAM, but I've not been able to get it to load models over 20B. I should spend some more time on it.