undefined

points

[-]

For a fraction of the price of 96GB vram, I built a desktop based on a supermicro server mobo and EPYC 9 series CPU, with just under 400GB rdimm ram (approx $4500 all in but this was before the ram price hike). Works really well for serving larger local modals at a decent enough speed (I consider anything more than 10 tokens/second usable and value accuracy over speed).

by dofm1 hours ago|

prev|

[-]

FWIW I think it might be both.

Ultimately if you skip over the opportunity to play with these models on your own machine you are losing out on a lot of really interesting educational opportunities — it helps make a lot of stuff feel more concrete in a way that only tinkering can.

But then I think once I had an idea of something that I was building against Gemma 4 or Qwen 3.6 I would be looking at openrouter etc., to stabilise it for the next tier of experimentation (and to get back a kind of multi-device access without tailscale/lm link etc.).

Are they good enough to replace what people seem to want to do with Claude? Maybe not. But it's an unparalleled learning opportunity.

by EagnaIonat1 hours ago|

prev|

[-]

Depends what you need the model to do. The recent granite4.1:3b just takes 2GB of memory and is fast. Results are pretty good and support tool calling. Barely a squeak out of the Mac laptop.

Even faster with the MLX builds.

Then when I need more heavy lifting I fire up a larger model.

IMHO the issue isn't the models. I've had OpenClaw give the same results as Claude using open models locally. Slower but does the job. Something that can do optimal model switching is what's needed.

by jtbaker1 hours ago|

prev|

[-]

> Trying to run them on a unified memory Mac

> but still not quite in the realm of Sonnet or DeepSeek 4 Flash

these are not mutually exclusive anymore. DS4 has set the bar for me these days. https://github.com/antirez/ds4

by wincy1 hours ago|

prev|

[-]

If I could just save up $6000 I could sell off my RTX 5090 for $4,000 and buy an RTX 6000 Blackwell Pro Workstation. I can fit models into the 32GB of vram but my context window ends up being tiny for any halfway capable model.

by layer847 minutes ago|

parent|

[-]

Isn’t the RTX 6000 Blackwell Pro Workstation over $13000 now?

by eek21212 hours ago|

prev|

[-]

Not really, Qwen 27b offloads to a decent gaming GPU (RTX 4090 in my case) without needing tons of RAM.

by mathisfun1232 hours ago|

parent|

[-]

can you give more info? llama.cpp vs vllm? config? i wanna try specifically this model