Hacker News
new
past
comments
ask
show
jobs
points
by
DrBenCarson
12 hours ago
|
comments
by
canpan
12 hours ago
|
[-]
Llama.cpp with automatic offload to main memory. You can also use Ollama, it is easier, but slower.
reply
by
reverius42
7 hours ago
|
parent
|
[-]
For those who want a GUI, LM Studio does this too (with llama.cpp as the backend I think). I'm getting great (albeit slow) results with Qwen3.6-35B MoE on 8GB GPU RAM, 40GB system RAM.
reply