Hacker News
new
past
comments
ask
show
jobs
points
by
karel-3d
4 hours ago
|
comments
by
zozbot234
4 hours ago
|
[-]
Waiting for official support in llama.cpp. There is a fork that can run a lightly quantized (Q2 expert layers) DeepSeek V4 Flash in 128GB RAM without offloading weight fetches from disk.
reply
by
karel-3d
3 hours ago
|
parent
|
[-]
Ouch. Can't run that on my M4 mac with 48GB RAM.
reply