Hacker News
new
past
comments
ask
show
jobs
points
by
Yukonv
2 hours ago
|
comments
by
3abiton
35 minutes ago
|
next
[-]
To be fair, it's "possible" to run such setup with llama.cpp with ssd offload. It's just abysmal TG speeds. But it's possible.
reply
by
anemll
2 hours ago
|
prev
|
[-]
Check my repo, I had added some support for GUFF/untloth, Q3,Q5/Q8
https://github.com/Anemll/flash-moe/blob/iOS-App/docs/gguf-h...
reply