Hacker News
new
past
comments
ask
show
jobs
points
by
Wowfunhappy
18 hours ago
|
comments
by
iamsaitam
6 hours ago
|
next
[-]
Don't forget that you're also spending much more electricity because it takes so long to run inference.
reply
by
ranger_danger
1 hours ago
|
prev
|
[-]
I have been using Qwen3.5-9B-UD-Q4_K_XL.gguf on an 8GB 3070Ti with llama.cpp server and I get 50-60 tok/s.
reply