undefined

points

[-]

Don't forget that you're also spending much more electricity because it takes so long to run inference.

[-]

I have been using Qwen3.5-9B-UD-Q4_K_XL.gguf on an 8GB 3070Ti with llama.cpp server and I get 50-60 tok/s.