Hacker News
new
past
comments
ask
show
jobs
points
by
turova
3 hours ago
|
comments
by
nabakin
16 minutes ago
|
next
[-]
Are you running qwen3.6-27b on one 3090 with your KV cache at q4? Ime there is significant long-context recall accuracy degradation at that precision. I prefer putting the KV cache at q8 and working with the 120k context
reply
by
hypfer
2 hours ago
|
prev
|
[-]
That math (250k context, Q4 model, 24GB VRAM) only checks out at q4 quant for the K/V cache, which is probably not the best idea.
reply