Hacker News
new
past
comments
ask
show
jobs
points
by
rsolva
1 hours ago
|
comments
by
zozbot234
1 hours ago
|
[-]
That's pretty nice actually, how much KV cache does that model require at full context? That tends to be the main limit to running concurrent requests locally, there's KV quantization but it has outsized negative impact on model quality.
reply