Hacker News
new
past
comments
ask
show
jobs
points
by
edg5000
13 hours ago
|
comments
by
valine
13 hours ago
|
[-]
Yup exactly, in principle it helps with both inference speed by reducing memory bandwidth usage and also reduces the memory footprint of your kvcache.
reply