Hacker News
new
past
comments
ask
show
jobs
points
by
saagarjha
8 hours ago
|
comments
by
zozbot234
7 hours ago
|
[-]
But text generation is quadratic
after
the KV cache optimization. If every decode step now has to recompute KV cache including its latest and most expensive tokens (even with a quick, "draft" model) that's even worse.
reply