Hacker News
new
past
comments
ask
show
jobs
High-Fidelity KV Cache Summarization Using Entropy and Low-Rank Reconstruction
(jchandra.com)
14 points
by
jchandra
2 days ago
|
1 comments
by
vivahir215
2 days ago
|
next
[-]
Interesting Approach. Curious about the latency tradeoff: OLS + SVD are much heavier than Top-K.Have you benchmarked end-to-end inference latency?
reply
by
jchandra
2 days ago
|
parent
|
[-]
[dead]
reply
by
jchandra
2 days ago
|
prev
|
[-]
[dead]
reply