High-Fidelity KV Cache Summarization Using Entropy and Low-Rank Reconstruction

upvote

High-Fidelity KV Cache Summarization Using Entropy and Low-Rank Reconstruction

(jchandra.com)

14 points

by jchandra2 days ago |

upvote

by vivahir2152 days ago|

[-]

Interesting Approach. Curious about the latency tradeoff: OLS + SVD are much heavier than Top-K.Have you benchmarked end-to-end inference latency?

reply

upvote

by jchandra2 days ago|

[-]

[dead]

reply

upvote

by jchandra2 days ago|

[-]

[dead]

reply