upvote
Grug says prompt caching just store KV-cache which is sequenced by token. Easy cut it back to just before edit. Then regenerate after is just like prefill but tiny.
reply
Maybe so, but pruning is still a useful feature.

If it hurts performance that much, maybe pruning could just hide the text leaving the cache intact?

reply