upvote
That's also a game changer for local inference. It unlocks long contexts, batched inference and storing the KV cache to disk on ordinary consumer platforms.
reply
Yes. The discount was most likely a "post-market trial" of how efficient the caching works for the new generation models.
reply
[dead]
reply