Kinda like when restaurants make me pay for ketchup or a takeaway box, i get annoyed, just increase the compiled price.
If they ignored this then all users who don’t do this much would have to subsidize the people who do.
I completely agree that it’s infeasible for them to cache for long periods of time, but they need to surface that information in the tools so that we can make informed decisions.
They have a limited number of resources and can’t keep everyone’s VM running forever.
note: I picked the values from a blog and they may be innacurate, but in pretty much all model the KV cache is very large, it's probably even larger in Claude.
Total VRAM: 16GB
Model: ~12GB
128k context size: ~3.9GB
At least I'm pretty sure I landed on 128k... might have been 64k. Regardless, you can see the massive weight (ha) of the meager context size (at least compared to frontier models).A sibling comment explains: