AI currently works basically by sending your entire codebase and workflow, and internal communication over the internet to some third party provider, and your only protection is some legal document say they pinky promise they won't train on your data.
And said promise is made by people whose entire business model relies on being able to slurp up all the licensed content on the internet and ignore said licensing, on the defense of being too big to fail.
> AIUI, people are even experimenting with offloading the KV cache itself to storage, which may unlock this batching capability even beyond physical RAM limits as contexts grow.
Especially this point. Any reason that this idea was considered bad? Is it due to the speed difference between the GPU VRAM to the RAM?
> Any reason that this idea was considered bad?
Because the KV cache was too big, even for a small context. This is still an issue with open models other than DeepSeek V4, though to a somewhat smaller extent than used to be the case. But the tiny KV of DeepSeek V4 is genuinely new.