undefined

points

[-]

Good. It's hard to overstate how nervous most executives are about relying on cloud-based providers.

AI currently works basically by sending your entire codebase and workflow, and internal communication over the internet to some third party provider, and your only protection is some legal document say they pinky promise they won't train on your data.

And said promise is made by people whose entire business model relies on being able to slurp up all the licensed content on the internet and ignore said licensing, on the defense of being too big to fail.

by zozbot2344 hours ago|

parent|

[-]

Yes, this is the most straightforward argument for local AI inference. "Why buy cloud-based SOTA AI? We have SOTA AI at home." It's great that DeepSeek may now be about to make this possible, once the support in local inference frameworks is up to the task.

by adonese7 hours ago|

prev|

[-]

Is there any place I can read about KV? Excuse my ignorance as I'm not familiar with this topic and I read scattered notes that deepseek's cost are well optimized due to how their kv cache work. But I want to read more how kv cache relates to the inference stack and where does it actually sit.

> AIUI, people are even experimenting with offloading the KV cache itself to storage, which may unlock this batching capability even beyond physical RAM limits as contexts grow.

Especially this point. Any reason that this idea was considered bad? Is it due to the speed difference between the GPU VRAM to the RAM?

by zozbot2347 hours ago|

parent|

[-]

KV cache generally grows linearly with your current context; it gets filled-in with your prompts during prompt processing, and newly created context gets tacked on during token generation. LLM inference uses it to semantically relate the currently-processed token to its pre-existing context.

> Any reason that this idea was considered bad?

Because the KV cache was too big, even for a small context. This is still an issue with open models other than DeepSeek V4, though to a somewhat smaller extent than used to be the case. But the tiny KV of DeepSeek V4 is genuinely new.