It's just wishful thinking (and hatred towards American megacorps). Old as the hills. Understandable, but not based on reality.
the webgpu model in my browser on my m4 pro macbook was as good as chatgpt 3.5 and doing 80+ tokens/s
Local is here.
If it has something like 80GB of VRAM, it'll cost $10k.
The actual local LLM chip is Apple Silicon starting at the M5 generation with matmul acceleration in the GPU. You can run a good model using an M5 Max 128GB system. Good prompt processing and token generation speeds. Good enough for many things. Apple accidentally stumbled upon a huge advantage in local LLMs through unified memory architecture.
Still not for the masses and not cheap and not great though. Going to be years to slowly enable local LLMs on general mass local computers.
CC: Claude Code
TC: total comp(ensation)