Current multi-GPU training setups assume much higher bandwidth (and lower latency) between the GPUs than you can get with an internet connection. Even cross-datacenter training isn't really practical.
LLM training isn't embarrassingly parallel, not like crypto mining is for example. It's not like you can just add more nodes to the mix and magically get speedups. You can get a lot out of parallelism, certainly, but it's not as straightforward and requires work to fully utilize.
Best they can do is to somewhat reliably react to objective signals that they've failed at something (like test failures).
As someone who experiments with local models a lot, I don’t see this as a threat. Running LLMs on big server hardware will always be faster and higher quality than what we can fit on our laptops.
Even in the future when there are open weight models that I can run on my laptop that match today’s Opus, I would still be using a hosted variant for most work because it will be faster, higher quality, and not make my laptop or GPU turn into a furnace every time I run a query.
MacBook Pro laptops are preferred over "gaming" laptops for LLM use because they have large unified memory with high bandwidth. No gaming laptop can give you as much high-bandwidth LLM memory as a MacBook Pro or an AMD Strix Halo integrated system. The discrete gaming GPUs are optimized for gaming with relatively smaller VRAM.
The market for local models is always gonna be a small niche, primarily for the paranoid.
Have you ever heard of industrial espionage? Pr privacy regulations? Or military applications?
(Also the US military runs claude as a local model)
I do not, I self host. My current client is also got rid from AWS packing up nice savings as a result