upvote
Anthropic is definitely gaining ground over OpenAI in the business world. Cowork is the absolute hotness right now, and even prompted MSFT to drop their own variant yesterday
reply
Ask anybody you know that works in Big Tech. They're all pushing hard for Claude Code adoption.
reply
Codex and Gemini CLI seem 1-2 months behind Claude Code. They will catch up. This race will eventually be won by whoever can come up with the cheapest compute.
reply
And that's a dangerous game because the cheaper compute gets, the more likely consumers are to self-host rather than pay a subscription.
reply
Apple could figure out a way to neatly package it into their ecosystem.
reply
Not really. Most people won't self host.
reply
The general public will self-host it's built in to your next phone or laptop straight out of the box or maybe from the App Store.
reply
I agree that that's what it would take, but compute would need to get very cheap for it to be feasible to keep models running locally. That's an awful lot of memory to have just sitting with the model running in it.
reply
True. I was thinking more of power users. Do you think Opus level capabilities will run on your average laptop in a year? I think that's pretty far away if ever.
reply
You can demonstrate "running" the latest open Kimi or GLM model on a top-of-the-line laptop at very low throughput (Kimi at 2 tok/s, which is slow when you account for thinking time) today, courtesy of Flash-MoE with SSD weights offload. That's not Opus-like, it's not an "average" laptop and it's not really usable for non-niche purposes due to the low throughput. But it's impressive in a way, and it does give a nice idea of what might be feasible down the line.
reply
> how impossible is a world where open source base models are collectively trained similar to a proof of work style pool

Current multi-GPU training setups assume much higher bandwidth (and lower latency) between the GPUs than you can get with an internet connection. Even cross-datacenter training isn't really practical.

LLM training isn't embarrassingly parallel, not like crypto mining is for example. It's not like you can just add more nodes to the mix and magically get speedups. You can get a lot out of parallelism, certainly, but it's not as straightforward and requires work to fully utilize.

reply
It's hard to train models in the open. All the big players are using lots of "dodgy" training data. Like books, video, code, destinations. If you did that in the open, the lawyers would shut you down.
reply
Though I think these companies are wildly overvalued, I don't see LLMs as a service going away in the future. The value in OpenAI is that it provides extra compute, data access, etc. My money is on local AI becoming more of a thing, while services like OpenAI still exist for local AIs to consult with. If a local model can somehow know that it's out of it's depth on a question/prompt, it can ask an OpenAI model if it's available, but otherwise still work locally if OpenAI fails to respond or goes out of business. To me that makes a lot more sense than the future being either-or.
reply
Models not being able to reliably know if they are out of their depth is a foundational limitation of the currently generation of models, though.

Best they can do is to somewhat reliably react to objective signals that they've failed at something (like test failures).

reply
> What is their next step to ensure local models never overtake them?

As someone who experiments with local models a lot, I don’t see this as a threat. Running LLMs on big server hardware will always be faster and higher quality than what we can fit on our laptops.

Even in the future when there are open weight models that I can run on my laptop that match today’s Opus, I would still be using a hosted variant for most work because it will be faster, higher quality, and not make my laptop or GPU turn into a furnace every time I run a query.

reply
If your laptop overheats when you push your GPU, you can buy purpose-built "gaming" laptops that are at least nominally intended to sustain those workloads with much better cooling. Of course, running your inference on a homelab platform deployed for that purpose, without the thermal constraints of a laptop, is also possible.
reply
I didn't say it overheats. It gets hot and the fans blow, neither of which are enjoyable.

MacBook Pro laptops are preferred over "gaming" laptops for LLM use because they have large unified memory with high bandwidth. No gaming laptop can give you as much high-bandwidth LLM memory as a MacBook Pro or an AMD Strix Halo integrated system. The discrete gaming GPUs are optimized for gaming with relatively smaller VRAM.

reply
You can host a website on any rackmount server for pennies compared to AWS. But people still use AWS.

The market for local models is always gonna be a small niche, primarily for the paranoid.

reply
"The market for local models is always gonna be a small niche, primarily for the paranoid."

Have you ever heard of industrial espionage? Pr privacy regulations? Or military applications?

(Also the US military runs claude as a local model)

reply
>"But people still use AWS"

I do not, I self host. My current client is also got rid from AWS packing up nice savings as a result

reply