undefined

points

[-]

Open weight models does not means you can run them on your laptop (except for the small ones). It means that someone independent (a cloud provider, another company ...) can build big computers that are capable ton run those models and provide you a metered usage.

At the end of the day, as a consumer, you still pay per token (or per something) to your provider, except you can chose from multiple providers with your own criteria. If you want to use DeepSeek v4 hosted in Europe, it's possible.

by datsci_est_201520 minutes ago|

parent|

[-]

In other words commodification, and no moat.

Which would also be an ideal outcome for people interested in avoiding a concentration of power and wealth due to access to generative AI.

by wahnfrieden7 hours ago|

prev|

[-]

No one is going to run models that are comparable to frontier locally without spending enormous sums for use at scale or in large orgs. Even with cheap RAM, you will still need a very large budget for frontier-level capability.

Open models that are competitive with frontier will be used on shared hosts.

by jorvi6 hours ago|

parent|

[-]

Models have been capped out on training and (active) parameters a while ago, its tooling / harness that is making the big jumps in performance happen. And then you have things like DeepSeek with a pretty small KV cache.

And with the extreme chip shortages for the next two years, there's little appetite for even bigger models anyway.

Barring a breakthrough in scaling, the only direction the models can really go is smaller. Which will inevitably mean better performing local models for same chip budget.

by zozbot2345 hours ago|

parent|

prev|

[-]

> No one is going to run models that are comparable to frontier locally without spending enormous sums for use at scale

You can always run these models cheaper locally if you're willing to compromise on total throughput and speed of inference. For most end-user or small-scale business needs, you don't really need a lot of either.

by 9dev5 hours ago|

parent|

[-]

It would be awful if running models locally became the primary way of using LLMs. On dedicated servers sharing GPUs across requests, energy usage and environmental impact is way lower overall than if everyone and their mother suddenly needs beefy GPUs. It’s the equivalent of everyone commuting alone in their own car instead of a train picking up hundreds at once.

by zozbot2344 hours ago|

parent|

[-]

You can batch requests when running locally too, if you're using a model with low-enough requirements for KV-cache; essentially targeting the same resource efficiencies that the big providers rely on. This is useful since it gives you more compute throughput "for free" during decode, even when running on very limited hardware.

by duskdozer3 hours ago|

parent|

prev|

[-]

Maybe people would target their use more appropriately, then.

by amelius3 hours ago|

parent|

prev|

[-]

It's even more awful if the compute capital is owned by only a handful of players.