At the end of the day, as a consumer, you still pay per token (or per something) to your provider, except you can chose from multiple providers with your own criteria. If you want to use DeepSeek v4 hosted in Europe, it's possible.
Which would also be an ideal outcome for people interested in avoiding a concentration of power and wealth due to access to generative AI.
Open models that are competitive with frontier will be used on shared hosts.
And with the extreme chip shortages for the next two years, there's little appetite for even bigger models anyway.
Barring a breakthrough in scaling, the only direction the models can really go is smaller. Which will inevitably mean better performing local models for same chip budget.
You can always run these models cheaper locally if you're willing to compromise on total throughput and speed of inference. For most end-user or small-scale business needs, you don't really need a lot of either.