upvote
An inference only platform selling good open weight model inference without the research overhead could capture a-lot of market for lower size model uses (haiky, gemeni flash). Diffusion-transformers and clever cashing can drop inference even lower, which is improving at a high rate.

The biggest reason large models are un-attainable for local applications is the lack hardware with large amount of unified/graphics memory (and the cost of the platforms that do). Once the memory slog goes back to normal and hardware manufacturers adapt to demand, we may see consumer hardware with large memory capacity effectively opening the door for slow but usable frontier model inference (assuming improvements in model efficiency and compute capacity)

At that point, inference becomes a race to the bottom. The large labs hope they can attain a leap in capability (which is increasingly looking bleak, with a average catch-up of just a few months) or market dominance through integration (integration in platforms and OS, exclusive deals with companies or governments).

For coding agents, i suspect no player will manage lock in enough market to enforce pricing much higher than the true inference cost, and catering to programmers becomes an unsustainable proposition. We will instead be further hit with a lot of AI integrated into our other tooling costs, such as GitHub, Microsoft suite, G-suite, forcing in AI functions as a value-ad into the total cost without giving the option to exclude them. (using their market position)

reply
AI may get so commoditized for certain use cases that you will not even be able sell inference at a profit. AI might be bundled in with other services, just like cursor bundles in their own AI model for auto complete with their editor. I.e. cameras might have AI for image recognition bundled in etc.
reply
Agreed, this is where google is really, really set up to win the market. They can combine gemini subscription with a moderately more expensive google workspace and steal MSFTs entire $50 billion enterprise productivity software market. MSFT is quickly trying to get copilot in a good enough state but without TPUs I think itll be tough for them to serve a good enough model at a price people will accept.
reply
I agree with all of this.

So my question remains the same: How are the players investing 100s of billions in buildout going to hope to make this back? Market capture looks bleak, inference looks like a race to the bottom. End users look like they could be beneficiaries. Where do the big boys go?

reply
The American big boys are hoping to create "labor as a service" rather than sell tools. You don't hire an accountant that uses Claude, you hire Claude and it just does everything, without the visibility of current agents. They'll need to make it remote and obfuscated to protect their secret sauce from distillation and reverse engineering. It'll be really expensive, and be focused on enabling rich business types and upper managers.
reply
Prices can go down while tokens sold increases so that profit increases. The labs number one goal right now is moving past software engineers so that every white collar worker in the country finds ai assistants indispensable. Speculation here but I think openAI/antrhopic api inference is insanely profitable, it just needs more volume to amortize the training costs.
reply
> Speculation here but I think openAI/antrhopic api inference is insanely profitable, it just needs more volume to amortize the training costs.

Well, they just rent their hardware, so I'm not so sure. But they'll both be public soon and we should get that breakout in their cost structures, somewhat.

reply