Fixed costs, exact model pinning, outage resistant, enshittification resistant, better security, better privacy, etc...
There are just so many compelling reasons to be on-prem instead of dependent on a 3rd party hoovering up all your data and prompts and selling you overpriced tokens (which eventually they MUST be, because these companies have to make a profit at some point).
If the only counterbalance is "well the api is cheaper than buying my own hardware"...
That's a short term problem. Hardware costs are going to drop over time, and capabilities are going to continue improving. It's already pretty insane how good of a model I can run on two old RTX-3090s locally.
Is it as good as modern claude? No. Is it as good as claude was 18 months ago? Yes.
Give it a decade to see companies really push into the "diminishing returns" of scaling and new models... combined with new hardware built with these workloads in mind... and I think on-prem is the pretty clear winner.
1/ https://github.com/google-gemini/gemini-cli/issues?q=is%3Ais...
It might be acceptable for some general tasks, but I haven’t EVER seen it perform well on non trivial programming tasks.
Has that BS stopped?