undefined

points

[-]

On-premise LLMs are also getting better and likely won’t stop; as costs go up with the technical improvements, I would imagine cost saving methods to also improve

by horsawlarway7 hours ago|

parent|

[-]

I still think it's basically unavoidable that most people who might pay for api access will end up on-prem.

Fixed costs, exact model pinning, outage resistant, enshittification resistant, better security, better privacy, etc...

There are just so many compelling reasons to be on-prem instead of dependent on a 3rd party hoovering up all your data and prompts and selling you overpriced tokens (which eventually they MUST be, because these companies have to make a profit at some point).

If the only counterbalance is "well the api is cheaper than buying my own hardware"...

That's a short term problem. Hardware costs are going to drop over time, and capabilities are going to continue improving. It's already pretty insane how good of a model I can run on two old RTX-3090s locally.

Is it as good as modern claude? No. Is it as good as claude was 18 months ago? Yes.

Give it a decade to see companies really push into the "diminishing returns" of scaling and new models... combined with new hardware built with these workloads in mind... and I think on-prem is the pretty clear winner.

by bigbinary6 hours ago|

parent|

[-]

These big players don’t have as big of a moat as they like to advertise, but as long as VC wants to subsidize my agents, I’ll keep paying for the $20 plan until they inevitably cut it off

by kakugawa7 hours ago|

prev|

[-]

gemini-cli has not been useable for weeks. The API endpoint it uses for subscription users is so heavily rate-limited that the CLI is non-functional. There are many reports of this issue on Github. [1]

1/ https://github.com/google-gemini/gemini-cli/issues?q=is%3Ais...

by tasuki6 hours ago|

parent|

[-]

I use Gemini-CLI at work, and haven't noticed anything. I use Google Jules (free tier) on a toy project much more heavily and can't complain. I think sometimes the prompts take longer than they used to, but I couldn't care less. I'm not in a hurry.

by solarkraft5 hours ago|

prev|

[-]

Gemini better? What are y’all doing that it doesn’t crash and burn within the first minute of using it?

It might be acceptable for some general tasks, but I haven’t EVER seen it perform well on non trivial programming tasks.

by earlyriser8 hours ago|

prev|

[-]

Gemini is not better on the quotas: https://discuss.ai.google.dev/t/quota-limit-for-pro-plan/130...

by ikidd7 hours ago|

prev|

[-]

Last time I used Gemini I watched it burn tokens at three times the rate of any other models arguing with itself and it rarely produced a result. This was around Christmas or shortly after.

Has that BS stopped?

by DefineOutside7 hours ago|

parent|

[-]

It's still not uncommon for it to escape it's thinking block accidentally and be unable to end it's response, or for it to call the same tool repeatedly. I've watched it burn 50 million tokens in a loop before killing the chat.

by kaycey20226 hours ago|

parent|

prev|

[-]

No. It's still shit. It can do some well contained tasks, but it is very less usable on production codebases than gpt or claude models. Mainly because of the usage limits and the lack of good environments for us to use it on. Anthropic gets away with this because claude code, as bad as it is, is still quite functional. Gemini cli and antigravity are utter trash in comparison.