undefined

upvote

points

by __jl__4 hours ago |

upvote

by strongpigeon4 hours ago|

[-]

> Google essentially only has Preview models! The last GA is 2.5. As a developer, I can either use an outdated model or have zero insurances that the model doesn't get discontinued within weeks.

What's funny is that there is this common meme at Google: you can either use the old, unmaintained tool that's used everywhere, or the new beta tools that doesn't quite do what you want.

Not quite the same, but it did remind me of it.

reply

upvote

by fhrow44844 hours ago|

[-]

https://static0.anpoimages.com/wordpress/wp-content/uploads/...

reply

upvote

by CactusBlue3 hours ago|

[-]

Reminds of Unity features

reply

upvote

by tymscar8 minutes ago|

[-]

I still remember the massive shift to SDRP and HDRP. Honestly, now in retrospect, almost a decade later, I think it was clearly done wrong. It was a mess, and switching over was a multi-week procedure for anything more than a hello world program, and what you got in return wasn’t something that looked better, just something that had the potential to.

Similar story with the whole networking stack. I haven’t used Unity in years now after it being my main work environment for years, but the sour taste it left in my mouth by moving everything that worked in the engine into plugins that barely worked will forever remain there.

Im sure its partly skill issue

reply

upvote

by yieldcrv3 hours ago|

[-]

Preview Road (only choice, and last preview was deprecated without warning)

reply

upvote

by goodmythical1 hours ago|

[-]

where's my nightly road?

Who knows, I might arrive before I depart.

reply

upvote

by peab1 hours ago|

[-]

such a great meme

reply

upvote

by madeofpalk2 hours ago|

[-]

oh is this about my workplace?

reply

upvote

by L-four3 hours ago|

[-]

Gmail was in beta for 5 years, until 2009.

reply

upvote

by metalliqaz3 hours ago|

[-]

"Gemini, translate 'beta' from Googlespeak to English."

"Ok, here is the translation:"

    'we don't want to offer support'

reply

upvote

by solarkraft3 hours ago|

[-]

Just like any Google product then.

reply

upvote

by cyanydeez3 hours ago|

[-]

Nah, it's "We dont want to provide a consistent model that we'll be stuck with supporting for a decade because it just takes up space; until we run everyone out of business, we can't afford to have customers tying their systems to any given model"

Really, the economics makes no sense, but that's what they're doing. You can't have a consistent model because it'll pin their hardware & software, and that costs money.

reply

upvote

by msikora1 hours ago|

[-]

I have a service that relies on NanoBanana Pro, but the availability has been so atrocious that we just might go back to OpenAI.

reply

upvote

by m_fayer3 hours ago|

[-]

My 5ish years in the mines of Android native back in the day are not years I recall fondly. Never change, Google.

reply

upvote

by jakub_g4 hours ago|

[-]

"Everything is beta or deprecated."

reply

upvote

by cyanydeez3 hours ago|

[-]

The business models of LLMs don't include any garuntee, and some how that's fine for a burgeoning decade of trillions of dollars of consumption.

Sure, makes total sense guys.

reply

upvote

by Aurornis3 hours ago|

[-]

> What a model mess! OpenAI now has three price points: GPT 5.1, GPT 5.2 and now GPT 5.4.

I don't know, this feels unnecessarily nitpicky to me

It isn't hard to understand that 5.4 > 5.2 > 5.1. It's not hard to understand that the dash-variants have unique properties that you want to look up before selecting.

Especially for a target audience of software engineers skipping a version number is a common occurrence and never questioned.

reply

upvote

by Melatonic2 hours ago|

[-]

Agreed - and its a huge step up from their previous naming schemes. That stuff was confusing as hell

reply

upvote

by __jl__2 hours ago|

[-]

I see your point. I do find Anthropic's approach more clean though particularly when you add in mini and nano. That makes 5 models priced differently. Some share the same core name, others don't: gpt 5 nano, gpt 5 mini, gpt 5.1, gpt 5.2, gpt 5.4. And we are not even talking about thinking budget.

But generally: These are not consumer facing products and I agree that someone who uses the API should be able to figure out the price point of different models.

reply

upvote

by jbonatakis2 hours ago|

[-]

Google is already sending notices that the 2.5 models will be deprecated soon while all the 3.x models are in preview. It really is wild and peak Google.

reply

upvote

by boringg2 hours ago|

[-]

Like building on quicksand for dependencies. I guess though the argument is that the foundation gets stronger over time

reply

upvote

by bethekidyouwant1 hours ago|

[-]

What dependancy could possibly be tied to a non deterministic ai model? Just include the latest one at your price point.

reply

upvote

by jbonatakis1 hours ago|

[-]

Well it’s not even performance (define that however you will), but behavior is definitely different model to model. So while whatever new model is released might get billed as an improvement, changing models can actually meaningfully impact the behavior of any app built on top of it.

reply

upvote

by 0xbadcafebee3 hours ago|

[-]

> or have zero insurances that the model doesn't get discontinued within weeks

Why are you using the same model after a month? Every month a better model comes out. They are all accessible via the same API. You can pay per-token. This is the first time in, like, all of technology history, that a useful paid service is so interoperable between providers that switching is as easy as changing a URL.

reply

upvote

by phainopepla23 hours ago|

[-]

If you're trying to use LLMs in an enterprise context, you would understand. Switching models sometimes requires tweaking prompts. That can be a complete mess, when there are dozens or hundreds of prompts you have to test.

reply

upvote

by bethekidyouwant1 hours ago|

[-]

This sounds made up. Much like “prompt engineering” Let’s hear an actual example

reply

upvote

by Koffiepoeder17 minutes ago|

[-]

We have an OCR job running with a lot of domain specific knowledge. After testing different models we have clear results that some prompts are more effective with some models, and also some general observations (eg, some prompts performed badly across all models).

Sample size was 1000 jobs per prompt/model. We run them once per month to detect regression as well.

reply

upvote

by gwd47 minutes ago|

[-]

OK, so a while back I set up a workflow to do language tagging. There were 6-8 stages in the pipeline where it would go out to an LLM and come back. Each one has its own prompt that has to be tweaked to get it to give decent results. I was only doing it for a smallish batch (150 short conversations) and only for private use; but I definitely wouldn't switch models without doing another informal round of quality assessment and prompt tweaking. If this were something I was using in production there would be a whole different level of testing and quality required before switching to a different model.

reply

upvote

by 0xbadcafebee6 minutes ago|

[-]

[delayed]

reply

upvote

by mcint1 hours ago|

[-]

Enterprises moving slow, or preferring to remain on old technology that they already know how to work...is received wisdom in hn-adjacent computing, a truism known and reported for more than 3 decades (5 decades since the Mythical Man-Month).

Sounds like someone who's responsible, on the hook, for a bunch of processes, repeatable processes (as much as LLM driven processes will be), operating at scale.

Just in the open, tools like open-webui bolts on evals so you can compare: how different models, including new ones, perform on the tasks that you in particular care about.

Indeed LLM model providers mainly don't release models that do worse on benchmarks—running evals is the same kind of testing, but outside the corporate boundary, pre-release feedback loop, and public evaluation.

https://chatgpt.com/share/69aa1972-ae84-800a-9cb1-de5d5fd7a4...

reply

upvote

by mr-pink54 minutes ago|

[-]

sounds like job security. be careful what you wish for before you get automated

reply

upvote

by hobofan2 hours ago|

[-]

That's true only in theory, but not in practice. In practice every inference provider handles errors (guardrails, rate limits) somewhat differently and with different quirks, some of which only surface in production usage, and Google is one of the worst offenders in that regard.

reply

upvote

by CobrastanJorji3 hours ago|

[-]

> Google essentially only has Preview models.

It's really nice to see Google get back to its roots by launching things only to "beta" and then leaving them there for years. Gmail was "beta" for at least five years, I think.

reply

upvote

by FINDarkside2 hours ago|

[-]

Also, GCP Cloud Run domain mapping, pretty fundamental feature for cloud product, has been in "preview" for over 5 years now.

reply

upvote

by embedding-shape3 hours ago|

[-]

> OpenAI now has three price points: GPT 5.1, GPT 5.2 and now GPT 5.4.

I guess that's true, but geared towards API users.

Personally, since "Pro Mode" became available, I've been on the plan that enables that, and it's one price point and I get access to everything, including enough usage for codex that someone who spends a lot of time programming, never manage to hit any usage limits although I've gotten close once to the new (temporary) Spark limits.

reply

upvote

by beklein2 hours ago|

[-]

Not sure why you think Anthropic has not the same problems? Their version numbers across different model lines jump around too... for Opus we have 4.6, 4.5, 4.1 then we have Sonnet at 4.6, 4.5, and 4.1? No version 4.1 here, and there is Haiku, no 4.6, but 4.5 and no 4.1, no 4 but then we only have old 3.5...

Also their pricing based on 5m/1h cache hits, cash read hits, additional charges for US inference (but only for Opus 4.6 I guess) and optional features such as more context and faster speed for some random multiplier is also complex and actually quiet similar to OpenAI's pricing scheme.

To me it looks like everybody has similar problems and solutions for the same kinds of problems and they just try their best to offer different products and services to their customers.

reply

upvote

by selcuka1 hours ago|

[-]

With Anthropic you always have 3 models to choose from: Opus-latest, Sonnet-latest, and Haiku-latest, from the best/slowest to the worst/fastest.

The version numbers are mostly irrelevant as afaik price per token doesn't change between versions.

reply

upvote

by maxo9948 minutes ago|

[-]

Three random names isn't ideal. I'm often need to double check which is which. This is why we use numbers

reply

upvote

by echoangle33 minutes ago|

[-]

How are the names random?

https://en.wikipedia.org/wiki/Masterpiece

https://en.wikipedia.org/wiki/Sonnet

https://en.wikipedia.org/wiki/Haiku

They dropped the magnum from opus but you could still easily deduce the order of the models just from their names if you know the words.

reply

upvote

by dseravalli31 minutes ago|

[-]

They aren't random. Opus's are very long poems, haikus are very short ones (3 lines), sonnets are in between (~14 lines)

reply

upvote

by svachalek1 hours ago|

[-]

It's much more consistent. Only 3 lines, numbered 4.6, 4.6, and 4.5, and it's clear they're tiers and not alternate product lines. It wasn't until recently that GPT seems to have any kind of naming convention at all and it's not intuitive if every version number is a whole different class of tool.

The pricing is more complex but also easy, Opus > Sonnet > Haiku no matter how you tweak those variables.

reply

upvote

by biophysboy3 hours ago|

[-]

Wow, is that what preview means? I see those model options in github copilot (all my org allows right now) - I was under the impression that preview means a free trial or a limited # of queries. Kind of a misleading name..

reply

upvote

by snug1 hours ago|

[-]

Pretty common to call something that isn't ready a preview

reply

upvote

by awad2 hours ago|

[-]

Incredibly curious how Google's approach to support, naming, versioning etc will mesh with the iOS integration.

reply

upvote

by abustamam1 hours ago|

[-]

I mean, Google notoriously discontinues even non-beta software, so if your concern is that there's insurance that the model doesn't get discontinued, then you may as well just use whatever you want since GA could also get discontinued.

reply

upvote

by raincole3 hours ago|

[-]

They aggressively retire models, so GPT 5.1 and 5.2 are probably going to go soon.

reply

upvote

by hobofan2 hours ago|

[-]

In the Azure Foundry, they list GPT 5.2 retirement as "No earlier than 2027-05-12" (it might leave OpenAIs normal API earlier than that). I'm pretty certain that Gemini 3, which isn't even in GA yet will be retired earlier than that.

reply

upvote

by delaminator4 hours ago|

[-]

two great problems in computing

naming things

cache invalidation

off by one errors

reply

upvote

by rurban2 hours ago|

[-]

Biggest problem right now in computing:

Out of tokens until end of month

reply

upvote

by arthurcolle4 hours ago|

[-]

There is a lot of opportunity here for the AI infrastructure layer on top of tier-1 model providers

reply

upvote

by motoxpro3 hours ago|

[-]

This is what clouds like AWS, Azure, and GCP solve (vertex AI, etc). They are already an abstraction on top of the model makers with distribution built in.

I also don't believe there is any value in trying to aggregate consumers or businesses just to clean up model makers names/release schedule. Consumers just use the default, and businesses need clarity on the underlying change (e.g. why is it acting different? Oh google released 3.6)

reply

upvote

by arthurcolle2 hours ago|

[-]

Do the end users really care about the models at all, or about the effects that the models can cause?

reply

upvote

by m3kw93 hours ago|

[-]

thats how they had it for years, is a mess, but controlled

reply