upvote
For real production I find the switching cost is not as trivial as you portray. Even going to a new model version in the same model family, say GPT-4o to GPT-5.2, a transition I just finished on a not too complicated application, requires extensive retesting and tweaking of prompts, guardrails and parameters.
reply
I second this; even switching between minor versions of a model, you need to adjust prompts: the new model is better by implying a bunch of things that, when included in the prompt, will overdo that thing.

Assessing quality of output is often not trivial, either. Typically, problems that are solved by offloading something to an LLM are super subjective, and customers “feel” something is different is vulnerable.

We try to quantify output differences by many different similarity metrics. But a lot of energy goes into subjectively evaluating if something still works.

reply
We’re talking about SOTA models like Fable, though.

If you’ve got a product where the budget allows for Fable level token costs, I doubt you wouldn’t have the budget to run your evals again on a cheaper model if Fable was unavailable. I mean it wouldn’t even take that much token volume to turn it into a money saving proposition to do the engineering work to switch to a cheaper model.

Fable is primarily used for human in the loop tasks like coding or office work, not in some backend app unless the company has money to burn and doesn’t care about anything other than using the best model available at the time.

reply
Maybe OP meant switching in a coding harness way? Not an application using AI? I had similar issues like you in the latter case, but in the former it's trivial.
reply
if you’re building on LLMs you gotta have an eval and prompt iteration pipeline, and you ought to be evaling every model release — your competitors will do this, and your users will want the latest and greatest (for frontier tasks) and the cheapest/fastest. So you should already be paying this cost anyways. i guess it depends on your team size and scale but not building this muscle seems like not having continuous delivery for regular code or even like not having tests and ci to merge to main.
reply
SOTA models are typically used for interactive coding and other human in the loop work

> say GPT-4o to GPT-5.2, a transition I just finished on a not too complicated application

Neither of which is close to SOTA, because tasks like these are typically built on a cost conscious manner which tries to keep token costs in check.

I’m primarily responding to all of the commenters who are acting like nobody is going to use American SOTA models for anything because the government interfered with them for a couple weeks. It’s obviously not true, and I expect these models to be oversubscribed instead of avoided like some are claiming.

reply
Vendor diversity is a longstanding risk management principle. For it to work you need to invest in it as you build, not when the rug is pulled.
reply
Exactly!

Even if you won't be able to use some model tomorrow, you can still make money by using it today!

And in the age of limited compute, spiky workloads and constant outages, building a mechanism to fallback to a weaker model when your primary choice isn't available is smart anyway.

reply
For many, that fallback mechanism is simply called Cursor - soon to be owned by Elon Musk. Which opens up a similar but slightly different can of worms...
reply
Well, there are many alternatives to Cursor as well.
reply
> The switching costs of changing LLM providers is as low as it gets

Not trivial, you would need to do lots of evals and prompt tuning when you switch models.

imagine what happens when you optimize your agent skills to the current model, and new model starts breaking. you would need to have versioning for your skills, serving different skills based on the model while you do A/B testing

reply
> Not trivial, you would need to do lots of evals and prompt tuning when you switch models.

Couldn’t we just train smaller models to “translate” what the harness user wants to what the worker model expects? I mean, if models understand caveman, it seems like just a small stretch

reply
It's not switching costs, but trust.

There's no congress. There's no policy (they've been making noises about not allowing AI regulation and now they're not-regulating it like a child paying with an on/off switch). The law is whatever Dear Leader's mood is today. It overrides any contract you sign with private companies, and they roll over and take it, because that's how oligarchies work.

reply