I would say that for a significant part of the current market open-source models are good enough to fill a part of it.
At least from my experience and friends of mine, we use OpenRouter for cases where we want to use smaller LLMs like Qwen, but when I've used ChatGPT and Claude, I use those APIs directly.
products entirely disappearing or significantly changing will be more and more common in the llm arena as things move forward towards companies shutting down, bubbles deflating, brand priorities drastically reshifting, etc...
i think, we're at or at least close to a time to really put some thought into which pieces of your flow could be done entirely with an open/local model and be honest with ourselves on which pieces of our flow truly needs sota or closed models that may entirely disappear or change. in the long run, putting a little bit of thought into this now will save a lot of headache later.
But with LLMs, how do you know switching from one to another won’t change some behavior your system was implicitly relying on?
I have no affiliation with DeepInfra. I use them, because they host open-source models that are good.
For direct user interaction or coding problems, perhaps. But as API calls get cheaper, it becomes more realistic to use them for completely automated workflows against data-sets, or as sub-agents called from expensive SOTA models.
For example, in Claude, using Opus as an orchestrator to call Sonnet sub-agents, is a popular usage "hack." That only gets more powerful, as the Sonnet equivalent model gets cheaper. Now you can spawn entire teams of small specialized sub-agents with small context windows but limited scope.
I did create my own MCP with custom agents that combine several tools into a single one. For example, all WebSearch, WebFetch, Context7 exposed as a single "web research" tool, backed by the cheapest model that passes evaluation. The same for a codebase research
Use it with both Claude and Opencode saves a lot of time and tokens.
Seems like a huge waste of money and electricity for processes that can be implemented as a traditional deterministic program. One would hope that tools would identify recurrent jobs that can be turned into simple scripts.
For example: "Here our dataset that contains customer feedback comment fields; look through them, draw out themes, associations, and look for trends." Solving that with a deterministic program isn't a trivial problem, and it is likely cheaper solved via LLM.
There are many simpler tasks that would work fine with a simpler, local model.
There are a lot of data science problems that benefit from running the dataset through an LLM, which becomes bottlenecked on per-token costs. For these you take a sample subset and run it against multiple providers and then do a cost versus accuracy tradeoff.
The market for API tokens is not just people using OpenCode and similar tools.
Coding is a rung on the ladder of model capability. Frontier models will grow to take on more capabilities, while smaller more focused models start becoming the economical choice for coding
The price is a concern too of course. But privacy is a bigger one for me. I absolutely don't trust any of their promises not to use data for training purposes.