They're not really cheaper than the SOTA open models on third-party inference platforms, and generally dumber. I suppose they're still worth it if you must minimize latency for any given level of smarts, but not really otherwise.
Similarly gemini 3.1 flash lite got more expensive than gemini 2.5 flash lite.
What's the point of a crazy cheap model if it's shit ?
I code most of the time with haiku 4.5 because it's so good. It's cheaper for me than buying a 23€ subscription from Anthropic.
So, every single time, the new model works most of the time?