I actually think that's still true and will continue to be true as long as someone else subsidizes the tokens. Once the "free money" runs out, things will get interesting.
We'll see how it winds up, but we could see models get licensed over half a dozen+ compute vendors, and then you pick your price/offering/features favorite.
The most I’ve ever spent in a month extra on API tokens for my own work is $200, and I pay for the $200/mo Claude. I use these models quite a lot, though not idly (I usually just walk around and do other stuff until I know how im going to approach the next set of problems). So it costs me about $3000/year to get as much as I want of the best model available. Already that seems low enough to not be worth stressing out too much about optimizing it, because it feels like an indisputable good value, and trying to save money with a less powerful model would be optimizing for a $1000-$2000 saving at the expense of a large portion of my work taking longer or being more frustrating and iterative.
That’s not a flex or anything, I get that in other countries $3000/yr is a lot of money for a software developer and also a lot of people would perhaps rationally be better off doing X% worse at work or spending Y% more time on tasks to save $Z, if their productivity improvements didn’t translate to more salary. Otherwise if your performance has more upside I really do think that the smartest models are better with the current pricing scheme. Deepseek and the other Chinese models spend a LOT of time thinking, and tend to be much more jagged (benchmaxxed) in performance. How can dealing with that over an entire year be worth $2k?
The only situation I can think of where sacrificing my own time/performance to save on inference is batch compute (of course, $1k vs $100k is different from $30 vs $3k) or work where the tier 2 models have crossed the “good enough” threshold. But I think Opus is not even close to that threshold generally yet. As it gets smarter I, and I think most others probably, just try to do harder things faster and hit the next wall.
Now, if they come back and tell me I can't spend as much om tokens, I'll have to change my strategy. But everything I'm hearing so far is we're going to be increasing our token spend this year and probably next year too. Not crazy increases but maybe enough to still keep using the latest models without being anxious about every prompt.
I've just recently started trying out DeepSeek 4 Flash and I was very skeptical at first because I've had some really good experiences with GPT-5.{4,5}, and couldn't possibly believe that this model they charge nothing for could give me similar results, but it absolutely shreds through things and ends up giving me very good answers in almost no time. I also like that it doesn't really seem to have much personality, it's given me mostly just facts and data so far without any additions to the prompt by me.
In my own agent I also specifically prompt to remove flowery language, snark, etc., but I haven't tried it with models like GPT-5.x which I've found has too much personality and tries to make it seem like I'm talking to a human too much.
I ask AI a lot of questions, not only about code but about my personal life, and I would be willing to pay very large sums to have the best quality output.
My Framework Desktop does a lot of similar work as my Claude subscription at work (Cowork, chats) for 100W of power draw and a little patience waiting for a slow GPU with limited memory bandwidth to crunch the numbers. Agentic coding is obviously weaker but CRUD development and visualization dashboards are within reach, and I'm usually pleasantly surprised at its ability to self-manage devops.
At my prior job there was still what felt like a strong enough correlation between my actual performance and my pay that I don't think I would have had a hard time justifying the expense there either; now I absolutely don't. With the current state of the models, it's baffling to me to hear about professional software developers planning their work around their $20/mo subscription's quotas.
Obviously it's more complicated than more tokens = more productive, but I see them less like SaaS and more like gasoline, where if I run out or need more to do what I'm doing, as long as I'm not being wasteful, I just buy more. Why would I waste a day walking 30 miles by foot when I can just pay $5 for gasoline and drive?
I've wasted over a hundred Euros re-doing work that was done badly due to the model not being up to task (Vue with TS + wrapper components around PrimeVue, needing to handle event and property passthrough and deal with the stupid Vue SFC issues, TS made this much worse than JS would be). I think it was the GLM model through Cerebras Code at the time, in addition to some GPT and Gemini models with the API pricing.
That said, DeepSeek V4 Pro is pretty good and I can totally see myself offloading some of the work, as long as a better model reviews the work and provides suggestions/tests for it.
doesn't invalidate the rest of us working on tough problems that demand more expensive models and valuable enough to justify it
A $20 claude sub goes a long way when you plan with Opus and execute with Sonnet.
1. The sheer number of tokens that a coding agent can use flipped the math upside down on this equation. If you use the most expensive model for everything those costs quickly become untenable, even for software companies.
2. We realized many of the coding problems we're solving aren't incredibly difficult.