The math doesn’t math.
I know because I see how people went over the 4o model. I can see opus behaving clearly differently enough that I pick it for certain tasks.
For awhile it was every 2-3 years you'd start a hardware refresh. As companies moved into more and more training, this timeframe started to shrink. It went from 36 months to 24 months. From 24 months to around 16-18 months. Last I checked last year, it was at 12 months. I think things may have slowed because of component availability, but otherwise whole data centers would be 6-12 months into full operations before they would start a refresh cycle.
Not to mention the massive increase in power density demand and cooling demand per rack that entails.
So no, "AI costs" have not gone down, in fact they are more expensive on training AND inference than ever.
This is why many are concerned about the heroin drip of api costs into orgs. For the companies that are public, look into their financials. It's gonna hit companies and high volume users like a ton of bricks.
- if AI costs go down you can ask how the companies will make profit and then suggest the bubble popping
- if AI costs go up you can ask how people will afford it and then suggest the bubble popping
- if companies actually do make profit then you can say the companies are getting too big and powerful so it’s a bad thing for consumers
Essentially you have left zero to a small narrow path where you are happy with the outcomes.
Like what if they don't necessarily have to be super duper money making machines to legitimate how useful and nice they are for you? Is that even conceivable? What if tomorrow we all decided they are more like utilities? Would that change anything intrinsic about them for you?
Likewise, the quality of what I can get from a local model like Qwen 3.6 on an RTX 5090 is light years ahead of what I could get a year ago on the same hardware.