I don't think many realize how could the cheap, alternative models are becoming. I prefer SOTA models for key work, but I can also spend 10X as many tokens on an open model hosted by a non-VC subsidized provider (who is selling at a profit) for tasks that can tolerate slightly less quality.
The situation is only getting better as models improve and data centers get built out.
Bedrock isn't the cheapest either although I'm fairly sure they aren't being VC subsidized
There are definitely cheap tokens out there. The big gotcha is "for tasks that can tolerate slightly less quality"
I think everyone making claims that inference is getting more expensive are unaware that there are more LLM providers than Google, Anthropic, and OpenAI.
Face-scanning? Iris patterns?
https://www.google.com/search?q=identify+anonymous+visa+mast...
Try the exhorbitant expenses and ballooning waste of generated electricity and usable water.