An alternative perspective is, devs highly value coding agents, and are willing to pay more because they're so useful. In other words, the market value of this limited resource is being adjusted to be closer to reality.
Inference is not free, so all providers have a financial limit, and all providers have limited GPU/memory, so there's a physical material limit.
I suggest looking at the profits of these companies (while they scramble to stay competitive).
It will get faster, but there are no singularities in the real world. Except possibly black holes, but we can't even be sure of that.