[citation needed]
There is certainly economic pressure to create an exponential demand for tokens, but we've already seen a pullback from the costly "token maxing" companies were pushing last year.
It's also pretty unclear to what degree the RAM shortage is driven by inference (versus by training). We're rapidly approaching the point where frontier models are "good enough" for everyday use, are at some point we're going to hit diminishing returns on training new trillion-parameter models...