Batch inference is much more efficient. Using the hardware round the clock is much more efficient. Cloud can absolutely pay more for hardware and still make money off you.
Cloud can pay more for RAM until all the RAM producers withdraw from the consumer market, then prices will go back down.
End users will still get access to RAM. The cloud terminal they purchase from Apple, Google, Samsung, or HP will have all the RAM it will ever need directly soldered onto it.
Doesn’t Apple place RAM directly into the SoC package? We aren’t even talking about soldering it to mother boards anymore, it is coming in with the CPU like it would as a GPU.
More like RAM producers are providing supplies to the highest bidder, no? If this doesn't peter out supply will normalize at a higher but less insane price eventually.