I don't actually know what the rate of growth before October was, I'm sure someone round here will though.
What if its in everyone's interest to buy computers at say 1/3rd the rate and switch everything over to HBM?
the discrepancy between compute and memory has been growing for ages, perhaps a painful switch to HBM is exactly what we need?
Would you rather have 3 intermediate computers with low memory bandwidth, or wait a little longer statistically so that we can all enjoy a new computer at 1/3rd the rate but much higher bandwidth than the area ratio?
As for 20-25% growth not being enough, I think it's not that far off, if we assume data center build out plans hit a wall and slow down significantly, and the AI heat starts to cool off.
I don't think 20-25% may be enough in the short term but if the AI build out stops within this year, we have a massive oversupply instead of a under supply.
Let me explain, imagine CXML grows massive and builds a lot of fabs, so much so that it becomes the leader in multiple segments, then the market demand cools off.
Then CXML the company that invested massively has oversupply so it undercuts every other memory company.
Aka, Samsung, SK Hynix are dead, and to protect Micron now US has 10000% tariff on the supply of memory.
Imagine. Because that has happened, if you don't play the boom and bust game someone will because the market is very large during a boom, and generally the player scaling more isn't the one with margins to protect and generally has the ability to undercut others.
Asian memory chip giants were made by under cutting European and American companies, American companies adapted by moving manufacturing to Asia, and European ones got bought for pennies or dissolved.
But can massive gains still be made? Definitely.
The entire AI hype is based on the paper Attention is all you need, and Attention is basically loading a huge matrix of all the tokens in memory, how well you can optimize this attention layer is basically how most architectures are trying to solve for performance and memory usage.
Only one with significant gains in it is DeepSeek (or so I would like to believe because others don't make their work open for folks like me not in Big AI Labs to read). Their MLA architecture reduced KV-cache memory requirements by upto 90%, ofc that's purely architectural change.
With some quantization like Turboquant from google you could push it down to ~1/3 of that. So 96% memory savings when talking about kv-cache.
But the models are close to being saturated for quantization based memory optimizations. We will have to see some architectural changes for a significant shift now.
We just haven’t reached the diminishing return of gen AI capabilities yet.
Models will get more useful if you have higher context size or higher param size. Then people will just use the models even more, leading to even more memory demand.