undefined

points

[-]

Looking at the history of the memory industry the biggest risk is that a firm would over produce and go bankrupt. Maybe this time is different but so far no memory chip maker has gone under because their competition increased capacity.

by minraws31 minutes ago|

parent|

[-]

I might be wrong but your second point can't be true if the first one is true.

Let me explain, imagine CXML grows massive and builds a lot of fabs, so much so that it becomes the leader in multiple segments, then the market demand cools off.

Then CXML the company that invested massively has oversupply so it undercuts every other memory company.

Aka, Samsung, SK Hynix are dead, and to protect Micron now US has 10000% tariff on the supply of memory.

Imagine. Because that has happened, if you don't play the boom and bust game someone will because the market is very large during a boom, and generally the player scaling more isn't the one with margins to protect and generally has the ability to undercut others.

Asian memory chip giants were made by under cutting European and American companies, American companies adapted by moving manufacturing to Asia, and European ones got bought for pennies or dissolved.

by galangalalgol1 hours ago|

prev|

[-]

Is there any indication research is being focused on reducing menory footprint of inference for frontier class models? Is the low hanging fruit already gone there?

by minraws44 minutes ago|

parent|

[-]

Low hanging? how low hanging are we talking, the basic stuff is gone. Largely big challenges around quantization were solved 2 years ago, and we have just been improving from there.

But can massive gains still be made? Definitely.

The entire AI hype is based on the paper Attention is all you need, and Attention is basically loading a huge matrix of all the tokens in memory, how well you can optimize this attention layer is basically how most architectures are trying to solve for performance and memory usage.

Only one with significant gains in it is DeepSeek (or so I would like to believe because others don't make their work open for folks like me not in Big AI Labs to read). Their MLA architecture reduced KV-cache memory requirements by upto 90%, ofc that's purely architectural change.

With some quantization like Turboquant from google you could push it down to ~1/3 of that. So 96% memory savings when talking about kv-cache.

But the models are close to being saturated for quantization based memory optimizations. We will have to see some architectural changes for a significant shift now.

by aurareturn47 minutes ago|

parent|

prev|

[-]

If they manage to make memory more efficient, they’ll just increase the context size and/or model size.

We just haven’t reached the diminishing return of gen AI capabilities yet.

Models will get more useful if you have higher context size or higher param size. Then people will just use the models even more, leading to even more memory demand.

by zx80801 hours ago|

prev|

[-]

What is the risk? Competition is good for consumers.

by LPisGood1 hours ago|

parent|

[-]

The risk is to the business not the consumers