undefined

points

[-]

> One day, maybe not far from now, a breakthrough will allow huge LLMs (say 200B in size) to run well on an old 5 year old Dell desktop.

But if you have such a breakthrough could you not also apply it and run 200T models on todays datacenters?

by pennomi1 days ago|

parent|

[-]

That assumes scaling laws still hold up. A bigger model might end up only incrementally more intelligent.

by ACCount3723 hours ago|

parent|

[-]

They do. Mythos kicked ass while it lasted. And what we know of the scaling law curves promises us even more gains in the future.

"The future" being "whenever training and inference at increased scale becomes economical". Which is probably bounded by new generations of hardware, but might also be pushed forward by algorithmic advances.

by phkahler23 hours ago|

parent|

[-]

I think they're out of training data though...

by ACCount3723 hours ago|

parent|

[-]

Synthetics are often used for "data amplification" nowadays. Extra compute covers a multitude of sins.

by ACCount3723 hours ago|

parent|

prev|

[-]

Not only you could: you would also want to.

The likes of Mythos show that the scaling laws are real, and you can x5/x2 the total/active params and get meaningful gains. If "inference per param" gets cheaper? Up the params and get more intelligence for the same price.

by deweywsu1 days ago|

parent|

prev|

[-]

Quite true

by simonebrunozzi1 days ago|

prev|

[-]

Interesting comment, but the comparison with hard disk drives is probably unfair.

The IBM 350 was commercialized 70 years ago; it took 70 years for someone like you to be able to compare that to a multi-TB SSD.

Furthermore, nothing says that Moore's Law will necessarily apply to LLMs, for decades to come.

by deweywsu1 days ago|

parent|

[-]

Very true, and all I am basing my comment on is the improvement in speed AI has demonstrated when applied to software development, and inferring it might enable a similar 10X or 100X improvement in both hardware architecture as well LLM structure and/or interface methods. If that speed improvement applies to performance of AI, that could mean the 70 years it took for people to improve storage technology might be able to be compressed to achieve a step change in AI performance in a drastically shorter timeframe.

by LZ_Khan1 days ago|

prev|

[-]

I think Jevons Paradox and scaling laws will make this not the case. If bigger models are always better (which seems they are), then will always need high-end hardware.

by gdiamos1 days ago|

prev|

[-]

Usually breakthroughs in computing lead to more usage of computing, not less.

by 3abiton1 days ago|

prev|

[-]

> One day, maybe not far from now, a breakthrough will allow huge LLMs (say 200B in size) to run well on an old 5 year old Dell desktop.

I think there will be specialized hardware (beside GPUs) that would be custom made for LLMs. Yes TPUs exist, but mainly for datacenter. GPUs exist, but they are adapted from mainly graphic application. Once all the demand from data center dries up, innovation will kick in.

by andriy_koval1 days ago|

prev|

[-]

> I keep wondering if hardware like this will become obsolete well before it has a meaningful ROI

it will build expertise/infra/know-how foundation for next generation of hardware

by hyhatqtv1 days ago|

prev|

[-]

Looking at the development of memory bandwidth, capacity and prices over the last 10 years there is little indication that’s likely.

by dwa35921 days ago|

prev|

[-]

True but as someone else pointed out; at that time we'd be interested in running 200T parameter model rather than 200B. Why, you might ask? Law of human laziness - a human will become as lazy as the technology allows it to. With the 200T or 20,000 T model - I'd be heavily incentivized to ask it to make the bread for me that I enjoy making now or create a movie for me (featuring myself) which will maximize the dopamine production in my brain.

by zabriel_goss1 days ago|

prev|

[-]

I agree with you. Stepping stones are still a part of getting there, if only to be briefly useful.

by Rekindle809020 hours ago|

prev|

[-]

[dead]