upvote
This is clearly where the industry is going, imho. Everyone who is playing with LLMs wants a laptop with enough grunt to run a decent model locally.

We've been sat with basically the same PC specs for ~20 years - our current specs are within an order of magnitude of the ones we could buy back in 2010. This is not really constrained by tech, as we could have much, much, larger machines. It's more because there's no mass demand for much, much, larger machines - if it's big enough to run Office apps or VSCode then you're good to go. The exponential growth we saw in the 90's was driven as much by software demand as it was by hardware development.

I can see the next 10 years produce the same kind of push for larger machines that the 90's did. And we should probably expect the same kind of standards churn as our existing technologies for storage, memory, etc, don't scale up enough and new technologies become worth developing because there's demand for them.

reply
It seems relevant for playing with LLMs, but for actual work this seems far off for me.

My productivity profits from the best intelligence available, a decent context size, and a batch size of four.

While my MacBook has 48 GB of RAM, not only do I want the above requirements at a decent speed, but I also need my machine to run the development tools and test suites, ideally without the fans blasting at full load.

For the foreseeable future I will stay with providers rather than local inference, apart from niche use cases.

reply
Yeah, agree, but that's the point, really. If I could buy a 16Tb machine with 4 TPUs for ~$5K and run a frontier model locally, I would.

I'm in Australia, so we're probably not getting access to Fable again. We're learning that a faster model + better harness/framework > smarter model. So being able to run GLM5.2 locally and super-fast would be great.

reply
my only concern if the same specs today would cost 10x more given the trajectory of the growth of memory prices lately.
reply
I think this is where the new technology comes in. There is demand for 10x (or 1000x) the memory that we're using at the moment, so someone/something will satisfy that demand. We haven't had that demand up until now, because 16Gb was a perfectly reasonable amount of memory that could run pretty much anything, and if that won't then 32Gb will. There was zero demand for 16Tb memory machines because no-one had any application for that much memory. Now that's changing, and there is demand for that much, so we'd expect to see that being made available.

But the existing tech we're using for 16Gb probably isn't going to scale to 16Tb at a reasonable price point. And the price point is relatively inelastic - people are used to paying <$5K for their computers, and they're not going to go much above that. You'll get early adopters paying $10K or more for a machine that large, but not the early majority. And even then, obviously, $10K is not going to buy you a 16Tb memory machine.

So there's room for a new technology to come in, where there wasn't previously. This is what happened all through the 90's, and we churned through a bunch of standards and technologies to try and keep up with demand.

reply
> memory prices coming down

Are they?

I suspect AI labs are buying stuff not just for their own use, but to make local use too expensive to be an option :-( And they can always make the "best" frontier model even bigger (though only fractionally better) so it's always out of reach of local use, while consumer laptops have nearly the same amount of memory they had a decade ago.

    m                  o
    o
    d
    e
    l             o
    s
    i        o
    z    o
    e  2020 2022 2024 2026
    
    
    c                  
    h
    e
    a
    p             o      
    R        o     
    A    o                
    M                   o
       2020 2022 2024 2026
reply
For most tasks, I don't value the LLMs based on their absolute capabilities. I wouldn't want to use GPT-4 today even if it's free.
reply
I'm being very sarcastic, local model evangalists seems to just be operating on vibes when they say these things and are completely disconnected from how models work, what the hardware requirements are.

Prices aren't going down, and consumer platforms are being shipped with less RAM so we can be sold cloud products. This isn't going to happen.

Can you please explain to me how you're going to fit 700bb-1T params in 64GB of RAM? You realize there are memory requirements proportional to model size?

reply
> Can you please explain to me how you're going to fit 700bb-1T params in 64GB of RAM?

You don't. What they're saying is that today's small models (that fit on consumer hw) are better than yesteryear's top models. GPT4 was reportedly 8x 220B (~1.6T) MoE, and today you can run a 30-120B model that beats it handedly in real-world tasks.

Similarly for 4-20B models beating GPT3 (175B) and so on.

There is a sweetspot of "good enough" that the small models can reach, where you get equivalent tasks solved fully locally. They'll never touch SotA, but they'll reach 2-3-4 year's SotA. Which, depending on the task you need, it can be "good enough".

reply