undefined

points

[-]

The cynical take is getting more and more to be the only rational one:

The promised mega-data center deals are meant to boost valuations today, not serve tons of customers three years from now.

by _heimdall15 hours ago|

parent|

[-]

It seems pretty clearly inline with the dotcom bubble to me. Every company claims to be a leading AI company, those building infrastructure are promising the moon and getting 1/3 of the way there, and no one knows how to monetize it justify the hype or expense.

by jjordan15 hours ago|

parent|

prev|

[-]

oof, this bubble popping is gonna be brutal.

by krupan14 hours ago|

prev|

[-]

It took us only, what 70-ish years of computer and AI research to get to this point, so yeah, probably just one little thing and then we'll have it </sarcasm>

Seriously. I have never ever seen so many people so willingly drink the marketing kool-aid from companies selling their product before. It's scarier to me than any threats of AI actually disrupting society (because it is so far from being capable of doing that).

by i_love_retros16 hours ago|

prev|

[-]

What would that breakthrough be?

by Waterluvian16 hours ago|

parent|

[-]

Magic math and computer science that allows us to get the same quality response for a fraction of the GPU.

by intothemild15 hours ago|

parent|

[-]

That's already happening. Qwen3.6 and Gemma4.

Basically small and medium models that are crazy well trained for their sizes.

Then we have a lot of specular decoding stuff like MTP and others coming to speed up responses, and finally better quantisation to use less memory.

Local LLM is the future, and the larger labs know that the open models will eat their lunch once people realise that the gap is only a few months. If we were good with LLMs a couple months ago, we're good with the open models now.

by krupan14 hours ago|

parent|

[-]

And how were those models developed and trained?

by lelanthran14 hours ago|

parent|

[-]

> And how were those models developed and trained?

That's irrelevant to my decision to use local or not.

by krupan14 hours ago|

parent|

[-]

That's not what this thread is about? We're saying some new breakthrough is needed, someone said it already has happened, and I'm asking if it really has. Has it? I don't think so, those models are not in some way fundamentally different than other LLMs

by lelanthran13 hours ago|

parent|

[-]

> We're saying some new breakthrough is needed, someone said it already has happened, and I'm asking if it really has.

I didn't read "and how were those models trained" as "Are we there yet?"

by intothemild6 hours ago|

parent|

[-]

There's a percentage of people who love to question how the open models were trained.. they are almost always going to try and make some argument about using the closed frontier models for distillation as some form of theft.

Just totally forgetting that the frontier models themselves stole an insane amount to get to where they are.

It's theft all the way across the board, and when someone tries to make the argument that open models theft is bad, but Altman or Amodei's theft is good.. they are revealing a lot about themselves

by 14 hours ago|

parent|

prev|

[-]

deleted

by YZF15 hours ago|

parent|

prev|

[-]

The current LLMs are also "magic" so anything is possible. AFAIK there is no proof that the current architecture is optimal. And we have our brains as a pretty powerful local thinking machine as a counter-example to the idea that thinking has to happen in data centers.

by _heimdall15 hours ago|

parent|

[-]

I want to ask what makes them magic, but even those building LLMs don't really know what happens when they run inference...

I have to assume current architectures aren't optimal though, the idea that we stumbled into the one and only optimal solution seems almost impossible.

by 15 hours ago|

parent|

prev|

[-]

deleted

by toufka15 hours ago|

parent|

prev|

[-]

I mean, the most cutting edge of iPhones, iPads and MacBook Pros _today_ are quite capable of running in realtime today’s high-end local LLMs.

If you project out that hardware just a couple of years, and the trained models out a couple of years, you end up in a place where it makes so much more sense to run them locally, for all sorts of latency, privacy, efficacy, and domain-specific reasons.

Not all that different from the old terminal & mainframe->pc shifts.

Finally - hardware has seemingly gotten out ahead of software that most folks use - watching YouTube, listening to music, playing a game or two. There was a time when playing an mp3 or watching a 4k video really taxed all but the nicest systems. Hardware fixed that problem, like it very well could this one.

by sofixa15 hours ago|

parent|

[-]

> I mean, the most cutting edge of iPhones, iPads and MacBook Pros _today_ are quite capable of running in realtime today’s high-end local LLMs

Definitely not the high end local LLMs. The small ones, yes, absolutely.

> If you project out that hardware just a couple of years

One of the biggest bottlenecks for LLMs is memory capacity and bandwidth. With the current glut for memory, it's unlikely we'll see lots of advancements in terms of average memory available or its bandwidth on regular (not super high end devices) in the coming years.

Alternatively, it's possible we get dedicated SMLs for e.g. phone specific use cases, that are optimised and run well.

by 16 hours ago|

parent|

prev|

[-]

deleted

by _heimdall15 hours ago|

parent|

prev|

[-]

I'd assume its a totally different architecture that isn't based on storing a compressed dataset of all digital human text.