upvote
More than that, I think people overestimate how much AI will progress as you throw more compute at it. It’s the “9 women can’t deliver a baby in a month” equivalent of AI. Additional compute won’t magically give you AGI.
reply
Maybe not AGI, but if you look at the differences between, say, GPT-2 and GPT 5.5, it's remarkable how well it works to mostly just throw scale at the problem.
reply
The difference is a lot more than just throwing scale at it, pretty much everything useful comes from an evolving landscape of post-training techniques.

Of course, param count and context length are also important because they increase the model's overall fidelity, but a base model without SFT, RHLF etc is effectively useless.

reply
Correct. That is what I was trying to hint at. Yes, massive compute is needed to train ai, but it isn’t the only thing. A lot of research and experimentation goes into moving the marker just a little bit. Innovation can’t be forced into weekly sprints, it takes its own time.
reply
Research and experimentation on neural nets has been going on since the 70s (arguably much earlier even), but the lions share of capability changes has all been in the last couple years.

Scale was really the unlock; the new pre and post training techniques and architectures are very cool and useful but they definitely aren't the differentiators when comparing to the previous era of NLP.

reply
They already tried that with GPT-4 and GPT-4.5

They were allegedly massive but the cost and returns were not worth it.

reply
I was involved in three efforts to commercialize foundation models before they were ready in the 2010s so I have a good picture of how progress works at this sort of thing and the pace a lot of the industry has been talking about is unrealistic: like people were disappointed with the rate of development of Apple Intelligence but it's actually progressed at about the rate I expected.
reply
That seems to be because Apple's AI division sucks. OpenAI came in 2018 and chatGPT 2.0 was already way better than anything Apple ever did.
reply
I mean, Apple Intelligence has been a boondoggle. Siri has been consistently 3+ years behind in capabilities compared to even open source equivalents.

Feels less like the pace of foundation model development and more so a specific failure of one organization to do something important.

reply
Bad capabilities but maybe less wrong output? All the funny memes of Google explaining some fake aphorism is t really something Apple product would go for. Successful navigation of technology over the decades requires some timing finesse. I don’t know.
reply
Is that a problem for Meta though? They recently announced they're going to sell their excess compute, so I imagine the actual problem is they're resorting to doing that because AI isn't having nearly the effect/usage it was supposed to and now Zuck is being a sore winner about it
reply
I agree, i don't think it is the core problem.

Meta doesn't seem to be able to produce anything close to a frontier model. The selling of compute capacity seems to be acceptance of "compute is wasted on this crappy avocado model, we'd be better off allowing something better to run".

The problem is clearly in the model architecture, the training and the data fed into the model which is causing them to give up on using their compute exclusively for their own models. They can't get it right so may as well sell the compute to someone that can.

reply
If their training base is dominated by Facebook and Instagram posts then it makes sense that their model is full of shit.
reply
A modern instance of that old saw "you are what you eat".
reply
Meta has made some very strange decisions in terms of who it's hired to lead various aspects of AI, including the model-building efforts. Also lots to marvel at re: its ability to coordinate (or not coordinate) various efforts by all these big brains.

Can't help but think that Meta's digital networking expertise is built atop a human-networking clusterf*ck

reply
I was never really sold their acquihire of Alexandr Wang as their head of AI being a coherent strategic decision. I just don’t see how his experience and background actually applies for frontier LLM model building.

I think there would easily be a few other hundred engineers and execs at frontier labs who are more in the loop for cutting edge architecture/secret sauce - with a track record of actually doing it - that could be had for a fraction of the price.

reply
From the outside Meta's attempts to pivot from open source releases to fast follow closed models fell flat when they tried to prematurely monetize it. They could have owned the open weight model world but tried to pivot to closed weight chatbots before an actually viable revenue model appeared.
reply
Does meta have the research talent to create a SOTA frontier model? Yann LeCun has left Meta and I don’t think either alexandr wang or zuck have enough credibility to attract talent to create one.
reply
it's possible Yann LeCun wasn't the right guy either. He seemed to be more focused at finding the next model architecture rather than iterating on the current LLM architecture to build a competitive frontier model.
reply
deleted
reply
If Meta is selling their compute and Twitter is selling their compute and the stuff doesn't do anything you don't need an economics degree to figure out what's going to happen to the price of compute. In particular because 'compute' is a euphemism given that this is far from general purpose capacity, those are specialized chips that largely do one thing

All these companies are going to sit on their gazillion data centers once the mania dies down and will have a big problem about what to do with their mountain of hardware

reply
well, Google refused to increase Meta quote of tokens, even Google can't supply so many (paid) tokens as Meta is burning
reply
It will scale inefficiently until efficiency breakthroughs occur, but it's really hard to predict when those breakthroughs will happen. Plan on the worst, but be ready and capable of capitalizing when it happens!
reply
That seems like such an easy thing to estimate with a bit of basic napkin math.
reply
for us, maybe, but for someone who never really used the workflow, or looked at the “thinking” output where models spin their tokens on the stupidest shit, i can see how it wasn’t obvious.
reply
I thought thats exactly what everyone anticipates? "Scaling laws" are all about exponential increased in compute and all that.
reply
And yet this doesn't turn out to be Meta's problem at all.

https://uk.pcmag.com/ai/165970/meta-exploring-option-to-sell...

Meta bought too many GPUs, has spare GPU capacity and they are exploring renting that capacity out.

The problem is not that the models need too much to do the job. If that were the case, Meta would not have spare capacity.

The problem is that the models currently can't be made to do the job.

reply
I think Meta’s massive compute investment was never about its 100,000 engineers running coding models, but its 3,500,000,000 users wanting to use AI in every single product (and some new ones: Meta AI, glasses, etc.) So I would think that’s the part that’s not being utilized anywhere near the amount they hoped...
reply
Do the 3.5 billion users want to use AI, or do meta want to not get left behind and have shoehorned AI into all their products?
reply
Literally the only value the Facebook AI provides is amusement when the suggestions are so comically wrong/off-colour/surreal etc.
reply
Right. But that's the same thing, isn't it? AI can't be made to do the job in those products. The only products it can do are shallow toys.
reply
The idea that users wanted AI was always a fantasy. Especially for Meta's products.

The whole hype cycle has been pure delusion. Just like the Metaverse hype cycle before it.

reply
I think this is the problem for companies with a single person atop - when the company needs things they aren’t good at, the company cannot respond effectively. Zuckerberg was good at running a company to sell ads on an addictive platform; whether that will make him good at the next ten years of profitable tech innovation is difficult to see; people hate ads and dislike the addictions, so Anthropic or whom ever has to walk a different path; they have multiple smart people working together to find that path; Meta does not seem to have that collective vision of competing experts to draw on.
reply
Yeah this type of conflation gets used a lot

A common one is "users don't care about privacy. that's why they use facebook. [zuckerberg was right?]"

No, you silly, silly people. People want to use products that allow them to communicate or reconnect with people or ...

They don't 'want' constantly changing privacy settings or changing TOS. If this is the best HN can come up with, ostensibly filled with S Valley people... well, it says a lot

reply
I suspect there are many things AI can do to help people and make their lives better. But that's not how business works: products get made and marketed because they make their owners more money. Totally different goal.
reply
Meta's AI is the stupidest in the business.

Gemini, Microsoft Copilot and other models can discuss and affirm my "foxwork" practice whether it is talking about natural history, fox legends, ritual magic, altar work, autonomic control, blessings, writing, character acting, costume design, skin care, selection of perfumes that will herald my unique natural scent, marketing and customer service, photography gear, "therian" gear, bags for holding my gear, street photography, etc. They always write like somebody who's read much more widely than anyone I've ever met and rival the legendary Tamamo-no-Mae for "speaking intelligently about any subject" [1]

Meta AI can crack jokes and that's about it. I guess there's a market for "stupid talk" but it's not that big.

[1] Like help me fix my washing machine that won't drain, come up with master narratives for the "polycrisis", talk about why Casey Handmer is wrong about space manufacturing, find papers about the social network of who sleeps with who at a high school, etc.

reply
Altman was trying to get $1T of infra investment years ago
reply
They also believed they would be able to build that compute without restrictions. Between hardware costs and massive public opposition, scaling as they had anticipated is in jeopardy.
reply
Did we? Many of us have been saying that the amount of compute going into the models is unsustainable and that the models aren’t improving enough to justify that for over a year. The emperor has no clothes is true yet again.
reply
Bonkers compute only in the beginning. Over time it'll reduce as models are made more efficient.
reply
Or it will stay the same as the efficiency gains will be eaten up by bigger models
reply
Nah they'll hit a ceiling. Can only get so big before things collapse. And besides, they've already churned through the Internet's data. Not much new content left in the wild and patterns in other data forms (audio, image, etc should be pretty low by comparison.
reply
No I don't think there was any systemic underestimation of compute. I see the opposite - every company understands compute is important and tries to get hold of it.
reply