Of course, param count and context length are also important because they increase the model's overall fidelity, but a base model without SFT, RHLF etc is effectively useless.
Scale was really the unlock; the new pre and post training techniques and architectures are very cool and useful but they definitely aren't the differentiators when comparing to the previous era of NLP.
They were allegedly massive but the cost and returns were not worth it.
Feels less like the pace of foundation model development and more so a specific failure of one organization to do something important.
Meta doesn't seem to be able to produce anything close to a frontier model. The selling of compute capacity seems to be acceptance of "compute is wasted on this crappy avocado model, we'd be better off allowing something better to run".
The problem is clearly in the model architecture, the training and the data fed into the model which is causing them to give up on using their compute exclusively for their own models. They can't get it right so may as well sell the compute to someone that can.
Can't help but think that Meta's digital networking expertise is built atop a human-networking clusterf*ck
I think there would easily be a few other hundred engineers and execs at frontier labs who are more in the loop for cutting edge architecture/secret sauce - with a track record of actually doing it - that could be had for a fraction of the price.
All these companies are going to sit on their gazillion data centers once the mania dies down and will have a big problem about what to do with their mountain of hardware
https://uk.pcmag.com/ai/165970/meta-exploring-option-to-sell...
Meta bought too many GPUs, has spare GPU capacity and they are exploring renting that capacity out.
The problem is not that the models need too much to do the job. If that were the case, Meta would not have spare capacity.
The problem is that the models currently can't be made to do the job.
The whole hype cycle has been pure delusion. Just like the Metaverse hype cycle before it.
A common one is "users don't care about privacy. that's why they use facebook. [zuckerberg was right?]"
No, you silly, silly people. People want to use products that allow them to communicate or reconnect with people or ...
They don't 'want' constantly changing privacy settings or changing TOS. If this is the best HN can come up with, ostensibly filled with S Valley people... well, it says a lot
Gemini, Microsoft Copilot and other models can discuss and affirm my "foxwork" practice whether it is talking about natural history, fox legends, ritual magic, altar work, autonomic control, blessings, writing, character acting, costume design, skin care, selection of perfumes that will herald my unique natural scent, marketing and customer service, photography gear, "therian" gear, bags for holding my gear, street photography, etc. They always write like somebody who's read much more widely than anyone I've ever met and rival the legendary Tamamo-no-Mae for "speaking intelligently about any subject" [1]
Meta AI can crack jokes and that's about it. I guess there's a market for "stupid talk" but it's not that big.
[1] Like help me fix my washing machine that won't drain, come up with master narratives for the "polycrisis", talk about why Casey Handmer is wrong about space manufacturing, find papers about the social network of who sleeps with who at a high school, etc.