upvote
Because this is Microsoft, experimenting and failing is not encouraged, taking less risky bets and getting promoted is. Also no customer asked them to have 1-bit model, hence PM didn't prioritize it.

But it doesn't mean, idea is worthless.

You could have said same about Transformers, Google released it, but didn't move forward, turns out it was a great idea.

reply
> You could have said same about Transformers, Google released it, but didn't move forward,

I don't think you can, Google looked at the research results, and continued researching Transformers and related technologies, because they saw the value for it particularly in translations. It's part of the original paper, what direction to take, give it a read, it's relatively approachable for being a machine learning paper :)

Sure, it took OpenAI to make it into an "assistant" that answered questions, but it's not like Google was completely sleeping on the Transformer, they just had other research directions to go into first.

> But it doesn't mean, idea is worthless.

I agree, they aren't, hope that wasn't what my message read as :) But, ideas that don't actually pan out in reality are slightly less useful than ideas that do pan out once put to practice. Root commentator seems to try to say "This is a great idea, it's all ready, only missing piece is for someone to do the training and it'll pan out!" which I'm a bit skeptical about, since it's been two years since they introduced the idea.

reply
Google had been working on a big LLM but they wanted to resolve all the safety concerns before releasing it. It was only when OpenAI went "YOLO! Check this out!" that Google then internally said, "Damn the safety concerns, full speed ahead!" and now we find ourselves in this breakneck race in which all safety concerns have been sidelined.
reply
Scaling seemed like the important idea that everyone was chasing. OpenAI used to be a lot more safety minded because it was in their non profit charter, now they’ve gone for-profit and weaponized their tech for the USA military. Pretty wild turnaround. Saying OpenAI was cavalier with safety in the early days is inaccurate. It was a skill issue. Remember Bard? Google was slow.
reply
What OpenAI did was train increasingly large transformer model instances. which was sensible because transformers allowed for a scaling up of training compared to earlier models. The resulting instances (GPT) showed good understanding of natural language syntax and generation of mostly sensible text (which was unprecedented at the time) so they made ChatGPT by adding new stages of supervised fine tuning and RLHF to their pretrained text-prediction models.
reply
There were plenty of models the size of gpt3 in industry.

The core insight necessary for chatgpt was not scaling (that was already widely accepted): the insight was that instead of finetuning for each individual task, you can finetune once for the meta-task of instruction following, which brings a problem specification directly into the data stream.

reply
On the one hand, not publishing any new models for an architecture in almost a year seems like forever given how things are moving right now. On the other hand I don't think that's very conclusive on whether they've given up on it or have other higher priority research directions to go into first either
reply
The most benign answer would be that they don’t want to further support an emerging competitor to OpenAI, which they have significant business ties to. I think the more likely answer which you hinted at is that the utility of the model falls apart as scale increases. They see the approach as a dead end so they are throwing the scraps out to the stray dogs.
reply
Not to mention Microsoft's investments in Nvidia and other GPU-adjacent/dependent companies!

A successful ternary model would basically erase all that value overnight. In fact, the entire stock market could crash!

Think about it: This is Microsoft we're talking about! They're a convicted monopolist that has a history of manipulating the market for IT goods and services. I wouldn't put it past them to refuse to invest in training a ternary model or going so far as to buy up ternary startups just to shut them down.

Want to make some easy money: Start a business training a ternary model and make an offer to Microsoft. I bet they'll buy you out for at least a few million even if you don't have a product yet!

reply
If that were true then they simply wouldn’t have published this research to begin with.

Occam’s Razor suggests this simply doesn’t yield as good results as the status quo

reply
Rest assured, all the big players (openai, google, deepseek etc) have run countless experiments with 4,3,2,1.58,1 bits, and various sparse factors and shapes. This barrel has been scraped to the bottom
reply
So is it finally time for a Beowulf cluster to do something amazing?
reply
Cannot agree more!
reply