What about Gemma and Llama and gpt-oss, not to mention lots of smaller/specialized models from Nvidia and others?
I would never argue that China isn't ahead in the open weights game, of course, but it's not like it's "all" American models by any stretch.
I'm annoyed at myself, because I thought/hoped/praised chinese AI when they were opening up as Llama was closing, but Qwen looks to be doing the same playbook here as Llama/Meta, Gemma/Google and OpenAI/gpt-oss.
Most*.
OpenAI, contrary to popular belief, actually used to believe in open research and (more or less) open models. GPT1 and GPT2 both were model+code releases (although GPT2 was a "staged" release), GPT3 ended up API-only.
Also the Chinese models aren't following a typical American SaaS playbook which relies on free/cheap proprietary software for early growth. They are not just publishing their weights but also their code and often even publishing papers in Open Access journals to explicitly highlight what methods and advancements were made to accomplish their results
Well, Musk v OpenAI kicks off in one week from now with the objective of forcing them back to their roots. A jury will be deciding whether a nonprofit accepting $50m - $100m of donations and then discarding their mission for an IPO is OK or not. Should be interesting.
I think the Chinese models have to be more open to increase trust as everyone is worried they are feeding their very essence/soul into a Chinese copying machine.
Also China wants there to be viable competitors so that US can’t just dominate a potentially very important field. It’s a challenge to a unipolar USA dominated world.
Also it helps to spur Chinese companies in the all important microchip industry which is controlled by a very small number of companies at various steps in the supply chain.
I wonder too if it allows them to hold an ace in their hand as well in terms of threat/power for negotiations. As in, they can cause the whole house of cards to crumble, an economic nuclear weapon so to speak.
Finally, there is a certain amount of prestige involved too. China can compete or even win at a very complicated game. They use it to increase national pride and to project their advancing power status to other nations.
Anyways, just my thoughts. Interested in others thoughts.
It's a good model though, would be nice with a refresh.
Today, lots of integer compute happens on local devices for some purposes, and in the cloud for others.
Same is already true for matmul, lots of FLOPS being spent locally on photo and video processing, speech to text, …
No obvious reason you wouldn’t want to specialize LLM tasks similarly, especially as long-running agents increasingly take over from chatbots as the dominant interaction architecture.
Right now, certainly. Things change. What was a datacenter rack yesterday could be a laptop tomorrow.
Now, given they can't satisfy current volume, they are forced to settle for just having crazy margins.
No, nVidia and AMD are not the only ones benefiting.
no it isn't. That's the kind of thing people say who've never worked in the Chinese software ecosystem. It's how the Chinese internet has worked for 20+ years. The Chinese market is so large and competition is so rabid that every company basically throws as much free stuff at consumers as they can to gain users. Entrepreneurs don't think about "grand strategic moves at the national level" while they flip through their copies of the Art of War and Confucius lol
DeepSeek, Kimi, GLM, etc. are not built by behemoths, and they are free. You do not understand China's culture and market.
> And surprisingly of course none of these models answer uncomfortable questions about China’s past.
Download the GLM 5.1 weights and ask about Tiananmen Square, it will tell you what happened.
You are viewing China through a Western lens. I used to do the same many years ago, but after traveling to China many times, I realized that was a mistake.
I saw your comment after I wrote mine.
The idea that every new foundation model needs to be pretrained from scratch, using warehouses of GPUs to crunch the same 50 terabytes of data from the same original dumps of Common Crawl and various Russian pirate sites, is hard to justify on an intuitive basis. I think the hard work has already been done. We just don't know how to leverage it properly yet.
To me, that suggests that transformer pretraining creates some underlying structure or geometry that hasn't yet been fully appreciated, and that may be more reusable than people think.
Ultimately, I also doubt that the model weights are going to turn out to be all that important. Not compared to the toolchains as a whole.
Tokenization breaks up collocations and creates new ones that are not always present in the original text as it was. Most probably, the first byte pair found by simple byte pair encoding algorithm in enwik9 will be two spaces next to each other. Is this a true collocation? BPE thinks so. Humans may disagree.
What does concern me here is that it is very hard to ablate tokenization artifacts.
[0] https://news.ycombinator.com/item?id=47431671 https://news.ycombinator.com/item?id=47322887
What if you still have to obtain the best result possible for given coefficient/tokenization budget?
I think that my comment express general case, while yours provide some exceptions.
>What if you need to reduce number of layers
Delete some.
> and/or width of hidden layers?
Randomly drop x% of parameters. No doubt there are better methods that entail distillation but this works.
> would the process of "layers to add" selection be considered training?
Er, no?
> What if you still have to obtain the best result possible for given coefficient/tokenization budget?
We don't know how to get "the best result possible", or even how to define such a thing. We only know how to throw compute at an existing network to get a "better" network, with diminishing returns. Re-using existing weights lowers the amount of compute you need to get to level X.
I believe US is building this off the cost difference from other countries using companies like scale, outlier etc, while china has the internal population to do this
People think that Chinese AI labs are just super cool bros that love sharing for free.
The don't understand it's just a state sponsored venture meant to further entrench China in global supply and logistics. China's VCs are Chinese banks and a sprinkle of "private" money. Private in quotes because technically it still belongs to the state anyway.
China doesn't have companies and government like the US. It just has government, and a thin veil of "company" that readily fool westerners.
That's very different from an American SaaS model which relies of free but proprietary software for early growth
If you forever stand at the entrance eating the free samples, that's fine, they don't care. Other people are going through the door and you are still consuming what they feed you. Doesn't mean it's going to be bad or evil, but they are staking their territory of control.
As for what comes next, it's probably going to be a bit of a race for who can do the most useful and valuable things the cheapest. If OpenAI and Anthropic don't make it, the technology will survive them. If they do, they'll be competing on quality and cost.
As for state sponsorship, a lot of things are state sponsored. Including in the US. Silicon Valley has a rich history that is rooted in massive government funding programs. There's a great documentary out there the secret history of Silicon Valley on this. Not to mention all the "cheap" gas that is currently powering data centers of course comes on the back of a long history of public funding being channeled into the oil and gas industry.
You can make any comparison you want if you use adjectives rather than values. I can say that cars use a massive amount of water (all those radiators!) to try and downplay agricultural water usage. But its blatantly disingenuous.
SV is overwhelmingly private (actual constitutional private) money. To the point that you should disregard people saying otherwise, just like you would the people saying cars use massive amounts of water.
Contrary: How will the closed, proprietary models from Anthropic, "Open"AI and Co. lead us all to freedom? Freedom of what exactly? Freedom of my money?
At some point this "anti-communism" bullshit propaganda has to stop. And that moment was decades ago!
I still prefer that over US total dominance.
Let them fight it out.
But the events of the past decade or so have clearly demonstrated that there are no "good" actors.
I personally couldn't care less who wins in the China vs US AI competition, both sides have a long list of pros and cons.
Then decide ...
Or maybe families of African descent.
Or maybe families of Japanese Americans who lived in the US during WWII.
Or maybe people of Latin descent living in the US today.
You really don't see the difference?
I'm perfectly happy to let the chinese get a piece of the pie and fight the US, no matter how bad they are right now.
It would be a great day for humanity if people would stopping glazing text autocomplete as revolutionary.