upvote
I think that's an overgeneralization. We've seen all the American models be closed and proprietary from the start. Meanwhile the non-American (especially the Chinese ones) have been open since the start. In fact they often go the opposite direction. Many Chinese models started off proprietary and then were later opened up (like many of the larger Qwen models)
reply
> We've seen all the American models be closed and proprietary from the start

What about Gemma and Llama and gpt-oss, not to mention lots of smaller/specialized models from Nvidia and others?

I would never argue that China isn't ahead in the open weights game, of course, but it's not like it's "all" American models by any stretch.

reply
gpt-oss is good but I haven't heard anything about an update. It seems like one and done, to shut up people complaining about non-Open AI
reply
The more accurate version is only Chinese companies (plus Facebook briefly) really open source their frontier models. The rest are non frontier. They are either older or specialized for something.
reply
It's all openwashing, all of the ones you listed at somepoint have expressed how important and valuable open weights and locally usable models are. Every single one of them has then increasingly focused and pushed closed, proprietary or cloud usable only options since saying/doing that.

I'm annoyed at myself, because I thought/hoped/praised chinese AI when they were opening up as Llama was closing, but Qwen looks to be doing the same playbook here as Llama/Meta, Gemma/Google and OpenAI/gpt-oss.

reply
> We've seen all the American models be closed and proprietary from the start.

Most*.

OpenAI, contrary to popular belief, actually used to believe in open research and (more or less) open models. GPT1 and GPT2 both were model+code releases (although GPT2 was a "staged" release), GPT3 ended up API-only.

reply
That's fair but those days seem so long gone now.

Also the Chinese models aren't following a typical American SaaS playbook which relies on free/cheap proprietary software for early growth. They are not just publishing their weights but also their code and often even publishing papers in Open Access journals to explicitly highlight what methods and advancements were made to accomplish their results

reply
> those days seem so long gone now.

Well, Musk v OpenAI kicks off in one week from now with the objective of forcing them back to their roots. A jury will be deciding whether a nonprofit accepting $50m - $100m of donations and then discarding their mission for an IPO is OK or not. Should be interesting.

reply
The Nvidia Nemotron models are recent, and of course the Gemma 4 series from Google.
reply
Any idea why they do that?
reply
If the question is why Chinese models are contributing to open source and sharing of information, I don’t pretend to know the rationale but I think it’s because it’s an economic war.

I think the Chinese models have to be more open to increase trust as everyone is worried they are feeding their very essence/soul into a Chinese copying machine.

Also China wants there to be viable competitors so that US can’t just dominate a potentially very important field. It’s a challenge to a unipolar USA dominated world.

Also it helps to spur Chinese companies in the all important microchip industry which is controlled by a very small number of companies at various steps in the supply chain.

I wonder too if it allows them to hold an ace in their hand as well in terms of threat/power for negotiations. As in, they can cause the whole house of cards to crumble, an economic nuclear weapon so to speak.

Finally, there is a certain amount of prestige involved too. China can compete or even win at a very complicated game. They use it to increase national pride and to project their advancing power status to other nations.

Anyways, just my thoughts. Interested in others thoughts.

reply
gasp Science!
reply
OpenAI has released their GPT-OSS series more recently.
reply
Recently, more like 20 years ago in LLM-years.

It's a good model though, would be nice with a refresh.

reply
deleted
reply
GPT started off open? They just closed before anyone else even joined the space
reply
I think it is in the interest of chip makers to make sure we all get local models
reply
I think they're in a win-win situation. Big AI companies would love to see local computing die in favour of the cloud because they are well aware the moment an open model that can run on non ludicrous consumer hardware appears, they're screwed. In this situation Nvidia, AMD and the like would be the only ones profiting from it - even though I'm not convinced they'd prefer going back to fighting for B2C while B2B Is so much simpler for them
reply
If you want to run AI models at scale and with reasonably quick response, there's not many alternatives to datacenter hardware. Consumer hardware is great for repurposing existing "free" compute (including gaming PCs, pro workstations etc. at the higher end) and for basic insurance against rug pulls from the big AI vendors, but increased scale will probably still bring very real benefits.
reply
Currently, yes. But I don't find it hard to imagine that in a while we could get reasonably light open models with a level of reasoning similar to current opus, for instance. In such a scenario how many people would opt to pay for a way more expensive cloud subscription? Especially since lots of people are already not that interested in paying for frontier models nowadays where it makes sense. Unless keep on getting a constant, never ending stream of improvements we're basically bound to get to a point where unless you really need it you are ok with the basic, cheaper local alternative you don't have to pay for monthly.
reply
I think average users are already okay with the reasoning level they'd get with current open models. But the big AI firms have pivoted their frontier models towards the enterprise: coding and research, as opposed to general chat. And scale is quite important for these uses, ordinary pro hardware is not enough.
reply
This is really just a question of product design meeting the technology.

Today, lots of integer compute happens on local devices for some purposes, and in the cloud for others.

Same is already true for matmul, lots of FLOPS being spent locally on photo and video processing, speech to text, …

No obvious reason you wouldn’t want to specialize LLM tasks similarly, especially as long-running agents increasingly take over from chatbots as the dominant interaction architecture.

reply
> If you want to run AI models at scale and with reasonably quick response, there's not many alternatives to datacenter hardware.

Right now, certainly. Things change. What was a datacenter rack yesterday could be a laptop tomorrow.

reply
At a consistent amount of usage, datacenters are at least an order of magnitude more hardware efficient. I'm sure Nvidia and AMD would be fine fighting for B2C if it meant volume would be 10+x.

Now, given they can't satisfy current volume, they are forced to settle for just having crazy margins.

reply
The problem with B2C is that you need to have leverage of some kind (more demanding applications, planned obsolescence, ...) in order to get people to keep on buying your product. The average consumer may simply consider themselves satisfied with their old product they already own and only replace it when it breaks down. On the contrary, with the cloud you can keep people hooked on getting the latest product whether they need it or not, and get artificial demand from datacentres and such.
reply
Future upgrade cycles on phones and laptops, PCs, will be driven by SOCs that embed some type of ASIC that run a specific model. Every 6 months there will be a new, better version to upgrade to, which will require a new device. This is how Apple will be able to reduce cycles from 3 years to 6-12 months.
reply
I think businesses running datacenters are much less likely to frivolously buy the latest GPUs with no functional incentive than general consumers are...
reply
There are also many Chines AI-target GPU/NPU producers. You can get a hold of some boards on taobao.com. They are usable in some way.

No, nVidia and AMD are not the only ones benefiting.

reply
Definitely. Many big hardware firms are directly supporting HuggingFace for this very reason.
reply
True, chip companies have the opposite mindset, Nvidia is making their own open weights I believe
reply
This is obviously a strategic move at a national level. Keep publishing competing free models to erode the moat western companies could have with their proprietary models. As long as the narrative serves China there will be no turn to proprietary models.
reply
>This is obviously a strategic move at a national level.

no it isn't. That's the kind of thing people say who've never worked in the Chinese software ecosystem. It's how the Chinese internet has worked for 20+ years. The Chinese market is so large and competition is so rabid that every company basically throws as much free stuff at consumers as they can to gain users. Entrepreneurs don't think about "grand strategic moves at the national level" while they flip through their copies of the Art of War and Confucius lol

reply
If this was true then they’d build services around those models and provide those for free or vastly cheaper than western competition. But that’s not what they’re doing. Instead they’re giving away the entire model for free. And by the way, Qwen isn’t build from some random entrepreneur who’s trying to solve the cold start problem, but from Alibaba which is a fucking behemoth. And surprisingly of course none of these models answer uncomfortable questions about China’s past. Because sure enough, the first thing any entrepreneur would think is to protect their government and their history. Sure, happens all the time, no state interference here, move on.
reply
> And by the way, Qwen isn’t build from some random entrepreneur who’s trying to solve the cold start problem, but from Alibaba which is a fucking behemoth.

DeepSeek, Kimi, GLM, etc. are not built by behemoths, and they are free. You do not understand China's culture and market.

> And surprisingly of course none of these models answer uncomfortable questions about China’s past.

Download the GLM 5.1 weights and ask about Tiananmen Square, it will tell you what happened.

You are viewing China through a Western lens. I used to do the same many years ago, but after traveling to China many times, I realized that was a mistake.

reply
Excuse me if it’s considered uncouth on here to do this but, I would be interested in your thoughts on what I wrote here: https://news.ycombinator.com/item?id=47847600

I saw your comment after I wrote mine.

reply
That has been a viable commercial strategy for most modern, funded businesses. Capture market share at a loss, then once name is established turn on the profit.
reply
Exactly. Open source is a commercial strategy for Chinese labs. They have no other effective way of marketing their models and inference services: https://try.works/writing-1#why-chinese-ai-labs-went-open-an...
reply
Always has been, it’s literally saas; the slight difference is that the lowest tier subscriptions at the frontier labs are basically free trials nowadays, too
reply
Its the new freeware model!
reply
I'm a little more optimistic than that. I suspect that the open-weight models we already have are going to be enough to support incremental development of new ones, using reasonably-accessible levels of compute.

The idea that every new foundation model needs to be pretrained from scratch, using warehouses of GPUs to crunch the same 50 terabytes of data from the same original dumps of Common Crawl and various Russian pirate sites, is hard to justify on an intuitive basis. I think the hard work has already been done. We just don't know how to leverage it properly yet.

reply
Change layer size and you have to retrain. Change number of layers and you have to retrain. Change tokenization and you have to retrain.
reply
Hopefully we will find a way to make it so that making minor changes don't require a full retrain. Training how to train, as a concept, comes to mind.
reply
And yet the KL divergence after changing all that stuff remains remarkably similar between different models, regardless of the specific hyperparameters and block diagrams employed at pretraining time. Some choices are better, some worse, but they all succeed at the game of next-token prediction to a similar extent.

To me, that suggests that transformer pretraining creates some underlying structure or geometry that hasn't yet been fully appreciated, and that may be more reusable than people think.

Ultimately, I also doubt that the model weights are going to turn out to be all that important. Not compared to the toolchains as a whole.

reply
That "underappreciated underlying structure or geometry" can be just an artifact of the same tokenization used with different models.

Tokenization breaks up collocations and creates new ones that are not always present in the original text as it was. Most probably, the first byte pair found by simple byte pair encoding algorithm in enwik9 will be two spaces next to each other. Is this a true collocation? BPE thinks so. Humans may disagree.

What does concern me here is that it is very hard to ablate tokenization artifacts.

reply
None of that is true, at least in theory. You can trivially change layer size simply by adding extra columns initialized as 0, effectively embedding your smaller network in a larger network. You can add layers in a similar way, and in fact LLMs are surprisingly robust to having layers added and removed - you can sometimes actually improve performance simply by duplicating some middle layers[0]. Tokenization is probably the hardest but all the layers between the first and last just encode embeddings; it's probably not impossible to retrain those while preserving the middle parts.

[0] https://news.ycombinator.com/item?id=47431671 https://news.ycombinator.com/item?id=47322887

reply
You took a simple path, embedding smaller into larger. What if you need to reduce number of layers and/or width of hidden layers? How will you embed larger into smaller? As for the "addition of same layers" - would the process of "layers to add" selection be considered training?

What if you still have to obtain the best result possible for given coefficient/tokenization budget?

I think that my comment express general case, while yours provide some exceptions.

reply
The general case is that our own current relative ignorance on the best way to use and adapt pretrained weights is a short-lived anomaly caused by an abundance of funding to train models from scratch, a rapid evolution of training strategies and architectures, and a mad rush to ship hot new LLMs as fast as possible. But even as it is, the things you mentioned are not impossible, they are easy, and we are only going to get better at them.

>What if you need to reduce number of layers

Delete some.

> and/or width of hidden layers?

Randomly drop x% of parameters. No doubt there are better methods that entail distillation but this works.

> would the process of "layers to add" selection be considered training?

Er, no?

> What if you still have to obtain the best result possible for given coefficient/tokenization budget?

We don't know how to get "the best result possible", or even how to define such a thing. We only know how to throw compute at an existing network to get a "better" network, with diminishing returns. Re-using existing weights lowers the amount of compute you need to get to level X.

reply
there is evidence it is useful in some cases, but obviously no evidence it is enough if you chase to beat SOTA.
reply
I do not think it's common crawl anymore, its common crawl++ using paid human experts to generate and verify new content, weather its code or research.

I believe US is building this off the cost difference from other countries using companies like scale, outlier etc, while china has the internal population to do this

reply
Any reason for them to do this other than altruism? I don’t think this can be regulated.
reply
Bake ads into them.
reply
The Chinese state wants the world using their models.

People think that Chinese AI labs are just super cool bros that love sharing for free.

The don't understand it's just a state sponsored venture meant to further entrench China in global supply and logistics. China's VCs are Chinese banks and a sprinkle of "private" money. Private in quotes because technically it still belongs to the state anyway.

China doesn't have companies and government like the US. It just has government, and a thin veil of "company" that readily fool westerners.

reply
As opposed to the US, which just has companies and a thin veil of “government”.
reply
Also many of these Chinese companies aren't just opening their weights. They are open sourcing their code AND publishing detailed research papers alongside them to reveal how they accomplished what they accomplished.

That's very different from an American SaaS model which relies of free but proprietary software for early growth

reply
I'm not sure how local AI models are meant to "entrench China in global supply and logistics". The two areas have nothing to do with one another. You can easily run a Chinese open model on all-American hardware.
reply
They are building a pipeline, and the goal is to get people in the door.

If you forever stand at the entrance eating the free samples, that's fine, they don't care. Other people are going through the door and you are still consuming what they feed you. Doesn't mean it's going to be bad or evil, but they are staking their territory of control.

reply
Oh for sure, they're getting a whole lot of Chinese people and other non-Westerners through the door already - mostly, the people who are being ignored or even blocked outright by the big Western labs. That's territory we purposely abandoned, and they're going to control it by default.
reply
I'm Aussie. Please explain to me; why should I care whether Chinese SOEs or the US tech companies are winning? Neither have my best interests at heart.
reply
Like with nuclear technology, it's not healthy for only one country to dominate AI. The cat is already out of the bag and many countries now have the ability to train and run models. Silicon Valley has bootstrapped this space. But it should be noted that they are using AI talent from all over the world and it was sort of inevitable that this technology would get around. Lots of Chinese, Indian, Russian, and Europeans are involved.

As for what comes next, it's probably going to be a bit of a race for who can do the most useful and valuable things the cheapest. If OpenAI and Anthropic don't make it, the technology will survive them. If they do, they'll be competing on quality and cost.

As for state sponsorship, a lot of things are state sponsored. Including in the US. Silicon Valley has a rich history that is rooted in massive government funding programs. There's a great documentary out there the secret history of Silicon Valley on this. Not to mention all the "cheap" gas that is currently powering data centers of course comes on the back of a long history of public funding being channeled into the oil and gas industry.

reply
>As for state sponsorship, a lot of things are state sponsored.

You can make any comparison you want if you use adjectives rather than values. I can say that cars use a massive amount of water (all those radiators!) to try and downplay agricultural water usage. But its blatantly disingenuous.

SV is overwhelmingly private (actual constitutional private) money. To the point that you should disregard people saying otherwise, just like you would the people saying cars use massive amounts of water.

reply
So an OPEN model that I can run on my own fucking hardware will entrench China in global supply and logistics how?

Contrary: How will the closed, proprietary models from Anthropic, "Open"AI and Co. lead us all to freedom? Freedom of what exactly? Freedom of my money?

At some point this "anti-communism" bullshit propaganda has to stop. And that moment was decades ago!

reply
Anything that isn't explicitly to the benefit of US interests must be against them /s
reply
So what?

I still prefer that over US total dominance.

Let them fight it out.

reply
Yeah, a lot of people are still living within the paradigm of tribalism: my team good, other team bad.

But the events of the past decade or so have clearly demonstrated that there are no "good" actors.

I personally couldn't care less who wins in the China vs US AI competition, both sides have a long list of pros and cons.

reply
I'd get a bit informed about what exactly Chinese dominance entails. Ask a few Uyghurs, Cantonese Hong Kongers, or even Tibetans.

Then decide ...

reply
Ask a few Native Americans about dominance.

Or maybe families of African descent.

Or maybe families of Japanese Americans who lived in the US during WWII.

Or maybe people of Latin descent living in the US today.

reply
The US examples you just gave happened decades (and in some cases hundreds) of years ago. The difference is that it's happening in China right now, and nobody cares.

You really don't see the difference?

reply
The US is the biggest threat to the world right now, and is actively supporting a genocide in Palestine as well as war crimes in Lebanon.

I'm perfectly happy to let the chinese get a piece of the pie and fight the US, no matter how bad they are right now.

reply
What a delusional dumb ass you are
reply
Well, isn't this what the US and really any other power in the world has always done, since forever?
reply
Why is it sad? These things are useles all around, along with the people who overuse them.

It would be a great day for humanity if people would stopping glazing text autocomplete as revolutionary.

reply