Apertus – Open Foundation Model for Sovereign AI

upvote

Apertus – Open Foundation Model for Sovereign AI

(apertvs.ai)

503 points

by T-A20 hours ago |

upvote

by maxloh20 hours ago|

[-]

Other fully open LLMs include Allen AI's OLMo 3.1 and MBZUAI's K2 Think V2, both of which have released their full training pipelines and datasets.

Nvidia Nemotron is also an open training source model, though a portion of its dataset remains proprietary.

Quoting lambda's comment:

> Note that the Nemotron models are generally stronger than Olmo and K2 Think V2 (according to Artificial Analysis benchmarks), and there is a lot of overlap in their datasets (lots of datasets are based on the same sources with different filtering, Olmo and K2 Think V2 both have used some Nemotron datasets).

> But yeah, Nemotron is a modern and fairly capable LLM, even the 122b is more capable than Deepseek R1 (a 671b model) on most benchmarks, and there's also the recently released 550b Ultra now.

https://news.ycombinator.com/item?id=48492439

reply

upvote

by soundworlds16 hours ago|

[-]

Allen AI do not get enough love. They are doing GenAI how it should have always been done.

In fact, if the frontier companies had taken their approach, it would have started much slower, but I think we would be far more advanced by 2035. Instead we have a majority of society that wants to see AI fail.

reply

upvote

by dvt13 hours ago|

[-]

> Instead we have a majority of society that wants to see AI fail.

Do you talk to regular people? I work out of coffee shops routinely and literally like 90% of laptops have ChatGPT or Claude open. I was shocked at how many of my friends love the silliest of AI features (like Slack bot summarizing your day or your upcoming meetings), and a lot of decks, proposals, SOW's, etc. are (at least in part) generated with AI these days.

reply

upvote

by dofm4 hours ago|

[-]

Depends very much on the society and the context you catch it in.

Young people who want to have secure jobs and who have any kind of experience with creativity see AI coming for their livelihoods and their joy simultaneously.

Middle-aged IT industry people like me, many of us are grudgingly learning it but believe it to be an obvious net negative the way it is currently deployed; it feels like we're automating all the wrong stuff.

I wouldn't go around talking as if people think AI is great. A solid proportion of the population would be tempted to push AI influencers under buses and trains.

reply

upvote

by dd8601fn2 hours ago|

[-]

Kids are using it a ton, too.

Everyone wants at least some of the utility.

Few want to reach the end of the road we’re said to be walking. AI companies and the CEOs of megacorps. Everyone else is being sold a doomsday scenario (true or not).

reply

upvote

by jstummbillig12 hours ago|

[-]

A quarter of US citizens use a Chatbot daily.

https://www.pewresearch.org/chart/about-a-quarter-of-u-s-adu...

It all, of course, depends on what people mean by "AI" (I think the question basically defeats itself, it's akin to asking someone about "databases", given that it covers image generation, self driving cars, TikTok feeds, drug discovery and chatbots) but AI sentiment at large is more negative than positive.

https://www.pewresearch.org/chart/americans-predict-ais-impa...

So, depending on where you sit: Sure, most people will use "AI", meaning a chatbot (probably ChatGPT: https://www.pewresearch.org/chart/americans-report-using-cha...). 90% in coffee shop land, why not.

But that does not mean that they are not weary of the consequences, and are growing more weary. I think, predictably, the better situated you are and the more your direct livelihood is at stake. That's just the animal we are.

Does that mean that we should have slowed down? Matter of opinion. My take: Absolutely not. The people who need it the most around the world will have dramatically improved lives, because of access to better medical advice or information about institutions and systems, to start things and help them in their daily lives.

reply

upvote

by dofm4 hours ago|

[-]

I think you meant "wary" of the consequences…

But this is one of those unique situations where wary (cautiousness, concernedness, preparedness, tinged with fear) and weary (exhaustion with a mental component) are overlapping into one horrible thing.

So I'm not correcting you because I think basically both are right: we are going through both of these at once because anxiety is what Scam and Wario in particular are selling.

reply

upvote

by johngossman5 hours ago|

[-]

Ironic that you should question if the commenter talks to regular people and then cite people who work on laptops from the coffee shop, use Slack etc.

reply

upvote

by alfiedotwtf10 hours ago|

[-]

I'm calling yesterday as Peak Height of AI...

I was at my daughter's football game, and another father from the club came up to me and asked if I were in IT and knew how AI worked. He then asked if I could help him setup an AI agent to generate passive income.

We're at the equivalent of December 2017 for crypto. Hang on to your hats!

reply

upvote

by rz2k7 hours ago|

[-]

It’s rare to find an example in English that demonstrates the difference in meaning between subjunctive and conditional.

Was it a two part question converted into one with a gate at the beginning, or was a general question about occupations and abilities?

reply

upvote

by 4 hours ago|

[-]

deleted

reply

upvote

by scjody5 hours ago|

[-]

They use it, but do they love it or do they feel like they need it to do their best work and stay ahead?

I hate cars but I still drive to the office 1x / week because I have to.

reply

upvote

by apercu3 hours ago|

[-]

Informed society is getting tired of unethical finance bro technocrats buying political influence and power with ill-gotten gains.

reply

upvote

by a_136_chiffa5 hours ago|

[-]

LLMs were invented by AI2, before Transformers were a thing - with RNN-based ELMO.

reply

upvote

by polytely8 hours ago|

[-]

I don't want AI to fail but I would like to see Altman anf Musk fail for example. I'm very uneasy with the power hungry silicon valley freaks that are running the show at these labs/companies. Hassabis seems like the only one that is not actually Evil.

reply

upvote

by waffletower2 hours ago|

[-]

I don't think Dario is completely evil, but he can't see his obvious naiveté that the rest of us see clearly (vis a vis the Trump administration), and his paternal hubris, only Anthropic should win and control AI, should be perceived as far worse than Bill Gate's desire to control the internet in the 90s. The fact that Microsoft invested so heavily in OpenAI blinded me to Anthropic's potential villainy for years.

reply

upvote

by sawjet14 hours ago|

[-]

Is there any evidence that "a majority of society wants AI to fail"

Or is it just vibes?

reply

upvote

by markerz13 hours ago|

[-]

There’s a few polls that have shown most people use AI, but they also dislike it. I’m in that boat, where my company pays for my subscription, and I use it to be productive. But I don’t really feel good about it.

https://gizmodo.com/people-hate-ai-even-more-than-they-hate-...

reply

upvote

by watwut10 hours ago|

[-]

Does it count when I hate the Dario, Altman and their weird cult more and more, every time they open their mouths? I think that I would not hate the tech in isolation, but considering who tech elite became, their rhetoric and how they behave , I want them fail just because of that.

reply

upvote

by intended9 hours ago|

[-]

https://news.ycombinator.com/item?id=48573332

Was discussed just recently, and there are multiple articles and surveys on AI sentiment.

reply

upvote

by hit8run15 hours ago|

[-]

Care to elaborate on this?

reply

upvote

by AndrewKemendo16 hours ago|

[-]

Fully agree with this and they were leading robotic learning as well even back to 2019.

IsaacSim was (and might still be) the best robotic learning sim and I ran MLAgents.

reply

upvote

by typ10 hours ago|

[-]

> an open training source model

It's always funny to see people tempted to call open-blobs/open-weights, which are literally shareware like WinRAR or Adobe PDF Viewer, open source, and then need to invent a new term for what is actually open source.

reply

upvote

by maxloh4 hours ago|

[-]

Nemotron is vastly different from standard open-weight models. Its entire training pipeline is open-sourced, while other vendors typically only release the model weights.

reply

upvote

by wrs1 hours ago|

[-]

Right, thus it is actually “open source”, and they shouldn’t need to invent a new term “open training source”. But the others have already effectively trashed the meaning of “open source”.

reply

upvote

by vcryan17 hours ago|

[-]

Maybe I'll give Nemotron another try. Yesterday I used the latest one on OpenRouter and it was bad - worse than StepFun

reply

upvote

by SwellJoe20 hours ago|

[-]

I like the idea, and it has become more pressing that everyone outside the US think about tech sovereignty because the US has become an unsafe place to keep your data, but the impression I get from Apertus is that it moves at the speed of a committee. I have no expectation they'll deliver a competitive model. At least, not competitive with current models. Maybe competitive with models a year ago (though they haven't even done that yet, right?).

reply

upvote

by nezuzen19 hours ago|

[-]

"the US has become an unsafe place to keep your data"

I empathize with this but curious what would make any other country a better safehaven for your data? I personally like the EU's approach to data safeguards, but are there other locales/data protections you have in mind that would keep your data "safe".

reply

upvote

by mark_l_watson5 hours ago|

[-]

I live in the USA and I use a European LLM as a daily driver: Proton’s lumo+ that does a good job packaging a Mistral model for general chat, with good searchable chat history — all with adequate privacy guarantees. Well worth the money.

I purchase open model tokens for agent programming assistance, and I like lumo+ for everything else.

Another option is DuckDuckGo’s Duck.ai subscription, but I slightly prefer ProtonMail’s lumo+ packaging as a product.

reply

upvote

by eric_cc4 hours ago|

[-]

Do you go through all this trouble out of principle or necessity?

reply

upvote

by mark_l_watson4 hours ago|

[-]

Out of principle. Also, I am an old man and retired - makes it easier to give up a little bit of productivity.

reply

upvote

by 3 hours ago|

[-]

deleted

reply

upvote

by kitd11 hours ago|

[-]

The law varies from country to country, but at least I vote for the legislators creating the laws governing my local sovereign AI.

reply

upvote

by tensor13 hours ago|

[-]

Putting aside reliable rule of law, as others have pointed out, it seems unwise to keep your data in a country that has repeated threatened to annex or invade yours.

reply

upvote

by digitaltrees19 hours ago|

[-]

The rule of law exists in other countries in a way it does not in the US right now.

reply

upvote

by SubiculumCode18 hours ago|

[-]

Can you give examples?

reply

upvote

by digitaltrees13 hours ago|

[-]

Is this a good faith question? It would take several hundred pages to document even a fraction of the violations.

How about deporting people without a hearing or opportunity to present evidence about their charges. And then violating the judges order to turn the planes around.

How about systematically ignoring judicial rulings.

How about detaining people based on the color of their skin and spoken language/accent.

How about violating the emoluments clause of the constitution by accepting a personal airplane.

How about sending your son in-law, who hasn’t been appointed to any office with the advice and consent of congress as required by the constitution.

How about refusing to seat elected congress members for reasons for months.

How about singling out companies like intel for targeted trade restrictions and then demanding equity in order to lift them.

What about threatening to delay or deny a merger of a media company unless your ally is allowed to buy them.

What about refusing to enforce the TikTok ban until you can arrange a buy out to an ally.

What about a formal market with a known price for pardons and commutations.

What about stating multiple wars without congressional approval.

What about creating a fake department named Doge that withholds funds apportioned by congress and breaks contracts that have explicit obligations for payment that results in more termination fees and losses than the savings. All without congressional approval.

How about threatening to withhold federal funds from states with governors of the opposing political party but not your own? Remember the president is supposed to execute the law congress passes not make law or arbitrarily enforce it based on their own political needs or values.

reply

upvote

by einpoklum10 hours ago|

[-]

> How about deporting people without a hearing or opportunity to present evidence about their charges.

Not to detract from your general point about the US, your first point is something that's happened recently in Switzerland:

https://truthout.org/articles/swiss-police-arrest-deport-pal...

reply

upvote

by antoinealb4 hours ago|

[-]

And a Swiss court decided that this was illegal and disproportionate [1]. Rule of law does not mean that nothing illegal happens in the country (that's obviously impossible to guarantee). It means illegal acts have consequences.

[1] https://www.bvger.ch/en/newsroom/media-releases/fedpol-must-...

reply

upvote

by intended9 hours ago|

[-]

That distracts from the point in favor of what, in this context, is a detail.

There are always incidents in all democracies with millions of people, that contravene the expectations of rule of law and integrity of its systems.

The US has degenerated significantly in the past few years, to the point that when someone asks “can you give examples”, I expect a disingenuous ploy more than genuine ignorance. The list of breaches is so long, that listing it results in numbness and exhaustion of the mental muscles responsible for being aghast.

reply

upvote

by sinuhe6915 hours ago|

[-]

Searching and seizure of your laptops, including your personal phones without a probable cause or warrant.

Compel you to reveal your secrets, including your passwords by threatening to arrest and detain you without legal proceedings for an unspecified period.

Deny your basic human rights, particularly at the borders, especially if you aren’t a citizen.

And more.

reply

upvote

by brandensilva17 hours ago|

[-]

Illegal tariffs, executive usurping congress power of the purse, Noem funding herself and friends with a commercial from an unknown entity with tax payer money, people in ICE/FBI handing over undisclosed unaccounted money in brown bags, insider trading is rampant, using funds inappropriately to fly girlfriend places that isn't official business, illegally using private money to fund public projects, taking bribes from foreign nations like jets and such violating emulation clauses, passing no bid contracts to people you know, using the pardon power inappropriately to pardon crypto scammers and other white collar crimes, moving notorious Epstein related criminals to a low security prison without going through the courts, avoiding justice for sex crimes of the rich, using the DOJ as a political cudgel, and the list goes on.

reply

upvote

by sscaryterry17 hours ago|

[-]

Wow, this is a bit obtuse.

It is a commonly accepted "fact" right now, outside the US, that the US is not to be trusted (right now), due to some orange guy, and his mates, manipulating markets, running their mouths, doing all kinds of criminal and/or infantile shit.

I'd say there is quite a bit of evidence for this all around.

reply

upvote

by eric_cc4 hours ago|

[-]

> infantile shit

I think it’s valid to not trust the US with your data. But if the reason is some TDS “Orange Man Bad”, it’s you that’s acting infantile.

reply

upvote

by sscaryterry3 hours ago|

[-]

I don't know what else to call it? Seriously, 2 words, and I'm sure it is spot-on.

reply

upvote

by SubiculumCode16 hours ago|

[-]

Hardly obtuse. It's good to be specific when making broad claims. The graft of Trump is a big problem, IMO, but the claim was larger than that, as being something about America's system of Law and Justice, and I don't see these as being completely busted (yet) by the Orange Man

reply

upvote

by digitaltrees13 hours ago|

[-]

Sorry but questioning that claim at this point verges on bad faith or credulity.

Ask intel, paramount, TikTok or anthropic if they feel law will be applied equally to all companies.

Ask the blue states that had fema funding withheld when it went to red states.

Ask black families that haven’t gotten reparations when Jan 6 rioters that beat and killed cops to over turn an election will get almost $2b in reparations and then had the Supreme Court throw out their votes in Louisiana in the middle of an election to overturn the voting rights act, redraw districts, overturn their own case law and the principle that judicial review shouldn’t happen too close to an election so they could redraw the districts.

Business leaders are sucking up to curry favor. That by definition isn’t the rule of law it’s the rule of dispensation. It’s the spoils system.

If you have a counter argument you’d better make it now or you will tip your hand.

reply

upvote

by rob747 hours ago|

[-]

Well, the system allowed Trump to be elected, twice, and the system hasn't (so far) prevented him from abusing his office in the ways mentioned. So it's fair to conclude that the US system is the problem, not the symptom called Trump. And if that's the case, it's also fair to conclude that the US is no longer trustworthy, because Trump could happen again.

reply

upvote

by SwellJoe16 hours ago|

[-]

It isn't completely busted, unless the Trump administration has a personal interest in overriding the law. As sometimes happens when some foreign power, or just a random politician in another nation, does something he doesn't like. Or, when Trump has a personal stake in some other outcome. Who wants to gamble that Trump won't decide to wreck your businesses, sabotage your defenses, or spy on European citizens? We now know most of the major tech companies won't object to information requests, and probably won't even reveal that they've given access to the US government. US citizens maybe still have some protections, but everyone else seems to be fair game.

Frankly, I'm surprised there's not more urgency on the part of Europeans to reduce dependence on US tech. I don't like it. I'm an American in tech. But, the US can't be trusted, at this time. And, given how irresponsible tech leadership has been, in kowtowing to Trump, I don't see how they can reasonably be trusted, either.

reply

upvote

by digitaltrees13 hours ago|

[-]

They are moving swiftly actually. France announced the government is moving to Linux, several other countries are moving off of aws and Microsoft.

I invest in startups and companies at every stage are losing contracts in Europe specifically for this risk. I can’t say who but it’s a multi front trend.

reply

upvote

by jimbokun15 hours ago|

[-]

It’s not clear whether Europe has the capability to compete with US tech right now.

reply

upvote

by SwellJoe15 hours ago|

[-]

It obviously does not. But, there is nothing preventing it. The US has given away all of our foreign scientists, if Europe wants them. All Europe has to do to take the lead in tech is ramp up research spending by an order of magnitude or two to match what the US used to spend (the US still outspends Europe on research, even after massive haphazard cuts and disruptions). Europe also has to welcome immigrants. Another thing many European countries have not always been great at, and some recently have become quite bad about. The regressive nationalist right is ascendant in many places, including some European countries.

reply

upvote

by digitaltrees13 hours ago|

[-]

I am going to ramp up building open source alternatives to every part of the stack. I am encouraging every YC founder to do the same. I am buying as much hardware as I can afford to have my own inference and training stack and funding researchers at Duke and CM to strengthen local and open source AI.

I am also assembling the largest in home robotics training data set available which will be open source.

Want to help?

reply

upvote

by advael9 hours ago|

[-]

Kinda, yea. I've never been able to afford to fully prioritize values-alignment in my work, but it is something I care about, and building anything proprietary and US-controlled feels increasingly bad, because even if a company's mission isn't evil, the state has demonstrated a strong willingness to force their hand if they can be useful to them at all, and punish them arbitrarily if they do anything that the ruling party dislikes. I do have bills to pay, but if you can meet my relatively (as tech workers go) modest needs and have a real plan to make something that enables rather than impedes digital sovereignty, I'd be interested in hearing what I could do to help

reply

upvote

by SwellJoe11 hours ago|

[-]

The kind of funding it takes to take on US tech corporations, especially in AI, will be astronomical. For an open source solution, it will take state action, and given how unpopular AI is with average folks (an entirely reasonable position for average folks to take when they see the new robber barons who're leading the AI charge), I'm not confident there's political will for it. If a few of the larger rich European nations really committed to funding research at a level competitive with the US, though, even if not specifically AI-related, the result would be an eventual end to US tech hegemony.

I was hoping the European AI companies and projects like Mistral and Apertus would, you know, do something good. But, their models trail not only US models, but Chinese models, including smaller ones, by a significant amount. I guess there's also the ethical component. Mistral is reportedly not plagiarizing like US companies, and isn't distilling US models like the Chines companies. Cheating gives one a leg up if there are no referees.

Anyway, I work for a robotics company, and I'm always interested in what's happening with open robotics stuff, including AI.

reply

upvote

by markhahn13 hours ago|

[-]

any US tech? not even for specific purposes? yeah, if there was some kind of forcefield around the US, most of the world would have tech troubles at one level or another. but so would the US.

and really, the topic here is reducing a transgressive President from infringing tech activities elsewhere (used to be mainly about surveillance, but then trump happened).

reply

upvote

by iwontberude16 hours ago|

[-]

It’s been cooked for longer than Trump. Al Gore won in 2000 and they stole the election. Everything that followed has been a complete fuckfest.

reply

upvote

by mannanj3 hours ago|

[-]

How about spying on, experimenting on, and conducting in-person psyops on a US citizen for reasons of calling the Spy Agencies terrorist organizations on social media and whistleblowing their online astroturfing accounts? My whistleblowing consisted of calling particular online accounts as deep state accounts, and I was reaching thousands of voices.

They decided that spying on me in a commune in Hawaii, and then following me after to other public spaces was fine. I'm certain something was put in my food based on behavior I saw in communal meals, and I can't say I took video or photo evidence though I wish I did.

I'm of Pakistani descent, held a former secret clearance, and I did not break any oaths or violate any laws though the way I was treated was certainly how the above person described rule of law: our spy agencies for example operate completely without accountability and regularly commit atrocious behavior against US citizens beyond just me.

reply

upvote

by MrDrMcCoy17 hours ago|

[-]

Iceland and Switzerland are probably the best places to keep your data safe. I'd put Norway, Sweden, Germany, and the Netherlands after that, though I don't have much specifics on how good they are at privacy these days.

reply

upvote

by SilverSlash16 hours ago|

[-]

lol the new "swiss banks". store all your dirty data in digital swiss lockers

reply

upvote

by OkWing9915 hours ago|

[-]

I think US is the only country that's asked to limit their frontier model access based on the Citizenship of the user.

Let's say Gemini gets to AGI by tomorrow, will my Google account access, or Gemini apps access and data be blocked if I'm not a US citizen? (Anthropic did it with a 5% better model).

If US is classifying the model access based on citizenship, that's similar to treating it as a Defense capability.

reply

upvote

by sawjet14 hours ago|

[-]

I, as a US citizen, also cannot access claude fable.

reply

upvote

by yathaid13 hours ago|

[-]

You cannot access Fable because Anthropic can't reliably tell whether you are a US citizen. The govt order is based on export controls to non-US citizens.

You can already imagine Anthropic working with a bunch of shady brokers to "remedy" this situation.

reply

upvote

by bloppe3 hours ago|

[-]

https://x.com/AmrithRamkumar/status/2067059417678336455

This particular order wasn't actually about citizenship at all. It seems the administration simply believed restricting the order to non-citizens would make it easier to defend in court, but they made it knowing full well that the only way to implement it would be to completely shut off access for everyone.

reply

upvote

by 14 hours ago|

[-]

deleted

reply

upvote

by PeterStuer12 hours ago|

[-]

Most people have had to reluctantly accept their own totalitarian state will control them. They do not want another state to have the same or even more power over them.

reply

upvote

by AndrewKemendo16 hours ago|

[-]

No country is safe. You need to host your own end to end on your own infrastructure if you want to be free.

Stallman was correct in the 80s and is correct now about libre software

reply

upvote

by jhancock16 hours ago|

[-]

From a legal perspective the US may be safer than other places if the US is the one seeking your data. The US doesn't need legal process to authorize digging into your foreign server.

From a practical perspective, I'm not sure any servers are safe anywhere...depending on who may want your data.

reply

upvote

by markhahn13 hours ago|

[-]

I guess you mean assuming your data is stored somewhere in the clear (and whole).

I'm surprised there isn't a lot more attention to encrypted, distributed, erasure-encoded stores.

reply

upvote

by mrshu19 hours ago|

[-]

By far the most impactful product of the Apretus project are the people. To quote a memorable line from Dominique Paul (https://www.thisiscrispin.com/):

> What most people miss IMO is that this is not a team who is doing this for the fourth time like virtually any other LLM provider and who could learn from its own past experiences. I bet if the team would do another model training they could get way better results at one fourth of the costs.

reply

upvote

by pferde20 hours ago|

[-]

For a model that claims to focus on many languages, it's quite unreliable when it comes to simple questions like "how to say X in language Y" or "how to conjugate verb X in language Y". It keeps hallucinating words that do not exist, and when corrected, it only hallucinates a new lie.

reply

upvote

by 8note19 hours ago|

[-]

it probably doesnt know what language each set of words is referencing.

i doubt they are including a lot of training data labeled with the language.

"how to say X in language Y" is a different task from saying X in language Y

reply

upvote

by einpoklum10 hours ago|

[-]

Actually, it isn't all that different. There are only two words separating "how to say X in language Y" from "say X in language Y". And this "vulgar" metric is actually quite relevant for an LLM, which answers based on conversational context.

reply

upvote

by throwaw1220 hours ago|

[-]

Looks like their instruct models are Llama3.1 fine tune from last year. Is there any progress on new models?

My last hope for soverign AI is from Chinese open models

reply

upvote

by kordlessagain20 hours ago|

[-]

Sovereign AI is not about using just one model. It's about using the right model for the right job, and getting them to talk through the solution TOGETHER before presenting the answer.

If you want to mix models like this, check out https://github.com/deepbluedynamics/nemesis8

reply

upvote

by wg013 hours ago|

[-]

You might dismiss it as nothing but the Linux analogy does not work here either. It is more than that and direct threat to commercial AI labs and their business model. These labs are milking bunch of foundational papers for years now and the end is near.

Going forward would be such open source, open data and open recipe models possibly someday even with the training being crowd sourced if not inference like the BitTorrent model.

Lastly, even Chinese models (GLM, Deepseek, MiMax) work really really good and any user would testify that they do not miss OpenAI/Anthropic/Gemini at all if they're using those Chinese models which is argument enough that with such models, no one is going to miss Chinese models as well.

reply

upvote

by zitterbewegung16 hours ago|

[-]

Sort of interesting license not sure if anyone will do it long term.

The training data and the Apertus LLM may contain or generate information that directly or indirectly refers to an identifiable individual (Personal Data). You process Personal Data as independent controller in accordance with applicable data protection law. SNAI will regularly provide a file with hash values for download which you can apply as an output filter to your use of our Apertus LLM. The file reflects data protection deletion requests which have been addressed to SNAI as the developer of the Apertus LLM. It allows you to remove Personal Data contained in the model output. We strongly advise downloading and applying this output filter from SNAI every six months following the release of the model.

reply

upvote

by reconnecting19 hours ago|

[-]

A chat interface where you can try Apertus:

https://chat.publicai.co

reply

upvote

by einpoklum10 hours ago|

[-]

You will need to register with an email and password though, i.e. your sessions will be recorded and identified.

Also even after you do that, and start a chat, you currently get:

  "JSON.parse: unexpected character at line 1 column 1 of the JSON data"

so it's not quite there yet.

reply

upvote

by Bobaso6 hours ago|

[-]

Apertus V1 performance were sub-par. The Team is working on v2 ATM. Looking forward to testing it.

reply

upvote

by khalic5 hours ago|

[-]

I don't know, I'm implementing a translation system right now, and Apertus is very good for the model size. I wished they added some chain of thought training to increase precision and context understanding.

reply

upvote

by yreg20 hours ago|

[-]

previous thread: https://news.ycombinator.com/item?id=45108401

reply

upvote

by jawns18 hours ago|

[-]

I am curious about how opt-outs and PII removal work.

Who confirms those requests are legit?

reply

upvote

by naklitechie13 hours ago|

[-]

What's the community's take on Sovereign AI being funded by states around the world?

Why the emphasis on sovereign? Open is good enough. No?

reply

upvote

by khalic5 hours ago|

[-]

It was in reaction to the possible threat of main actors restricting use. The latest US gov stunt with Fable just made it concrete and pressing.

reply

upvote

by luplex7 hours ago|

[-]

Sovereignty is a political buzzword. From the political point of view, you want your country to be as independent as possible. This means you need the capabilities to build and deploy good AI models. Initiatives like this are more about capability-building and less about LLM-building.

Why do we need capabilities in Europe? Because Trump and Xi can't be trusted to keep providing us with new frontier models in the next years.

reply

upvote

by trvz20 hours ago|

[-]

The previous version of this model has been pretty bad, but claimed to adhere to copyright laws. However, based on my testing, that's not true either. So in my view this is completely useless.

reply

upvote

by dofm42 minutes ago|

[-]

So far the smallest model I have actually seen behave in a way that feels consistent with the contemporary LLM chat experience is Gemma 4 12B. (The QAT build particularly). The E4B model is not bad — it has a good conversational flow, it responds well if nudged — but the 12B model feels capable.

Nothing below that really seems to be good for anything other than training for specific tasks. I have not been impressed by the earlier Apertus 8B model, which doesn't feel like it really responds to nudges.

I am a strong believer in smaller models, so I might try one of these out of curiosity to see if it might do useful things in limited contexts.

reply

upvote

by embedding-shape20 hours ago|

[-]

As long as the following remains true, this release ends up a bigger contribution to science at large than most other models trained "behind closed doors":

> Fully open model: open weights + open data + full training details including all data and training recipes

reply

upvote

by coder54319 hours ago|

[-]

Is a recipe useful if no one likes it?

There are equally open, much more useful models out there: https://artificialanalysis.ai/?models=nvidia-nemotron-3-ultr...

reply

upvote

by khalic5 hours ago|

[-]

Nemotron still has partial closed data. Having multiple models to chose from is a good thing

reply

upvote

by simonw20 hours ago|

[-]

It uses fineweb, which is derived from Common Crawl, which is an unlicensed scrape of web pages.

reply

upvote

by reedciccio5 hours ago|

[-]

You don't need a license to scrape the public web and analyze it, turn it into tokens and other transformations. Let's not expand copyright beyond the horrible monster it already is.

reply

upvote

by simonw4 hours ago|

[-]

I think it's likely that US law will continue to find training on scraped, unlicensed data to be legal.

That doesn't mean much to the many people I know of who refuse to use a technology that they see as being unethically created using the work of others without compensating them.

I continue to hope that someone will train a "vegan" model on licensed or out-of-copyright data so those people can experience the benefits of this class of technology.

(I compare them to vegans because, like vegans, I think their ethical position is credible and has merit even though I do not choose the same ethical framework for myself.)

reply

upvote

by markhahn13 hours ago|

[-]

I'm curious how you test; could you explain? Do you have a set of factoids that should be subject to copyright, but are somehow literally (whole work) generated by the model in question?

reply

upvote

by neom18 hours ago|

[-]

I'm curious to know what stuff like this means for cohere? Their whole value prop is Sovereign AI. It seems they spent a lot of money developing models but own none of their own infra, what is the point of a country spending a lot of money on coheres solutions when stuff like this is becoming increasingly available and usable? Feels like I must be missing something here??

reply

upvote

by uberex12 hours ago|

[-]

Being childish I https://oss.zuericitygpt.ch/?q=hello+talk+like+a+pirate

reply

upvote

by atemerev20 hours ago|

[-]

I use it extensively. It is not ready for agentic use, but as a generic driving model for RAG use cases, it is pretty competent. You can build useful software with it.

reply

upvote

by MASNeo19 hours ago|

[-]

I use Apertus including as the driver for an agent, not a coding agent. Find it useful enough. What was your Challenge?

reply

upvote

by atemerev8 hours ago|

[-]

Legal consulting.

reply

upvote

by dTal19 hours ago|

[-]

It's good that there is a movement for open LLMs, but it's not where the battleground is right now. The battleground is local vs service LLMs, and we are losing that battle badly despite all the software being here now and viable, entirely because UX sucks.

How many normal people do you know who use "ChatGPT"? A lot, probably.

How many even know what "Gemma" is, let alone have downloaded llama.cpp, a GGUF file from Hugginface, and run "llama-server" from a text console with all the correct command arguments? How many are thinking about this use case when speccing out their next computer? Where is the breathless marketing copy boasting x tok/s?

We are sleepwalking into slavery.

reply

upvote

by 62746718 hours ago|

[-]

"Normal people" have never bothered to host their own: photos, music, videos, documents, comunications, etc. To the point that for many their computer is essentially a thin client into someone else's server. Why would we think this same people would care about "personal" inference?

reply

upvote

by trollbridge17 hours ago|

[-]

Normal people can go open an account at DeepSeek or Xiaomi and chat away for free. Or, for that matter, a couple other models like z.ai's (GLM-5.2 isn't in the free tier, though, but neither is GPT-5.5-Pro), or Qwen, which does have 3.7-Max for free with no account on their chatbot interface.

Yes, I realise this isn't "running a local model", but it's using models that can be grabbed and run locally. For my pipelines, I feel far more confidence when I use an open model (even one like GLM-5.2 that would be expensive for me to run) since I have a backup plan if the hosted/cloud option becomes unworkable for me. If that happens to me with Opus, I have zero options.

reply

upvote

by cdata17 hours ago|

[-]

If our strategy to avoid "slavery" involves "normal people" taking the local-vs-managed choice seriously, we have already lost.

This choice is made for us. The deciding factors will be convenience and economics.

My sense is that just like Web 2.0 SaaS we are destined for servitude.

A better strategy is to play an assymetrical game IMO. Don't let your would-be master write the rules by which you play.

reply

upvote

by yeeeloit16 hours ago|

[-]

> A better strategy is to play an assymetrical game IMO. Don't let your would-be master write the rules by which you play.

What do you mean by this? Do you have an example in the given context?

reply

upvote

by At1C9 hours ago|

[-]

[dead]

reply

upvote

by 8note19 hours ago|

[-]

normal people dont really have the hardware to run local models

reply

upvote

by dTal7 hours ago|

[-]

Anyone with an M-series Apple computer can run something very competently. Mac Pro users can run 30B class models which is good enough for the vast majority of practical everyday purposes, far better than the original ChatGPT was. Anyone with a gaming computer is in a similar situation. The rest of us can still run stuff, just not as big or as fast.

reply

upvote

by sosodev18 hours ago|

[-]

They have it, we just haven’t enabled them. The smart model with a chat box is the wrong abstraction for local. Ideally we would have it built into applications as a clear and easy to use opt-in feature. Like allowing a user to index a folder on their hard drive and then search it semantically via embeddings. You could do that on fairly low end hardware these days. Like 2GB of RAM with any processor made within the last 10 years.

reply

upvote

by manithree18 hours ago|

[-]

They may not right now, but the whole point of Microsoft's Copilot+ PC standard (even though it's somewhat anemic) is to run models locally. Apple Silicon with enough unified memory is capable. Not to mention modern iPhones and Pixels have fairly capable NPUs and routinely run local models. So, we may not be to the point where most normal people have the hardware to run local models, but it is rapidly approaching.

reply

upvote

by Danox15 hours ago|

[-]

As time goes on, they’re almost certainly will be very capable local models in the long run we (general computer users) aren’t going back to the era of mainframe computing no matter how much OpenAI, Meta or Google would like us to.

reply

upvote

by dTal7 hours ago|

[-]

We aren't? Are you sure? Where is your email inbox? Where are your backups? Where are your music files? For most people the answer to all those is "someone else's computer".

reply

upvote

by trollbridge17 hours ago|

[-]

Gamers can run Qwen 3.6 quantised models now.

You would also be shocked what's possible on a 64GB Mac Studio, which isn't that unattainable.

reply

upvote

by conception17 hours ago|

[-]

Google Edge Gallery is turn key for people and on the device most people chatgpt on. Just like with most Google Stuff “edge gallery” is maybe the worst name possible for “run AI on your phone”!

reply

upvote

by theptip18 hours ago|

[-]

Why do you feel the important part _now_ is where the weights get run?

I can see this as a future battleground but access to frontier models (which you cannot run locally) seems a lot more relevant today.

reply

upvote

by dTal7 hours ago|

[-]

Because the local LLMs available today are already fantastic, and the difference between no LLM and an open weights LLM is much smaller than the gap between an open LLM and a so-called "frontier" model.

It's important that people get used to the idea that your interactions with a language model are a highly personal thing. LLMs can perceive and categorize us in ways we can't even imagine, far more violently than the simple algorithmic feeds which have already corroded public discourse so much. LLMs can control us. LLMs warp the information landscape more radically than even the internet did. Even now you are likely underestimating their role in future society.

The principles of software freedom are becoming existentially important.

reply

upvote

by itkovian_17 hours ago|

[-]

You can’t run a closed llm locally. Strange to frame the dichotomy as between local and open. One begets the other.

reply

upvote

by idiotsecant19 hours ago|

[-]

Better UX does not buy you a datacenter farm to train state of the art cutting edge models. Right now the only people who can do that are the technobility class.

reply

upvote

by dTal19 hours ago|

[-]

It does not, but it might encourage more people to care. Worrying about training is a luxury when you are starting from a baseline of "OpenAI spies upon me and controls my access". Let's focus on getting every Tom, Dick and Harry 1) on board with LLMs, because they're happening, 2) habitually using local software.

reply

upvote

by trollbridge17 hours ago|

[-]

The same used to be true of being able to program computers and compile software.

Of course the frontier will always be unattainable, but that's like pointing out that I couldn't buy my own Cray supercomputer.

reply

upvote

by azinman218 hours ago|

[-]

> We are sleepwalking into slavery.

That’s a bit hyperbolic…

reply

upvote

by MrDrMcCoy17 hours ago|

[-]

Some hyperbole is useful. The problem is real and serious, though short of the specific verbiage.

reply

upvote

by 0gs18 hours ago|

[-]

it's funny because i made this thing (called enough) that aims to make it easy for non-technical people to get up and running with local models quickly, but it is impossible to figure out how to break through the noise. every thread and comment like this breaks my heart a lil bit

reply

upvote

by dTal7 hours ago|

[-]

Link? You have to tell us if you want to break through the noise!

reply

upvote

by 0gs4 hours ago|

[-]

sure! github.com/0gsd/enough ; enough.support has some FAQs. i did post a Show HN on it and i intend to do so again sooner than later haha

reply

upvote

by double0jimb019 hours ago|

[-]

Yea, anyone who understands what makes products actually usable is opting to get paid for said skill.

reply

upvote

by bsder17 hours ago|

[-]

> we are losing that battle badly despite all the software being here now and viable, entirely because UX sucks.

Yep. I'm an old time Linux sysadmin, but I am COMPLETELY baffled as to what I can or cannot run on my 32GB R9700 with 128GB main CPU memory.

If I want something Claude or Codex like what do I use that would be useful? If I want a chat system, what do I use? Images--apparently ComfyUI for setup but after that what do I do?

I don't even mind spinning up something in the cloud for a bit, but I need to know how I'm going to get data up and down without racking up massive bandwidth charges.

I'd love to do some tinkering, but the field is moving so fast and so full of charlatans that cleaning the dross out is almost impossible.

reply

upvote

by entrope6 hours ago|

[-]

For coding, Qwen3.6-27B with MTP should fit in 32GB with almost full context length for Unsloth's 5-bit quantization. That's my preferred choice for a local coding agent on similar hardware: the quality delta compared to a MoE model is IMO worth the extra wait. (And I haven't found a model with 70B-120B parameters that works better for coding.) For general chat, maybe gpt-oss-120b? It should have more general knowledge than a 30B-class model; I've used it to suggest itineraries for trips and to review the completeness of small requests for proposals.

I don't have recommendations for images because I haven't played with those.

reply

upvote

by markhahn13 hours ago|

[-]

these days, even completely mainstream distros (Fedora here) include ollama, which leverages a wide range of hardware and range of models. (it's generally useful to install a more recent ollama, though.) there are free coding harnesses too.

reply

upvote

by dTal7 hours ago|

[-]

ollama is just a wrapper around llama.cpp, and a pretty janky one at that. You're much better off using it directly.

reply

upvote

by wmf19 hours ago|

[-]

LM Studio

reply

upvote

by JSR_FDED15 hours ago|

[-]

From a sovereign AI perspective, how does this compare to Mistral?

reply

upvote

by luplex7 hours ago|

[-]

It's a different country, and Switzerland is not even in the EU.

reply

upvote

by JSR_FDED7 hours ago|

[-]

True. What France (as an EU member) and Switzerland (not as an EU member) share is a desire for sovereign AI. I am interested how their efforts compare, and how their LLMs thus far compare.

reply

upvote

by holistio17 hours ago|

[-]

Knowledge cutoff is March 2024. Incredible.

reply

upvote

by uberex12 hours ago|

[-]

Does anyone care about this anymore with context windows and tool harnesses.

reply

upvote

by pizlonator15 hours ago|

[-]

> compliant at scale

The jokes write themselves.

reply

upvote

by _pdp_20 hours ago|

[-]

I want to believe.

reply

upvote

by david_shi17 hours ago|

[-]

These models don't seem very competitive, who's their target audience?

reply

upvote

by poplarsol17 hours ago|

[-]

Europeans who fetishize "compliance".

reply

upvote

by markhahn13 hours ago|

[-]

residents of the universe who recognize the US as a supply-chain risk.

no, actually, from the docs it sounds mainly motivated by the country's unique linguistic requirements.

reply

upvote

by 399753157817 hours ago|

[-]

[dead]

reply

upvote

by dangoodmanUT18 hours ago|

[-]

How are they going to be competitive with top models at 70B size?

reply

upvote

by kennywinker13 hours ago|

[-]

Qwen et al shows size isn’t actually the only useful metric for an llm.

reply

upvote

by nisten17 hours ago|

[-]

As an opesource AI researcher with a lot of models and datasets on huggingface I am very appreciative of these types of project but we are ignoring the elephant in the room here ( or lack of )

the swiss have no gpus

reply

upvote

by T-A10 hours ago|

[-]

the Apertus model was trained on the Alps supercomputer, operational at CSCS since September 2024, a data center of over 10'000 top-of-the-line NVIDIA Grace-Hopper chips

https://log.alets.ch/110/

reply

upvote

by kennywinker17 hours ago|

[-]

How is this a real problem? Genuine question, because i don’t really understand the urgency of everyone buying up ram and gpus as prices for those skyrocket.

I can run the 8B version of this swiss-ai model on a ten year old GPU. For the larger one, $2000 consumer hardware can run it fine. Beyond that, there are plenty of places where time on a GPU can be rented, and if the model is good, there will be hardware to run it.

reply

upvote

by pu_pe10 hours ago|

[-]

You can run it, but you can't train it. While this type of toy model could actually be trained in Swiss equipment, a state-of-the-art LLM probably could not.

My charitable reading of GP's point is that the bottleneck for true compute sovereignty is the chips, not the models.

reply

upvote

by khalic5 hours ago|

[-]

Do some research before posting that kind of stuff

reply

upvote

by markhahn13 hours ago|

[-]

why do you say the Swiss have no gpus?

reply

upvote

by markab2118 hours ago|

[-]

I'm mildly surprised that more people aren't using Nemo models for this reason. We've moved most of our processing to a combination of Nemo Ultra and Super, with some support for multi-model-specific tasks on Omni. The setup is working REALLY well for us, and I'm comfortable with the more measured pace of improvements. We work with many long-context problems, and the ecosystem is great.

There were a number of use cases where we needed to use Gemini (audio modality), and Ultra has been a VERY cost-effective alternative once we got through the nuances.

reply

upvote

by khalic5 hours ago|

[-]

[dead]

reply

upvote

by firstrowraver10 hours ago|

[-]

apertvs.ai? seriously?

reply

upvote

by andrewshadura12 hours ago|

[-]

Not to be confused with Apertium and Apertis.

reply

upvote

by flixspiek2 hours ago|

[-]

[flagged]

reply

upvote

by runnig10 hours ago|

[-]

[dead]

reply

upvote

by jocelyner10 hours ago|

[-]

[flagged]

reply

upvote

by yashthakker14 hours ago|

[-]

[flagged]

reply

upvote

by Ainaguade18 hours ago|

[-]

[dead]

reply

upvote

by focusgroup018 hours ago|

[-]

[dead]

reply

upvote

by iamyemeth8 hours ago|

[-]

> Conclusion There are 2 r's in the word "strawberry".

Not looking good so far

reply

upvote

by sigmoid108 hours ago|

[-]

I guess they still use a tokenizer? Why would this kind of issue be solved? The model fundamentally can't see the word character by character like you do. For o200k tokenizers for example, what the model sees are 3 tokens: [302, 1618, 19772]. These are shown to you as ["st", "raw", "berry"] in the UI. The only way any model can infer individual characters is by using external tools or implicit knowledge picked up during training or (what many of the big labs apparently do) special training for these edge cases that fail once the next special case comes along.

reply

upvote

by maxloh19 hours ago|

[-]

Great to see more fully open LLMs.

I think a problem with open-weight models is that while you can improve them, you are not going to create the next generation of LLMs by fine-tuning. We are at the mercy of frontier labs for access to SOTA LLMs. For example, Anthropic recently started requiring identity verification for Claude [0], same for OpenAI [1].

If one day China's distillation labs stop releasing their LLMs as open-weight, I doubt American labs will continue to release free LLM weights without that competition.

That's where fully open pipelines shine: they enable the community to create the next generation of SOTA LLMs. That is the only way LLMs truly become sovereign.

[0]: https://news.ycombinator.com/item?id=48618455

[1]: https://news.ycombinator.com/item?id=48618606

reply

upvote

by anon37383919 hours ago|

[-]

> China's distillation labs

This notion that Chinese labs are merely distilling frontier models is quite an unwarranted slur. Those labs have published WAY more useful research than US labs on RL techniques, novel model architectures, training pipelines, etc. They have also hit intelligence-per-parameter densities that US labs have yet to attain.

Apart from that, merely training a model on outputs from another model, off policy and without the logits, doesn’t really work that well.

The Chinese labs know how to build frontier level models. GLM-5.2 shows that they no longer even need Nvidia chips to do it.

reply

upvote

by trollbridge17 hours ago|

[-]

It's one of those lies people tell themselves to make themselves feel better. "Oh, they're just copying my stuff."

Chinese labs are basically just telling everyone, out in the open, what they're doing and how to do it, and the answer from American frontier labs is "Well, they couldn't possibly be getting the results they're getting without just distilling our models," and the American labs aren't even trying to do some of the stuff like DS's aggressive caching to get costs down.

reply

upvote

by Vaslo18 hours ago|

[-]

I recently watched a video for one of these “Chinese Models” it kept insisting it was Claude when the user asked. Sorry, there’s no “slur” here but legit suspicion.

reply

upvote

by c0rruptbytes18 hours ago|

[-]

https://blog.kilo.ai/p/did-claude-opus-48-distill-alibabas

it happens to all models…when the internet is increasingly generated, things happen

reply

upvote

by anon37383917 hours ago|

[-]

These anecdotes where someone gets the model to claim it is X model are meaningless. (Claude also has been known to claim it is Deepseek when asked in Chinese.)

reply

upvote

by trollbridge17 hours ago|

[-]

As anyone who's tried to write an AGENTS.md that says "Place an Assisted-by: git trailer that contains the harness you're using:whatever model this is"; such a naive approach often results in a seemingly random model.

reply

upvote

by halJordan18 hours ago|

[-]

But have they? I understand that the Chinese side is illuminated and the American side is dark. I disagree that the Chinese labs have created anything that isn't in an American research lab or production dc. Sure the Chinese have published their findings and not for nothing. But are they novel? Unlikely imo

reply

upvote

by chriskanan18 hours ago|

[-]

They are doing ta tremendous amount of novel research where American AI companies have "war rooms" to study their papers and models and American labs publish next to nothing. They have to often do more with less. As an AI researcher, Chinese labs are doing tremendous benefit to science whereas some American companies (and I'm American) seem to think only they are able to do AI research responsibility (I've been working on neural networks for 25+ years). I'm pretty sure Fable sabotaged my research codebase (see the news stories about this).

reply

upvote

by david_shi14 hours ago|

[-]

Whoa, say more about Fable sabotaging your codebase?

reply

upvote

by dofm19 hours ago|

[-]

> We are at the mercy of frontier labs for access to SOTA LLMs

I disagree with this use of SOTA, and this topic is why.

Anthropic and OpenAI have “cutting-edge” models. These are beyond the state of the art but they are closed, secretive, hard to quantify.

The “state of the art” is open source, open weights models that can be inspected, studied, shared and critiqued, because that is what is meant by “the art” —- it is the knowledge and principles and evidence and materials available to all. The “state of the art” is the highest point of that.

I wish we could make this distinction and stop blessing two secretive, unverifiable loss-making companies with so much power.

(Putting that aside, I suspect — without evidence, mind you - that the endless march to solving models by making them bigger is not the solution anyway.)

reply

upvote

by MangoCoffee17 hours ago|

[-]

SOTA LLMs is less important than cheap token and Chinese AI labs is releasing model that is only about 6-8 months behind American AI labs.

Chinese's model like GLM is getting better for coding task and its cheaper. Microsoft Github copilot have to switch billing to token based. the cost of AI have increased since agent come into play. whoever can offer cheaper token to do task will win.

even Microsoft is looking into Deepseek for cheap token.

https://www.axios.com/2026/06/16/microsoft-copilot-cowork-to...

reply

upvote

by sockaddr19 hours ago|

[-]

Sorry but I think you’re requirement that something only be “the art” if any arbitrary person can critique it is off. The frontier labs are working on the state of the art but it’s just art that you aren’t allowed to see. Unfortunately.

reply

upvote

by dofm19 hours ago|

[-]

It is work using the principles of the art, obviously.

But "state of the art" implies the highest state of general availability, not just in terms of access to some product, but of use of the ideas, concepts, methodologies etc.

Anthropic and OpenAI have "cutting edge" models; the state of the art is behind the cutting edge.

The state of the art is the best open source, open weights model available. More or less by definition.

I am probably tilting at windmills here.

reply

upvote

by bnj18 hours ago|

[-]

I appreciate this distinction. The are multiple senses of SOTA and one that has been taking on greater mindshare is as a synonym of “the best available”. By rebasing on SOTA as generally available and understood versus cutting edge, which has limited distribution and leads the way, we expand the vocabulary we have available to describe what’s going on. Thanks.

reply

upvote

by toss117 hours ago|

[-]

That's an interesting and possibly useful distinction , but it seems unique to you. Spreading it as "We should categorize the AIs this way" would be a good argument.

But the way SOTA is generally understood by other users of the language, it refers to exactly the team, technology, & techniques defining the cutting edge in any field, regardless of the whether the technology & techniques are available outside of that team...

reply

upvote

by dofm7 hours ago|

[-]

Not so much, it turns out.

https://english.stackexchange.com/questions/239963/do-state-...

reply

upvote

by 8note19 hours ago|

[-]

the art is the standard engineering practices that go into building the thing

its things you would be trained in as part of a bachelor's degree and some graduate coursework

reply