I am wondering what is keeping them back, though: Money? Compute? Skills? Training data? My fear is that you are really only getting really good models by training on very dubious data (outputs from the frontier models etc) and that Mistral is too European and too enterprisey to take those risks.
Or at least there’s been a lot of noise about that.
I dont't think that was meant to be implied: the EU actually has access to more GPUs than those hosted by European companies in Europe, just as US labs have access to GPUs hosted outside the US
Meanwhile, Anthropic and OpenAI have investors practically begging them to let them buy this much equity at mind-bogging valuations.
It's not impossible, but China is just much better set up for the nessesary level of government support
I've never heard or read anything about the EU planning on investing money in Mistral. They're a private company. They're French. It honestly sounds kind of absurd.
If you're going to make that claim at least put some effort in.
I already checked for one variation of a google search like I said.
Can you show some proof you did anything at all?
Not ruthless enough and no backing by a corrupt govt administration that has no morals but focuses on self-enrichment instead.
Might sound drastic but I think that's actually closer to the truth thn everbody likes to admit.
> My fear is that you are really only getting really good models by training on very dubious data (outputs from the frontier models etc) and that Mistral is too European and too enterprisey to take those risks.
Exactly.
I think an European company, taking Chinese models, perhaps doing its own post-training on them and training the Chinese-ness out, with a great chat service, enterprise API and coding agent, could be pretty valuable in itself.
Considering all their talk about new DCs and compute, and a few offhand comments, it sounded to me that compute is a big limitation.
All of the above and more. Everything holding Mistral back is the same thing that has held Europe back from competing in the entire digital revolution. See this 1991 article lamenting the loss of any viable European PC manufacturer: https://www.nytimes.com/1991/04/22/business/europe-stumbles-...
Mistral being in Europe is disadvantaged with:
1. Money: less diverse private pension fund environment = less LPs to invest in VC funds = less VC dollars to invest in new ventures. European money is vacuumed out of the private sector into state pension funds and dumped into low yielding government bonds. This starves the private sector of capital while inflating the % of GDP driven by government spending every year (government pension funds buying government bonds in circular fashion enable runaway deficit spending...just like circular AI infrastructure spending).
2. Talent & compute: due to #1, Silicon Valley can outbid Europe for the best talent and hardware. Watch an OpenAI launch video and listen to all the European accents.
3. Local market fragmentation: Europe is a collection of countries that pretend to work together while not even having a unified capital market. The average EU citizen can barely communicate with their neighbor in a common language beyond the level of a toddler (english fluency is massively overstated by Americans who only experience tourist capitals).
4. Regulatory disadvantages: In everything from company regs, employee regs, unions, privacy regs, data portability regs, etc.
It's not "culture" or Europeans being "lazy" as most people would claim. There's currently thousands of young french people working 80 hour weeks creating dumb consulting powerpoints or legacy investment banking deal memos as we speak. Ambitious people exist everywhere in equal proportion, they're just working on the wrong things.
Europe can't compete in the digital revolution the same way they could compete in the industrial revolution due to various system design choices. Culture is simply the aesthetically observed byproducts of system design.
Not true in my experience: even German waiters in small towns tend to have pretty fluent English.
Edit: more broadly, there’s just more friction when people aren’t in their first language. I know I hesitate to bring up some things, say hi to strangers, try making a joke, etc because the cost of talking is just… higher.
The German speaking members of our group had to order food for us in most restaurants.
And most locals aren’t waiters in restaurants.
There is definitely a lot of truth to that. Maybe a bit of an arbitrary measure, but these are the nationalites of the people that wrote the "Attention is all you need" paper. Pretty revealing I find:
Ashish Vaswani: India
Niki Parmar: India
Jakob Uszkoreit: Germany
Llion Jones: Wales (UK)
Aidan Gomez: Canada
Łukasz Kaiser: Poland
Illia Polosukhin: Ukraine
Noam Shazeer: USA
Personally, I would much rather have good public pensions and health-care, than A.I agents.
The US also has public pensions (social security payouts rival or beat many EU countries) with dramatically better tax free private options on top.
Also, the US has free healthcare (Medicare and Medicaid) for roughly 50% of its population.
Expanding that to 100% doesn’t suddenly make them a bad country to do business in.
You think OpenAI is going to close up shop and move to Mexico if the US expands single payer healthcare? That would actually make it even easier for businesses to operate in the US!
Explain to me how expanding US single payer healthcare suddenly makes the US a worse place to do business in than Europe?
Companies would love not having to deal with the complexities of 401ks and employer health plans.
There are supposedly streamlined paths for local residents, but I had to go through the standard corporate pipeline. I spent three months fighting a bizarre catch-22 between my notary (who cost €3k+) and the bank. To open the account, I had to prove I deposited €10k in capital. But I couldn't make the deposit without an active bank account. On top of that, the bank's compliance team kept arbitrarily canceling my application due to "incorrect answers"... refusing to tell me what the errors actually were and forcing me to restart the entire process ab initio.
I finally just gave up. I wrote off the €3,000 notary fee and €1,000 in registered office costs as a sunk cost, and incorporated a US LLC instead. It took under 10 minutes, no notary, fees of $25 since I did it myself, plus another 20 minutes to open the business bank account.
There was no commercial reason to choose Austria; it was purely sentimental. My ancestors were entrepreneurs in Linz and Vienna, and I loved the idea of renewing that legacy. But the sheer weight of the bureaucracy managed to kill about 99% of the early-stage startup enthusiasm you normally rely on to get a new project off the ground.
It's a bizarre system that Switzerland uses too. I've done it twice. Unfortunately the German speaking world has a lot of rules that are trying to eliminate all risk for investors and employees. The GmbH/AG capital requirements are just the start.
The next fun thing you might have encountered, at least in Switzerland, are rules that literally say your company's assets can't fall below 50% of your initial capitalization. If it does you're supposed to raise funds or make more investment of your own private capital and this rule pierces the usual liability requirements. Even more fun: it turns out that this law isn't actually enforced and locals regularly ignore it. But bad accountants won't tell you that. They'll just inform you of the law when you do your yearly accounts.
Then you have wealth taxes that cover the valuation of a startup as if it were a cash position. So if you raise $100M in investor funding then whatever shares you have left over are considered to be liquid assets you can offload at will, and are wealth taxed as such. The fact that the shares don't trade in a liquid market is irrelevant to the tax authorities. In Zürich at least that got patched by the local tax office deciding that startup shares aren't counted for the wealth tax, but this just means you have to be able to convince the tax authority that your company is a startup. The way they determine this is more or less just the opinion of whoever at the tax office assesses your case. Does it sound "startuppy" enough?
Fixing this stuff isn't hard, but it never gets fixed because European politics is both quite stagnant and dominated by people who view hostility to business as a virtue signal. They don't want to fix it because they think businesses are sort of like oil fields. They just exist, lying around naturally, and the only question is how to maximally exploit them.
This is tangential: and forgive my ignorance here, but is there an inherent reason why there aren't smaller, focused models from the frontier model providers?
I'm thinking something like a software-specific subset of Opus that is the default for use in Claude Code. Smaller, cheaper to deploy and consume, maybe faster.
It turns out coding has to do with a lot of the same reasoning needed in math or in legal analysis, even if the grammatical expression is different.
This is less true of lower intelligence tasks. Classification requires a lot less reasoning capacity and so can be much smaller and more specialized.
Or I guess more to the point: is this something frontier labs have said is (or tried to paint at any rate) problematic? This feels like an "out of the loop" situation because I've only ever heard "distillation" with a positive connotation before.
> You may not use our Services for any illegal, harmful, or abusive activity. For example, you may not:
> [...]
> * Use Output to develop models that compete with OpenAI.
Source: https://openai.com/policies/row-terms-of-use/
(I'm also curious whether they consider developing a competing model to be illegal, or harmful, or abusive...?)
Given that OpenAI doesn't care about training on copyrighted data, why is suddenly their ToU something anyone should care about?
On a more risk-strategy level there is the size of their legal team, general endowment, and supplier and political connections to consider.
Everyone is free to ignore their ToU, but I can understand why a company would avoid it...
Yes that's what should be said to OpenAI. Now they should not cry about their T&Cs not being respected when they never cared about others' copyrights.
It's like saying you can't use windows to develop an OS, or drive a Ford on the way to your job at Hyundai.
Mistral looks like it's fading away to irrelevance unless they can play alongside the similar sized models, or have some unique advantage other than being in Europe, for Europe. I was really excited for them back when they were startup that had the biggest European venture round ever. This space will have a few winners, and many losers. Google, plus either Anthropic or OpenAI most likely. Big models will see breakthroughs in inference performance/cost fall precipitously and small models will only exist on devices (Pixels and iPhones, cars, watches, bluetooth speakers, etc)
> This is a race and nobody will care or remember how the winners got there.
It seems like the EU should have paid China for the distillation datasets, esp. since Mistral isn’t even a governmental org.
For consumer AI, yes. For coding assistants, probably.
For specific application "business" AI like the things Airbus announced the other day? Not at all. What matters for an Airbus using Mistral to build compliance documentation based on AI generated physics simulations is the enterprise relationship, reliability, compliance, forward deployed engineers helping with the fine tuning, quality, predictability, support. A Chinese lab having a better at benchmarks model that is cheaper is just irrelevant for that.
And IMO, the real money in AI is this type of "business AI" deployment. Developer tooling tends to converge on becoming commoditised. Once you're a core supplier for a big bank and embedded in their processes, you're there untill you screw up with the pricing (like Broadcom), and even then.
I wanted to try out Mistral, but I fail to find anything like that even after creating an account
Then you can install their coding harness, I personally used the Python + uv option: https://mistral.ai/products/vibe/code/ if you don't have uv yet, you might have to install it too: https://docs.astral.sh/uv/ though I already use it for other projects. Oh and if on Windows, you probably want to do all of the installation inside of WSL, just so that file paths are the *nix variety, I've had issues otherwise with pretty much every coding harness, like OpenCode as well (across multiple models).
After that, you need an API key for your subscription, you can generate and copy it here: https://console.mistral.ai/codestral/cli that's also where you see the quota, though it seems to NOT refresh instantly, but more or less a few times a day.
Either way, happy coding!
It's a very charitable take, as Mistral has never really left the realm of irrelevancy.
It's only a matter of time before EU falls back to hosting Chinese models in EU datacenters.
Even though Mistral 4 has 6B active parameters per token (allowing 3-3.5 per token parameters to be loaded on a 4090), the ~240GB download + storage is pushing the limits of being able to try this out locally, especially if you are downloading and evaluating multiple models.
It also makes it harder for other people to make downstream finetunes like with what happened with the older Mistral/Magistral models.
They'll end like Dailymotion, just a zombie company.
Foundation model labs should be building very large reasoning models, then leaving it to the community to distill them down.
You can't scale a small model up, but you can scale a small model down.
I'm convinced the only way we'll have a seat at the table in the future and avoid total runaway takeoff is if there are very large models within 80% of the capabilities of the frontier models. Tiny RTX models do diddly squat to remain competitive.
Build open weights models for running on H200s. I'll spin them up on RunPod or Lambda.
I have used Mistral models out of pure ideology for web agents and the like which aren't doing a lot of heavy lifting.
Our evals are pretty complex so we only recently started testing ~30B class models, which are now becoming quite smart (on par with the frontier from 1 year ago). Mistral is far behind, but I'm rooting for them.
Data at https://gertlabs.com/rankings
Fully agree to your point though, Mistral in general is far behind where I'd expect and Qwen in particular is crushing it at the smaller sizes.
Personally, I'd consider anything 20B params and above a "medium" model. Small being <20B and large >100B. I think obviously we can get to the huge 1-2T param models, but frankly the margin of accuracy improvement for the speed hit is kinda insane (1-2% for many metrics).
1. tiny <2-3B -- easily runnable on lower-spec hardware
2. small 4-8B -- runnable on 8GB GPUs
3. medium 9-12B -- runnable on 12GB GPUs
4. large 13-24B -- runnable on 16GB (for the lower end models) and 24GB GPUs
5. very large 25-32GB -- runnable on 32GB GPUs
6. huge >32GB -- not easily runnable on consumer GPUs without compromising performance (offloading layers to the CPU/RAM), quality (heavy quantization, esp. at <= Q4), or price (investing in multi-GPU setups and/or server-grade hardware).
You could possibly split huge down further, as 70GB models (e.g. llama 3) are easier to get working than >120GB models and 1TB models are completely intractable.
1. tiny <2-3B -- could run in a browser even, mac neo
2. small 4-8B -- last of browser options, MacBook Air base
3. medium 9-24B -- 32GB machine, air or pro notebook or mini
4. large 25-48B -- 64GB, pro notebook or mini
5. x-large 49-100B -- 128GB MacBook Pro or Studio
6. Huge > 100B -- 256/512GB Mac Studio
Or a phone. I’m running Gemma 4 E2B in one of my apps on my 14 pro (which may or may not be killing my display through overheating. It might just be a coincidence).
I don’t really disagree with your post, but this is not exactly right. That subreddit seems to go from hype train to hype train every week, I haven’t found anything really insightful in it for quite a while now.