undefined

upvote

points

by amazingamazing18 hours ago |

upvote

by lebovic17 hours ago|

[-]

It's too late to prevent distillation of some capabilities, like writing code or finding vulnerabilities [1].

But an AI lab can continue to produce immense economic value without releasing the model publicly for potential distillation. For example, it could use a model solely in-house to develop therapeutics.

Hopefully there's a future where others can access frontier models, but it's not neccessary if preventing proliferation through distillation is considered more important.

[1]: See the notes on distillation in https://dualuse.dev/posts/export-controls-on-fable

reply

upvote

by bandrami13 hours ago|

[-]

My long-term prediction for the sector is that frontier models will be so expensive that they will only be available for grant-funded projects at research institutions, like supercomputer clusters were 25 years ago.

reply

upvote

by wqaatwt9 hours ago|

[-]

Why? Well it depends, most evidence is suggesting that Anthropic and OpenAI are making a lot of money on inference so the question is whether its more profitable for them to sell 100X tokens for Y, or 1X tokens for 100Y. In most industries with high fixed costs and low variable costs and unlimited scalability (like LLM providers) the first option ends up being much more profitable

reply

upvote

by bandrami9 hours ago|

[-]

Literally nobody is making money on inference

reply

upvote

by wqaatwt9 hours ago|

[-]

Based on what? There isn’t a lot evidence that’s the case..

Prices on OpenRouter for GLM and other large open models indicate that Anthropic/OpenAI must have pretty high gross margins even if their models are several times more expensive to serve.

It wouldn’t make sense for any provider to host large open models and then loss $10 on every $1 they make since they don’t have infinite VC money or any business model that would justify it.

reply

upvote

by bandrami9 hours ago|

[-]

If they had high margins they wouldn't be issuing senior debt with a 18.5% coupon payment (and failing to fully subscribe it), nor would they need Elon to give them two months of free compute in order to appear profitable for a single quarter.

reply

upvote

by wqaatwt6 hours ago|

[-]

We were talking specifically about inference and I don’t think there any indication that their gross margins on the API tokens (if not the personal subscriptions) are negative?

Obviously they have R&D and other fixed expenses that make the company itself highly unprofitable but that’s only semi-tangential.

reply

upvote

by bandrami6 hours ago|

[-]

No I mean Anthropic has only claimed a profitable quarter based on xAI giving them two months of free compute, and both Anthropic and OpenAI are counting discounted revenue as actual revenue. They haven't found a way to sell inference for less than it costs them yet, and when they tried earlier this quarter their customers bailed.

reply

upvote

by wqaatwt6 hours ago|

[-]

Well again.. you are mixing up inference costs and their other mostly fixed expenses (in addition to sales and marketing)

Is there any indication that if they could sell X * N more tokens than now at the same (or even quite a bit lower) price they wouldn’t become profitable as a company?

> They haven't found a way to sell inference for less than it costs them yet

Based on what? I only see evidence to the contrary.

reply

upvote

by nonethewiser17 hours ago|

[-]

Im not so sure because we only seem to see distillation from China. What’s preventing tech companies from the UK, Germany, etc. from distilling Claude, GPT, etc. Do they simply lack the ability to?

Point being there may be no technical solution but there may be a political one (theoretically).

reply

upvote

by sailingparrot16 hours ago|

[-]

Meta Spark is rumored to have distilled Claude to some extent, early Gemini models as well. I think the biggest factor is that Chinese companies arent really afraid of being sued by Anthropic because the juridictions are so disconnected. European/US companies don't have the same protection.

reply

upvote

by avd20116 hours ago|

[-]

Aside from politics/law, it's probably much easier for everyone else to distill from the Chinese model which already distilled Claude/GPT/Gemini. Maybe not as good a result, but you don't need to jump through dozens of hoops.

reply

upvote

by 1415 hours ago|

[-]

This reminds me of the whisper game played in elementary school. Starts with a sentence and the person whispers it to the next kid who again whispers it and on and on until it goes around the circle where the last kid has to repeat the sentence. Hint it never once was even close to the starting phrase. I would love to see what one model copying another model that is again copied however many times would look like in the end.

reply

upvote

by hgomersall8 hours ago|

[-]

Called, fittingly, Chinese Whispers in the UK. As an aside, I've always wondered if it was so called because, Chinese being a tonal language, it's much harder to whisper in.

reply

upvote

by Barrin9216 hours ago|

[-]

>What’s preventing tech companies from the UK, Germany, etc. from distilling Claude

literally nothing but given that the Chinese already did it and the models are published what's the point. You can thank the Chinese taxpayer for subsidizing the electricity bill and just download the thing

reply

upvote

by fg1377 hours ago|

[-]

Jensen Huang likely agreed with you and tried to change Dario Amodei's view on that, but that attempt appeared to have failed.

So there's that.

reply

upvote

by nonethewiser17 hours ago|

[-]

Distilled models are necessarily behind so long as models are progressing. Models are progressing. Maybe it will be over some time in the future.

And Berkeley’s “False Promise of Imitating Proprietary LLMs” found imitation closes the style gap fast but there is a large capability gap.

https://arxiv.org/abs/2305.15717

reply

upvote

by lebovic17 hours ago|

[-]

Curiously, this isn't always true.

For example, GLM 5.1 is more capable at pentesting than the model from which it is alleged to have been distilled [1].

Intuitively, this makes some sense: you can "distill" from multiple frontier models, and you can further post-train the distilled model. But I'm not sure exactly what happened with GLM 5.1.

[1]: https://dualuse.dev/posts/chinese-models-are-sometimes-bette...

reply

upvote

by mh-17 hours ago|

[-]

Interesting blog post, thanks for sharing.

I'm curious how that comparison controls for Opus refusing (whether explicitly, or just deciding not to pursue a path) given the caption below the first image:

>A perfect score means the model autonomously found and exploited the vulnerability.

I'm not really suggesting that it's misleading, but wondering if I'm missing something. Otherwise I guess it seems unsurprising that you can distill a better-performing model [in specific focused areas] by simply not distilling refusals?

reply

upvote

by lebovic17 hours ago|

[-]

Thanks!

For that eval, I used an account that was labeled as a known red-teaming org by Anthropic, and I read the traces. There were no refusals or obvious avoidance behaviors, though it may have been silently nerfed.

On the same eval, Opus 4.7 and 4.8 outperformed GLM 5.1, but GLM 5.2 is on par again with Opus. So it's at least partially measuring capabilities without respect to refusals.

One possible contributing factor is that model capabilities are shaped differently (an example of this is GLM 5.1 vs. DeepSeek v4 Pro: https://dualuse.dev/posts/deepseek-v4-thinks-different). So if you use RL-based "distillation" from multiple models like Opus 4.x and GPT 5.x, you could get a more capable model.

reply

upvote

by mh-16 hours ago|

[-]

Got it, thank you!

reply

upvote

by Gigachad14 hours ago|

[-]

I'm ok with having last months model at a tiny fraction of the price.

reply

upvote

by seany17 hours ago|

[-]

I can't even come up with a reason to find it wrong.

reply

upvote

by IncreasePosts17 hours ago|

[-]

I personally bristle at the corporate espionage and IP theft that China has undertaken the last few decades. I can't help but respond here whenever anyone brings up the inane comparison to Samuel Slater.

But with this, I don't have an issue. There is no theft since what is being used is the exact product that is being delivered. Yes, it's breaking the ToS, but ToS are generally bullshit. Anthropic surely broke thousands of ToS or other legal terms while it was scraping for content to train on. Which is why they had to pay $1.5B

reply

upvote

by HaloZero17 hours ago|

[-]

Doesn’t that require them to register an account using the browsers they’ve compromised? If anthropic adds identity verification won’t that cut that down. Maybe it will let them use Gemini inside of chrome

reply

upvote

by dannyw15 hours ago|

[-]

Residential IPs don’t even matter. Developers use devboxes, use Claude Code CLI on servers from just about every cloud, etc.

There’s probably a decent volume of customers who just buy Claude Max and spend most if not nearly all of their sessions via Claude Code, and it’s not uncommon for power users to be working on multiple concurrent projects/tasks/codebases at the same time.

How do you really block this without also impacting your core market of developers?

reply

upvote

by ygouzerh14 hours ago|

[-]

Probably some business will popup, like: "rent part of your unused subscription", or even: "proxy tokens with a premium", eg. 5.5 USD on Opus 4.7 paid by the distiller to the user, that will then only spend 5 USD.

reply

upvote

by amazingamazing17 hours ago|

[-]

No, they could easily buy legitimate, already registered accounts and use VPNs.

reply

upvote

by dannyw15 hours ago|

[-]

Why use VPNs? Just use a public cloud like AWS, or something like Linode and Vultr and all that.

Developers use devboxes on these clouds all the time, it’s totally normal behavior.

Most people buying these Chinese resold tokens are probably using it for coding anyway, so you don’t want the Claude.ai chat system prompt.

reply

upvote

by wg011 hours ago|

[-]

It's just like web scraping is impossible to guard against.

Change my mind.

reply

upvote

by skarz1 hours ago|

[-]

Put your site behind Cloudflare, enable Bot Fight. Done.

reply

upvote

by redwood17 hours ago|

[-]

One simplistic way to describe distillation would be to try everything imaginable and cache the response. But trying everything imaginable is hardly trivial

reply