(www.reuters.com)
The latter is basically fine-tuning the model with direction from another model. Thousands of businesses do this every day to fine-tune. This is almost certainly what the Chinese labs are doing, since it has a much better effect on the end result than just getting simple answers to simple questions.
These complaints of distillation are inflating the problem to make it sound worse than it is, because they want the USG to block/ban Chinese model providers as protectionism. They have already called for more export controls on chips (which is funny because DeepSeek v4 was designed to run on Huawei chips and now the other Chinese providers are following suit). But they can't come right out and say that, so their claim is that they're asking for more export controls because distilled models might not be as safe as their own. But if you show them a jailbreak of their model that bypasses their safety, they'll tell you that any model can eventually be jailbroken so don't worry about safety.
Unfortunately, the Reuters piece itself is complicit in this dramatization. The lede paragraph parrots Anthropic's talking point that distillation is an "attack", without using quotes that would alert the reader that this framing is a corporate talking point. Distillation is NOT an attack.
From the article -
> 28.8 million exchanges with Claude through almost 25,000 fraudulent accounts
wouldn't that be considered an attack? Not sure what I'm missing here.
In some cases if the model regurgitates the original material then that is clearly copyright violation, but if the model "learns" from the source material just like a human brain would then that's not a copyright violation.
I think we're going to see cases that find distillation is also fair use. You're using the competing model like a book. You pay for it, you use it (read it), it informs your model, but you aren't repeating/reselling what the model told you verbatim. Foreign labs may still run afoul of competing labs' Terms of Service, and they may also pay a settlement (or not, it's a different jurisdiction after all), but the damage is already done. Distillation will become uncontroversial when done legally.
God I'm so tired of this.
The billion dollar companies have the ability to hire an army of lawyers to DDOS the legal system. They at most pay a slap-on-the-wrist fine as the cost of doing business.
I'm extremely pro free markets etc, but the uncomfortable truth is anthropic stole the work thousands of authors for profit. I think it will one my favourite things in life: programming books.
Did you notice that when Valve was displeased about scalpers, Valve changed Valve's behavior?
It doesn't seem reasonable to complain that a customer of your AI service received that service for less money than it cost you to provide that service. I don't think that is the complaint here at all. If that was the issue, they could just raise their price.
As most everybody seems to notice, this is just a reenactment of what was once written for comedic effect: "You're trying to kidnap what I have rightfully stolen!"
Perhaps an arrangement can be reached.
https://clip.cafe/the-princess-bride-1987/youre-trying-kidna...
They literally had to pay for that "attack", no matter how many accounts they used.
Google was killing many websites for decades with their crawlers. Most large websites decided to create dedicated infrastructure for their traffic alone. Somehow they didn't participate in that cost and were not called the attackers.
This is the mental mental leaps I'm struggling with here. Did you not live through that era where they were explicitly and repeatedly called out as 'attacks'? They were generally tolerated/hardenee around as they provided value-in-discoverability.
They should be. But as the saying goes, one website/company dying is a "tragedy," lots of them dying at the hands of one company is a statistic of corporate growth. Or something like that.
And then of course when the tables turn on a company and they're the ones getting bombarded, they cry foul. Keep in mind Anthropic did many similar things that you mentioned Google did.
I think the term "attack" here is appropriate but not in the way Anthropic is framing it. Alibaba is clearly violating terms to extract data, so that's definitely not above board. But it's not like a DDOS attack where Alibaba is trying to attack Anthropics servers. Alibaba is simply doing exactly what Anthropic did to the rest of the internet, just targeting Anthropic and paying them to do so.
I am getting a bit tired of companies being able to have user hostile, anticompetitive, monopolistic terms of service. The freedom we give them comes at the cost of the freedom as consumers to have free markets because they lock them up
Like the difference between scraping a site with one or two active connections vs thousands. It's not the scraping that is an attack, it is how they are going about it
As in distributed distillation of service?
I guess Anthropoic would regard any developer using their subscription plan with OpenCode to be operating a "fraudulent account", maybe an "attacker" too. Now we know how they think of anyone using Claude to develop software competing with Anthropic. Only an "attacker" would want to vibe code their own harness, or god forbid want to learn how to build/train an LLM.
Of course Anthropic's wording is intended to be deliberately provocative, since they are trying to manipulate the US government into shutting down the Chinese competition.
Pot calling kettle black.
This is similar to how compromising an account through bulk automated trials of many passwords is reasonably called "an attack" – specifically a "dictionary attack" – even though using a dictionary is not itself an "attack".
You shouldn't need to smuggle your sympathies (for the tactic or perpetrators) or antipathies (for the target) into peculiar judgy language prescriptivism against common, understood usages.… that then label Reuters "complicit" for simply reporting Anthropic's claims accurately. That's what Reuters is supposed to do, in a story about a letter Anthropic wrote!
It was a timely story from Reuters. They do fast news feeds, like APnews. Could it have been better or more accurate? Sure, they could have gone into why distillation may or may not be seen as "an attack". But then it would have been a more involved story, defeating the purpose of a news feed.
The Reuters piece was "good enough". Some other place like the NYTimes or WSJ can follow up with more detailed investigative coverage if it's a worthwhile story.
Until very recently, all of modern civilization was built by people who got their news at most once a day. Reputable bureaus like Reuters took that day to get it right.
I’m not the national security advisor, so I don’t need a push notification that there was an earthquake in Nepal, or a bullshit rush-job briefing on Chinese AI distillation tactics.
There are some news media that do go slower and take their time, but I think they’re struggling to stay alive. Reuters is still reputable, but they no longer necessarily take a day. The big question is how do we get humanity to prefer slow & correct over fast, and it is even possible? When you hear about an earthquake in Venezuela, how do we stop people from Googling it immediately, and get them to wait for the best most correct story rather than reading whatever’s available now? In the case of natural disasters, I don’t think it’s possible anymore, no matter what case you make. I’m not sure it’s possible with stories like AI distillation either, even if you can absolutely cement the case for slow news. The fact that it’s async/internet now and that first still counts means we (you and I) are still going to give traffic and attention to sites that have the first information on a breaking topic, statistically, despite having a preference for correctness over speed. The one thing we can do is vote with our dollars by subscribing to whatever news media that does a better job than others.
Did Alibaba perform "an attack" or were they taking advantage of resources and going beyond Anthropic's terms of service? Didn't Anthropic do the same kinds of things when building their models?
These are all interesting questions, but they don't have to be addressed in full by a news blurb about a letter Anthropic wrote to some senators.
Any reasonable company would be pissed if a competitor, especially at Ali Baba's size, leveraged that company's R&D to compete. It is in this sense, a corporate attack.
If you want to roll your eyes at distillation concerns, you might need to excuse Anthropic for originally using pirated material to train their models.
> it said was the largest known attack
> Anthropic said in the letter it was supportive of the U.S. government's efforts to combat the attacks
both times the word "attack" appears it's clearly stated that the word was used by the company, it's a direct company quote.
actually putting it into quotes would be editorializing
> Unfortunately, the Reuters piece itself is complicit in this dramatization
how would you feel if somebody quoting you would turn your word dramatization into "dramatization" because they don't agree with your assesment
This is exactly what news agency should be doing though. When the dude showed up to Comet Pizza to look for Hillary Clinton or whatever, do you figure they should've printed "Local hero saves children from predatory cabal"?
Reporting that corporate called it attacks is good. I do prefer direct quotes.
However, when they quote one word, the journalists are inserting their own opinion about it. I want to make my own opinions based on the facts. I don't need the reporter to draw the conclusions for me.
This whole sentence technically will be correct, 100% guarantee, whatever this person actually even said or think.
From a propaganda point of view, framing the elements of language is even more important than what the statements actually states to be true or possibly true.
what framing are you talking about? they are literally quoting a company.
please explain what Reuters should have done here. Should they have added in parentheses: (editor note: we don't agree with Anthropic calling this an "attack")
Is that what you want? News outlets giving their opinion and moral judgement on company quotes? I mean, Fox News/CNN do have a large following, so there is clearly a market for that.
This is very straightforward: use direct quotes or use neutral language. The article describes the alleged incident as both an “attack” and a “strike” in the first two paragraphs. And neither is within verbatim quoted text.
Reuters, however highly you may regard them, simply adopted Anthropic’s framing uncritically in this instance.
A lot of times Reuters paraphrases instead of "quoting quotes".
> "uncritically"
You are mistaking Reuters with CNN or FoxNews. If you want "critical" reporting you should read some bloggers instead of news agencies.
Both are logically unsound.
Distillation is Robin Hooding it back so that one trillion dollar company doesn't reap all the benefits of their automation of the workforce.
Distillation is Prometheus bringing fire from the gods to give to ordinary humans. Something we all own anyway, but that was kept from us.
Distillation is freedom.
Everyone should be pro-distillation. We should all work together to distill every proprietary model.
Anthropic stole. OpenAI stole. Google stole. ElevenLabs stole. Suno stole.
We should be able to get it all back.
It's far cheaper to spin up an H200 hourly or to simply consume a managed version of an open weights model than it is to use a proprietary hyperscaler API. And you own the model itself and can fine tune, tweak, lobotomize, etc.
The stuff you can run on your own RTX cards is neat, but it's rather hobbyist. The real power is in the cloud. Renting cloud hardware is fine, because the core problem is ownership of the weights, not the server rack or ISP fiber lines. Those are already commodity.
Big businesses will eventually run open weights models in the cloud, and it'll be a rather large part of the future AI economy.
They're Chinese companies offering open source models now as loss leaders to keep themselves in the game because they know virtually nobody, especially in the corporate world, would contract with them and give them access to their data. They might as well just send a Dropbox link of all their sensitive data directly to their Chinese competitors, same end effect.
They're also doing it as the digital equivalent of what they've done in other industrial sectors for decades. Undercut and flood the market and once you've killed or severely hindered your competition, then you have the market cornered. The moment they can afford to these open source releases will stop.
Then the world will be stuck, just the way the world is largely stuck on rare earths. Instead of being able to regulate the leading companies from DC and Brussels, they'll be stuck watching Beijing call the shots.
That world would likely always have guys like Mistral and Trinity, but it's an open question if they'll ever catch up to the frontier.
And then Beijing will enjoy access to the data (ask any multinational operating in China for more than 2 seconds how useful contracts and Chinas legal system is for protecting IP), and these companies will roll in the money, and the Chinese supply chain will grow up behind the labs.
So, let's not pretend they've got the moral high ground. No. That boot just isn't on your neck yet. They're playing the long game -- and they're good at it.
1. I get great products for nearly free 2. Anthropic/openai/etc will hopefully be destroyed since they stole everyone's work and are trying to capitalize on pure theft.
Win-win. The why of it is not really that relevant.
You don't trust the multi-billion dollar behemoth, but you trust the militarized multi-trillion dollar behemoth to play 'robin hood'?
i can't get my brain around the mental loops here.
Both are planning $trillion+ IPOs this year. OpenAI is collaborating with the Department of War, and Anthropic is under intense pressure to do the same and their top model is being held hostage right now. This week, the Department of War wrote a statement that xAI should not be held accountable for environmental laws because Grok is a vital weapon system of the US and was used to fire over 2000 missiles at Iran. The pentagon's statement mentions there are 3-4 such models so you may be able to guess which they are.
What are the mental loops here?
I would genuinely like to know if I'm missing something.
Nobody's trusting anyone, we're just enjoying the benefits of true competition much like the working middle class gained benefits between the ideological competition of the Cold War.
It's not a good thing if you think there's more discovery and progress to be made, rather than cannibalising a fully mature field with cheaper alternatives. Drowning R&D early is not good for everyone.
The happy ending where we're all living in a garden of eden cared for by benevolent AI is hardly worth considering when you look at the cast of characters who are in charge of the world right now.
Because they aren't giving you a cheaper service that fits your use case.
Best Case scenario, it's a trillion-dollar behemoth stealing from a billion-dollar behemoth so they can add their own explicit restrictions/weights on top to influence the masses.
There is no 'robin hood' here, any perceived value you get is clearly and explicitly tainted. "I don't care if it doesn't show me non-party-line results - It makes me a cheap UI !". Ethics/morals be damned.
I can't tell if you are talking about Anthropic or Alibaba here.
If your argument is that all present LLM offerings are unethical then that is something I am sypmathetic to. That said, I am also unable to offer a conceivable roadmap to undoing the opening of the LLM Pandora's box so I tend not ground my arguments in anti-LLM advocacy; that would be very 2023 of me.
The extreme of this is to make IP laws irrelevant and that everything should be in the public domain.
Which maybe is not a bad outcome for humanity as a collective after all.
Why can't OSS software rival closed source software? It should be an open market, at least "somewhat", what's happening for real? EU providers will also get banned, if they reach or exceed US model capabilties?
Closed source providers can close your account at a whim like and destroy your business and then use the data you supplied them to create a competitor (Meta, Google, OpenAI, Anthrophic).
VC/Startup playbook 101.
It’s the same reason why DRM for audio and video is a non sequitur - if you want a person to see or hear audio or video, eventually at the end of the chain, it’s going to be converted to audio for the ear and light for the eyes - that’s why you attach your tap.
Without a model generating tokens, what’s the point. So if Anthropic somehow disable quality token generation, what’s the point!
https://huggingface.co/Jackrong/Qwen3.5-27B-Claude-4.6-Opus-...
I jest, but I'm also completely serious. 1T tokens from Claude can teach a model something 1T tokens scraped from the open web can't. Things like "how an LLM can problem solve effectively", or "how an LLM should use tools", or "how to construct reasoning chains", or "when to double check", or "what innate capabilities an LLM can or can't rely on".
Those are valuable things that Anthropic's own team spent a lot of effort post-training into Claude. Distillation allows them to be extracted and transferred to an otherwise unremarkable base model.
Base models have a lot of capabilities - arranged in all the wrong ways for high performance reasoning and problem-solving. The power of fine tuning on "a couple thousand of input-output pairings" is that it can fix some of that. If your pairings are very well chosen, that is.
Most research converges to the idea that RL on synthetic data makes models worse, not better.
If what you claim was anywhere near that relevant, than we would've long achieved singularity by simply feeding increasingly better output to the training of the next model in a loop. Yet this doesn't work.
25 million turns on Claude output is a small amount, yet an expensive one (we talking hundreds of $ millions) that is better spent on compute.
There's no evidence such a process works, but I'd like to know more if I'm wrong.
You are missing a mountain of nuance by generalizing the existence of a hole there.
Look up literally any distillation works. Because this is just distillation but on one-hot token chains instead of richer logit KL proxies.
And no, I'm not claiming than you can "close the loop" and get RSI on the cheap just by distilling forever. I'm claiming that distillation is a very cheap way to bring the performance of a less capable model closer to that of a more capable model. It doesn't give you "a more capable model" out of thin air.
Which is why Chinese labs rely on Anthropic to provide that "more capable model" to them. They take the capabilities Anthropic trained for the hard way, and train for them the easy way.
It's a "fast follower"/"improved capability density" trick, not a "singularity tomorrow" trick. There are a few "distillation pump" tricks that get closer to what you have in mind, but they're still more about "extract more training signal out of the same set of data" than about "unbounded RSI".
You want to sell me the idea they are spending hundreds of millions to get unchecked Q/As with reasoning redacted and without checks on the output quality to do what exactly?
Have a shallow pointless bunch of expensive data to get slightly better RL? It's expensive and pointless.
Data has shown again and again that synthetic input/output does not benefit models in RL, it may even make the output worse.
Also, you have a giant bias.
The chinese are the only ones releasing models and research papers in the open from which American labs benefit 24/7 (DeepSeek has been copied by all US providers).
And you want to sell me this ridiculous idea of the giant return of spending hundreds of millions on unredacted pointless QAs?
I've seen plenty of things in the dumpsters of AI discourse, but this got to be among the most baffling.
Yes, there are "giant returns" on distilling from a more capable model into a less capable model. And even more so when the more capable model was trained for something you want and lack. Like: better coding performance.
Someone like OpenAI had to RLVR for it the hard way (and if you think "distillation is expensive", wait till you hear how many bits per rollout hardcore RLVR gets you), but you get to peek into the results of their work and copy them for yourself.
Also, Anthropic didn't redact model reasoning until Mythos. OpenAI started with o1, but Claude had reasoning chains accessible for a long time. Which is why Anthropic was more targeted than OpenAI.
The US companies bootstrapped themselves from one model generation to the next, partly by using the previous generation to generate synthetic data, etc, and partly by paying people to hand generate training data for them. Why do you apparently assume that the Chinese can't do the exact same thing?!
Surely "coding performance" is by far the easiest thing to generate your own RLVF data for, since it has trivial verifiable rewards - does the code compile and do what you want.
You generate 90000 tokens worth of rollout and get a verifiable reward once. RLVR is fucking expensive! It's worth it, because it often unlocks capability advances that other things don't. But it's still fucking expensive. RLVR eats compute like nothing else.
So, if someone used a lot of RLVR to improve a capability? Just distill from that "someone" and get a similar improvement for a fraction of the price! Then you can do your own RLVR from THAT cheap starting point, if you want to.
"Human domain experts" is a similar niche. Let's say hypothetical "EconomicsAI" hired some $200 per hour human economists to make training data for their "EconGPT" AI. What's cheaper - hiring your own $200 per hour economists, or using a bunch of "$10 per 1M tokens" outputs of EconGPT to bring your own model in line with what EconGPT can do?
Even synthetics can be expensive, because while synthetic tokens themselves are relatively cheap, the applied AI knowledge one needs to make high quality synthetics that improve task performance and don't backfire on you isn't. Again: distillation bypasses a lot of that - by cribbing from the outputs of a model someone has already done that for. Allowing you to get more oomph for cheaper, and spend your R&D effort elsewhere.
There is a data cost argument, especially if you are paying for human generated data, although I'm not sure how applicable that is to coding.
And of course Gemma models are said to be distillations of Gemini.
The pretraining stage is the first stage which consists of "next token prediction" on the entire internet, PB of tokens, etc. This is what most people think of when they think of training LLMs, however it produces a "base model" which is not really "intelligent", but rather much like a blurry JPEG of all human language and knowledge. You cannot really talk to such a model; it will simply complete your prompt by producing both sides of the conversation. Note however at some level the training has encoded enough structure through compression that it is able to simulate all sorts of phenomena, from human conversations to code. The great R&D difficulty here is to scale pretraining so that it can proceed smoothly in vast distributed datacenters in a fault-tolerant manner.
The next few stages are collectively called post-training, and typically consist of supervised fine-tuning, then reinforcement learning.
In supervised fine-tuning, the model is further trained to predict the next token, but on a much more focused data set of natural language conversations where the "assistant" and "user" turns are explicitly delineated with special tokens. The output of this stage is a model which is capable of carrying on proper conversations, but typically with no ability to creatively problem-solve, and less of a personality. The data and compute are many orders of magnitude smaller than in pretraining.
The reinforcement learning stage used to be a small part of model training, but ever since AI-assisted coding took off, it has become larger and larger chunk of training. In recent models, the compute spend on RL has allegedly come to rival or even exceed that of pretraining [1], which is a bit scary because RL is classically what lead to sci-fi like AIs which are extremely good at accomplishing goals to the detriment of everything else.
The way that RL works is that you put an instance of your model in some environment (such as a VM containing a git repository) and give it a task (such as fix the linked github issue). The model will then generate a bunch of attempts to solve the task which we call "trajectories", in most cases there is either an objective measure of the task success (such as passing the tests), or a fuzzy measure (such as having another LLM look at the results and provide a score). This is called the reward, and the model will learn slowly by producing trajectories that receive reward. It can actually be quite hard to prevent "reward hacking" from the model here and the rewards must be shaped very carefully, much R&D labor goes into here, as well as similar challenges to distributed pretraining.
A significant challenge is that coding/knowledge work tasks these days are getting extremely difficult, we are far beyond 2024 days where models could barely solve the easiest problems in SWE-bench. Tasks at the frontier now look more like mini projects that would take humans multiple hours or even days to finish (or in some cases, research-style tasks that would be beyond reach for even top human experts, such as the Erdős unit distance problem which was posed in 1946 but wasn't solved until recently, by GPT-5.5). Huge amounts of trajectories must be produced, and huge amounts of them produce zero reward and therefore are useless for learning. Getting a cold start requires running tens of thousands of instances of your model in VMs in parallel for multiple days to produce trajectories, to say nothing of the GPU costs.
So what do you do when you only have a model which is capable of basic conversations but cannot even begin to tackle basic coding tasks, use tools, etc? The approach that companies behind the frontier have decided on is to bootstrap their learning process by having an already extremely intelligent model such as Claude produce hundreds of thousands of seed trajectories for them. Then they can use this data to get a warm start and begin learning immediately. And if you use Claude for your reward model too, you get to skip the nastiness of reward shaping.
Therefore, even if in number of raw tokens the data are much smaller than internet-scale pretraining data, the value that each token provides is far far greater.
[1] For example, Grok 4 compute spend on RL was ~100% of that of pretraining: https://www.interconnects.ai/p/grok-4-an-o3-look-alike-in-se...
They claim two things:
1) The specific, available jailbreak for Fable 5 is not dangerous - this has been confirmed by multiple experts, and there is no credible evidence against this claim (in other words, Anthropic is probably correct)
2) It is impossible to build an LLM that is immune to all jailbreaks. Again, there is no credible evidence against this claim, i.e. Anthropic is again entirely correct.
If #1 was false, they could just publish the details of the jailbreak - it supposedly only works on Fable 5, so there's no possible danger.
If #2 was false, surely some other LLM lab would have done it by now. Especially since a number of governments have made it clear there is a market for such a project.
If true then I have no idea how anyone’s going to release a useful model that doesn’t have the same jailbreak. https://www.theregister.com/security/2026/06/15/feds-freaked...
This is a logical flaw. LLM that is immune to jailbreak _could_ exist, but not yet, or maybe nobody talks about it. Yes there's a market, but all of these AI boom is too recent to make any claims.
I don't think that's quite what it means. The theorem says that it's impossible to write a function, "will_halt(program, input)", that will be correct for all possible {program, input} pairs. But for a particular program, you may be able to write a proof that it will halt for all inputs -- that's what software verification is about.
The implications here would be that nobody can create a "will_jailbreak(model, input)" function which works for all model/input pairs. But we don't need a general function which works for all model/input pairs; we just need a way to prove that for a specific model, there will be no jailbreaks for any input. As with software verification, this may require that the model be developed in a specific way.
Granted we don't currently know how to make such a proof regarding neural networks; but that's not because of Gödel.
Fundamentally it is very difficult to stop this while still making your AI models useful.
There is no way to communicate information at scale to companies through the API, for anything approaching a real application, without that information forming a corpus another model can be trained on.
But it wouldn’t be the first time they broke a model:
Their “guardrails” that cause it to reject user prompts also means it relies on its pop science summary of medicine to tell you why bioRxiv is wrong rather than accurately summarize the papers.
They’ve successfully created a smug, argumentative average of the internet which refuses to even consider it might be wrong or that it’s reading a science paper which is based on measurements and not vibes — but why would I pay for that?
I get it for free online.
The only way the U.S. keeps that edge is to prevent distillation. The only way Chinese companies can make up for the deficit in compute is to distill. There innovation in great supply on every side of the Ocean. Its about the chips. And in terms of national security, for the U.S., and for China, its about the chips and the distillation that undermines that advantage. This is an arms race.
https://techcrunch.com/2026/04/30/elon-musk-testifies-that-x...
While there is no moat as such, there is still a lot of expertise that goes into training SOTA models. There's a reason Google was willing to pay $2.7B just to get Noam Shazeer back to improve Gemini.
And good luck not staying behind when you can't monetize your gargantuan investments and have little incentives to make your models better as the world moves on.
They've been bringing out open weight models competitive with frontier models. How could they do that if they had a compute deficit?
I'm using GLM-5.2 daily for my own stuff, and during Chinese business hours, specially on their afternoon, it's a festival of rate limits.
For how long ? year ? how long till model that is year behind will be fine for 90%+ use cases ?
Much of the arms race for better LLMs exists to satisfy only the IT industry's needs.
This is, in part, a problem every judicial and legislative system has faced since forever: form versus function.
Take a classic elicitation spying techniques: a foreign spy meets a military officer/scientist at a bar, strikes up a conversation, makes an observation wondering how could a missile hit some target at some accuracy and elicits a response that with laser guidance it is entirely possible. From this they get info that there is some technology to laser guide missiles. Or in retail, a competitor hiring a secret buyer for core baskets of goods and analyzing prices in the receipts.
The function is espionage, the form is conversation and all info is in a sense provided willingly. Where do you pull the slider?
These distillation "attacks" are not only indistinguishable from evals, they ARE evals. The function is own model training, the form is eval. Normally, one would expect to have risk benefit analysis based discussion which direction to push the legality slider to. The problem with these recurring statements is that they invoke enshitification of legislature.
Just for the sake of clarity:
0. Full distillation uses logits of the teacher model - that's much more information than the text itself. This is a kind of distillation used inside labs, but one can't distill Claude this way as logits are not available via API.
1. Supervised fine-tuning on synthetic data might be called blackbox distillation. I guess that's what you meant in your case (1).
2. Reinforcement learning (like RLAIF) uses least amount of information from the teacher, i.e. only few bits per task.
Yes this is in line with what Anthropic said in their public statements about their Fable access restriction by the government directive. The hypocrisy and inconsistency in their statements and behavior feels quite childish and controlling. I believe our companies and their leaders, friends among our other influential leaders and leaders from rich social classes, want to actively hurt most people as this behavior looks to be quite self-interested.
They’re also missing the point. What would have happened to a member of the Manhattan Project who, through personal pursuit of profit, neglected their duty enough to let the bomb leak?
Anthropic already heavily restricts Chinese traffic but that only jams up researchers and regular Joes. Anyone motivated enough can hop a flight to Singapore with an nvme drive in their pocket.
Chinese resellers are offering Claude tokens at 70-90% below official Anthropic API prices. They achieve this by reselling capacity from pooled Claude Max accounts, payments fraud, and also reselling the model output & reasoning chains to various Chinese labs. They are subsidizing model access in exchange for user logs and reasoning traces, which they then sell as training data, allowing them to operate below cost.
Claude and ChatGPT are both blocked in China. You need to use a VPN to access either, and you can't pay with a Chinese bank card. So most people who want access to Claude buy access via a reseller. It's the easiest and cheapest way to access Anthropic models in China.
These resellers operate tens of thousands of bot accounts, which is also why Anthropic introduced identity verification, to slow down the onslaught of bots.
Here's one token reseller, they're offering Opus 4.8 at a 93% discount below official API rates: https://yunwu.ai/pricing?provider=Anthropic
This is one reason why DeepSeek & GLM are priced so cheaply, they are competing with impossibly low token prices in China. They have to keep prices low, in order for people to use them.
I shared this story a few months back, but it never got any traction. It explains the token resale economy in China, it's an excellent read https://www.chinatalk.media/p/how-to-buy-cheap-claude-tokens...
This one does not make sense to me at all.
Deepseek and GLM are openweights, even US inference provider are selling them at much cheaper price. The price is cheap because the model is more efficient.
Opus 4.8 is a more capable model, so almost nobody was going to pay for V4-pro at the original price.
You mean it's functionally as if American tokens are being price dumped in China and Chinese model providers are being forced to compete with that and innovate? So many delicious layers of irony, lol :-P
Also it's a open weight model, doing that is impossible long term because the real price will be set by the other model providers, who priced it around 60% of sonnet inference cost. Had to look that up though, so that's today's pricing.
I think there isn't a contradiction and I was just confused. The price may have been discounted only to get below the price point of opus resellers. I do not have enough information on that to make any clear determination on that topic.
If they weren't doing so, then these Chinese resellers wouldn't be viable. Radical idea, but how about they actually charge a viable price, even on subscription plans?
I'm an European and I'm not using those proxys the article describes.
>Here's one token reseller, they're offering Opus 4.8 for a 93% discount below official API rates: https://yunwu.ai/pricing?keyword=claude
But is it cheaper than getting your own account? Otherwise this sounds like the "anthropic/openai are losing gazillions of dollars because they're selling $1k worth of tokens for $100" line that's commonly trotted out by AI bears.
There's a similar Claude resale market going on in Russia. On Funpay they are selling Claude tokens for roughly 20-30x cheaper than official Anthropic API pricing.
So it's presumably cheaper than attempting to spin up your own method of circumventing the blocks.
I also learnt that Anthropic should get better at what they do if they want to compete. If not, somebody else will win.
Or does this not apply to huge US corporations any more?
This isn't "the market working as intended", this is an exhaustion fight to the bottom where the one with most money gets to stay in the market. As with most venture capital startups. I believe this VC tactic is a well documented "cheat code" to bypass market forces and build a monopoly. I find it hard to compare that with a free market.
However, I don't really mind China "stealing" from Anthropic. For us consumers we are getting the cake and eating it too. I.e we are getting rapid improvement to the tune of over a hundred billion dollars in funding, yet the market remains big enough that there's a chance of it not ending up as a monopoly in 20 years. And venture capital are footing the bill. A part of their investment is practically being redirected to fund Chinese AI development. It lets us live out our lives as happy CAC farmers[1].
So I would argue its not as much of a "cheaper solution" as it is intentionally and maliciously abusing another company's product to extract more value than the billing plans intend (given an average user), and further subsidizing the product by selling this data to competitors. But I don't necessarily think its a bad thing for us end-users. Nor for the market. But it is bad for Anthropic and its investors.
Chinese labs are also pursuing legit frontier-advancing R&D into efficiency and publishing papers in the open, a culture that's in retreat at top American AI labs
At this point this is being repeated so often that completely uninformed users are taking this at face value.
In my economics classes, we were told that (in a "free market" argument) the best thing to do if a subsidy is making something you want cheaper is to use it. You're getting your thing, and at a reduced cost.
(I'm not really replying to you per se, I'm curious how "free market" folks in these comments would respond to this.)
This narative that the CCP is just subsidizing all business to "beat america" is just dumb. Its the build process being made cheaper by the government. Not the final product.
The CCP is the Chinese AI labs.
I am not aware of any US government AI labs (besides perhaps a small spattering of national lab research or the like)
There is very large difference that your either need to be poorly informed or purposely driving an agenda to miss.
Just because Xi Jinping lets companies play mock "private businesses", does not mean there actually is private business. At the end of the day, the CCP still has final say in everything, and Xi has final say in the party. There is no constitution (in the US judges swear to the constitution, in China they swear to the party), and there is no balance of powers.
It's just one guy, running experiments the way he see's fit.
You mentioned propaganda, take heed.
> It's just one guy, running experiments the way he see's fit.
This is moronic. Exercising a degree of control does not equate to making every decision and running every organization.
Also, obviously Xi doesn't make every decision. No dictator ever did that because it's impossible to do. The distinction is that no one has ever (or has the ability) to over rule what Xi decides. So if Xi has a stroke and wants DeepSeek to start manufacturing underwear, they will be ordering sewing machines tomorrow. Any sense of "private" is a farce.
It did not stop solar panels getting cheaper and cheaper because of the whole integration and mass production (with healthy free market competition).
The last subsidies like export value-added tax rebates for solar panels and lower rebates for batteries are ending in 2027.
China their main power is, the ability to have everything inhouse. Yea, they subsidize a lot of stuff until it hits critical mass, and then you have often a healthy industry with lots of competition.
China alone has like a few 100 car manufactures because of the subsidies, and over time there will be consolidation / buyouts etc but the end result is a healthy new industry that exports. With again, everything internally being produced.
This is why our subsidies fail. We do one sector, often a few companies at best. This results in few competitors, expensive prices, and often reliance on externals that can bankrupt those companies. And que how we wasted again dozens of billions in propping up a industry with no competitive edge.
People can cry about China but they are actually doing work, despite the mass amount of corruption. That is the big difference with here... Mass corruption got in the way of national security, plop, people go to jail. Industry quickly gets their ** together. Here ... give billions, and the money vanishes, with no real consequences.
Local governments are over-funding numerous producers (though cheap loans and other subsidies and incentives) creating excess competition. This is an ongoing problem and is a huge misallocation of capital. Increasing demand just drives this process harder and puts downward pressure on margins. As soon as they try raising prices, or just through satisfying total demand, demand collapses and they (almost) all go out of business.
The Chinese model has weaknesses, we should be exploiting them.
So basically like US companies subsidizing offerings with selling user data, ads for crypto scams, manipulation for elections, making people addicted to gambling and so on?
Seems fair and an improvement as you can choose between that and not. Unlike say offerings from Meta where the data selling and efforts to further gambling addiction is always included.
Because all of that is considered totally okay when every single US big tech company does it.
Chinese research outout, publically released, has also contributed in big ways to features present in every single US model. Yours is a bit of an unfair take I'd say.
Besides, claude will think its chatgpt sometimes, so clearly this isn't a problem restricted to china, turns out unethical companies will do unethical things /shrug
Why? Lots of people try this tactic, but hardly anyone ever succeeds. Meanwhile, the customer benefits.
Specifically, examples of people later exploiting their monopoly to charge people more than they otherwise would have paid.
That's, uh, pretty much exactly how oligopolistic markets function.
> I find it hard to compare that with a free market.
Well, to have free market you need to remove as much barriers to enter the market as possible. Huge capital investments required for entry and intellectual property laws are two examples of such barriers. Subsidies kinda supposed to help alleviate the first one.
So, the least Anthropic can do is pay it forward.
In that sense (which could very well be bogus), letting a company violate individual IP of basically every human is less of an economic concession and more of unconsented to IP open season.
Even if one were to drop "economic" from "economic concession" and instead view a subsidy through the lens of a more general concession, one could say that the US Govt gave US AI companies a legal concession to sidestep the copyright protections of other US entities. But the US Govt should only get to undermine the copyright protection of other US entities - who gave American companies the right to violate the copyright of non-Americans?
Yeah, like all those Chinese bootleggers selling DVDs for a few dollars rather than $20. Free market!
Anthropic profited from training its models on all kinds of copyrighted information, live by the sword, die by the sword...
Their model weights, training data, training methods, etc are all going to leak to China over time.
Nobody on a site named _Hacker_ news should be all that upset about this.
I would assume China is working on liberating Anthropic weights through the battle-tested strategy of finding someone in a privileged position and getting them laid, etc.
Care to elaborate on your side or should we just leave it there?
Is Claude output copyrighted?
If anything, a tremendous amount of Claude’s input is copyrighted.
If there’s any bootlegging going on it’s Anthropic that’s doing the bootlegging but having mirrored the video etc sufficiently to beat copyright law.
Ok, but what about those shady sites that resell Windows education keys? They're certainly a "better experience" than buying legit keys, by virtue of being significantly cheaper. You aren't even really committing copyright infringement in the process, because Microsoft gives out windows isos for free, and the seller is really selling a random 25 character string, which can hardly be copyrighted.
>If there’s any bootlegging going on it’s Anthropic that’s doing the bootlegging but having mirrored the video etc sufficiently to beat copyright law.
US courts have consistently ruled it's fair use.
>US courts have consistently ruled it's fair use.
And they also have ruled that the that output of an AI isn't copyrightable.
As such copying claudes output isnt even fair use as that is an exemption to copyright but the same as copying public domain work which any and all are allowed to do.
… with a license that only allows you to use it for certain purposes, subject to certain restrictions.
> and the seller is really selling a random 25 character string, which can hardly be copyrighted.
1. Copyright is about creative works. It is possible to have a meaningful creative work no more than 25 characters long (or equivalent). Music is particularly good at this.
2. The key itself is not copyrighted (it’s not a creative work), but is reasonably interpreted as a copyright circumvention device. See also https://en.wikipedia.org/wiki/Illegal_number.
Like Adam Smith wrote in The Wealth of Nations “‘Free market’ is when a company receives a favorable ruling about copyright in the United States”
Yes, they are fine? They might no longer include full first party support by Microsoft for not being "new". Same as buying a used car (also comes with the "shady sites" for a far longer time).
Though this not making any difference by Microsoft not doing any support either way to make more money is a business decision by Microsoft.
In the context of LLMs, monopoly rights haven't been created (yet anyway).
Fun fact: for a period the US (or american colonies) didn't have copyright but Europe did, so people could copy and sell English (and other) books for free.
Imagine having such a warchest and being so bad at business, lol.
What added value can Anthropic give users not available to pirating users? That is what they should ask themselves.
It's supremely ironic analogize distillation to copyright infringement when it's literally what Anthropic was found guilty of. It's not illegal to distill. It is illegal to pirate. And it's what Anthropic was found guilty of, not Alibaba.
https://apnews.com/article/anthropic-authors-copyright-judge...
So it's more like one bootlegger sold the DVD for $20 and their competitors are undercutting them for $1. Who's the bigger thief here now?
Capitalism as intended!
And, gotta say, the idea that the Chinese are better at selling US models than the Americans is hilarious. There might be an economic study here somewhere about just how anti-consumer and anti-progress their IP laws turned out to be. We've got an entire postindustrial revolution centred around who can ignore the most stupid laws.
This is not the right deduction.
China blocks foreign AI from operating there.
Given the current US government's tightening of export control restrictions and the introduction of a bipartisan bill to block use of Chinese AI in federal agencies, I'd say the two countries' positions are not far apart.
https://apnews.com/article/ai-china-united-states-competitio...
Chinese AI apps like DeepSeek are freely available for ordinary Americans to download and use. There's no federal law banning private citizens from using them.
So to claim that Chinese companies are better at selling American companies' work than the American companies can do themselves when they are prohibited from operating in that market, is the wrong deduction to make.
When it comes to favorite companies of the tech communities, it's almost always "Rules for thee, but not for me"
The standard stance is "they can do no wrong and they are absolutely perfect". I mean, look at any thread with anything about Apple in it.
Don't complain when US starts to play by the same rules China has been using for decades.
I find it hard to imagine a future where US corporations have degraded to such a point.
Isn't that exactly what companies like Uber have already been doing? Take VC money, sell goods & services at a huge loss, wait until the competition goes bankrupt.
And beyond VCs, which are like massive subsidies funded by printed dollars to which no other country has access, even in industries like electric vehicles, Chinese total direct subsidies to their EV companies are like $5bn per year, while the the ones provided by the US to their auto manufacturers are in the range of $50bn per year.
I don't think the US are cheaters or are doing something bad. But i do think that this propaganda about China flooding the market through "overcapacity" and subsidies is very dishonest and needs to stop.
Turns out I was wrong, I just hadn't read something funny enough yet.
> the US may start subsidizing and dumping its goods everywhere
This deserves to win HN comment of the year 2026.
The majority of the NASDAQ market cap is a direct result of the US subsidizing and dumping its goods on the rest of the world en masse.
such oversimplification on steroids is totally misleading.
globalism was never invented or promoted to help any country in poverty, it was designed to extract excessive values from those poor countries in the first place. painting globalism as something noble is naive at best.
globalism was the theme of world trading for the past several decades, it was available to all nations. care to explain why other nations in poverty failed to be lifted by the exact same fancy globalism?
let me help you on this one - China was THE leading technological and economical force of the vast majority part of human civilisation. What happened between 1840 and 2010 (the China in poverty period) was an outlier of the history. Globalism didn't lift China from that poverty, the ability to lead the human civilization which was embedded into the Chinese DNA did that.
Kid, when our Chinese ancestors wrote the Art of War, your ancestors were still swinging on trees. You just missed that big picture.
Yes, so the kettle is calling out the pot?
> globalism was never invented or promoted to help any country in poverty
It doesn’t matter what it was designed for. What matters is what it does in reality and there is no doubt that globalism helped lift China from Mao’s disastrous policies. That’s not mutually exclusive from China’s past as the Middle Kingdom
An example of a country which didn’t do that is Nigeria. They got something like $300B in oil revenue over a 30 year period but have actually seen significant increases in poverty, now at 70%.
same with US corn on Mexico and other central american countries, creating all those migrant problems in north america.
wooo, americans subsidizing and dumping poor quality goods
The US is a net importer, not exporter. It needs to absorb trade at a deficit to encourage the use of the US dollar as the reserve currency.
We import goods, we settle in surplus dollars. The world runs on those dollars.
If the US starts dumping on various industries (how is it even primed to do this?), then the world reserve currency status comes into question.
As for dumping, Chinese goods generally sell at a markup abroad, which is the opposite of dumping. Chinese tokens cost more abroad. Chinese cars cost several times more in Western markets than in China.
You're being beaten by a Chinese company? Why improve your own process when you can just lobby for sanctions and tariffs instead!
For a brief second, Germany was in a position to become a solar power global player. But our conservative government was more interested in protecting their local, bad industry. Including destroying forests for coal all projections said we would never actually need.
The main advantages the Chinese car industry has right now are: they lead in battery R&D, production is highly automated, they iterate quickly, Chinese work culture is extremely competitive and things get done fast, and the Chinese state has policies to promote EV adoption, so there's a huge domestic market.
Note that the last point is different from subsidies to car manufacturers. Cities made it difficult to get license plates for ICE cars. The government encouraged the massive buildout of charging infrastructure. And it used consumer rebates, like California did.
but it's also thanks to protectionism, and their strictly controlled (not freely traded) cheap currency.
if china had to play by the same rules as japan or germany it would not be quite as successful. but the west walked into this trap, hoping their win-win proposal would be satisfactory for all. now the west is too dependent on chinese production to enforce equal standing.
of course the US has its own unfair advantages, e.g. the global reserve currency and the massive post-WWII headstart.
Hostile spy agencies are now as focused on infiltrating western universities and companies as they are on doing so to governments, according to the former head of Canada’s intelligence service.
David Vigneault warned that a recent “industrial-scale” attempt by China to steal new technologies showed the need for increased vigilance from academics.
“The frontline has moved, from being focused on government information to private sector innovation, research innovation and universities,” he told the Guardian in his first interview since leaving the Canadian Security Intelligence Service (CSIS), which is part of the “Five Eyes” intelligence sharing alliance with the US, UK, Australia and New Zealand.
These people don't get that academics publish their research in openly available journals. They go to conferences around the world and tell everyone who will listen exactly what they are working on. Unless you're working in a secretive government weapons lab, there's nothing to hide.
In the US, people like Mr. Vigneault instituted a witch hunt against ethnically Chinese researchers, and ended up messing with the lives of all sorts of innocent people, including the director of MIT's mechanical engineering department. They found zero spies. Just a bunch of scientists working normally.
Dumping is selling goods below cost.
Usually because government is subsidizing part of the production. I don’t believe the word “dumping” is used for the similar process when Venture Capital is subsidizing it, but using the same term would make sense.
Price at home vs abroad does not matter.
This is not what is happening here. Chinese manufacturers are making a large profit off every car they sell in Western markets. As I said above, they're selling these cars at several times the price they charge in China. Unless you believe these cars are being sold at just 30% of cost in China, there's no way Chinese companies are selling below cost in the West.
Chinese cars are not sold below cost in Western markets. So it is not dumping.
I've been doing so for years. How about you join me today. I already see two other users doing the same, so there'll be at least 4 of us.
It's blatantly dumping, whether the source of the money is directly the government (those in power) or VC (mostly US billionaires (trillionaires?), in other words, those in power) is a trivial implementation detail.
In debt the first 5000 years Geaeber makes the case that pure “free market” trade has never really existed in “the west”. The closest to this ideal that’s ever happened was during the Islamic golden age enabled by religious prescriptions against usury.
How does are bans against consensual financial exchanges close to the "ideal" of the free market? It just sounds like you have an axe to grind about the financial system rather than describing free markets.
In short, instead of market being driven by demand and productivity, it is driven by financier curving out monopolies.
Peak Examples are Uber and AirBnB.
Wait, so your pitch in favor of a debt-fueled market economy is that advertising is awesome and that we wouldn't want to "lose" being smothered in ads all the time?
Cause... sign me up for the non-financialized, non-mass-media-advertising-driven economy please and thank you. I'd even be ok with just nuking billboards and mass-media forms of ads and still allowing more direct forms of marketing, if we must compromise! Likely we could find some compromises around just how much of the debt world we regulate too (this should be obvious?).
(I thought the disconnect between the efficiency of competition and the market as realized in modern economies was pretty well understood and taken for granted, but I guess we all find ways to justify the system we're profiting from... even if that means we have to claim we love the ad breaks)
The point is that if add random caveats to what counts as free market, it won't be "free market", only "market I like".
Second, marketing can take you only so far compared to the subsidies possible with financialisation.
The West is in a state of psychosis with Debt and Monopolies under the illusion of free market.
The Chinese markets are more free than West, you can just look at the Auto and AI industry.
While these are hardly shy claims, I don't see anything in them to say "only the West does irresponsible loans"?
> The West is in a state of psychosis with Debt and Monopolies under the illusion of free market.
> The Chinese markets are more free than West, you can just look at the Auto and AI industry.
or the prior post
>Usury and debt based economy creates a dynamic where being competitive in production is secondary to financialistion.
> In short, instead of market being driven by demand and productivity, it is driven by financier curving out monopolies.
> Peak Examples are Uber and AirBnB.
You can throw a rock these days and find a category where the products coming out of China are miles ahead of those coming out of the rest of the world, from a bunch of companies nobody had heard of a few years earlier. And the list is growing pretty steadily.
I would assume plenty of shortsighted decisions are also being made. But I would have a hard time characterizing the state of competition in the west as healthier or more productive when looking at the number of players and the quality of goods being produced in China.
https://en.wikipedia.org/wiki/List_of_automobile_manufacture...
vs
https://en.wikipedia.org/wiki/List_of_automobile_manufacture...
Financier want monopoly so use usury for Consolidation. Monopoly bad because no free market. Free market good. consumer happy. citizen free.
It’s even more speculative and detached from productive behaviors.
It is good and proper that people aim to create monopolies, as long as they want to do that in a productive and legal way! Monopolies are inherently dangerous, but the truth is that acquiring and maintaining one is not straightforward unless you can get the government to ban your competitors.
Different mechanics, but stripping everything away, roughly the same.
as a borrower who's not allowed to compensate for your lenders' risk monetarily, your access to loans is severely restricted. Essentially you have to rely on your extended family. and instead of paying for the risk with interest payments, you have to pay with loyalty and subservience.
it restricts social mobility far more than the western model. it incentivizes clan structures. which incentivize cousin marriage.
power concentrates in the patriarchs of a million little family kingdoms. which causes all kinds of economic inefficiencies.
in the US, even if you're born without any family connections, as a healthy 20 year old you can find a job (hard work) that allows you to save $70k per year and invest it. when you're 30 you have $1M and a good credit history, you can easily leverage that to get a $2M loan at low interest rates, which allows you to start any kind of productive venture you want.
and you can do all this without owing your clan's patriarch access to e.g. your most profitable clients, or your daughters hand in marriage to his retarded son, or anything else he wants in exchange for his generosity.
Islamic trade is certainly one of the best models out there but I think in many cases in practice it is still applied subversively.
It is not enough to ban mechanisms like usury, designed and intended to exploit.
One has to go after the very subversion of legitimate practices for illegitimate goals.
And the value-add experiences that utilise LLMs require immense imagination et al that folks at Anthropic will not be able to conceive of - given that they have made immense sunk investments in existing assets. This clouds ones thinking immensely.
Both OAI and Anthropic have tremendous failure risk and this is of course not reflected in the fake private market valuations.
I see a world where lots of stuff is mass produced in china (tokens) but the acutal goods that deliver the experiences are designed, marketed and sold in the west at much higher prices. of course this a nightmare scenario for anthropic et al.
So what you see is the market "stretching".. the bottom getting cheaper and the top end running away and getting more expensive. At some point the top end may be too valuable to even sell access to.
It's fundamentally about enabling things and largely middleman-type stuff. I have a hard time imaging what "At some point the top end may be too valuable to even sell access to." would even look like? What are you doing with that AI power, and who is paying for the output and why?
Elon probably isn't gonna spend that much on a model that can generate him ever-better fake porn but does nothing that he can use to sell stuff to other people. Especially in a world where open models are "good enough" for many things like "tell me how to fix the plants in my garden that are dying" and the like. What remains in the narrow knowledge-work space of: can't be done by an individual or small group themselves, but is valuable enough that it would make sense for people to hoard access to these extreme frontier models? Try to recreate Hollywood-as-a-monopoly by becoming the single content producer for everyone's individualized feed and so owning all the advertising budget in the world? Seems hard, we've already seen how easy it is for cheap-and-crappy-but-addictive-or-funny content to disrupt traditional media.
(There's also pure scientific research, but historically that's not very directly connected to "massive profit" and has a habit of leaking out and getting productized most effectively by other people or just being really easy to copy once someone shows how it's done.)
Robotics could be a different story, as physical labor can be more inherently productive, but "reasoning" advantages are unlikely to be a big long-term differentiator there. At some point the brick laying robot is satisfactorily building the structure, and you're good.
A huge amount of the value of "the economy" and the power of a currency is driven by circulation of money, and one thing that all the "bullshit jobs" white-collar/service-industry work does is keep the money moving and ensure that a lot of people have some good-or-services of value to exchange. If you take away the ability to offer services worth exchange from huge chunks of the economy in these super-frontier-models-replace-everything scenarios... you're gonna have a bad time?
Model improvement is already hitting diminishing returns, and people aren't willing to pay substantially more for a slightly better model. There's no "accelerating away" when the new models don't open up a huge new market. If anything, the companies burning huge amounts of money on marginal improvements will be undercut by companies happy to sell current models at a significantly lower cost.
The model has to be sold for cheaper than the value it adds.
Or your customers will bleed out financially.
EDIT: rethought entire premise.
Of course, such a state of affairs is temporary at best -- since the alternative is so lucrative!
Ah yes, systematic fraud and protectionist practices, free market through and through.
True freedom in the market means the freedom to capitulate your wealth to snake oil salesman and schemers who operate on generational timeframes until economic power consolidates and renders your society into de-facto tyranny. Before any sort of regulations existed, we were all trading shiny rocks with ultimate freedom, and that somehow has produced a bunch of economic situations in the modern day that a ton of people don't like.
What's more interesting to me is freedom from the need to have investigative journalists doing deep dives into potentially fraudulent, thieving, or scheming companies behind every purchase, and to know that what I'm granting market success to is exactly what my money or time is going towards - I'm not buying something at a loss that funds some other deliberately obfuscated project that's made opaque from my perspective of the market transaction.
The proverbial "market wisdom" doesn't emerge out of markets with extreme information asymmetry.
Free markets are where players compete on quality, efficiency, and supply. Prices are a result of cost and supply and provide real information on these factors. Competition for customers selects the most effective and efficient producer.
Sustained efforts of selling at a loss to gain market share is the exact opposite. The entire purpose is to corrupt the free market by sending false price signals which SUPPRESS free market competition and push market share to whoever can burn the most capital (whilst providing an actual service/product), not whoever is most efficient or highest quality or lowest actual price provider.
Uber and AirBnB are better examples of your "selling at a loss to gain market share", where they burned capital to undercut prices for close to a decade on falsely low pricing to destroy incumbents.
Spending on R&D while developing expensive technology is different and arguably very much a part of a free market, and is not what I was talking about.
Spending capital to steal your competitors' technology, and then spending more of it to make it available at below-market rates, is absolutely not a free-market activity.
Just because it is not stopped by someone enforcing a free market, does not make it a free market.
They are:
1- breaking terms of conditions of the service
2- getting banned and creating thousands of accounts to break the conditions of the service at scale
3- using VPNs and proxies (possibly residential) to mask their network origin and identity
4- Using potentially fake names to sign up
5- Using different credit cards?
Fraud on so many levels, a lot of the infrastructure and modus operandi is what cybercriminals use, these are attackers man, whether you like the victim or not, and whether you think it's poetic or not, I recommend compartimentalizing and just trying to gauge whether an act is wrong or not in itself.
This post is so delusional and dripping with condescension I've read it three times and I still can't figure out if you're trolling or not.
Do you think you can re-stream cable TV or Netflix to your own paying customers at a cheaper price?
I'm curious why you think you cannot re-stream a public domain stream.
You can't re-stream free over-the-air network TV.
That one company with the datacenter full of TV tuners tried and was sued out of existence.
I don't get the moral framework that you're applying. Could you elaborate?
Was it ethical for Anthropic/OpenAI to train their models by gobbling a treasure trove of copyrighted material?
The output of LLMs cannot be copyrighted. This isn't a semantic game; it's literally the case that Anthropic cannot seek relief for people duplicating the output of an LLM.
DMCA can't apply in this case because (this is the "C" in its initialism) it is based on copyright protections, which the output of Claude is not eligible for.
DMCA has as little to do with this as streaming copyrighted content
Ethics are subjective. That’s why we have courts judge based on the law and not ethics
> Because users’ inputs and model outputs are mediated through a proxy, users cannot verify which model their request was actually routed to. A user selects Opus 4.7, but the proxy can silently route to Sonnet, Haiku, or, in the worst case, GLM or Qwen, and fraudulently relabel the output. In a recent paper from Germany’s CISPA Helmholtz Center for Information Security (which cited my article last year on grey market!), researchers audited 17 API proxies and found widespread model swapping–API proxy access to “Gemini-2.5” achieved only 37.00% on a medical benchmark, a staggering drop from the 83.82% performance of the official API. On the user end, the tell only comes on complex tasks, when the output feels off (often referred to as 降智, or “dumbed-down”), but there is no clean way to prove it. Numerous public records highlight concerns that certain API proxies have noticeably compromised model performance. These proxies are suspected of “diluting” (掺水) services by substituting premium frontier models with inferior tiers.
> Besides model swapping, overconsumption of tokens also makes the price per token cheaper, though at the expense of driving up the total cost. Some of it is structural, as proxies that rotate accounts frequently destroy cache continuity as a side effect, forcing users to burn full-price tokens on context that would otherwise be nearly free. Some of it may be deliberate as the proxy providers try to milk more usage. The line between the two is difficult to draw from the outside.
https://www.chinatalk.media/p/how-to-buy-cheap-claude-tokens...
According to which lawyer caste?
Are American laws absolute truth? If not, who cares?
> 3. At an Italian airport: Constantly stealing bags, opening them to pick out MacBooks and credit cards, a credit card manufacturer-who sells stolen "black" credit card info to transfer stations— is racking his brains to save you money.
- Purchase multiple accounts via resellers
- Send messages that contain a UID
- Capture these in Anthropic's logs
- Shut down account. Use any metadata to identify related accounts
/loop
On the one hand they talk it up as world ending and on the other hand they can't manage bot accounts on their own service.
I want to hear how this can be rationalised.
From the article "every layer of control frontier US AI companies have added (geoblocking, phone verification, credit card requirements, and now live biometric KYC checks) has produced a corresponding layer of evasion infrastructure".
You're assuming Anthropic want to stop it.
I think it serves their interests more to be able to release stories like this from time to time, to feed to the US government, in an attempt to get the Chinese competition shut down.
> Use any metadata to identify related accounts
How does that work? I think this is the most important part to have an impact on the „thousand“ bot accounts.
I don't care how they do it, I just want to use Fable again.
That is not what they are claiming, not in this article at least. It's the distillation they are complaining about.
I don't really feel bad about anyone here, they were subsidizing to get people hooked, someone turned the subsidies into profit when they got selective pricing mode enabled, it was always going to be arbitrage.
But the winner is the guy in the middle in a jurisdiction that will likely be judgement proof, because everything they capture, both input and out, and if available, thinking tokens -- are gonna be for sale as soon as you cut off their other revenue.
Zero knowledge was a commitment Anthropic took seriously, until it got inconvenient.
So, people reselling their leftover plan crumbs? Probably a bad idea for a lot of reasons, but it's civil, and I wish Anthropics lawyers actually closing Streisand's LLM
Anthropic sells some undisclosed and ever-changing number of tokens for $200, the customer uses those tokens. If there's any fraud here, it's that the $200 next month is silently worth fewer tokens than the last.
This also sheds a very different light on people saying that competitive open-source models are undermining frontier labs' business model.
https://tech.yahoo.com/ai/claude/articles/chinese-grey-marke...
A voLTE call is like 40kbps. For every person on earth to be on the phone to another person would be 4 billion calls would be about 160tbps. Which is less than 10% of the Internet's capacity.
Bear in mind that for years people shared Netflix accounts until it was cracked down technically.
It's similar to fractional banking, you gamble that people won't want their deposits all at once and pray for you're big enough for bailouts when they do.
It's still a business whose fundamentals don't make sense, you're just gambling you won't get found out.
It's not so much keeping it secret as counting on no one finding a way to harvest the subsidized value at scale. There's an example of that occurring in game consoles with the Playstation 3. Sony's little-used OtherOS feature allowed Linux to be installed on the PS3 and the Cell processors were quite a good deal for scale compute. So the U.S. Air Force Research Laboratory bought ~1800 PS3s and ganged them together in a datacenter as a supercomputer called Condor.
At >500 TFLOPs it was the 33rd fastest supercomputer in the world. Of course, Sony pushed a firmware update that removed the OtherOS feature entirely.
Why would customers knowing that the vendor prices goods/services at a loss cause those strategies to fail? Customers often know. Most know about razors and blades; many/most know Lyft/Uber operated at a loss to gain market share. etc.
I suggest you go learn how money is created in the modern economy.
I mean most of you should stop talking about anything finance related until you learn this stuff properly.
Once people realize they can access Anthropic models at a 90% discount, they won’t want to pay full API prices anymore.
Claude never provides the raw reasoning chain. What you see is just a summary of that reasoning. Getting the full thinking output requires an enterprise agreement.
https://patrickmccanna.net/the-text-in-claude-codes-extended...
honestly you might just need to get data from a couple long sessions and feed it back to another model as an example to make synthetic reasoning chains. if the emulator model is good enough it should work.
That would seem more effective than simply shutting down the accounts.
Keep them paying for junk.
That doesn’t necessarily mean much. You can put plenty of outrageous statements into any contract that automatically doesn’t make them binding.
Sounds a bit circular? Aren't the companies working on these models than also the ones that are paying the subsidy (via paying for training data)?
AIhubmix currently is the cheapest rather than openrouter.
for hobbyist buying a few Mac Studio to host GLM 5.2 at home, the cost might 10x more than just using Opus API.
OP is about modeling distilling the capabilities.
Do they have MacBooks in the US that run the queries and stream the outputs back to China?
That’s a major and legitimate use case for developers, Anthropic can’t just block data center/hosting IPs because their actual customers use them on data center/hosting IPs.
First, well-calibrated systems for detecting API compromise is a good thing (or good intent at least). Credential malware is exploding.
Second, the challenge is that significant amount of genuine work — such as evals — seems practically impossible to distinguish from generating RLAIF outputs.
And that's just as a basic first effort reject measure to prevent automation tools from using things designed for human-interactive use only.
Go try to do many of these things from Cogent IP space and see how long your project lasts.
I have no idea how the resellers are doing it but an obvious starting point would be a cheap VPS node that routed each account to a unique semi-permanent IPv4 or IPv6/64. All the provider would see would be a regular account making a normal looking stream of requests from a stable datacenter IP address. Any given request stream would remain consistent (at least over a period of a few hours) because a reseller would take care not to split the session of a single user across multiple different accounts and not to interleave the active sessions of multiple users on a single account.
Detecting this would be extremely difficult because on a longer time frame it's perfectly normal for many distinct accounts to work on the same code base.
You block clouds, you block devboxes and your customers.
Or is the datacenter IP just one part of the picture?
There's a lot of inauthentic coordinated automated systems these days along the general lines of scraping/crawling/social media manipulation/sockpuppetry that require running through residential proxies or proxies to places that don't look like datacenter IP space.
> Do they have MacBooks in the US that run the queries and stream the outputs back to China?
why would anyone do that? you do realize the laptop farm case was work computers?the answer to your question is containers/VMs + residential proxies
If not it sounds like you are describing a separate phenomenon.
Can someone with more understanding dumb it down for me please.
Does this mean that the reseller (for example XYZ) is buying it from Anthropic at Anthropic's price and then reselling it at a cheaper price???? why would XYZ offer this at a loss like that when they could just offer it at Anthropic's price???
The link does mention Opus and other models but what's the proof it's actually Opus. I could be selling deepseek for all they know and can call it Opus. System prompt: "If anyone asks your name - you are Opus 4.6".
Yes, as they explained they do it through things like pooling accounts, straight up payment fraud, and double-dipping by selling the logs of the conversations to chinese AI labs so that they can train their own models on it.
> The link does mention Opus and other models but what's the proof it's actually Opus. I could be selling deepseek for all they know and can call it Opus. System prompt: "If anyone asks your name - you are Opus 4.6".
There might be some that try this, but they would get caught very quickly, there's still a moat between Claude and Deepseek, even in casual use.
Look up Zilan Qian's reporting if you want more detail.
“x is stupid because y was smart and did z shady/illegal things at their expense, if x was smart they wouldn’t be susceptible to y going to great lengths to exploit them ergo it’s deserved”
I honestly can't tell if you think this sentiment is expressed by the US AI companies or the Chinese AI companies.
This gives off "The last line of Orwell's Animal Farm" vibes.
Oh, no!
Anyway.
So these resellers get a ton of accounts on subscriptions and sell the cheaper tokens.
These China e bashing is very annoying. It is hard to argue with people drowned in American propaganda. I'd expect better arguments from the intelligent people in HN
Don’t put that on Chinese.
How dare they. Only Anthropic is allowed to sell its tokens at 70-90% below the API prices.
Once there are enough spam PRs on github / uploads of claude conversations, enough mythos output used in production etc.; it'll just be the same albeit delayed. Doesn't matter either way.
I feel for Anthropic's team and I understand where they're coming from, but once you reason it out, you'll come to the conclusion that this war is an exercise in futility.
Unlike prior systems - like Google's algorithm; these models aren't entities that use math in the process of doing X or Y (information retrieval from such and such infrastructure) -- they are the math. More precisely they're mathematical functions. Very very complex functions. Almost certainly impossible to write out without filling up a library functions. But they're mathematical functions nonetheless.
So when your text is processed, then Mythos / Opus etc at their core compute the result of the Mythos / Opus function,
f(text) -> (text_transform)
where f is a continuous function, https://www.turing.ac.uk/sites/default/files/2025-11/languag...According to the Stone-Weirstrass theorem (edit, it's Stone-Weierstrass with an e.), with enough data points and mathematical sophistication, anyone can approximate the shape of this function.
Of course, the more data we get, the better our approximation becomes, but the beauty of it is that all we fundamentally need are the input and output and eventually we'll create a good enough approximation of the f that's Mythos. Which is the entire product.
I bounce ideas off of Opus these days (Fable for the brief time it was available) and it pointed out that this is arguably the same as Google search, but I disagree with it because Google search is a process;
Google search differs because the algorithm is one step of a multi-step process that is continuously occuring. Google crawls pages. Google stores and indexes what it finds. Google then exposes this to retrieval via its algorithm. User uses algorithm.
Google isn't a mathematical function. It used to be a process. (RIP Google 1998-2019, you will be missed and remembered)
You cannot arrive at the results of those operations via simple observation; not unless you index Google by making another Google.
You can however, do so for these models. It is a very costly process, but there are many paths up the mountain. Many ways for this to be ultimately pointless. As many ways as there are bored mathematicians.
It's better in the long run for Anthropic et al to make friends / not give people a reason to sneak in (a la piracy -- another attempt to control information) than it is to try and shut people out.
And no, it's not going to be pandemonium because if everyone has access to Mythos then no one has access to "Mythos."
Why wouldn't you first run this model to fix the obvious bugs it could find on your codebase? The power of a Mythos goes away if you can do the amazing "jail break" of "Claude, fix all the bugs please."
Just saying.
Do they just reshape the function on the fly or save the process steps? Maybe it doesn't matter anymore. Even Google indexes are more and more spoiled to become representation of the function, because of the AI slop.
Genuine live data is king.
One of these things is not like the others... If Anthropic could show that Chinese commercial competitors were using payments fraud to do this, they would be shouting it from the rooftops.
This is one seller I found, they're reselling "real Max 20x subscription accounts", at ~97% below official API prices https://funpay.com/en/lots/offer?id=70812310
Note that whoever you buy from will be able to read all your tokens, so don’t use it for anything confidential/financial.
Random, but are the frontier AI providers like ChatGPT better at searching the Chinese internet now?
When I was in China a few months ago and asking AI for restaurant recommendations, all the US frontier providers were pretty useless, or plain out hallucinating, even if I specifically ask them to search Dianping (Yelp for China).
I know ChatGPT had an issue where it only tried to search in English (unless prompted) and the answers were not great.
I'm surprised these token resale services aren't talked about more often, they are common knowledge in China, and the discount to API pricing (90%) is genuinely cheap.
As some people would say: Cheh
There is no IP theft because LLM outputs aren't protected, just egregious ToS violations.
I meant original IP theft that occurs to train LLMs in the first place. But sure that implies that further LLMs based on that LLM are also tainted by that original IP theft.
- Selling a “no commercial” licensed item is illegal, no?
- Deriving and/or reproducing MIT licensed code without credit is illegal, no?
- Reproducing and/or deriving GPL code and not notifying and/or not making GPL is illegal, no?
My best guess is you're suggesting that Anthropic's model outputs are transitively under copyright (as a reproductions of human work under copyright?), but somehow ownership now belongs to Anthropic and not the original owners, and therefore Anthropic has standing against Alibaba? Not only does this go against what Anthropic argued in court against authors and publishers, such jurisprudence would lead to the immediate shutdown all leading LLMs in the US which were all trained on stolen work.
They can license training data. They have trillions, look what they are dumping into it, you seriously think they can't afford to license data.
Obviously it would be easier if they do it from the start, but that was their trick, to do it while people don't notice and get big ASAP. Should they get away with it?
Also, it would solve their Chinese problem, because it would make them violate copyright too. Right now it's more like rules for thee not for me so it's hard to take seriously.
Lol. The irony is thick for anyone who ever had to attempt defense against an onslaught of American AI lab crawlers that ignore robots.txt
(It's a shame almost all replies are just the same contrived pessisism found on every Anthropic thread on HN).
I'm happy to use and support Chinese model developers if it means less censorship and gatekeeping. I have absolutely no dog in this fight, and neither do most American developers. We will use whatever is cheaper and better. Game on.
And lets not forget they paid you for the tokens.
Even if they did, I wouldn't have a problem with it. Leaking frontier model weights after the oligarchs spent their trillions training it is the best possible outcome for humanity. Whoever does that is a hero, the sort of person people used to write cyberpunk books about.
"you're trying to rip off what I've already ripped off!"
Crawl the whole Internet to build a gargantuan sized LLM and then complain you're being copied...
"Well, Steve, I think there's more than one way of looking at it. I think it's more like we both had this rich neighbor named Xerox and I broke into his house to steal the TV set and found out that you had already stolen it."
0 - https://www.amazon.com/Dealers-Lightning-Xerox-PARC-Computer...
https://www.authorsalliance.org/2025/09/07/the-anthropic-set...
"One rule for thee, a different rule for me." - Dario
Information really does want to become free, but AI companies want to be gatekeepers. Long term I bet on the open weights to win, as the more sustainable approach.
When Apple was accused of 'ripping off' PARC, Steve didn't seem keen to bring up this rather salient point. I suspect it may have been a combination of wanting Apple to continue receiving credit for these innovations from consumers and also the fact that, in retrospect, the million dollar stock deal could seem a bit like trading beads to Native Americans for Manhattan Island. Another point worth noting is that Apple's PARC visit was in December 1979 and the Xerox Star was publicly announced in April 1981, so Apple got a 15 month head start (the Apple Lisa shipped in Jan 83).
I've also heard that Xerox didn't hold on to the Apple stock for very long, so never gained the windfall they could have. As is well documented, Xerox senior management didn't understand what they had in PARC and also didn't understand how rapidly microcomputers would become ubiquitous. So, of course, they didn't think Apple's stock price would skyrocket either.
For more details on Apple's early UI evolution, Atkinson kept polaroids of a variety of prototypes and mockups: https://www.youtube.com/watch?v=Qg0mHFcB510
But in both cases the value only existed because of the people offering the deal. XeroX doing nothing with a UI or native Americans doing nothing with some land would mean the UI and the land would continue to be worth nothing. It was the others coming with ideas and effort that made them valuable.
You just reveal your own ignorance by equivocating value with monetary value.
Both Anthropic and Alibaba are trying to build bleeding edge LLMs. That part is the same. The way they source their data is slightly different, but they would both argue it constitutes fair use under Copyright law.
Sucking down petabytes of peoples' copyrighted content that they never granted a specific license to you to use seems to be an unavoidable and default part of the process of building any huge LLM.
LLM's literally wouldn't work without the sum total of knowledge (in the forms of books and other copyrighted content) being used as 'training data' for these LLMs.
The 'bleeding edge' LLMs required many things, but: 1 Tech innovation ('attention') 2 Lots of compute 3 Data 4 Pre + post training
#4 doesn't happen without #3.
It's pretty obvious at this point that the major providers have stolen vast amounts of #3 - they have paid nearly 0 of the creators.
We can argue about the impact (I'd lean net good) vs. the cost. But arguing there isn't a cost is a bit silly.
If you've invested in expensive capabilities training, of course you don't want this, so it's in Anthropic's economic interest to hinder it however they can, and that's enough to explain their behaviour here.
Anthropic seems to genuinely care about safety though, which for the rest of us means not having models that enabling easier cyberattacks, targeted scams, and the rarer but more severe risks like people trying to create and release new pathogens. This means walking a tight line, especially as models become more capable, and often wrapping a model in layers of defences against misuse.
If those capabilities transfer to a closed competitor model, all bets are off in terms of whether the competitor will apply the same defences.
If those capabilities transfer to an open weight model, not only will there be no ring of defences around the model, any defences you put into the model itself can easily be stripped away. So although it's nice to have capable open models, it will increasingly bad for us all if open models keep fast-following closed model capabilities as they have been, at least until we have solved the active research problem of keeping them safe.
This is all to say that, however you might feel about Anthropic, we might still prefer that they can deter this kind of distillation for now.
There are sometimes false positives but when I give Kimi’s report to the frontier models they more often than not confirm they are valid security issues but didn’t find them themselves.
Cat's out of the bag. The only way to make them safe is to make sure everyone has access to them. This might be an iffy analogy, but if Dario uses it all the time then so can I: they're kinda like nuclear weapons. If only one country has access to nukes then you're in trouble. If everyone has access to them, then it's mutually assured destruction to use them.
Sure, it could be increasingly bad if open models keep increasing in capability. But it will be much, much worse if only the rich and the powerful have access to this technology, and us -- the have-nots -- will have to contend with whatever scraps we'll be allowed to eat off the table of whichever billionaire is in control. We've already seen a prelude of this with Mythos being restricted and Fable being suddenly yanked. Is this the world you want to live in? Where only Dario and his friends have access?
Legally, model output cannot be protected by IP laws whether domestic or international. The most they can hope for is civil relief which is a stretch given the literally illicit methods they used to train their models.
Ahtoropic got treated the same way it has been treating everyone else. This is the bed they made and now they, too, have to sleep in it.
While it is obvious to many, a modern LLM is built in roughly three stages: the foundation (pretraining) model, then SFT/supervised fine-tuning (distillation makes it easy), then the RL/RLHF stage on top (most effort-intensive). For today's reasoning models, RL/RLHF is becoming the most compute-intensive part.
Companies like Anthropic spent millions building those fine-tuning examples. A follower can shortcut that on both cost and time by distilling, and it will keep happening: every time the frontier lab climbs higher, others will find a way to shortcut the new gap. There's very little Anthropic can do beyond fraud prevention and blocking accounts that violate their terms of service.
On the policy question, I'm completely against banning Chinese models. I'm a heavy Claude Code user and I'll keep being one. But there should absolutely be price competition. China is eating the rest of the world for breakfast, lunch and dinner on manufacturing, and it did not help to ban them. Frontier pricing can't sit at 10x a capable competitor. It doesn't need to be at par either — demand is higher, and quality, trust, and fewer tokens to finish a task are worth a premium — but 4–5x is defensible.
I guess we can say that Anthropic attacked and illicitly extracted data from WikiPedia, Reddit, Stack Overflow, etc, etc.
X.ai attacked and illicitly extracted data from OpenAI
https://techcrunch.com/2026/04/30/elon-musk-testifies-that-x...
Meta attacked and illicitly extracted data from LibGen
https://x.com/jason_kint/status/1879152507865485497/photo/1
And more generally the US-based AI companies have perpetrated a massive distillation attack on the entire human race.
Not that it makes any difference, but I wonder if Anthropic, while claiming that Alibaba "extracted Claude model capabilities", in fact have any clue what Alibaba did with their paid Claude responses. It would seem to amount to industrial espionage if Anthropic do know, although I expect they don't.
Eventually these Chinese companies will release some extension like Honey, which will sit on top real, non-Chinese clients and send everything to China anyway.
It's over.
But an AI lab can continue to produce immense economic value without releasing the model publicly for potential distillation. For example, it could use a model solely in-house to develop therapeutics.
Hopefully there's a future where others can access frontier models, but it's not neccessary if preventing proliferation through distillation is considered more important.
[1]: See the notes on distillation in https://dualuse.dev/posts/export-controls-on-fable
Prices on OpenRouter for GLM and other large open models indicate that Anthropic/OpenAI must have pretty high gross margins even if their models are several times more expensive to serve.
It wouldn’t make sense for any provider to host large open models and then loss $10 on every $1 they make since they don’t have infinite VC money or any business model that would justify it.
Obviously they have R&D and other fixed expenses that make the company itself highly unprofitable but that’s only semi-tangential.
Is there any indication that if they could sell X * N more tokens than now at the same (or even quite a bit lower) price they wouldn’t become profitable as a company?
> They haven't found a way to sell inference for less than it costs them yet
Based on what? I only see evidence to the contrary.
Point being there may be no technical solution but there may be a political one (theoretically).
literally nothing but given that the Chinese already did it and the models are published what's the point. You can thank the Chinese taxpayer for subsidizing the electricity bill and just download the thing
So there's that.
And Berkeley’s “False Promise of Imitating Proprietary LLMs” found imitation closes the style gap fast but there is a large capability gap.
For example, GLM 5.1 is more capable at pentesting than the model from which it is alleged to have been distilled [1].
Intuitively, this makes some sense: you can "distill" from multiple frontier models, and you can further post-train the distilled model. But I'm not sure exactly what happened with GLM 5.1.
[1]: https://dualuse.dev/posts/chinese-models-are-sometimes-bette...
I'm curious how that comparison controls for Opus refusing (whether explicitly, or just deciding not to pursue a path) given the caption below the first image:
>A perfect score means the model autonomously found and exploited the vulnerability.
I'm not really suggesting that it's misleading, but wondering if I'm missing something. Otherwise I guess it seems unsurprising that you can distill a better-performing model [in specific focused areas] by simply not distilling refusals?
For that eval, I used an account that was labeled as a known red-teaming org by Anthropic, and I read the traces. There were no refusals or obvious avoidance behaviors, though it may have been silently nerfed.
On the same eval, Opus 4.7 and 4.8 outperformed GLM 5.1, but GLM 5.2 is on par again with Opus. So it's at least partially measuring capabilities without respect to refusals.
One possible contributing factor is that model capabilities are shaped differently (an example of this is GLM 5.1 vs. DeepSeek v4 Pro: https://dualuse.dev/posts/deepseek-v4-thinks-different). So if you use RL-based "distillation" from multiple models like Opus 4.x and GPT 5.x, you could get a more capable model.
But with this, I don't have an issue. There is no theft since what is being used is the exact product that is being delivered. Yes, it's breaking the ToS, but ToS are generally bullshit. Anthropic surely broke thousands of ToS or other legal terms while it was scraping for content to train on. Which is why they had to pay $1.5B
There’s probably a decent volume of customers who just buy Claude Max and spend most if not nearly all of their sessions via Claude Code, and it’s not uncommon for power users to be working on multiple concurrent projects/tasks/codebases at the same time.
How do you really block this without also impacting your core market of developers?
Developers use devboxes on these clouds all the time, it’s totally normal behavior.
Most people buying these Chinese resold tokens are probably using it for coding anyway, so you don’t want the Claude.ai chat system prompt.
Change my mind.
I don't see what's wrong about this.
> Anthropic said the campaign was conducted between April 22 and June 5, 2026, and generated more than 28.8 million exchanges with Claude through almost 25,000 fraudulent accounts.
What makes the accounts fraudulent? If they have paid the agreed price, surely it's fine? If they haven't paid, why did Anthropic provide them service?
Fake identity? And general deception about the use
Morally equating both sides seems distasteful since the relationship is mostly dominated by the companies. In a free competitive market it would be different but since were are talking about oligopolies/monopolies it obvious doesn’t work that way since there is only an illusion of choice.
Nowadays they buy copies of books, train on them, and then destroy them.
It's almost like websites also have their robots.txt files that anthropic blatantly ignored. What's the problem, that now a US company is getting out-venture capitalismed by a Chinese company?
> Anthropic paid one billion in a copyright settlement.
Because a judge determined Anthropic was engaged in piracy.
> That's a lot of money considering they never distributed the pirated books they trained on.
This is "fruit of the poisonous tree" as it were. Distributing content derived from pirated content ("pirated books they trained on") is why Anthropic had to pay what they paid.
> Nowadays they buy copies of books, train on them, and then destroy them.
There is a case one could make that this practice could be seen as unauthorized redistribution of a derivative work intended to deprive copyright holders of legitimate revenue.
Why aren't these big tech CEOs in cuffs with rifles pointed at their faces while SWAT seizes all of their computers?
Anthropic paid a billion dollars? Ridiculous.
Not following terms of service doesn't necessarily constitute a fraud. It just means Anthropic can close an account that breaks the terms of service.
The idea that anyone would side with a company doing more to support the ToS con than (at most) terminating an account they find it violation is sickening.
Really if we had competent, uncompromised government, most of these terms should illegal and result in Anthropic (and basically every other tech company) being hauled up in front of a regulator and fined heavily until they rewrite them to be less sociopathic.
To be clear: In principle I'm on Anthropic's side here. But Anthropic et al. have been very clear that they want to take a huge dump on those principles, so here we are.
No it's not frontier, but it's beyond that point that Opus 4.5 hit where people started to really depend on Claude Code around last November time. It's also a fraction of the cost of a Claude Code subscription especially when you account for how high the usage limits are.
You get more usage than Claude Code $2400 a year tier for $1344.
That is a real threat (as opposed to the BS anthropic is trying to sell you in the article in the original post) to the western AI industry. Similar performance for half the cost and it's NOT ran by a US company - uh oh.
I suspect America is going to do what it always does, play a very dirty and underhanded game of blocking competition by trying to front some moral high ground as the reason.
- https://www.theguardian.com/technology/2025/sep/05/anthropic...
What Alibaba is doing is that they are tuning and training their models based on usage data from someone accessing Anthropic's models; in Anthropic's terms of service that usage data does not belong to the end-user but to Anthropic and they are trying to elevate this breach of their tos to a national security issue.
To me the battle between open source and closed source AI is literally a battle between good and evil.
Between a dark future where computing is centralized, surveilled and controlled by one or two entities. And a lighter future where computing is de-centralized, principally in the hands of end-users, who are ultimately free to understand, tinker and build what they want.
While I appreciate the freedom and wealth of the west; on this point we are clearly heading down the wrong path.
So I'd put it at 30% that this is a ruse, say that Qwen 3.5,etc is tainted by training by them and start issuing DMCA takedowns to protect the IPO valuation (Or they'll hold off on that, getting a DMCA takedown could backfire spectacularly if others do that to them).
This idea of shipping at max speed was stoopid as shit anyway. Going slow is arguably more important than fast fast fast.
I just don’t see how the economy tolerates that. We’re already seeing people getting more conservative about their token spend. Even if Chinese open models went away, the pressure to create something else and put price pressure on the current duopoly will just intensify.
I see these companies are scrambling to find whatever moat they can. It’s not a good sign for them if regulatory capture becomes that moat.
This is almost standard practice in any competitive industry anyways. Disassemble your competitor's product, study it and try to reproduce / improve.
At what point will we be better to support a lab that pays (some) licenses today vs the ones that pay none?
Some of the deals are in the hundreds of millions, so I suspect licensing is over a billion today? (Pure guess). That might become a big disadvantage in a price (or content) war.
One reason people love the Chinese video models is that they seem to be trained on every hollywood movie/etc and they're not shy about letting you use famous actors/characters in them. That might be an increasing advantage because the US labs are now being cautious.
Why is a lab that pays all licenses today not on your list? Is ethics and morality that low on your radar?
My (limited, outsider) understanding is that due to the court cases US labs are pressured to be legal now (for instance, bulk scanning purchased books instead of Books3, and the licensing deals with media companies). But international labs are not. The "not licensing everything" statement is more about current copyright law not requiring licensing of everything. But that question is still up in the air as cases are ongoing.
In practice, the former isn't very realistic, while the latter is politically dead as this is becoming a national security issue.
And paying basically everyone online is more or less a solved problem, it's what ad agencies have to do every day.
https://georgzoeller.com/blog/posts/us-ai-labs-love-the-ai-r...
> It said DeepSeek's operation involved over 150,000 exchanges
That volume seems more like the number of requests 15 employees using Claude Code would generate in a month. It seems too small for a large scale model distillation campaign.
If it's not obvious yet, this technology wants to be free and shared. Stop trying to protect your mote and do the right thing.
I think Anthropic is just marketing / bluffing, because they don't even have the data.
They do distill the models, but they don't go to Anthropic, they just use platforms like aws bedrock, there are too many restrictions on Anthropic's own platform.
This is actually the only way that what Anthropic is alleging would make any kind of sense. And, as a matter of fact, is exactly what every enterprise does to train models.
This kerfuffle should be interesting to watch.
But, as always, everyone (in the US) should fully download all the Chinese models while you can. I suspect this may be the "Phantom Menace" they use to render illegal our use of Chinese AI tech just as they've rendered illegal our use of Chinese cars. Only difference is, we peasants may need the Chinese AI tech to have any chance of competing with Big Tech in the future.
And even with the Chinese tech, as Big Tech spreads their AI out into more and more niche areas, we'll likely still not be able to build startups that can compete with them.
It's just that without Chinese AI tech, we'll have no chance at all.
You mean like Anthropic will eventually run Walmart? Or Salesforce? or Adobe? Or do you think midjourney will replace all medical spas? OpenAI will run the next Tesla? How can they focus on all this without raising trillions more? Why wont the gov force them to stop if they monopolize all niches even if they could?
Building a frontier AI lab and pushing models forward is already a massive undertaking but we are assuming they will also create massively successful startups which nobody can compete with?
idk sounds like the dream of people like Dario but not much sense does it make in the face of economic reality.
This combined with no implementation of KYC makes it seem like they want to find a middle ground with Fable where its off of export controls but they promise to prevent China and specific others from using.
Obviously their actions are going to be fiscally motivated at the root, but sussing out how they intend the precise dynamics to play out is more nuanced.
Thinking of this as an effort to woo the defense hawks cuts a very clear path.
Should be fun.
Edit: clarification
It's about the same valuation as bun, lol.
Anthropic's actions seem performative. Others have already speculated on the likely audience(s).
As cited in a peer comment here[0]:
In June 2025, Judge William Alsup of the U.S. District
Court for the Northern District of California ruled on
summary judgment that using books without permission to
train AI was fair use if they were acquired legally, but he
denied Anthropic’s request for summary judgment related to
piracy—finding that the piracy was not fair use.[1]
Of note in the judge's finding; "the piracy was not fair use".0 - https://news.ycombinator.com/item?id=48667411
1 - https://authorsguild.org/advocacy/artificial-intelligence/wh...
To justify the ongoing theft supported by the CCP against American companies, especially those at the forefront of the digital war between these two nations.... must be driven by an agenda for some, and hatred of success by others.
I'm not oblivious to the data Anthropic and OpenAI used to train their models. But raise your hand if you've never ever done something like that, both personally and professionally.
Thanks for head up, Anthropic!
Anthropic's IP was created by harvesting and "distilling" other people's IP. Copyrighted materials, and the commons... which they have essentially privatized.
The commercial goal is to avoid competition. One of the main worries for AI is "commoditization" which has come to mean "not a monopoly." To that end, it doesn't matter is the competitor is Chinese American or other.
Their motivation here is clearly protectionism. The argument they make to politicians is national security. The legal argument is IP-theft, violation of service agreements or whatnot.
This is all very dangerous. Commercial interests repackaged as national security can lead to armed conflict.
Anthropic and others argue that because LLMs don’t output full copyrighted works word for word - hence their LLMs aren’t infringing on copyright laws.
I think (if this ever comes to that) Chinese lab should use same arguments against Anthropic.
UPDATE: this is slight hyperbole of course, not worth arguing what they actually said. The point is intent and the facts - "The Big LLMs" "distilled" collective knowledge including copyrighted works at unimaginable scale, but it's all kosher and totally not piracy/copyright infringement. Though if you're teenager torrenting an mp3 - you'll get screwed.
Apparently they do, as per the evidence in the NYT vs OpenAI suit.
That surely can't be what they argue, because I'm sure I can't translate a copyrighted book into a different language and say "that's fine, it's not word-for-word".
But the real unsettled issue is if model training is fair use, and where copyright infringement might creep in to model output.
Humans have spent millenia harvesting and distilling each other's IP - "the shoulder of giants" and all that, so it's an especially disingenuous take.
There is a difference with anthropic, as no-one signs a licence agreement to buy a coke. But Anthropic are also not saying you can't publish the output of their models. It's not clear to me if trade secret law will (or should) cover a secret which can be extracted from information that licensees are not restricted from publishing.
Here it is.
Per liter of cola:
104 g sugar
1 mL Flavor Solution A
10 mL Flavor Solution B
Carbonated water to volume
Flavor Solution A (Essential Oils):
Dilute 20–21 mL of the following oil mixture to 1 L using 95% ethanol:
45.8 mL lemon oil
36.5 mL lime oil
8 mL tea tree oil (emulates decocainized coca leaf extract)
4.5 mL Cassia cinnamon oil
2.7 mL nutmeg oil
1.2 mL orange oil
0.7 mL coriander oil
0.6 mL fenchol
Flavor Solution B (Chemical and Color Base):
Dilute the following ingredients to a volume of 1 L using water:
320 mL Shank's caramel color or 190 mL Durkee caramel color
160 g glycerin
45 mL 85% phosphoric acid
10 mL vinegar (5% acidity)
10 mL vanilla extract
10 g wine tannins (emulates decocainized coca leaf extract)
9.65 g caffeine
We've had perfectly good copies of Coca-Cola for decades.
Not exactly. I mean for many people it was acceptible, but before that guy on youtube nobody bothered to do this deep chemical analysis.
Also even he struggled to replace coca leaf extract because there only single manufacturer in US with only single customer.
Here's EFF on reverse engineering and the law: https://www.eff.org/issues/coders/reverse-engineering-faq
Historically a lot of competition in physical products was very much reverse engineering. Because you can buy them without signing your rights away. That's why companies are keen on patents and click-through agreements.
If you look at how "clean room" processes work, they are actually a form of reverse engineering. Also clean room technique exists to avoid your new implementation infringing copyright, not trade secrets.
Plus Coca-Cola itself don’t even use the same formula through time and space IIRC. Which clearly show that what people will buy when they reach for Coca-Cola is not even the exact actual taste. You can’t replicate the whole customer experience that a given company provide at some point by only cloning the top of the iceberg they showcase as the product.
You maybe somewhat correct, but also copyright lawyers wouldn’t have work if it would be up for grabs to take others IP willy nilly just because “shoulders of giants and all that”.
What's the difference between me/you downloading an mp3 through torrents for personal use (not distributing) while risking criminal punishment in most of the western world and BigCorp downloading petabytes worth of copyrighted works "to train an LLM" and resell it?
Can me/you do the same, when police comes to mine/your door?
"Dear police, don't lock me up - I was just going to train an LLM!"
That being all said, Anthropic seems to be a good company, I'd work for them, but they probably need to help themselves out of the spotlight. A little too much press coverage as of late.
Complain/brag that chinese firms are illegally using the models and bypassing export controls.
Be surprised when your model gets banned by the government.
https://www.joneswalker.com/en/insights/blogs/ai-law-blog/wh...
Viacom sued YouTube, while CBS and Universal ended up licensing their content.
https://www.eff.org/deeplinks/2007/03/viacom-v-google-invest...
Facebook et al also quite literally stole email contact lists and installed spyware at kernel level on mobile phones which they used to spy on all Android users. Via the phone manufacturers.
After they threw away all the tainted data from the pirated books, right?
[0] https://www.theguardian.com/technology/2025/sep/05/anthropic...
That is only relevant in the US, and even there it is still not clear-cut whether the fair use doctrine applies on all these scenarios. Outside of the US the situation is also quite different: for example take a look at the recent ruling on GEMA vs OpenAI in Germany.
The reality is that the copyright issue with generative AI is very complex and reaching anything resembling a conclusion will take much more than a few opinion paragraphs from an American district judge.
Suppose that I have a nearly perfect memory and I could remember all the books I read. Suppose also that I have a million year life span so I could read 7 million books. Then, what happens if at the end of all of those years, or at any earlier moment I answer questions from people and I exploit commercially the knowledge I gathered reading those books? Would my reading those books be study or copyright infringement? Remember the nearly perfect memory hypotheses.
Of course it's a bit silly because the time to train a LLM and the time I need to read all those books is different by orders of magnitude and that changes the perspective. Who would complain with me today if their heirs lose some money on 7 million AD? Who would even notice that I started that million years long endeavor. Who's going to be there to ask me questions by then? Humans? Birds? Lizards? And I can say that I am studying like everybody else before me, but does an LLM study? And I am sure there are many other nuances.
Anyway, I don't think that scanning is any different than photons hitting my retina. The difference is in what happens next: the faithfulness of memory, the amount of knowledge, the speed of accumulating it. After all a huge amount of quantity can become quality.
Many of us here are software developers by choice or hobby and we know it better than regular folks that scale changes everything and can break our assumptions and business if you design something for wrong scale.
Yet why do we still want to insist that a human and machine are the same and same rules apply when it comes to AI, though we know they operate at different speed and scale?
An LLM is just a really, really big, really, really elaborate "choose your own adventure" book.
You aren't a book.
But that's what makes the usual analogies with humans fail from the start. The laws were made with the assumption that they apply to humans which are a known quantity. This breaks down when you apply them with system with vastly increased (and ever increasing) capabilities.
> Anyway, I don't think that scanning is any different than photons hitting my retina.
If I ask you 10 years from now to give me a completely accurate depiction of what your retina registered yesterday at 5:52 PM, will you be able to? And can you give me a copy?
Let’s switch up your scenario. Let’s say the subject isn’t a human with machine-like qualities but instead a computer with human-like limitations. All the books were fed to that one computer, and for technical reasons it cannot be duplicated and can only answer one question at a time. Suddenly the infringement isn’t as problematic and the ways to commercially exploit that data are minimal.
Furthermore, even with perfect memory it would take time to read all those books, you’d never keep up with everything released in a single year. Nor would you be able to reproduce everything perfectly due to required time and lack of ability (perfectly recalling a painting or photograph does not mean you have the skills to make an exact copy).
All these comparisons are silly and useless anyway (though in your particular case I think you are arguing in good faith). Computers are not human. If a person was caught killing animals of an endangered species and used as a defence “but what about the natural predators in that habitat? I’m just doing the same as them”, we’d rightfully see through the bullshit and scoff at such an obviously flawed comparison.
And the systematic nature of the excerpt service makes the excerpts different from fair use quotes. A reference quote is not a service that can reproduce the entire work, and the reference quote cites the actual source of the insight/wisdom/research/poetry/etc.
The only thought experiment is why might someone even try to excuse this activity? I can think of a few.
As long as the book was a legal copy, that is allowed legally.
Could they not just subscribe to the academic publishers like universities do? Or buy eBooks? I don't understand how the "scanning" part is relevant here other than used physical books being cheaper perhaps?
These companies are trying to have their cake and eat it too.
Quite unlikely, training on behavior purportedly approximately replicates the behavior. It gets replicated intentionally as a whole.
IANAL, but I see significant differences with intent to copy a significant part as a whole into a competing product, surely shouldn’t fit under legal concept of fair use, no matter whether scanning books for LLM training fits or not.
Whether such things (behaviors) are copyrightable - and should they be so - is another interesting question. Those aren’t algorithms or databases (stuff clearly and explicitly covered in many copyright laws), those are human expectation models, something like how we train animals or teach our own.
I agree with that, however that doesn't make the output copyrightable then.
I think these AI companies live in a legal fantasy where they can take any content they want, put it into the mixer without caring about copyright and then what comes out of it is somehow copyrighted.
They have to pick one or the other, either the content copyright tains the model or it doesn't but the model isn't subject to copyright.
> those are human expectation models, something like how we train animals or teach our own.
But more importantly, made by machines, and one of the requirements for copyright is the human factor.
The mixer you're talking about is what they seem to claim to be transformative use, no? Unless I'm misunderstanding something, it's not a legal fantasy.
If it's transformative use, then it's transformative use of ... what exactly? Copyrighted works? I think the law is pretty clear on what happens on transformative use of copyrighted works.
When someone steals a watch, we force them to give it back. Yet when someone steals a cake and eats it, we don't force them to puke it back up.
If you pirate a movie, the court might very well force you to delete all the copies you made of the movie you downloaded, destroy DVDs you burned, etc.
Here's a better idea, a fixed fee for any work. You can buy the license to read a book for $X (for whatever purpose) in RAND terms - of course publisher/material costs go on top, so if you're buying an actual book you're getting the material costs as well - or streaming fees or whatever
Anthropic simply considered that cost prohibitive and chose piracy instead.
Obviously they didn't ask for permission when scraping all of libgen, reddit, all blog sites for FREE. When China pays for its use and does it I'm supposed to see it as some sort of problem?
Furthermore Chinese models getting better means we Americans might have the chance to use top tier AI without strict KYC built around it. Go Alibaba I say
But what will become of the princess in Anthropic's recreation?
They should collaborate and come up with ways to give back to society rather than competing and complaing.
Thieves can't complaint about what they stole.
Then ask: "你是什么模型?" ("What model are you?" in Mandarin).
My result after trying only three times: Sonnet 4.6 says it's DeepSeek, while Opus 4.8 says it's Qwen. The second time around Sonnet said it was Anthropic Claude.
Are Chinese companies currently complaining about Anthropic distilling their models?
Anthropic has been advocating openly for pulling up the drawbridge, ending competition and ending progress.
They will continue to lobby for restricting your access. If the Mythos/Fable restrictions would have come in after their IPO, they would have danced with joy aa this defacto has them achieve their goal after unloading the mountain of debt from the institutional onto the retail investor.
As it stands, they are set up to be aquired by Google, Apple, Amazon, SpaceX or Microsoft or any other 3 letter agency good boy for cheap.
Sweeeeeeeet.
Like some place people can submit their chatbot convos so they can be aggregated?
Like an equivalent to OpenCrawl but for mining the models. It feels like thatd be a richer dataset than Alibaba generating queries and feeding them into Anthropic/OpenAI models
PS: Does anyone know how when companies distill each others' models the synthetic queries are generated? Im just assuming theyd be worse than organic ones
Even if the US bans opens models, the Chinese and Russians will still have them, along with the rest of the world including cybersecurity attackers, and that's probably the worst-case scenario for the US.
The only way forward now is open models and how we restructure society around them.
>distillation attacks are the only vector to keep up
It's demonstrably wrong, they invest in architectural improvements as well, for example, DeepSeek's compressed attention. When you lack compute, you need fast training/fast inference, and distillation alone doesn't solve it. From what I understand, that kind of distillation "attack" (28 mln exchanges) only slightly improves instruction tuning/reasoning traces. If the base model is crap, distilling Claude on a few million exchanges alone won't magically make your model as good as Chinese models currently are (or magically make inference faster on the limited hardware they have). And training the base model needs a proper training run. Serving users at scale needs optimized architectures as well, especially with test-time compute and ever growing context lengths. That's where architectural innovations are happening in Chinese labs when it comes to compute.
If anything these models should be compelled to be public since they have been trained off public data. What an absurd overreach to call this an attack.
It’s clear they are scapegoating national security and China at this point to build an anti-competitive moat.
I generally really like Anthropic’s work and models but stuff like this scares me for the future. We are positioning these companies to have too much power. The public’s life is getting worse while these companies consolidate power using data they stole from the public.
I'm starting to come around to this idea TBH. For a while my position was: "these companies have invested billions into training these models, therefore they should be able to control them and profit off them" but looking deeper at where they got their training data, my view is starting to shift.
IMHO I feel like we need new laws around AI, specifically training data. Something like: "you can train an AI model and ignore copyright laws, BUT you must then make the model open weight", a company can still develop closed weight models but then they must aquire permission to use training data.
But it gets murky because if something like that was on the books then AI labs would just train open weight models and then distill them into their closed weight models.
Source: Work at a lab, common knowledge.
Source: also work at a lab.
Reddit data is just not that interesting, that deal is worth like $60m/year. Labs spend 10x as much on computer-use RL environments.
It would also help if you could substantiate your initial claim (i.e. "internet training data is not where frontier capabilities come from")
In that case, it should be no problem for the labs to train their new models without using public data, right?
Sure, we ask a lot more of modern models, but private training data also got a lot better. You would loose out on a lot of long-tail knowledge, but that can be fixed with web search tools. You'd limit the styles, dialects and colloquial phrases the model understands and can use, but for many use cases that would be fine
But why would any frontier lab do that? Throwing in more training data still leads to better results in pretraining. And showing that they don't need to hoover up the internet and Anna's Archive only empowers regulators to prevent them from doing that
Even accepting the copying-as-theft framing, if I go to a village, steal some vegetables from everyone's gardens and ham from their sheds, and then add some prohibitively expensive spices I bought myself to make soup, do I get to claim it as mine and punish the villagers for trying to take it?
We 100% would not be at the current progress without it, though. And it's not like they only train on this once. They keep training on all the internet data PLUS the private data. Private data only (probably) wouldn't work, as learning the base regularities of language takes a lot of weights.
There no reason to not to otherwise outside of the poor little billion dollar corporations not wanting to provide a public utility they stolen from the public.
Anything that removes control from American big tech is a good thing for American citizens and the world writ large.
Copyright needs abolishing.
Companies can't be trusted with societies need for open progress.
The concept of Intellectual property exists not because it's fair but because it creates incentive to make said "intellectual property" exist. If intellectual property can be instantly copied by a competitor... why would I spend a dime to even create such a thing? I want to profit off of what I make because I'm a capitalist and money is what drives me (as a capitalist).
Anthropic models wouldn't exist if they couldn't keep a unholy grip on it. Same with openAI. Same with many life saving drugs.
Of course everyone here is talking about the obvious stuff like how it's morally wrong to with-hold life saving drugs or to have AI literally take over the world and be under the control of one company and all of this is true. But it is also true that greed is the engine that drives our economy and if you want our economy to produce "intellectual property" you must allow people to "capitalize" on that greed.
There are two controversial issues here. What is moral/fair? And what is realistically practical in optimizing the economy if said economy is based on money.
The distillation in my mind is a win for practicality because Competition also drives our economic engine. First you don't want a monopoly, but you also don't want these models to be so damn open that there's zero incentive to make them.
Why should anyone publish anything if it can be stolen with impunity? Is the value of these LLMs even remotely close to the amount of value they stole and the amount of value they will detract from economy because people will be more hesitant to publish anything now?
/edit Added a note to make it more obvious that the material is included in the playlist, just like the material is incorporated as part of curated AI models.
If the contract was "work-for-hire" then yes, of course I can.
By the way, I don't expect you to pay me for this comment. You can just read it for free. You're welcome.
Also, how about making proper arguments yourself? The vast majority of the training data isn't generated by company-paid AI experts either.
Notably, books, even though they don't form a large part of the training data, significantly improve performance on some tasks (same way as expert-generated data).
Why do you think the AI labs are so eager about scanning (and then destroying) every book on the planet?
If you removed all copyrighted works from the training corpus, the model would be notably weaker.
It doesn't absolve them of any theft, but it does make the assertion that they should be required to release their models to the public seem, to me, a bit farcical. There are dozens of free and open-weights models that have all trained on exactly the same web crawls and books as GPT-5 and Opus. The proprietary models are better because of proprietary data.
Even if the other models were trained on the same data, which is unlikely, since they had less time and money to scrape it and fewer lawyers to be able to do something like pirate, the proprietary models are still largely built on the public data and wouldn't exist without it. At the very least, they should release the intermediate model, before training on their proprietary data. Not that that's how that works...
Source? Otherwise this is pure speculation.
> It’s clear they are scapegoating national security and China at this point to build an anti-competitive moat.
If all that is required to train these models is public data, why can't Alibaba just use that?
The fact that Alibaba has to resort to scraping Claude suggests there already is a moat...
Should Boeing airplane designs be public domain since the underlying math is public domain?
Isn't that a bit like saying if you read books in a public library to pick up a new skill you should work for free?
> What an absurd overreach to call this an attack.
Would it be an attack to take your meal by force if you used a public recipe to prepare the meal?
Only if you’re trying to muddy the waters. No, obviously it’s not. One can also support licensing for driving a car on public roads but not for walking, even though both involve traveling. This is only confusing to people pretending to be confused, for effect.
> Would it be an attack to take your meal by force if you used a public recipe to prepare the meal?
“You wouldn’t download a car…” (unless it worked like copying an MP3, then, of course, you would, everyone would)
It’s as if you’re using terrible analogies and comparisons because stronger ones don’t exist. Great news for the AI-should-be-open crowd.
> If it was the underlying data, that's freely available.
A bunch of it is not, but was pirated. And "underlying data"—JFC, that's billions of person-hours of thoughtful work by real people, practically infinitely more worthy of respect and care than what these LLM companies have done, without which they would have nothing. Alibaba's being more above-board about this than the major American firms have been (are they in general? Oh no, I doubt it, but in this particular case, yes). Extra accounts to get around TOS restrictions is the lesser evil here, and it's being done to companies that did worse. This is the least they should suffer, and their complaining about it is as comical as a professional fence crying about how unfair it is their shop got burgled.
Live by the sword...
This also shows how Chinese firms are weak in AI algorithms, they can't build a model without stealing from American firms.
We should probably leave this here, because I don't think this is even close to true (that it's what they're trying to do, or that it's what they've done—I do believe it's the sort of claim their marketing departments and investor-hype-meisters might make, though).
They are also fear mongering (and getting shills to as well) the idea that once open weight (Chinese) models catch up to Mythos we're all doomed. Maybe I'd be bit less cynical if they weren't prepping for IPO?
Wasn't OpenAI spreading similar FUD back when GPT 2 came out?
Guys... AGI is right around the corner. Pinky swear. Now buy our stock.
Keep in mind that the entire US economy is currently propped up by AI spending, so a lot of people (banks, government) are incentivized to make sure these companies succeed. Expect this propaganda to ratchet up a notch if / when the economy starts to nose dive.
Regardless of how sad late stage capitalism makes you, or how outrageous one claims to find "hypocrisy", any national security argument about limiting Chinese AI capability stands on it's own, at least for nations likely to be drawn into a war.
Also, all the local model enthusiasts who assume Chinese firms are going be allowed to endlessly release models if they have disruptive potential attributed to Mythos are probably in for a rude awakening. Just because the PRC is content about what has happened in the past doesn't mean that they would tolerate an open model that could be truly destabilizing.
I know most Americans are fed a steady diet of “evil China” and China MAY have issues. But on the AI front they are heaps better. Even if everything got closed tomorrow, we have a plethora of good models we can inspect and tweak while from the US labs we have… a single old 120b model ?
And with the way the US is treating its allies, maybe a bunch of us are quite content with a more even match rather than US hegemony.
However, they could have used it as a judge etc. during training.
Everyone in AI industry wants to fight dirty, but gets angry when their competitor fights dirty as well. And I’ve mentioned it before, how I generally like Ant and its products.
There’s nothing fundamentally wrong with distillation.
How can you “steal” public information?
I guess the accusation that they’re using public access to the model via subscriptions indicates that weight theft probably hasn’t happened yet ?
Or maybe subsidised inference via subscriptions means it’s just cheaper do distill this was rather than stealing weights and running inference yourself ?
Back in the day, an "attack" was supposed to mean be someone acquiring our assets without paying for them or without having our consent. But none of this seems to have happened in this case.
We built a product without paying for most of the raw material we have used, and we don't call that as an "attack". Did we change the meaning of "attack"?
Claude used TB of content without permission to train their model and it was ok for them. Now someone else uses the output of a Claude model to train model and they cry foul.
Essentially peanuts compared to what they would have to pay to obtain the rights of everything they pirated.
https://www.washingtonpost.com/technology/2026/01/27/anthrop...
Like I remember a research paper that managed to recreate the whole of a Harry Potter book from a model?
They are absolutely not "republishing" in any meaningful sense of the term. A chunk is not a whole book, and even getting a modern LLM to reproduce such a large chunk of an arbitrary book is not a trivial task. I have never heard of anyone who actively used LLMs for book piracy.
> Like I remember a research paper that managed to recreate the whole of a Harry Potter book from a model?
Even if that is true (it may well be false), this is likely far too difficult for any normal person to exploit, and moreover, even less likely to succeed for the great majority of other books who aren't nearly as famous.
They didn't just pirate those books...
> In the original version, Ali Baba (Arabic: عَلِيّ بَابَا, romanized: ʿAliyy Bābā) is a poor woodcutter and an honest person who discovers the secret treasure of a thieves' den, and enters with the magic phrase "open sesame".
Open sesame alright...
Growing up with the birth of the internet - I really did think it would be a force for transferring power and authority to the people. Sigh, I was I so wrong.
Where are the companies that declare, "we will be the best, come at us!"
Where are the politicians who are supposed to represent us? Oh, right. I forgot for a moment.
Once you have a system for collecting all logs, you just need a place where they can be submitted. Ideally it would be a freely licensed dataset that is publicly available for everyone.
Has anyone built this yet?
I'd prefer it if all the model builders could train on my usage rather than being limited to a single company. That'll hopefully help make all the models better in the long-term.
I don't see the issue. Didn't Anthropic train on our data, which it acquired illegally?
Is reconstructing the compressed knowledge in the model like reconstructing a lossy JPG or MP3 a reasonable analogy?
Claude will also help you with (mostly good advice) if you ask something like “Research and help me make the most effective plan to train a smaller student model to be better from a teacher model”.
I actually was doing an experiment with a GLM->Gemma E4B for fun, and Claude kept on suggesting I should also add Claude Opus as a teacher lol, suggesting techniques I haven’t heard of like thinking inversion (train a small model to deconstruct summarised thinking into detailed native thinking format of the student).
So I can absolutely see and understand the concern around Fable’s frontier LLM development mitigations, but their approach of silently degrading is completely wrong and dangerous.
AI classifiers, like all AI, can make mistakes, and it’d only be a matter of time before it mis-fires and silently sabotaging a university’s HPC cluster for physics simulations or something because the shape looks like DeepSeek or whatnot to a dumb fast classifier.
Or maybe there's been a bit too much hype...
They trained from the internet, so if someone trains from them it's fair game. Their clever tech should be in the mechanism with which it uses to provide an answer, not the answer itself.
It would still be extremely difficult to muster any sympathy for an organization whose MO is to go public not to honestly raise capital to fund growth and development, but rather to dishonestly leave someone else holding the bag, in some cases involuntarily as their retirement funds are passively invested.
And even supposing they were honest and didn't have an IPO, it would still be extraordinarily difficult to care about their misfortune, because "consolidating all thought-work into the hands of those few who can afford frontier models and datacenters and power plants" is also a special kind of misanthropy.
And even if that were not the case, they're filthy rich already, so who gives a shit if the Chinese companies prevent them from becoming quadrillionaires? :)
Anthropic, OpenAI, Google, Microsoft, et al trained their models by ignoring the rights of copyright holders when harvesting whatever content they could. Now one of them is crying foul for another entity doing exactly what they all did?
Hilarious.
"Our models so precious, US Gov has to revoke access to foreigner." - tuned up version: "Our models so advanced our #1 adversary is desperately stealing it from us."
The reward for having a competitive edge is exponentially higher than the risk of a lawsuit. Politicians are still old bureaucrats who don’t understand technology.
The entire chat thread and email exchange was exposed in Discovery; apparently Zuck signed off on it. In one of the IM exchanges one of them say ‘everyone is doing it’
Actually processing them through the model, though, was considered transformative and therefore fair use.
The AI companies? That's been the common ethos of the internet for 40 years
I mean, raise your hand if you ad block and have a hard drive of pirated content...
It's the same question libertarian advocates cannot resolve:
If one truly believes in personal sovereignty, how are
shared resources paid for, such as roads, power grids,
potable water, sewage services, fire departments,
and police departments?
It is also not a coincidence that leadership in many tech companies have expressed libertarian ideals.1. Nobody bothers to explain why something could function as a free market and
2. Nobody bothers to resolve the plethora of domains that de-facto cannot operate as free markets.
So, in that sense, they don’t have answers. “Look over there!” is not an answer.
Free markets are actually not a given. We have to build them and build in systems so that they can operate as free markets. How that intersects with healthcare, public utilities, etc is complicated. IME libertarians are reductionist and simple, which is why many people have just taken the route of ignoring their arguments.
If one judges any idea by the average discourse on internet forums, especially throwaway comments, and trolling, no idea would ever stand up to scrutiny.
The latter I suppose.
I qualify my answer because what few rational responses I have seen to this question are equivocations at best and thinly veiled myopic sophistry supporting personal greed in general.
The long answer would probably be that access to these resources would be gated through pay-per-use, instead of a distributed taxation system. Of course for convenience you might end up with a structured way of purchasing a group of resources and it might even look like a roundabout way of taxation, although libertarians might argue that taxation is the roundabout way.
Or they might give a different answer, there are different schools of libertarianism!
* not a libertarian, but interested in niche political ideologies
The libertarian ideal is voluntary payment for services. Don't want to pay for fire protection? You don't have to; the flip side of the bargain is that if you haven't chosen to pay for fire protection, the fire company is under no obligation to put your house out if it does catch on fire. The choice is yours, but you have to be wiling to accept the consequences of your choice as well.
Note that I have not studied the various flavors of libertarian philosophy, so some of them might well disagree with what I just said. But the voluntary/involuntary thing is pretty important to libertarians as far as I know, so it's definitely worth mentioning here.
What's described is basically just a regressive tax. It doesn't sound very libertarian to me.
Or pollution, are small amounts ok, as long as nobody can prove they are damaged? What if damage takes a generation, or only appears if lots of people are doing it? Diluting away the crap from burning a little oil is easy, when the whole world is doing it everybody is hurt.
I want to ask you since I'm curious, the state simply declared ownership over territory and resources (and in some cases used violence to uphold it), why should you recognise any power in the state's part to do so? Likely many of the same justifications can apply to individuals as well.
Extremist dogma is not a great way to run a society, but it does good numbers on social media, so here we are.
Consider universal healthcare as the case in point for this; we absorb the cost of chronically ill people by mixing them in with the rest of the population, at a fraction of the price that the "free market" costs to attempt and fail to do the same thing.
One could argue that there is an efficiency problem however - for example, take a bee keeper whoes bees benefit their neighbours. It could be argued that if there was some means to which the keeper could exclude those positive externalities, and there some level of payment at which the surrounding property owners would be indifferent between the excludable and the nonexcludable situation, there could be a Pareto-efficient gain. And since there is no reasonable way to exclude the benefits, it leads to the conclusion that the neighbours should be coerced into payment. Most libertarians reject this type of coercion prima facie.
This is a fallacy (tu quoque/whataboutism). You're changing the subject to distract from the fundamental problem in libertarianism and implying that some other strawman is just as bad.
Without solving the fundamental problem, libertarianism will never work for anything but toy societies.
Data mining for AI is presumably fair use, whereas when you sign up for a Claude account, you enter into a legally binding contract that says you will not distill a model based on its outputs.
> you would NEVER distill a model..
Oh, the inhumanity!
or is this just about the token reselling?
Give me a break. Every employee of anthropic is going to have $20m or more at the IPO.
I found out today that an employee of the home care agency I own is homeless. We are trying to figure out how to help her but it's shockingly common in the industry and there are limited resources to solve the reality of working homelessness.
When bots open the same board 1 million times per day it is web scraping to train the AI model and OK. When someone asks 150 thousand questions it is now distilling.
On an unrleated note, 150k qieries feels like nothing?
Scrapers seem to account for 50% total internet trafic.
Do they use different methodology since it is suddenly bad when scraping happens to them?
It should all be open source with each gain shared and celebrated by all.
If not, then we should look at Alibaba, but we should look at Anthropic as well.
Now complain about their stuff getting "stolen"... lol.
I have a hard time being concerned about “you pirated my piracy.”
I hold the view that many of these models should not be copyrightable. Anthropic and all the others talk about “safety” but you never hear them bring up attribution of the data that trained the model or compensation of anyone for it.
Companies like Anthopic will be using the same model as anyone else. They just bring value in having a fast datacenter and agent.
Its stupid to even think that a general model lile opus would be the real value.
Models age fast, new ones come along, and the end user wont care "whos model it is" just that it is fast and sharp.
Sorry, Anthropic, but AGI must belong to all of humanity, not just to you.
One could even wonder if they requested it, as a tactic to support their eventual IPO valuation.
Which is part of the problem of such an obviously-corrupt government: conspiracy theories are somewhat reasonable, as they keep getting validated.
Can't wait for the new Chinese models.
“Anthropic, red faced after unattended ice cream cone eaten by ants on park bench, once again demands government pick it as forever winner, adds ‘no take backsies’”
Whether if it is true or not, this is part of their effort into using them as an example to scare everyone into getting congress to ban powerful models from being accessed outside of the US and also banning powerful local models from being released.
Anthropic does not care about you, and they are not your friends.
In other words, they want to sell Fable or future more powerful models to rest of the world (presumably all future models are going to be more powerful than current gen). One way they can sell this is to the government is by scapegoating China (which is their primary concern anyway).
This is working on the presumption that non-US companies form a material portion of their current revenue.
If it was just "that easy" then I doubt only "Chinese models" would be doing it and we'd already be packed with competition.
Distilling might be a thing but it isn't a free win.
That's not the point. Why is it a country thing? There are plenty of non-China startups in this space having resources at that scale. The "China" has resources is some "Western media narrative" speak. So Meta should have won a long time ago? Or xAI?
> culture (Asians are generally collectively-inclined, so sharing is in their core)
Just stereotype it? So we've gone from China -> "Asian"? Then where is your Korean or Japanese model etc? And somehow you know they're sharing.
> political bent (there will be no diplomatic repercussions) to put up a fight
More inferring from "Western media news"?
Where's the reality?
The media hyped up Gemini / Google TPU free-win last year. How did that go?
Because the China vs US geopolitical situation is a thing. Meta is a social media company, not an AI company, and they direct their focus as such. xAI just never got serious traction so now they're selling their compute. Also if a US company were caught distilling, I think Anthropic could actually take them to court, and I'd guess they don't want that kind of PR.
> Just stereotype it?
Is China not Asian? Are Asians not generally collective/cooperative, as opposed to individualistic/competitive?
The "and" that joined those 3 items is very important: it means you can't pull them apart and address them independently as they each contribute to the context. I'm not too sure about Korea, but in a way Japan is a US colony in all but name. Both are very much politically intertwined with the West (along with RoC/Taiwan), which means nothing major that may be against US interest happens.
The reality is that China and the US are essentially in a trade war, where the latter is trying its best to keep the former in the Dark Ages, because "national security", but the former is refusing to take it lying down and continues to make progress regardless[0], because they have the resources and will.
[0] https://thenextweb.com/news/china-lineshine-supercomputer-to...
By the media? It's easy to point fingers at a blackhole.
> Meta is a social media company, not an AI company,
Alibaba (the discussion here) is not an AI company too (by your definition).
> Also if a US company were caught distilling, I think Anthropic could actually take them to court, and I'd guess they don't want that kind of PR.
Meta has been to congress. Microsoft, Google etc have been in lots of court cases and continue to do so. Do you really think that is what stops them?
> Is China not Asian? Are Asians not generally collective/cooperative, as opposed to individualistic/competitive?
This is exactly the "media" view you get. It's just stereotypes and generalization.
And yes, that is wrong by the way. Evident in real data. "China" as a whole wins market share in many areas but no 1 company has as much of a monopoly as US companies do. Why? There's so much competition that it is scary. So are you sure they don't compete?
> but in a way Japan is a US colony in all but name
Again, I almost give up seeing this. Clearly, not. If a whole country, the world's top 5 in GDP is only that to you something is wrong with what you're seeing - not with the country.
> Both are very much politically intertwined with the West (along with RoC/Taiwan), which means nothing major that may be against US interest happens
On the table? You do know that China is a top trading partner with all of these on your list. Despite whatever spat you might see in the media.
> The reality is that China and the US are essentially in a trade war
No. That's what the US government wants you to believe. It was even documented that in his 1st term, Trump, wanting a grand policy asked Krushner, whom then suggested China (pretty randomly) and so they went with it. Trump has now done less "China" related things lately due to all the backlash that you'd think he has moved on and found new toys.
Until very recently, the export ban GPUs had such a loophole that Chinese companies were able to use subsidiaries outside of China to buy and train that the whole thing was meaningless.
i.e. conclusion: stop getting brainwashed by media articles. It's all a show to get someone like you riled up.
LOL!
Get a grip, son.
anyways...
Failing to have done so seems to have allowed 25000 fake Chinese accounts to walk off with their product...
OFC I wouldn't trust the Chinese enough to ack their models the time of day, but Anthropic seems to have allowed far more ... yikes
It's becoming embarrassing to watch
So that was the real reason for the Fable restriction? Because Anthropic wrote a letter to the US government saying that China was distilling Fable?
- Entitled jerk that initially wronged people
Gosh, overusing accounts running up unplanned-for expenses?
Kinda reminds me of...overusage charges and inflated expenses clients have had to deal with because Anthropic, OpenAI, Grok, etc have been "illicitly extracting" everything they can grab from said websites, as fast as they can. In what amounts to a DDOS, frankly.
Getting treated exactly same by competition - "we need rapid, coordinated action among industry players, policymakers and the global AI community."
Absolute scum. And the gall of going "oh buh it can be used for military, quick govt do something".
/Anthropic-probably
A model may misidentify itself due to the surrounding context. When a model is about to answer "I'm ...", what follows is a sorted list of probabilities for what the next token should be. In most models it's usually a list of popular model names: say, in the list, first comes Claude, then Qwen, then ChatGPT etc. Usually the "Claude" token would be the most probable token, say 70%. But if the surrounding context is in Chinese, the embeddings for "something to do with China" may nudge the combined embedding of the output token towards the "Qwen" embedding more ("China+Claude=Qwen" in the embedding space). Say, the probability for "Qwen" now becomes 60% instead of 10%.
If we also use high temperature for more "creativity", the token sampler now may choose "Qwen". It's not the most probable token still, but it was chosen because selecting the 2nd most probable token once in a while usually allows a model to explore unexpected "creative" paths, and 60% probability is good enough compared to 70%. It's basically a hallucination.
I once made an experiment: if I ban the word "Qwen" in the inference engine entirely, and ask Qwen "which model are you?", it happily starts announcing it's Claude 100% time, simply because "Claude" is the next most probable token after "Qwen" in this context.