upvote
And they're hilariously pissy about it for a megacorp that did the same with the entire Internet and every library book they could get their hands on.
reply
Anthropic's claim was that Deepseek collected ~150k conversations.

https://www.anthropic.com/news/detecting-and-preventing-dist...

I think the extent of distillation by Deepseek specifically is overstated. For comparison, Minimax collected over 13m 'exchanges', which starts to sound a lot more like large-scale distillation.

reply
If that's all it took to make Deepseek so good, I'll gladly ship High-Flyer all my personal 150k claude/chatgpt conversations in exchange for Deepseek 5 (and a rack of B200s or Ascend chips)
reply
Ah, dang it. My college professors warned me about this: the Wikipedia page I read the other day is wrong!
reply
Did you read a Wikipedia page, or did you read a LLM-generated summary? When I looked this number up yesterday the LLM summary claimed it was millions, but I opened the Anthropic post I was looking for and verified it was indeed just 150,000. Are you sure you weren't just being lazy and trusting the summary?
reply
I said what I meant:

https://en.wikipedia.org/wiki/DeepSeek

> In February 2026, Anthropic accused DeepSeek of using thousands of fraudulent accounts to generate millions of conversations with Claude to train its own large language models.[57]

reply
deleted
reply