Don't even try to combine it with any notion of "leadership" then, however, since distillation is literally "copying the actual leader"
(and if you argue the US models do produce copyrighted works, then oooops - whose copyright is it huh?)
There's no "leader" if, absent someone whose results you're copying, you are an emperor without clothes
And certainly they have no idea whether these outputs (assuming they ever existed and it wasn't made up) were used for training. The article mentions that DS made 150k requests. This isn't much and might have been just an eval or a benchmark to compare their own model against. It's really hard to believe DeepSeek had any Claude outputs anywhere in their training schedule, since it's just too different. Besides training on random vibecode of course, which is mostly written by Claude.
Imagine if your casio calculator would come with a ToS that says you can't use it to develop a competitor calculator or any other tools. Or that your hammer can't be used to make other tools. Or, closer to the HN crowd, imagine MS in the 90s saying that you can't use their OS to build competing services to MS. They'd be laughed at and be split immediately if they tried that.
The only thing they can do is to refuse serving tokens (and even that's debatable, if we get to tokens being commoditised). But that's gonna be a game of whack-a-mole, and they know it.