Distilling might only be effective in the chat bot dominant era. We are about to move to an agents era.
Furthermore, I’m guessing distilling will get harder and harder. Claude Code leak shows some primitive anti distilling methods already. There’s research showing that models know when it’s being benchmarked. Who’s to say Anthropic and OpenAI aren’t able to detect when their models are being distilled?