https://old.reddit.com/r/Anthropic/comments/1snorbg/the_bigg...
I don't know enough about distillation to understand how much this hinders/slows the process, but it sounds at least superficially plausible.
Honestly, I think its quite possible that models will be retrained with gaps in their knowledge. e.g. a coding model for commercial use probably doesn't need to have deep knowledge of biology, and training on biological sciences probably doesn't help those evals much.
Honestly, I'd welcome such an approach.