It's not a theory. These smaller models that are coming out are huge advances for the field.
I can't comment on companies training practices. That would be proprietary stuff I guess. I think the claims that the advances being made are due to distillation alone are completely unfair. The advances alone are not just data.