upvote
Distillation is great for researchers and hobbyists.

But nearly all frontier models have anti-distillation ToS, so distillation is out of question for western commercial companies like Apple.

reply
Even if Apple needs to train an LLM from scratch, they can distill it and deploy on edge devices. From that point, inference is free to them.
reply