Also, offline access is still a necessity for many usecases. If you have something like an autocomplete feature that stops working when you're on the subway, the change in UX between offline and online makes the feature more disruptive than helpful.
To your point, its neat tech, but the limitations are obvious since 'printing' only one LLM ensures further concentration of power. In other words, history repeats itself.