I think there’s a sweet spot currently with munging your data blindly on the server so that your client device battery still lasts all day.
Meanwhile Apple and others push on with making client side models more efficient so that eventually the server costs and complexities go away.
You could run a pretty good home server on $50 of gear and yet we never saw any real adoption of OwnCloud/NextCloud style products as an alternative to Google Drive/Photos or Apple Cloud.
Why should LLM/Transformers be any different? Especially when you need a proper expensive GPU to run them instead of a Raspberry Pi?
On-device AI is going to be important, I think. It doesn't have to take the form of a chatbot UI to be useful.
Maybe if you ask them that question, but if you show them two products, they'll definitely prefer the faster one. 30 seconds is a long time to watch a progress bar.
People definitely aren't going to accept more expensive + slower ...