The TL;DR though is that a 10-15b param model baked into an ASIC with the latest fab tech would take around 62W of power draw when active. At ~10k+ t/s though it likely would only be active for short bursts of time. It'd fit perfectly fine within the thermal envelope of a laptop.
The approach makes a lot of sense. Once you get to those speeds, latency of the network becomes one of the bigger bottlenecks, so local has a real advantage over a subscription.