upvote
For open models, usually not well. You get 5+ providers competing on cost, all with cheaper electricity and hardware utilization than your local setup
reply
I did an estimate of that if you're interested: https://x.com/pwnies/status/2028831699736637912

The TL;DR though is that a 10-15b param model baked into an ASIC with the latest fab tech would take around 62W of power draw when active. At ~10k+ t/s though it likely would only be active for short bursts of time. It'd fit perfectly fine within the thermal envelope of a laptop.

The approach makes a lot of sense. Once you get to those speeds, latency of the network becomes one of the bigger bottlenecks, so local has a real advantage over a subscription.

reply