There are other arguments for running an ssh-able box in a closet somewhere too as with KVMs you can give an agent remote control over the machine itself such that it has vastly more capabilities than if it were controlling its own machine it's running on, as well as not needing to keep the MacBook open all the time just to have the agent finish running.
This translates to qwen 27b actually working fast enough for useful work on dual 3090s and being painfully slow on Macbook Pros. Also if you're running a big model on a macbook pro the UI gets laggy and the keyboard gets hot. Much better to run dual 3090s in your basement and connect to them from your Macbook.
Even a 128GB is $6.8k today. Still only 2/3 your quote.
Bandwidth is relevant (I have both a 5090 and an M4 Max 128GB Studio, so have direct comparison right here), but quote the cost appropriately!
So, I always thought local LLMs were toys not worth pursuing.
Only once have I tried something decent like Gemma 4 31B and Qwen 3.6 27B did I realize how incredibly useful they are.
You stop fearing you are sharing sensitive information.
You stop fearing you will run out of tokens.
You stop fearing about the availability of the remote AI.
Local LLMs are extremely valuable.
The M5 hardware is amazing for what it is, but GPUs are still so much faster.
Running the models on the GPU box also means I can use the laptop on my lap instead of turning it into a hot plate.
Get a regular laptop and use the network to access the LLM