Also why Swift nowadays has to have good Linux support, if app developers want to share code with the server.
There's also: "Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the Universe trying to produce bigger and better idiots. So far, the Universe is winning."
The market segments that can afford to ignore laptops and only target permanently-installed desktops are mostly those niches where the desktop is installed alongside some other piece of equipment that is much more expensive.
If you want to get usable speeds from very large models that haven't been quantitized to death on local machines, RDMA over Thunderbolt enables that use case.
Consumer PC GPUs don't have enough RAM, enterprise GPUs that can handle the load very well are obscenely expensive, Strix Halo tops out at 128 Gigs of RAM and is limited on Thunderbolt ports.
It'll increase a lot based on the zero-ram baseline. But it's still complete garbage compared to fitting the model in RAM. Even if you fit most of it in RAM you're still probably an order of magnitude slower than fitting all of it in RAM, most of your time spent waiting for your SSD.
I have a feeling that Mac fans obsess more about being able to run large models at unusably slow speeds instead of actually using said models for anything.