undefined

points

by HerbManic19 hours ago |

comments

by pjmlp16 hours ago|

[-]

That is the proof what is left is a workaround, just like pilling minis on racks because Apple left the server space.

Also why Swift nowadays has to have good Linux support, if app developers want to share code with the server.

by coldtea13 hours ago|

parent|

[-]

A workaround that works is better than an official solution that's barely adequate. Which is often the case.

by pjmlp11 hours ago|

parent|

[-]

Or just maybe, to use a Steve Jobs quote, one is holding it wrong and should look elsewhere.

by coldtea3 hours ago|

parent|

[-]

People sneer at this Steve Jobs quote, but almost anybody working in tech had at some point quoted another, stronger, quote like "We tried to make the program idiot proof, but they keep making better idiots".

There's also: "Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the Universe trying to produce bigger and better idiots. So far, the Universe is winning."

https://en.wikiquote.org/wiki/Rick_Cook

by zozbot23419 hours ago|

prev|

[-]

But those Thunderbolt links are slower than modern PCIe. If there's actually a M5-based Mac Studio with the same Thunderbolt support, you'll be better off e.g. for LLM inference, streaming read-only model weights from storage as we've seen with recent experiments than pushing the same amount of data via Thunderbolt. It's only if you want to go beyond local memory constraints (e.g. larger contexts) that the Thunderbolt link becomes useful.

by wpm19 hours ago|

parent|

[-]

Why everyone wants to live in dongle/external cabling/dock hell is beyond me. PCIe cards are powered internally with no extra cables. They are secure. They do not move or fall off of shit. They do not require cable management or external power supplies. They do not have to talk to the CPU through a stupid USB hub or a Thunderbolt dock. Crappy USB HDMI capture on my Mac led me to running a fucking PC with slots to capture video off of a 50 foot HDMI cable, that then streamed the feed to my Mac from NDI, because it was more reliable than the elgarbo capture dongle I was using. This shit is bad. It sucks. It's twice the price and half the quality of a Blackmagic Design capture card. But, no slots, so I guess I can go get fucked.

by wtallis18 hours ago|

parent|

[-]

For anything that's even somewhat in the consumer space rather than pure workstation/professional, the main reason is that dongles can be used with a laptop but add-in cards can't. When ordinary consumer PCs (or even office PCs) are in the picture, laptops are a huge chunk of the target audience.

The market segments that can afford to ignore laptops and only target permanently-installed desktops are mostly those niches where the desktop is installed alongside some other piece of equipment that is much more expensive.

by GeekyBear18 hours ago|

parent|

prev|

[-]

Wasn't streaming models from storage into limited memory a case where it was impressive that you could make the elephant dance at all?

If you want to get usable speeds from very large models that haven't been quantitized to death on local machines, RDMA over Thunderbolt enables that use case.

Consumer PC GPUs don't have enough RAM, enterprise GPUs that can handle the load very well are obscenely expensive, Strix Halo tops out at 128 Gigs of RAM and is limited on Thunderbolt ports.

by zozbot23416 hours ago|

parent|

[-]

The bad performance you saw was with very limited memory and very large models, so streaming weights from storage was a huge bottleneck. If you gradually increase RAM, more and more of the weights are cached and the speed improves quite a bit, at least until you're running huge contexts and most of the RAM ends up being devoted to that. Is the overall speed "usable"? That's highly subjective, but with local inference it's convenient to run 24x7 and rely on non-interactive use. Of course scaling out via RDMA on Thunderbolt is still there as an option, it's just not the first approach you'd try.

by Dylan168075 hours ago|

parent|

[-]

> If you gradually increase RAM, more and more of the weights are cached and the speed improves quite a bit

It'll increase a lot based on the zero-ram baseline. But it's still complete garbage compared to fitting the model in RAM. Even if you fit most of it in RAM you're still probably an order of magnitude slower than fitting all of it in RAM, most of your time spent waiting for your SSD.

by GeekyBear8 hours ago|

parent|

prev|

[-]

If you don't care about performance, you have a lot of options.

by mixdup12 hours ago|

prev|

[-]

The proposition of a Mac Pro in the Apple Silicon world wasn't necessarily about performance, it was about the existence of the PCIe slots. I don't think AI becoming a workload for pro Macs means the Mac Pro doesn't have a place, people who were using Mac Pros for audio or video capture didn't stop doing that media work and switched to AI as a profession. That market just wasn't big enough to sustain the Mac Pro in the first place and Apple has finally acknowledged that fact

by alsetmusic11 hours ago|

parent|

[-]

I had a U-Audio PCI card in a Mac Pro during the Intel era of Macs. It was a chip to run their software plugins and the plugins are top of the line. I have a U-Audio box that runs over Thunderbolt now. I know there are people who need device slots, but it's vanishingly few. I'm disappointed that this category of machine is going away, but it stopped being for me in the Apple Silicon era.

by grahamlee12 hours ago|

parent|

prev|

[-]

so many peripherals now come in external boxes that communicate _incredibly quickly_ over Thunderbolt 4/5 that the need for PCIe is marginal, while the cost to support it is significant.

by ActorNightly6 hours ago|

prev|

[-]

Wow spend 40k to get the same tokens/second in QWEN as you would on a 3090

I have a feeling that Mac fans obsess more about being able to run large models at unusably slow speeds instead of actually using said models for anything.