undefined

points

[-]

> But how will a GPU with small-ish but fast VRAM and great compute, augment a Mac with large but slow VRAM and weak compute?

It would work just like a discrete GPU when doing CPU+GPU inference: you'd run a few shared layers on the discrete GPU and place the rest in unified memory. You'd want to minimize CPU/GPU transfers even more than usual, since a Thunderbolt connection only gives you equivalent throughput to PCIe 4.0 x4.

by manmal13 hours ago|

parent|

[-]

But isn’t the Mac Mini the weak link in that scenario?

by zozbot23413 hours ago|

parent|

[-]

It has way more unified memory than your typical dGPU.

by manmal10 hours ago|

parent|

[-]

Yes obviously. That VRAM is also slower and has weak compute attached. Loading to the external GPU will slow things down too much.

by arjie14 hours ago|

prev|

[-]

My Mini is actually the smallest model so it actually has "small but slow VRAM" (haha!) so the reason I want the GPU for are the smaller Gemmas or Qwens. Realistically, I'll probably run on an RTX 6000 Pro but this might be fun for home.

by GeekyBear13 hours ago|

prev|

[-]

We've seen many recent projects to stream models direct from SSD to a discrete GPU's limited VRAM on PCs.

How big a bottleneck is Thunderbolt 5 compared to an SSD? Is the 120 Gbps mode only available when linked to a monitor?

by manmal13 hours ago|

parent|

[-]

That’s what, 14GB/s? The GPU‘s VRAM can do 100x that.

by GeekyBear13 hours ago|

parent|

[-]

A discrete consumer GPU card doesn't have enough fast RAM to run a very large model that hasn't been quanitized to hell.

That's why all the projects streaming models into the GPU from an SSD popped up recently.

by manmal10 hours ago|

parent|

[-]

Yes. There’s just no way to get above 1t/s that way with a large model.