undefined

points

[-]

I very recently ran the numbers on these GPUs for an upcoming blog post. The token generation performance is bad, but the prefill performance is _really_ bad.

For a Qwen 3.6 35B / 3B MoE, 4-bit quant:

- parsing a 4k prompt on a M4 Macbook Air takes 17 seconds before generating a single token.

- on an M4 Max Mac Studio it's faster at 2.3 seconds

- on an RTX 5090, it's 142ms.

RTX 5090 uses more power than an M4 Max Mac Studio but it's not 16x more power.

by wolvoleo14 hours ago|

prev|

[-]

Somehow Apple has always been able to sell their stuff as somehow Magic. Remember the megahertz myth? Apple hertzes and apple bytes are much better than PC hertzes and bytes because they are made by virgin elves during a full moon.

by mschuster9113 hours ago|

parent|

[-]

> Apple hertzes and apple bytes are much better than PC hertzes and bytes because they are made by virgin elves during a full moon.

The thing that Apple has always been excellent at is efficiency - even during the Intel era, MacBooks outclassed their Windows peers. Same CPU, same RAM, same disks, so it definitely wasn't the hardware, it was the software, that allowed Apple to pull much more real-world performance out of the same clock cycles and power usage.

Windows itself, but especially third party drivers, are disastrous when it comes to code quality, and they are much much more generic (and thus inefficient) compared to Apple with its very small amount of different SKUs. Apple insisted on writing all drivers and IIRC even most of the firmware for embedded modules themselves to achieve that tight control... which was (in addition to the 2010-ish lead-free Soldergate) why they fired NVIDIA from making GPUs for Apple - NV didn't want to give Apple the specs any more to write drivers.

by bigyabai12 hours ago|

parent|

[-]

> NV didn't want to give Apple the specs any more to write drivers.

I think that's a valid demand, considering Nvidia's budding commitment to CUDA and other GPGPU paradigms. Apple, backing OpenCL, would have every reason to break Nvidia's code and ship half-baked drivers. They did it with AMD's GPUs later down the line, pretending like Vulkan couldn't be implemented so they could promote Metal.

Apple wouldn't have made GeForce more efficient with their own firmware, they would have installed a Sword of Damocles over Nvidia's head.

by jorvi14 hours ago|

prev|

[-]

On Geekbench 5, the M1 hits 483 FPS and the RTX 3090 hits 504 FPS.

There are other workloads where the M1 actually beats the 3090.

Apple does plenty of hyping but it's always cute when irrational haters like you put them down. The M1 was (well, is) a marvel and absolutely smokes a 3090 in perf per watt.

by kllrnohj14 hours ago|

parent|

[-]

What geekbench 5 fps are you talking about? Geekbench only has OpenCL and Vulkan scores for the 3090 as far as I can tell, and the M1 Ultra is less than half the OpenCL score of the 3090. And the M1 Ultra was significantly more expensive.

Find or link these workloads you think exist, please

> The M1 was (well, is) a marvel and absolutely smokes a 3090 in perf per watt.

The GTX 1660 also smokes the 3090 in perf per watt. Being more efficient while being dramatically slower is not exactly an achievement, it's pretty typical power consumption scaling in fact. Perf per watt is only meaningful if you're also able to match the perf itself. That's what actually made the M1 CPU notable. M-series GPUs (not just the M1, but even the latest) haven't managed to match or even come close to the perf, so being more efficient is not really any different than, say, Nvidia, AMD, or Intel mobile GPU offerings. Nice for laptops, insignificant otherwise