Apple has the best silicon team in the world. They choose perf per watt over pure perf, which means they don't win on multi-core, but they're simply the best in the world in the most complicated, difficult, and impossible metric to game: single core perf.
Even when they were new, they competed with AMD's high end desktop chips. Many years later, they're still excellent in the laptop power range - but not in the desktop power range, where chips with a lot of cache match it in single core performance and obliterate it in multicore.
https://www.cpu-monkey.com/en/compare_cpu-apple_m4-vs-amd_ry...
Why does it matter how they achieved their thunderous performance? Why must it be diminished to just a boatload of cache? Does it matter from which implementation detail you got the best single-core performance in the world? If it's just way more cache, why isn't Intel just cranking up the cache?
And in laptop form compared with a m4 max: https://www.cpu-monkey.com/en/compare_cpu-apple_m4_max_14_cp...
That was Fujitsu. They each have their own specialties.
I don't know how to set up a proper cross compile setup on Apple Silicon, so I tried compiling the same code on 2 macOS systems and 1 Linux system, running the corresponding test suite, and getting some numbers. It's not exactly conclusive, and if I was doing this properly properly then I'd try a bit harder to make everything match up, but it does indeed look like using clang to build x64 code is more expensive - for whatever reason - than using it to build ARM code.
Systems, including clang version and single-core PassMark:
M4 Max Mac Studio, clang-1700.6.3.2 (PassMark: 5000)
x64 i7-5557U Macbook Pro, clang-1500.1.0.2.5 (PassMark: 2290)
x64 AMD 2990WX Linux desktop, clang-20 (PassMark: 2431)
Single thread build times (in seconds). Code is a bunch of C++, plus some FOSS dependencies that are C, everything built with optimisation enabled: Mac Studio: 365
x64 Macbook Pro: 1705
x64 Linux: 1422
(Linux time excludes build times for some of the FOSS dependencies, which on Linux come prebuilt via the package manager.)Single thread test suite times (in seconds), an approximate indication of relative single thread performance:
Mac Studio: 120
x64 Macbook Pro: 350
x64 Linux: 309
Build time/test time makes it look like ARM clang is an outlier: Mac Studio: 3.04
x64 Macbook Pro: 4.87
x64 Linux: 4.60
(The Linux value is flattered here, as it excludes dependency build times, as above. The C dependencies don't add much when building in parallel, but, looking at the above numbers, I wonder if they'd add up to enough when built in series to make the x64 figures the same.)Not even a bad little gaming machine on the rare occasion
Those panther lake comparisons are from the top end PTL to the base M series. If they were compared to their comparative SKUs they’d be even further behind.
This was all mentioned in the article.
See the chart here for what the intel SKUs are: https://www.pcworld.com/article/3023938/intels-core-ultra-se...
They consume more power at the chip level. You can see this in Intels spec sheets. The base recommended power envelope of the PTL is the maximum power envelope of the M5. They’re completely different tiers. You’re comparing a 25-85W tier chip to a 5W-25W chip.
They also only win when it comes to multi core whether that’s CPU or GPU. If they were fairly compared to the correct SoC (an M4 Pro) they’d come out behind on both multicore CPU and GPU.
This was all mentioned in my comment addressing the article. This is the trick that apples competitors are using, by comparing across SKU ranges to grab the headlines. PTL is a strong chip, no doubt, but it’s still behind Apple across all the metrics in a like for like comparison.