upvote
Depending on what you do. If you are doing token generations, compute-dense kernel optimization is less interesting (as, it is memory-bounded) than latency optimizations else where (data transfers, kernel invocations etc). And for these, Mac devices actually have a leg than CUDA kernels (as pretty much Metal shaders pipelines are optimized for latencies (a.k.a. games) while CUDA shaders are not (until cudagraph introduction, and of course there are other issues).
reply
Mac kernels are almost always compute shaders written in Metal. That's the bare-minimum of acceleration, being done in a non-portable proprietary graphics API. It's optimized in the loosest sense of the word, but extremely far from "optimal" relative to CUDA (or hell, even Vulkan Compute).

Most people will not choose Metal if they're picking between the two moats. CUDA is far-and-away the better hardware architecture, not to mention better-supported by the community.

reply