upvote
I keep this link in my favorites and refer to it every now and again. Still one of the best write-ups I've seen on just have vast the difference is between a naive and well tuned kernel

https://siboehm.com/articles/22/CUDA-MMM

reply
This is an amazing article, thanks for sharing! Will also bookmark it.
reply
> Getting peak performance out of a GPU is much more complex than it is with a CPU.

I don’t really think it’s significantly more challenging

reply