upvote
On modern cpus? Most likely. Those kinds of optimizations are done by the core with no compiler magic needed.

CPU implementation has become too complex to grasp. The only sure way to know how a CPU will behave for a given workload is to run the workload. It's good to have some basic expectations of performance, instructions/cycle, memory bandwidth, to detect if something is off. I guess I'm trying to say it's hard to keep in your head all the details of what ~1B transistors are doing together to run your code. It's just too big.

reply
deleted
reply
Hardware definitely supports this but it might need compiler support, as in adding instructions to do prefetching. Which might be done automatically or requires a pragma or calling a builtin. So it can be implemented in any case.
reply
The compiler probably does [0].

[0] https://gcc.gnu.org/projects/prefetch.html

reply
That list doesn't include any current mainline processors. It's all Itanium, 3DNow!, and MIPS.
reply
Intel added PREFETCHW to their Broadwell processors launched in 2014, years after AMD dropped all 3DNow! instructions except the prefetch instructions. That timeline strongly suggests that the instructions aren't no-ops and likely are used by some popular software.
reply