To be fair, the culprit in the article is _less complex_ than branch prediction: "with random data, bits are flipped often, and bit flips in transistors inherently draw power" is less mental gymnastics than "with random data, the cpu fails to predict the future, causing redundant speculative execution"
I expected a “torch is smart enough to keep track of cases where it just initialized the C in C <= A*B+C to zero, avoiding the add” type situation but I was wrong.