upvote
And helps with SMT

Edit: this is apparently not the case, see @tliltocatl's comment down the thread

reply
What's SMT in this context?
reply
Simultaneous Multi-Threading (hyper-threading as Intel calls it). I'm not a cpu guy, but I think the ALU used for subtraction would be a more valuable resource to leave available to the other thread than whatever implements a xor. Hence you prefer to use the xor for zeroing and conserve the ALU for other threads to use.
reply
I don't think that's how it works.

- Normally ALU implements all "light" operations (i. e. add/sub/and/or/xor) in a single block, separating them would result in far more interconnect overhead. Often, CPUs have specialized adder-only units for address generation, but never a xor-specialized block.

- All CPUs that implement hyper-threading also optimize a XOR EAX,EAX into MOV EAX,ZERO/SET FLAGS (where ZERO is an invisible zero register just like on Itanium and RISCs). This helps register renaming and eliminates a spurious dependency.

- The XOR trick is about as old as 8086 if not older.

reply
By the time you get to a CPU complex enough to be to have SMT it is likely to detect these “clear register” patterns and special case them.

XOR would also be handled by the ALU, the L is for logic.

reply
Most CPU use the same ALU for xor and sub.
reply
XOR appears a lot in any code touching encryption.

PS. What is static vs dynamic count?

reply
Static count - how many times an instruction appears in a binary (or assembly source).

Dynamic count - how many times an opcode gets executed.

I. e. an instruction that doesn't appear often in code, but comes up in some hot loops (like encryption) would have low static and high dynamic.

reply