upvote
Quite a few architectures have a dedicated 0 register.
reply
Yep. The XOR trick - relying on special use of opcode rather than special register - is probably related to limited number of (general purpose) registers in typical '70 era CPU design (8080, 6502, Z80, 8086).
reply
Unfortunately, 6502 can't XOR the accumulator with itself. I don't recall if the Z80 can, and loading an immediate 0 would be most efficient on those anyway.
reply
XOR A absolutely works on Z80 and it's of course faster and shorter than loading a zero value with LD A,0. LD A,0 is encoded to 2 bytes while XOR A is encoded as a single opcode. XOR A has the additional benefit to also clear all the flags to 0. Sub A will clear the accumulator, but it will always set the N flag on Z80.
reply
The Z80 can do either LD A,0 or SUB A or XOR A, but the LD is slower due to the extra memory cycle to load the second byte of the instruction.
reply
And [as mentioned in the article] even modern x86 implementations have a zero register. So you have this weird special opcode that (when called with identical source and destination) only triggers register renaming
reply
A move on SPARC is technically an OR of the source with the zero register. "move %l0, %l1" is assembled as "or %g0, %l0, %l1". So if you want to zero a register you OR %g0 with itself.
reply
Indeed!!

MIPS - $zero

RISC-V - x0

SPARC - %g0

ARM64 - XZR

reply
PowerPC: "r0 occasionally" (with certain instructions like addi, though this might be better considered an edge case of encoding)
reply
indeed. riscv for instance. also, afaik, xor’ing is faster. i would assume that someone like mr. raymond would know…
reply
Which part of "mathematical operations don’t reset the NaT bit" did you not understand?
reply
> afaik, xor’ing is faster

Even tiny tiny CPUs can do sub in one cycle, so I doubt that. On super-scalar CPUs xor and sub are normally issued to the same execution units so it wouldn't make a difference there either.

reply
On superscalars running xor trick as is would be significantly slower because it implies a data dependency where there isn't one. But all OOO x86's optimize it away internally.
reply