undefined

points

by Sweepi6 hours ago |

comments

by shawn_w5 hours ago|

[-]

Quite a few architectures have a dedicated 0 register.

by repelsteeltje5 hours ago|

parent|

[-]

Yep. The XOR trick - relying on special use of opcode rather than special register - is probably related to limited number of (general purpose) registers in typical '70 era CPU design (8080, 6502, Z80, 8086).

by classichasclass53 minutes ago|

parent|

[-]

Unfortunately, 6502 can't XOR the accumulator with itself. I don't recall if the Z80 can, and loading an immediate 0 would be most efficient on those anyway.

by blywi34 minutes ago|

parent|

[-]

XOR A absolutely works on Z80 and it's of course faster and shorter than loading a zero value with LD A,0. LD A,0 is encoded to 2 bytes while XOR A is encoded as a single opcode. XOR A has the additional benefit to also clear all the flags to 0. Sub A will clear the accumulator, but it will always set the N flag on Z80.

by bonzini37 minutes ago|

parent|

prev|

[-]

The Z80 can do either LD A,0 or SUB A or XOR A, but the LD is slower due to the extra memory cycle to load the second byte of the instruction.

by wongarsu36 minutes ago|

parent|

prev|

[-]

And [as mentioned in the article] even modern x86 implementations have a zero register. So you have this weird special opcode that (when called with identical source and destination) only triggers register renaming

by bonzini54 minutes ago|

parent|

prev|

[-]

A move on SPARC is technically an OR of the source with the zero register. "move %l0, %l1" is assembled as "or %g0, %l0, %l1". So if you want to zero a register you OR %g0 with itself.

by lynguist5 hours ago|

parent|

prev|

[-]

Indeed!!

MIPS - $zero

RISC-V - x0

SPARC - %g0

ARM64 - XZR

by classichasclass56 minutes ago|

parent|

[-]

PowerPC: "r0 occasionally" (with certain instructions like addi, though this might be better considered an edge case of encoding)

by signa115 hours ago|

parent|

prev|

[-]

indeed. riscv for instance. also, afaik, xor’ing is faster. i would assume that someone like mr. raymond would know…

by pif5 hours ago|

parent|

[-]

Which part of "mathematical operations don’t reset the NaT bit" did you not understand?

by IshKebab5 hours ago|

parent|

prev|

[-]

> afaik, xor’ing is faster

Even tiny tiny CPUs can do sub in one cycle, so I doubt that. On super-scalar CPUs xor and sub are normally issued to the same execution units so it wouldn't make a difference there either.

by tliltocatl5 hours ago|

parent|

[-]

On superscalars running xor trick as is would be significantly slower because it implies a data dependency where there isn't one. But all OOO x86's optimize it away internally.