Then you have to classify them as 0's or 1's. Each is visually distinct, a 1 being encoded by the presence of a transistor and a gap in the polysilicon. We didn't have to guess which is which is by the nature of Intel microcode we could assume 0's were much more frequent, so a transistor meant a 1.
There are some automatic tools designed to perform this work via color thresholding, but they didn't work very well here because some of the mosaic was blurry, and a lot of dust had crept in which created false 1 bits.
Instead, we trained a convolutional neural network to classify the extracted bit regions into 0's and 1's. This was overlaid back onto the original mosaic as white or black squares at 50% opacity.
Then we spent several long, tedious days just checking the results for errors. Finally we had the raw 2d array of bits - the next step is to extract the microcode words from the bit array.
> The photo above shows part of the microcode ROM. Under a microscope, the contents of the microcode ROM are visible, and the bits can be read out, based on the presence or absence of transistors in each position.
[1]: https://www.righto.com/2020/06/a-look-at-die-of-8086-process...
http://brianluft.com/images/2026/05/386_microcode_bits.jpg -- my fully annotated result. I was working from a higher-quality PNG; this is highly compressed because it's a big image.
More modern devices are of course more difficult due to layers, feature size, and less visually obvious ROM bit designs.
Anyway, the impressive part of this project was really understanding the undocumented microcode assembly language through inference and trace following; the 1s and 0s look like they were the easy part!
1. Extract the ROM bits. 2. Determine physical-to-logical bit ordering. 3. Identify microinstruction boundaries. 4. Infer field boundaries. 5. Associate fields with hardware destinations (check with die tracing). 6. Decode instruction-dispatch programmable logic arrays. 7. Associate x86 instructions with microcode entry points. 8. Infer repeated idioms: moves, ALU ops, termination, calls, tests, redirects. 9. Decode accelerator protocols. 10. Validate against known architectural behavior.