It also looks like there's a slight difference in the unwanted effect both companies have reported, despite the bug being seemingly triggered the same way (mov touching the high byte):
- Oodle reports that a low byte is occasionally stored in the intended location.
- Mozilla's fix suggests that a full 16-bit value is stored instead, corrupting an adjacent variable! This could have much more serious consequences.
Technically, this could still be the same exact bug. I found no mention of the order the output buffer was accessed in by the Huffman decoder debugged in the Oodle report, and, since it was a contiguous buffer, it's easy to mistake an occasional out-of-bounds copy there for a copy from a wrong location. But if both analyses are correct, the behavior of high byte accesses on Raptor Lake is way less predictable than those fixes suggest. Haven't managed to find an official erratum from Intel.
"Microcode and BIOS code requesting elevated core voltages which can cause Vmin shift especially during periods of idle and/or light activity" (emphasis mine)
https://community.intel.com/t5/Blogs/Tech-Innovation/Client/...
Recall also that "Vmin shift" means "the minimum voltage the processor needs to run correctly goes up" so if the issue isn't addressed, that level of undervolt may stop working
When you degrade the CPU naturally needs higher voltages to be stable, until the point where it just breaks completely and no amount of voltage it help it. But if your CPU doesn't degrade because it hasn't been overdoing it on voltages then there'll be no issues for Vmin to shift.
As an anecdotal experience from someone I know that runs these in prod for game servers, limiting the CPU to 80°C and 1.4V-1.45V, 400A has been keeping them alive for years doing 24/7 loads. Maybe a bit lower on the voltage if one wants to be sure longer term, as they are fine with just mass RMAing these. There's also large amount of differences in the silicon quality between samples that can make one run cool and completely fine even at the old stock settings, and an another sample that'll have to pull say 1.5x the power for the same load and clocks having it degrade.
Vmin will creep up, and the headroom for undervolting will degrade. It will affect the high clocks first (they demand the highest voltage), which is why dropping the max boost multiplier a step or two can also work around it (at the cost of basically downgrading it to a cheaper processor)
https://www.pugetsystems.com/blog/2024/08/02/puget-systems-p...