undefined

points

[-]

AVX-512, originally called the "Larrabee New Instructions" has been the only decent vector extension of the Intel-AMD ISA, which has been coherently planned instead of being a heap of more or less randomly chosen instructions, each being thought to be useful to accelerate some particular benchmark or a certain workload of one of the big customers.

MMX (Pentium MMX, 1997) sucked badly (because in designing it ease of implementation was prioritized over usefulness), SSE (Pentium III, 1999) was much worse than the simultaneously launched Motorola AltiVec, and AVX (Sandy Bridge, 2011) was much worse than the simultaneously developed Larrabee New Instructions (despite the fact that Sandy Bridge was developed by the A-team, while Larrabee was developed by the C- or D-team, which however had hired competent consultants from outside Intel, experienced in programming games and graphic applications).

AVX-512 is for now better than any competitive vector ISA, both in the achievable energy efficiency and in the achievable performance. Obviously, it is possible that some future Aarch64 (Arm) or even RISC-V CPUs will change this, by implementing wider registers and execution units and by adding any missing operations.

The SME ISA extension (Scalable Matrix Extension), which is available in the latest Apple CPUs and in the current 2026 generation of Arm C1 CPUs, has the potential to be more efficient than AVX-512, exploiting the fact that the current Intel AMX ISA is intended only for ML/AI and not also for general-purpose computing. Nonetheless this may happen only in a rather distant future, because neither Apple nor Qualcomm nor Arm seem interested to make products suitable for the needs of technical and scientific computing, like Intel and AMD. Because of that, in the existing CPUs with SME the ratio between SME execution units and the general-purpose CPU cores is low, resulting in a low total throughput.

by vardump1 hours ago|

parent|

[-]

MMX was what they could do that time without adding a lot of new registers. It still had its uses. 3DNow! made MMX semi decent on AMD CPUs. Of course SSE was superior, but early SIMD was all about compromises.

by fweimer6 hours ago|

parent|

prev|

[-]

SME (like AMX) are easier in this regard because there is a clear expectation that they are used in dedicated code blocks only, so run-time dispatch becomes feasible. In contrast, with auto-vectorization, general-purposes vector ISAs such as AVX-512 and SVE tend to get used all over the place.

by fweimer6 hours ago|

prev|

[-]

Wide (especially unconditional) use of AVX-512 faces two main issues today: There's no public commitment from Intel to phase out CPUs that don't support it. And some emulation-adjacent tools (the prime example is valgrind) do not support it.

The latter could at least be solved with some community effort, although the relevant set of instructions is quite large. It's also not specific to AVX-512. Any comparable vector ISA faces the same challenge.

by paulf385 hours ago|

parent|

[-]

"some community effort" is a huge understatement. Let me rephrase that for you: "Possibly the largest ever single contribution to Valgrind".

Initial work on this was started by an engineer at Intel. She was based in St Petersburg so that work stalled in 2022. Here is the bugzilla item https://bugs.kde.org/show_bug.cgi?id=383010. The other big issue is that we don't have enough people working on Valgrind that are experts with the virtual CPU. There are a couple of guys working on s390 and a little bit of work is being done reusing amd64 sse4 support on x86. I dabble a little bit on arm64,

If there are any AVX512 experts that would like to help with this it would be most welcome.

by fweimer47 minutes ago|

parent|

[-]

I didn't intend to make a statement about the programming effort required. I wanted to contrast it with corporate politics at CPU vendors, from which it is largely decoupled. Given the size of the task, it needs corporate funding, just not from x86 vendors. For example, we're fairly strongly incentivized to make valgrind support for any potential future x86-64-v4 transition because our development community really expects valgrind support as part of the core toolchain.

by Dylan168075 hours ago|

parent|

prev|

[-]

> There's no public commitment from Intel to phase out CPUs that don't support it.

They dropped the idea of having AVX10 variants that don't support the full thing, and as of Nova Lake even the E cores will have it. Is there a significant risk it doesn't get into all products starting soon?

by fweimer44 minutes ago|

parent|

[-]

Historically, the edge business unit did a bit of their own thing with their CPUs. I believe the transition is finally happening once we see AVX10 CPUs over there as well. Until then I'm somewhat skeptical. (To be clear, I have no insight into their roadmaps, precisely because it's so separate.)

by simonask8 hours ago|

prev|

[-]

It will be literal decades before RISC-V becomes mainstream. Not because it’s not a perfectly fine ISA, but because business incentive structures aren’t nowhere near supporting it.

Literal man-millennia have been poured into writing software for both x86 and ARM, and nobody seems close to designing a competitive RISC-V chip.

by jcelerier2 hours ago|

prev|

[-]

next intel cpus will have AVX 10.2 & APX

by camel-cdr6 hours ago|

prev|

[-]

Porting this optimization to RISC-V Vector is pretty trivial.