MMX (Pentium MMX, 1997) sucked badly (because in designing it ease of implementation was prioritized over usefulness), SSE (Pentium III, 1999) was much worse than the simultaneously launched Motorola AltiVec, and AVX (Sandy Bridge, 2011) was much worse than the simultaneously developed Larrabee New Instructions (despite the fact that Sandy Bridge was developed by the A-team, while Larrabee was developed by the C- or D-team, which however had hired competent consultants from outside Intel, experienced in programming games and graphic applications).
AVX-512 is for now better than any competitive vector ISA, both in the achievable energy efficiency and in the achievable performance. Obviously, it is possible that some future Aarch64 (Arm) or even RISC-V CPUs will change this, by implementing wider registers and execution units and by adding any missing operations.
The SME ISA extension (Scalable Matrix Extension), which is available in the latest Apple CPUs and in the current 2026 generation of Arm C1 CPUs, has the potential to be more efficient than AVX-512, exploiting the fact that the current Intel AMX ISA is intended only for ML/AI and not also for general-purpose computing. Nonetheless this may happen only in a rather distant future, because neither Apple nor Qualcomm nor Arm seem interested to make products suitable for the needs of technical and scientific computing, like Intel and AMD. Because of that, in the existing CPUs with SME the ratio between SME execution units and the general-purpose CPU cores is low, resulting in a low total throughput.
The latter could at least be solved with some community effort, although the relevant set of instructions is quite large. It's also not specific to AVX-512. Any comparable vector ISA faces the same challenge.
Initial work on this was started by an engineer at Intel. She was based in St Petersburg so that work stalled in 2022. Here is the bugzilla item https://bugs.kde.org/show_bug.cgi?id=383010. The other big issue is that we don't have enough people working on Valgrind that are experts with the virtual CPU. There are a couple of guys working on s390 and a little bit of work is being done reusing amd64 sse4 support on x86. I dabble a little bit on arm64,
If there are any AVX512 experts that would like to help with this it would be most welcome.
They dropped the idea of having AVX10 variants that don't support the full thing, and as of Nova Lake even the E cores will have it. Is there a significant risk it doesn't get into all products starting soon?
Literal man-millennia have been poured into writing software for both x86 and ARM, and nobody seems close to designing a competitive RISC-V chip.