upvote
MMX was what they could do that time without adding a lot of new registers. It still had its uses. 3DNow! made MMX semi decent on AMD CPUs. Of course SSE was superior, but early SIMD was all about compromises.
reply
SME (like AMX) are easier in this regard because there is a clear expectation that they are used in dedicated code blocks only, so run-time dispatch becomes feasible. In contrast, with auto-vectorization, general-purposes vector ISAs such as AVX-512 and SVE tend to get used all over the place.
reply