SME (like AMX) are easier in this regard because there is a clear expectation that they are used in dedicated code blocks only, so run-time dispatch becomes feasible. In contrast, with auto-vectorization, general-purposes vector ISAs such as AVX-512 and SVE tend to get used all over the place.
reply