This can have a huge effect on a wide range of applications, not just those using particular CPU features. For example, each libc implementation typically has a separate implementation `memcpy()` for each set of CPU features.
https://devblogs.microsoft.com/dotnet/performance-improvemen...