I can believe that, but even so it's caused by the type system monomorphising everything. When it use qsort from libc, you are using per-compiled code from a library. When you use slice::sort(), you get custom assembler compiled to suit your application. Thus, there is a lot more code generation going on, and that is caused by the tradeoffs they've made with the type system.
Rusts approach give you all sorts of advantages, like fast code and strong compile time type checking. But it comes with warts too, like fat binaries, and a bug in slice::sort() can't be fixed by just shipping of the std dynamic library, because there is no such library. It's been recompiled, just for you.
FWIW, modern C++ (like boost) that places everything in templates in .h files suffers from the same problem. If Swift suffers from it too, I'd wager it's the same cause.
But not having to is a win, as the monomorphised sorts are just much faster at runtime than having to do an indirect call for each comparison.
I was all excited to conduct the "cargo check; mrustc; cc" is 100x faster experiment, but I think at best, the multiple is going to be pretty small.
There are multiple caveats on providing this to users (we can't assume that macro invocations are idempotent, so the new behavior would have to be opt in, and this only benefits incremental compilation), but it's in our radar.