I dont know how it's implemented by the standard/compiler (not my domain). The performance differences are well documented though.
I've used both in my pathing code and tested each in debug/release.
Even if the std:: implementation was as fast as possible, you're still adding bit manipulation on top of accessing the element, so it will be slower no matter what you do.