They also haven't' tried to write a high performance kernel for triton yet. If it goes the way my last experiment with Taylor did they're in for some bad news.
I'm just a hobbyist though, it's certainly possible that people with more time/resources could outperform me without much effort. I just want to see it tested on something familiar and benchmark-able.