This is just trash, like almost any AI benchmark. E.g. it says since around 2015 speech recognition is above human yet any any speech input today has more errors than any human would have.
If I would not type but speak this comment maybe 2 to 5 words would be wrong. For a human it is maybe 10% of that.