undefined

points

by XCSme6 hours ago |

comments

by cootsnuck6 hours ago|

[-]

The error rate for human transcription can be as high as 5%.

by qingcharles56 minutes ago|

parent|

[-]

I did transcription for a while in 2021. It is absurdly hard. Especially as these days humans only get the difficult jobs that AI has already taken a stab at.

The hardest one I did was for a sports network where it was a motorcross motorbike event where most of what you could hear was the roar of the bikes. There were two commentators I had to transcribe over the top of that mess and they were using the slang insider nicknames for all the riders, not their published names, so I had to sit and Google forums to find the names of the riders while I was listening. I'm not even sure how these local models would even be able to handle that insanity at all because they almost certainly lack enough domain knowledge.

by XCSme6 hours ago|

parent|

prev|

[-]

Oh wow, I thought humans are like 0.1% error rate, if they are native speakers and aware of the subject being discussed.

by zipy1245 hours ago|

parent|

[-]

I was skepitcal upon hearing the figure but various sources do indeed back it up and [0] is a pretty interesting paper (old but still relevant human transcibers haven't changed in accuracy).

[0] https://www.microsoft.com/en-us/research/wp-content/uploads/...

by XCSme5 hours ago|

parent|

[-]

I think it's actually hard to verify how correct a transcription is, at scale. Curious where those error rate numbers come from, because they should test it on people actually doing their job.

by rhdunn3 hours ago|

parent|

prev|

[-]

It can depend a lot on different factors like:

- familiarity with the accent and/or speaker;

- speed and style/cadence of the speech;

- any other audio that is happening that can muffle or distort the audio;

- etc.

It can also take multiple passes to get a decent transcription.

by qingcharles55 minutes ago|

parent|

[-]

You missed a giant factor: domain knowledge. Transcribing something outside of your knowledge realm is very hard. I posted above about transcribing the commentary of a motorbike race where the commentators only used the slang names of the riders.

by Nimitz143 hours ago|

parent|

prev|

[-]

Most of these errors will not be meaningful. Real speech is full of ambiguities. 3% is low