upvote
GOOG-411 was "competing" with a strong company (1-800-FREE411) by serving no ads in a category worth ~$3.5B at the time. It was inexplicable at the time, but they did this to get voice samples, way back when. For reasons like that, I expect that this category of training is baked — but I don't have current domain knowledge fwiw.
reply
It's already there. And keeps moving.

Even have a nice UI on top.

https://voicebox.sh/

reply
Not really, Mozilla Common Voice (the ImageNet of speech) is larger than this. Their English database has 3814 hours, 1.6 million sentences, from 100k speakers.

https://commonvoice.mozilla.org/en/languages

reply
Yep, the silence around provenance is probably the most suspicious part
reply