Seems like this could also be used by call centers to realtime adjust their accents. Text is obviously easier to analyze (no realtime required) but I imagine that audio is not that hard to process real time.
Calling for home internet support and getting the person on the other end (in a US Southern or Boston accent) asking you to "do the needfull" could be pretty entertaining :-D