upvote
Anything more competent than Microsoft Sam counts as realistic in my book. If their definition excludes narration that would be weird.

Their detection might not look at audio right now though.

reply