upvote
None of this is true. There are standard curves for human hearing frequency response and you can use these to compare sound A’s volume to sound B. And since sound compression is in DCT space, you can calculate those numbers very quickly with something similar to sum(vol(f) * curve(f) for f in encoded_frequencies).

I read the article. It specifically talks about server-side ad embedding, i.e. where the service is inserting ad content into the streams, and therefore, by definition, has access to the ad content. They can do the calculations on their end during the embedding process and normalize volumes there before transmitting the result. To make things even easier, they don’t have to calculate the ad volume each time one’s streamed, just once per ad they’re going to serve.

And finally, all of this is a solved problem for TV broadcasters. They face the same problems: advertisers send them content to air, then the broadcasters are legally required to normalize the ad vs content volume, and they do. If this is an insurmountable problem that the streaming services face, they can drive over to their nearest TV station and ask them how they manage to pull off this technological feat.

reply