(For what little it’s worth, and in the spirit of aforementioned curiosity: nausea gives you ad nauseam; with some caveats, a Latin noun in the singular governed by the preposition ad gets the ending -m while retaining the final vowel of its stem.)
Actual learning requires you to think about what you just read, maybe re-read it multiple times, stop to try and solve a example problem, etc - all of which require you to stop/rewind which video inherently disfavors.
Besides, I think just like handwritten notes might have a slightly different neurological effect than typed ones, reading might just be a very different mental muscle more connected to comprehension; humans had oral language for much longer than any script, so maybe it came with some different connections to higher brain structures as well.
That's not exactly neutral though, but part of a larger theme of regression from literacy to a visual and oral culture (and a dopamine seeking junky one).
I don't find audio so easily multi-tasks, unless we're using different definitions. My example: I find it very difficult to do something described in an audio or video format - rewire a light switch, say. I find it way easier to have text with a diagram. I can stop and check the text at any time. I find it easier to go back to previous sentences, than to rewind an audio or video.