undefined

points

[-]

I'm pretty sure it should be possible to distill HS-TasNet into a version approximate and fast enough for the purpose of animating LEDs.

At the end it's "just" chunking streamed audio into windows and predicting which LEDs a window should activate. One can build a complex non-realtime pipeline, generate high-quality training data with it, and then train a much smaller model (maybe even an MLP) with it to predict just this task.