Audio Reactive LED Strips Are Diabolically Hard

upvote

Audio Reactive LED Strips Are Diabolically Hard

(scottlawsonbc.com)

161 points

by surprisetalk1 days ago |

upvote

by doctorhandshake5 hours ago|

[-]

I like this writeup but I feel like the title doesn't really tell you what it's about ... to me it's about creativity within constraints.

The author finds, as many do, that naive or first-approximation approaches fail within certain constraints and that more complex methods are necessary to achieve simplicity. He finds, as I have, that perceptual and spectral domains are a better space to work in for things that are perceptual and spectral than in the raw data.

What I don't see him get to (might be the next blog post, IDK), is getting into constraints in the use of color - everything is in 'rainbow town' as we say, and it's there that things get chewy.

I'm personally not a fan of emissive green LED light in social spaces. I think it looks terrible and makes people look terrible. Just a personal thing, but putting it into practice with these sorts of systems is challenging as it results in spectral discontinuities and immediately requires the use of more sophisticated color systems.

I'm also about maximum restraint in these systems - if they have flashy tricks, I feel they should do them very very rarely and instead have durational and/or stochastic behavior that keeps a lot in reserve and rewards closer inspection.

I put all this stuff into practice in a permanent audio-reactive LED installation at a food hall/ nightclub in Boulder: https://hardwork.party/rosetta-hall-2019/

reply

upvote

by scottlawson5 hours ago|

[-]

I didn't go into much detail about it but there's a whole rabbit hole of color theory and color models. For example, the spectrum effect assigns different colors to different frequency bins, but also adjusts the assignment over time to avoid a static looking effect. It does this by rotating a "color angle" kind of like the HSL model.

I really like your LED installation in Rosetta Hall, it looks beautiful!

reply

upvote

by doctorhandshake4 hours ago|

[-]

Thanks! Great article - would like to read one about the color rabbit hole pls ;)

reply

upvote

by PaulHoule5 hours ago|

[-]

Yeah, "diabolical" overstates it. It isn't a wicked problem

https://en.wikipedia.org/wiki/Wicked_problem

Kinda funny but I am a fan of green LED light to supplement natural light on hot summer days. I can feel the radiant heat from LED lights on my bare skin and since the human eye is most sensitive to green light I feel the most comfortable with my LED strip set to (0,255,0)

reply

upvote

by scottlawson5 hours ago|

[-]

I'd actually argue it has some wicked problem characteristics. The input space is enormous (all possible audio), perception is subjective and nonlinear, and there's no objective function to optimize against, only "does this feel right?". Every solution you try reframes what "good" means. It's not as hard as social planning but is way harder than it sounds, no pun intended.

reply

upvote

by jcelerier15 minutes ago|

[-]

isn't it the exact same problem than "making a good movie" or "making a good book" ? this is just thoroughly subjective.

When the author says:

> Every commercial audio reactive LED strip I've seen does this badly. They use simple volume detection or naive FFTs and call it a day. They don't model human perception on either side, which is why they all look the same.

well no, if they sell, then they are doing just fine until someone comes up with the $next $thing

reply

upvote

by PaulHoule4 hours ago|

[-]

Ever seen https://www.youtube.com/watch?v=oNyXYPhnUIs ? There are a lot of things people might think feels right.

(Note both the scanner in front of KITT and the visual FX on his dashboard when he speaks, which changes from season to season.)

reply

upvote

by fragmede4 hours ago|

[-]

fta: The biggest unsolved problem is making it work well on all kinds of music.

The wickedness comes from wanting something that works just as well for John Summit as the Grateful Dead as Mozart and Bad Bunny.

But it seems like you could cheat for installations where the type of music is known and go from there. The other cheat is to have a "tap" button, and to pull that data and go from there.

mental note: the thought "it can't be that hard" when obviously it is sent me down a rabbit hole for a couple of hours

reply

upvote

by WarmWash5 hours ago|

[-]

The real killer is that humans don't hear frequencies, they hear instruments, which are a stack of frequencies that roughly sometimes correlate with a frequency range.

I wonder if transformer tech is close to achieving real-time audio decoding, where you can split a track into it's component instruments, and light show off of that. Think those fancy Christmas time front yard light shows as opposed to random colors kind of blinking with what maybe is a beat.

reply

upvote

by adzm4 hours ago|

[-]

real time audio stem separation is already possible, some specific models can even get around 20ms latency (HS-TasNet) https://github.com/lucidrains/HS-TasNet

There was a nice paper with an overview last year too https://arxiv.org/html/2511.13146v1 that introduced RT-STT which is still being tweaked and built upon in the MSS scene

The high quality ones like MDXNet and Demucs usually have at least several seconds of latency though, but for something like displaying visuals high quality is not really needed and the real time approaches should be fine.

reply

upvote

by omneity2 hours ago|

[-]

I'm pretty sure it should be possible to distill HS-TasNet into a version approximate and fast enough for the purpose of animating LEDs.

At the end it's "just" chunking streamed audio into windows and predicting which LEDs a window should activate. One can build a complex non-realtime pipeline, generate high-quality training data with it, and then train a much smaller model (maybe even an MLP) with it to predict just this task.

reply

upvote

by iamjackg6 hours ago|

[-]

Scott's work is amazing.

Another related project that builds on a similar foundation: https://github.com/ledfx/ledfx

reply

upvote

by aleksiy1233 hours ago|

[-]

Fun I actually did a similar project during my time at UVic 10 years ago but it was a hoodie.

https://youtu.be/-LMZxSWGLSQ

I remember thinking really hard on what to do with color. Except like you say mine is pretty much a naive fft.

https://github.com/aleksiy325/PiSpectrumHoodie?tab=readme-ov...

Thanks for reminding me.

reply

upvote

by mdrzn8 hours ago|

[-]

Always been very interested in audio-reactive led strips or led bulbs, I've been using a Windows app to control my LIFX lights for years but lately it hasn't been maintained and it won't connect to my lights anymore.

I tried recreating the app (and I can connect via BT to the lights) but writing the audio-reactive code was the hardest part (and I still haven't managed to figure out a good rule of thumb or something). I mainly use it when listening to EDM or club music, so it's always a classic 4/4 110-130bpm signature, yet it's hard to have the lights react on beat.

reply

upvote

by mechsy33 minutes ago|

[-]

Yeah in a similar project getting line passthrough or similar to work (matching sampling frequencies etc.) to get a clean signal for the FFT proved much harder than setting up eg the ESP32 side of things. But it’s a lot of fun to play around accumulating values in the frequency buckets while trying to get the reactivity tradeoff right. Just don’t look directly into the LEDs in a dark room, maybe that’s a bit dangerous.

reply

upvote

by menno-dot-ai6 hours ago|

[-]

Woow, this was my first hardware project right around the time it released! I remember stapling a bunch of LED strips around our common room and creating a case for the pi + power supply by drilling a bunch of ventilation + cable holes in a wooden box.

And of course, by the time I got it to work perfectly I never looked at it again. As is tradition.

reply

upvote

by scottlawson5 hours ago|

[-]

That's awesome to hear! Sometimes the journey is the destination, its a great project to get started with electronics.

reply

upvote

by rustyhancock8 hours ago|

[-]

More than 20 years ago or so I made a small LED display that used a series of LM567 (frequency detection ICs) and LM3914 (bar chart drivers) to make a simple histogram for music.

It was fiddly, and probably too inaccurate for a modern audience but I can't claim it was diabolically hard. Tuning was a faff but we were more willing to sit and tweak resistor and capacitor values then.

reply

upvote

by cwillu1 hours ago|

[-]

That would be “The Naive FFT”:

“Most people who attempt audio reactive LED strips end up somewhere around here, with a naive FFT method. It works well enough on a screen, where you have millions of pixels and can display a full spectrogram with plenty of room for detail. But on 144 LEDs, the limitations are brutal. On an LED strip, you can't afford to "waste" any pixels and the features you display need to be more perceptually meaningful.”

reply

upvote

by JKCalhoun7 hours ago|

[-]

I made a decent audio visualizer using the MSGEQ7 [1]. It buckets a count for seven audio frequency ranges—an Arduino would poll on every loop. It looks like the MSGEQ7 is not a standard part any longer unfortunately.

(And it looks like the 7 frequencies are not distributed linearly—perhaps closer to the mel scale.)

I tried using one of the FFT libraries on the Arduino directly but had no luck. The MSGEQ7 chip is nice.

[1] https://cdn.sparkfun.com/assets/d/4/6/0/c/MSGEQ7.pdf

reply

upvote

by empyrrhicist6 hours ago|

[-]

Have you ever seen anything like a MSGEQ14 or equivalent? It would be cool to go beyond 7 in such a simple-to-use chip, but I haven't seen one.

reply

upvote

by JKCalhoun1 minutes ago|

[-]

No, I have not.

reply

upvote

by milleramp5 hours ago|

[-]

This guy has been making music controlled LED items, boxes and wrist bands. https://www.kickstarter.com/projects/markusloeffler/lumiband...

reply

upvote

by londons_explore7 hours ago|

[-]

The mel spectrum is the first part of a speech recognition pipeline...

But perhaps you'd get better results if more of a ML speech/audio recognition pipeline were included?

Eg. the pipeline could separate out drum beats from piano notes, and present them differently in the visualization?

An autoencoder network trained to minimize perceptual reconstruction loss would probably have the most 'interesting' information at the bottleneck, so that's the layer I'd feed into my LED strip.

reply

upvote

by akhudek2 hours ago|

[-]

I've done this in my own solution in this space (https://thundergroove.com). I use a realtime beat detection neural network combined with similar frequency spectrum analyses to provide a set of signals that effects can use.

Effects themselves are written in embedded Javascript and can be layered a bit like photoshop. Currently it only supports driving nanoleaf and wled fixtures, though wled gives you a huge range of options. The effect language is fully exposed so you can easily write your own effects against the real-time audio signals.

It isn't open source though, and still needs better onboarding and tutorials. Currently it's completely free, haven't really decided on if I want to bother trying to monetize any of it. If I were to it would probably just be for DMX and maybe midi support. Or maybe just for an ecosystem of portable hardware.

reply

upvote

by calibas6 hours ago|

[-]

I was playing around with this recently, but the problem I encountered is that most AI analysis techniques like stem separation aren't built to work in real-time.

reply

upvote

by panki277 hours ago|

[-]

Had a similar setup based on an Arduino, 3 hardware filters (highs/mids/lows) for audio and a serial connection. Serial was used to read the MIDI clock from a DJ software.

This allowed the device to count the beats, and since most modern EDM music is 4/4 that means you can trigger effects every time something "changes" in the music after synching once.

reply

upvote

by JKCalhoun7 hours ago|

[-]

"3 hardware filters…"

The classic "Color Organ" from the 70's.

reply

upvote

by copypaper4 hours ago|

[-]

This is awesome! I did a similar project in college for one of my classes and ran into the same exact walls as you.

- The more filters I added the worse it got. A simple EMA with smoothing gave the best results. Although, your pipeline looks way better than what I came up with!

- I ended up using the Teensy 4.0 which let me do real time FFT and post processing in less than 10ms (I want to say it was ~1ms but I can't recall; it's been a while). If anyone goes down this path I'd heavily recommend checking out the teensy. It removes the need for a raspi or computer. Plus, Paul is an absolute genius and his work is beyond amazing [1].

- I started out with non-addressable LEDs also. I attempted to switch to WS2812's as well, but couldn't find a decent algorithm to make it look good. Yours came out really well! Kudos.

- Putting the leds inside of an LED strip diffuser channel made the biggest difference. I spent so long trying to smooth it out getting it to look good when a simple diffuser was all I needed (I love the paper diffuser you made).

RE: What's Still Missing: I came to a similar conclusion as well. Manually programmed animation sequences are unparalleled. I worked as a stagehand in college and saw what went into their shows. It was insane. I think the only way to have that same WOW factor is via pre-processing. I worked on this before AI was feasible, but if I were to take another stab at it I would attempt to do it with something like TinyML. I don't think real time is possible with this approach. Although, maybe you could buffer the audio with a slight delay? I know what I'll be doing this weekend... lol.

Again, great work. To those who also go down this rabbit hole: good luck.

[1]: https://www.pjrc.com/

reply

upvote

by MomsAVoxell1 hours ago|

[-]

> I think the future of audio visualization on LED strips will involve a mixture of experts tuned for different genres, likely using neural networks.

I think its more likely going to come from a direct integration with existing synthesis methods, but .. I’m kind of biased when it comes to audio and light synthesizers, having made a few of each…

We have addressed this expert tuning issue with the MagicShifter, which is a product not quite competing with the OP’s work, but very much aligned with it[1]:

https://magicshifter.net/

.. which is a very fun little light synthesizer capable of POV rendering, in-air text effects, light sequencer programming, MIDI, and so on .. plus, has a 6dof sensor enabling some degree of magnetometers, accelerometers, touch-sensing and so on .. so you can use it for a lot of great things. We have a mode “BEAT” that you can place on a speaker and get reactive LED strips of a form (quite functional) pretty much micro-mechanically, as in: through the case and thus the sensor, not an ADAC, not processing audio - but the levers in between the sensor and the audio source. So - not quite the same, but functionally equivalent in the long-rung (plus the magicshifter is battery powered and pocketable, and you can paint your own POV images and so on, but .. whatever..)

The thing is, the limits: yes, there are limits - but like all instruments you need to tune to/from/with those limits. It’s not so much that achieving perfect audio reactive LED’s is diabolically hard, but rather making aesthetically/functionally relevant decisions about when to accept those limits requires a bit of gumption.

Humans can be very forgiving with LED/light-based interfaces, if you stack things right. The aesthetics of the thing can go a long way towards providing a great user experience .. and in fact, is important to giving it.

[1] - (okay, you can power a few meters of LED strips with a single MagicShifter, so maybe it is ‘competition’, but whatever..)

reply

upvote

by itintheory1 hours ago|

[-]

> https://magicshifter.net/

I get a cert mismatch on that site, and when clicking the shop link I end up at https://hackerspaceshop.com/ which is advertising an online fax service.

reply

upvote

by nsedlet3 hours ago|

[-]

I also attempted to do real-time audio visualizations with LED strips. What was unsatisfying is that the net effect always seemed to be: the thing would light up with heavy beats and general volume. But otherwise the visual didn't FEEL like the music. This is the same issue I always had with the Winamp visualizations back in the day.

To solve this I tried pre-processing the audio, which only works with recordings obviously. I extract the beats and the chords (using Chordify). I made a basic animation and pulsed the lights to the beat, and mapped the chords to different color palettes.

Some friends and I rushed it to put it together as a Burning Man art project and it wasn't perfect, but by the time we launched it felt a lot closer to what I'd imagined. Here's a grainy video of it working at Burning Man: https://www.youtube.com/watch?v=sXVZhv_Xi0I

It works pretty well with most songs that you pick. Just saying there's another way to go somewhere between (1) fully reactive to live audio, and (2) hand designed animations.

I don't think there's an easy bridge to make it work with live audio though unfortunately.

reply

upvote

by blobbers1 hours ago|

[-]

Am I the only one who was surprised the obvious answer is to map frequencies to notes and basically turn your LED strip into a piano visualization? Then just norm to strip size?

There’s plenty of visual experiments of pianists doing this “rock band” “guitar hero” style visualization of notes.

reply

upvote

by serf2 hours ago|

[-]

the hard part is dousing a room in pulsing bright colorful LEDs tastefully.

I haven't seen that done yet. I think it's one of those Dryland myths.

reply

upvote

by wolvoleo5 hours ago|

[-]

Thanks for this! Exactly the thing I'm struggling with now. Making decent visualisation for music based on ESP32-S3.

reply

upvote

by 8cvor6j844qw_d67 hours ago|

[-]

Are these available commercially for consumers?

reply

upvote

by leptons2 hours ago|

[-]

There are plenty of LED strips with audio controllers that work pretty well. I've used them in a few projects. Just go look at Amazon, you can get them for pretty cheap.

reply

upvote

by p0w3n3d8 hours ago|

[-]

IANAE but I would go for electric circuit, not electronic software that steers the led. I think that nowadays, with the LLM support it can be easier and better to optimise it for the sake of latency.

reply

upvote

by mrob7 hours ago|

[-]

If you want minimum latency, you want the input side of an traditional vocoder, not an FFT. This is the part that splits the modulator signal into frequency bands and puts each one through an envelope follower. Instead of using the outputs of the envelope followers to modulate the equivalent frequency bands of a carrier signal, you can use them to drive the visualizer circuit.

That can be done with analog electronics, but even half an analog vocoder needs a lot of parts. It's going to be cheaper and more reliable to simulate it in software. This uses entirely IIR filters, which are computationally cheap and calculated one sample at a time, so they have the minimum possible latency. I'd be curious if any LLM actually recognizes that an audio visualizer is half a vocoder instead of jumping straight to the obvious (and higher latency) FFT approach.

reply

upvote

by avisser6 hours ago|

[-]

For recorded music, you could always buffer however many milliseconds of audio to account for the processing.

reply

upvote

by IshKebab4 hours ago|

[-]

It's not that hard. I did a real-time version of the Beatroot algorithm decades ago that worked pretty well for being such a simple algorithm.

reply

upvote

by askl8 hours ago|

[-]

Interesting. I'm currently in the process of building something with a audio reactive LED strip but didn't come across this project yet. The WLED [1] ESP32 firmware seems to be able to do something similar or potentially more though.

[1] https://kno.wled.ge/

Edit: Oh wait, that project needs a PC or Raspberry PI for audio processing. WLED does everything on the ESP32.

reply

upvote

by turbine4018 hours ago|

[-]

Check out the MoonModules fork/variant of WLED too, it has much better audio reactive user mods and visualisation options https://mm.kno.wled.ge/ than the main project.

And yea, I agree with the article. In my past I've also dabbled in audioreactive for LEDs and it's fiendishly difficult to make anything interesting.

Make it react too much, and it's chaos, and inversely when the algorithm reacts less the audio, it's boring.

And in all cases it's really not easy to see what the leds are doing in correspondence to all the complexity of music.

reply

upvote

by stavros8 hours ago|

[-]

Yeah WLED does it fine, I've built a few and it works well.

reply

upvote

by MrBuddyCasino6 hours ago|

[-]

WLED is decent but tbh the lag is very noticeable. Did you compare to this python thing?

reply

upvote

by askl5 hours ago|

[-]

No, haven't tried it.

For my use case I want something fully portable and battery powered anyways. So the audio stuff should happen on the ESP32. (Or on my phone, that might work too)

reply

upvote

by ssl-31 hours ago|

[-]

Eh, it's probably OK either way. People have been saying since day 1 that Raspberry Pis are not low-power devices and they're probably right.

Everything is relative, though. In terms of maximums, a Pi 4 (for example) can use up to about 7 Watts under load by itself, which adds up fast when operating on batteries.

But a single 1 meter string of 144 WS2812B LEDs can suck down up to around 43 Watts, and 43 is a lot more than 7. :)

Lighting rigs are thirsty. The processing (even if it's the whole Pi) is generally a small drop in the bucket.

reply

upvote

by tensor2 hours ago|

[-]

It's pretty easy to run a pi on a battery.

reply

upvote

by mockbolt6 hours ago|

[-]

[flagged]

reply

upvote

by isoprophlex6 hours ago|

[-]

Are you using multiple accounts to post the same comment?!

reply

upvote

by kbouck6 hours ago|

[-]

[flagged]

reply

upvote

by m3kw96 hours ago|

[-]

how is it hard, do a A to D, add a filter, do compute, then do D to A.

reply

upvote

by kennywinker5 hours ago|

[-]

Not hard to do, hard to do well. Hiding all complexity with a hand wavey “do compute” doesn’t make that bit easy

reply

upvote

by m3kw95 hours ago|

[-]

Yeah i get it, the details are hard.

reply

upvote

by cogman106 hours ago|

[-]

The article covers that.

In short, audio and visual perception do not map perfectly. Humans don't have a linear perception of either so a perfect A to D then D to A conversion yields unsatisfying results.

reply