Honestly, I think why the GBA is more popular than the DS for that kind of thing is because it only has one screen (much less awkward to emulate), has high-quality emulators that are mostly free of bugs (mGBA most notably), and its aspect ratio is better than the DS anyway (3:2 upscales really well on 16:10 devices). That is to say, it's much easier to emulate GBA software on a phone or a Steam Deck than it is to emulate DS software.
None of this fixes the audio, but it sure gets damn close.
The issue with the sound isn't just the speakers - you could always use headphones, after all. The GBA only has the original GB's primitive PSG (two square waves, a noise channel, and a short programmable 4-bit waveform) plus two 8-bit PCM channels. 8-bit PCM samples are unavoidably noisy with lots of aliasing, and all sound mixing, sequencing, envelopes, etc. for those channels needs to be done in software, which tends to introduce performance and battery life constraints on quality, channel count, effects, and sample rate.
The SNES, by comparison, uses high-quality 16-bit 32kHz samples, and all the places on the GBA where devs may have had to cut corners are done in hardware: eight separate channels, no need for software mixing, built-in envelopes and delay.
Compare the SNES FFVI soundtrack to the GBA version; the difference is dramatic. Frankly, using high quality speakers or headphones just makes the quality difference more obvious.