The DS has you dealing with two cores you need to write a firmware for that have to communicate to do anything useful, a cartridge protocol to fetch any extra code or assets that wouldn't all fit into RAM at runtime, instruction and data caches, an MMU, ... And that's without mentioning some of the more complex peripherals like the touch screen and wifi.
All official games used the same firmware for one of the cores, a copy of which is embedded into every single cartridge ROM. There's some homebrew firmwares included in the respective SDKs, but they aren't well documented for standalone use.
Granted, all of the above isn't completely impossible, but if you think of how much code you'd need to get a simple demo (button input, sprite moving across the screen), especially for a beginner, the DS requires a nontrivial amount of code and knowledge to get started without an SDK. Meanwhile, you can do something similar in less than 100 lines of ASM/C for GBA.
Honestly, I think why the GBA is more popular than the DS for that kind of thing is because it only has one screen (much less awkward to emulate), has high-quality emulators that are mostly free of bugs (mGBA most notably), and its aspect ratio is better than the DS anyway (3:2 upscales really well on 16:10 devices). That is to say, it's much easier to emulate GBA software on a phone or a Steam Deck than it is to emulate DS software.
None of this fixes the audio, but it sure gets damn close.
The issue with the sound isn't just the speakers - you could always use headphones, after all. The GBA only has the original GB's primitive PSG (two square waves, a noise channel, and a short programmable 4-bit waveform) plus two 8-bit PCM channels. 8-bit PCM samples are unavoidably noisy with lots of aliasing, and all sound mixing, sequencing, envelopes, etc. for those channels needs to be done in software, which tends to introduce performance and battery life constraints on quality, channel count, effects, and sample rate.
The SNES, by comparison, uses high-quality 16-bit 32kHz samples, and all the places on the GBA where devs may have had to cut corners are done in hardware: eight separate channels, no need for software mixing, built-in envelopes and delay.
Compare the SNES FFVI soundtrack to the GBA version; the difference is dramatic. Frankly, using high quality speakers or headphones just makes the quality difference more obvious.
Though there were some fits and starts there. The N64 for example is, from what I've heard, heavily library dependent and absolutely brutal to program bare metal (GPU "microcode" that was almost like programmable shaders v0.1); even the GameCube is a significant improvement for that kind of thing.