The issue with the sound isn't just the speakers - you could always use headphones, after all. The GBA only has the original GB's primitive PSG (two square waves, a noise channel, and a short programmable 4-bit waveform) plus two 8-bit PCM channels. 8-bit PCM samples are unavoidably noisy with lots of aliasing, and all sound mixing, sequencing, envelopes, etc. for those channels needs to be done in software, which tends to introduce performance and battery life constraints on quality, channel count, effects, and sample rate.
The SNES, by comparison, uses high-quality 16-bit 32kHz samples, and all the places on the GBA where devs may have had to cut corners are done in hardware: eight separate channels, no need for software mixing, built-in envelopes and delay.
Compare the SNES FFVI soundtrack to the GBA version; the difference is dramatic. Frankly, using high quality speakers or headphones just makes the quality difference more obvious.