real music (not tones)

Here's the problem ... to play back real music you need to sample it, like how CDs are made. To get reasonable quality you need to sample at twice the frequency you want to have as your maximum, so for CD quality that is 40000 samples per second. At 8 bits per sample that is 40 Kb every second. But the Arduino only has 2 Kb of RAM and 32 Kb of program memory.

Either way, that's just a fraction of a second.

You might do it with extra storage (eg. an SD card) and a digital-to-analog converter, which is probably what these shields have.

Another more practical route is to play MIDI, because then you just have to send "play middle C for 1 second" which only takes a few bytes, and let the synth do the actual sounds.