Arduino for this? Or something else?

I guess it depends on the length and / or fidelity with which you want to play the sound.
Maybe investigate an audio shield