You will never get the latency/bandwidth for good audio if you read small, 32 byte, chunks from two files. Arduinos with 2K of SRAM can read and play one audio file but two files would require more than 2K of buffer.
The big problem is that SD cards were designed to transfer a single stream to/from a contiguous region of the card. When you switch between two areas there can be big latency problems. This latency is not acceptable for audio.
I have written two audio libraries for the Arduino and it is a challenge to just play/record one file without overruns due to latency problems. My libraries record/play uncompressed wave files so they have higher bandwidth requirements than mp3.
You can use any SD shield with SdFat. For quick development, shields from Adafruit or SparkFun would work.
SdFat is here: http://code.google.com/p/sdfatlib/