I'm not sure I have any answers to this, but I do have technical explanations as to what's going on -- which may help you decide what course of action to take...
Gapless playback of MP3 is basically impossible. (Wait -- there's a caveat coming!) Now, I don't mean "it's impossible on a WaveShield", I mean, the format isn't designed to care about gaps on the order of ms to either side of the audio. MP3 is broken into frames that last a certain number of ms in time. But the sound doesn't start at the beginning of the frame. Like many other perceptual audio coding formats, audio is converted from time domain to frequency domain, so it's described in terms of FFT, not samples. There's some feeding required before sound can start, and there's some cleanup required to transition from sound to silence. Additionally, frames are dependent on each other, which complicates things further.
You can get around this by using a mechanism like the LAME tag, which knows and records the number of audio samples in the source material. Then, the encoder can describe how many samples to throw away at the start of playback (to account for the delay incurred by the encoding), and how many to throw away at the end (to accurately match the length of the source material). To benefit from this, the decoder must have enough intelligence to analyze the metadata frames, then treat the MP3 stream as a whole (by acknowledging the first and last frames and discarding useless decoded samples) rather than just an arbitrary sequence of frames (which is how most decoder ICs will handle it.) Even if you were to use a dedicated decoder IC that gives you back PCM samples to feed to a separate DAC, doing the stream processing on a microcontroller would be difficult since you have so little RAM to split between parsing the frames and tags, and buffering frames to be sent to the decoder, and samples that arrive back from it ... not to mention whatever else you intended to do.
Since most decoder ICs just want single frames or a certain number of bytes at a time, you could just avoid the cookie-cutter file-based playback libraries and handle file reading and buffering on your own. If you carefully craft your MP3s, you can, in theory, send an endless stream of frames to the decoder. The loop gap will be limited to just what is inherent in the encoded frames, and "seamless" transitions from file to file would have a similar constraint. Gaps would be fairly short and you might be able to tolerate them depending on the nature of the sound. Certainly less intrusive than the MP3 delay plus the time to empty the playback buffers, read a new file, fill the buffers, and begin playback.
If sample-based formats (wav, etc.) are an option, you could do the same trick and get completely gapless playback. Simple PCM wave files have very basic headers that are pretty easy to parse, but if your files are all the same format (e.g., 8-bit, 22kHz, mono), you could skip the wav format complete, dump raw samples to your SD card and hard-code the sample properties during decoder initialization.

