I think the choice of microcontroller could be very important here if you want good quality audio (16-bit 44.1kHz). In that case, you would need an external DAC (Digital to Analog Converter). You will also need to communicate with that DAC, typically through a form of communication called I2S (not all microcontrollers support this). Also if your microcontroller is fast enough such as the ARM M4f or the ESP32 you will be able to decode the MP3 file with the software and without a separate chip. This will cut costs and make the hardware component easier but will also make the software component harder.Just for the audio component, I would use the following 1. Fast microcontroller for software MP3 decoding and digital audio communication.2. Sd card breakout to hold files3. I2S DAC for producing a line out signal you can connect to an existing speaker.