You can look at our BeatVox product for inspiration. It basically does what you are trying to do. You can look at the software and tools associated with it for ideas on how to load sounds into Arduino FLASH memory.--The Flexible MIDI Shield: MIDI IN/OUT, stacking headers, your choice of I/O pins
If you don't want to invest in some audio playing hardware you will be storing uncompressed samples.The first question is: How much total audio do you need to store?The second question is: How much audio bandwidth do you need?Together those would determine how much storage space you need. If you want phone-quality audio you'll need about six thousand bytes per second. If you want music quality you will need more like forty thousand bytes per second. If you have a minute of audio that comes to 360 Kb to 2.4 Mb. The Atmega328p has 32 Kb of FLASH so it looks like there is no sense in even attempting to store audio there unless: A) Your total audio is very short and B) you don't mind crappy audio quality. I think you should go straight to an SD card or, perhaps, an add-on EEPROM chip.