It does depend on the audio bandwidth you are looking for and the delay times.
Some years ago I worked on a system for digitizing voice (approx 100Hz to 3kHz). I used a Dual Slope Delta Modulation technique to digitise the incoming audio - the thing about it is it digitises to a bit stream.
There are some largish serail ram chips - the serial stream could be packaged into 8 bits written in using SPI and read out from a different address in order to get the delay.
I think an arduino could handle this.
Mike