Is this even possible? (~7 second audio delay)

The first thing you would need to do is figure out what sampling rate you want to use to record the incoming signal. Then, determine if the ADC on the Arduino is fast enough to collect that number of samples. For any reasonable level of accuracy for sampling audio data, it is not.

Then, you need to determine how much space it will take to store 7 seconds of samples. For any reasonable level of accuracy for sampling audio data, that's a lot of samples. Far more than the poor little Arduino can hold in memory at one time. Even the Mega 2560 only has 8K of SRAM for all global variables, constants, and stack space.

Then, you need to convert the sampled/stored input data into some useful format.

If you want to chop 3.5 seconds out of the data being streamed out, that leaves you with 3.5 seconds of data. If you chop another 3.5 seconds, you are no longer storing anything. How do commercial devices handle this?

Break the legs on the last one you have, so it doesn't walk off, too.