Do you want the peaks? Or RMS or average? For digital recording you generally want the peaks. For "loudness" you generally want RMS or average or possibly A-weighting or EBU R128 analysis, etc.
It's not only crosstalk. The ATmega datasheet says you loose resolution above 15kHz which means you can only sample the signal/audio (accurately) to 7.5kHz (Nyquist sampling theory). With stereo you're down to 3250Hz and with 12 channels you're essentially just occasionally/random sampling the signal.
Most of the energy in real-world audio is in the mid & lower frequencies so you don't always need the highest audio frequencies, but you are going to loose some accuracy with slower sample rates.
Random "slow" sampling will still give you a reasonable indication of level/loudness if that's all you need but you might miss some peaks, and you will need to do some smoothing/averaging (as you would normally do).
I'm not sure about the processing... Calculation 12 averages or 12 RMS values is probably going to chew-up enough time to slow you down even further. Even "finding" the peak is gong to chew-up some processing time (and memory).
I've used peak detectors on my sound activated lighting and it works great! One of my effects is a "giant VU meter", but it's just an effect. It's "calibrated" based on the peak & average values stored in a circular buffer and so it's not calibrated in dB or anything meaningful...
I'm sampling the peak-detector output at about 10Hz which leaves me plenty of time for processing.
You'd need 3 quad op-amps* and the resistors & diodes. I'm not sure if my peak detector is fast-enough to capture a 20kHz peak (it's not important for my application). That depends on the capacitor value and the ability of the op-amp to charge the capacitor.
Oh.. The other complication is if you want accuracy down to zero-volts your op-amp needs bipolar power supplies.
- That's for a positive peak-detector that ignores the negative-half of the waveform.