Go Down

Topic: DSP speed - thought experiment (Read 79 times) previous topic - next topic

Since I've built a little DSP shield for the Arduino Due and am now doing my first digital FIR filters and other stuff with it, I already encounter the limits regarding processing speed. Now I'm in the middle of a little thought experiment...

For processing in real-time, it's quite clear to me that the speed of the ARM Cortex M3 is the limit of what I can do. But if I say that things do not have to happen in real-time and latency is fine, what are the limitations then?

In other words, would it be possible to read a couple of samples (a buffer), process this buffer in more complex ways and then write the buffer out? But still in a continuous flow of buffer blocks (resulting in continuous audio)?

The idea is that I can easily do offline DSP processing. For instance, read a whole file of music, do the processing and write the processed file out again, which does obviously not take longer than the whole song lasts. So maybe I could split, for instance, a song into buffer blocks, process them and have digital processing of a higher quality?

Or is it in any case critical to be able to process one sample in the time it takes to read the next sample (=sample rate)?

For 48kHz sample rate, this would mean ~20┬Ás, which isn't really much compared to the M3s speed of 80MHz! Maybe a few hundred MAC operations at max!?

I'm eager to hear your thoughts on the subject! I couldn't really find satisfying information or methods on this on the internet. Maybe someone knows about books on this topic?

One more thing...

When you're using a USB audio interface, and your PC can't manage to process everything in time, you can increase the buffer size of the interface, which results in a larger latency, but everything works fine again! Can this be applied to embedded DSP too?

Magician

I'd estimate DUE processing through-output in 250-400 ksps.  Working with "block-structure" data is easier, so FFT / FHT may approach 400 ksps, or 200 ksps sampling rate in case of stereo signal, close to 100 ksps - if stereo + 50% overlap (required for windowing).
 Doing sample by sample, FIR / IIR,  so stereo 200, and with 4-TAP filter gives close to 50 ksps.

Thanks for the quick reply! Seems like a reasonable approximation! So the equation "more latency = more processing power" doesn't quite hold, right?

Btw, I really like your quote "per aspera ad astra"! I had to google it I admit though. ;)

Magician

Quote
So the equation "more latency = more processing power" doesn't quite hold, right?
Correct. Gain in performance comes from less interruption calls, in real-time it's ones per sample, and another case ones per block. So equation looks like hyperbolic:
T = 1/( k + m), where k - interrupt overhead, and m - usefull calculation.
T = (N x m) / ( k + N x m). T'd approach "1" in case boundary-less block N->infinity.

I'll try expanding into frequency domain! Well, at least my software... ;)

Paul Stoffregen

In other words, would it be possible to read a couple of samples (a buffer), process this buffer in more complex ways and then write the buffer out? But still in a continuous flow of buffer blocks (resulting in continuous audio)?
That's exactly how I designed the Teensy Audio Library.  It processes audio in 128 sample blocks, which is approx 2.9 ms at 44.1 kHz sample rate.

Unfortunately, it won't run directly on Arduino Due, because it makes heavy use of the Cortex-M4 DSP extensions, which aren't present in the Cortex-M3 processor on Due.  It's also built around Freescale's peripherals and DMA engine, which are different from Atmel's.

But as to your original question, most certainly yes, collecting small blocks of samples works very well.  If you give this library a try, I believe you'll see it's extremely effective.


Go Up