Using ADC with DMA

This is a continuation of the earlier post, speeding up analogread() at the Arduino Zero.

In that post, the author provides code that continuously executes ADC conversions, and places each sample in DMA, all at a 2 usec rate. For this to be useful, three capabilities are needed:

  1. The main sketch is allowed to periodically extract a current sample, from one of the many in the DMA, and on demand, and while the DMA is filling with additional samples.
  2. The ADC routine does not compete with the main sketch for microprocessor cycles (i.e. the main sketch runs independently of the ADC). This effectively means that the microprocessor executes two threads simultaneously and independently -- a multi-thread processor.
  3. Multiple analog inputs.

Is this true. How is it setup to do so?

If not, then the scheme is merely to fill a buffer with ADC samples, wait the time needed to fill the DMA, and THEN, process them in the main sketch.

Hi sthudium,

Yes, this is true.

The SAMD21 on the Arduino Zero has a 12 channel DMAC (Direct Memory Access Controller). The DMAC can be used to move data from memory to memory, peripheral to memory, memory to peripheral or peripheral to peripheral, independently of the CPU.

This means the DMAC can autonomously read and write data to/from the ADC, DAC, TCC/TC timers, serial ports, SPI and I2C peripherals to/from the SAMD21's memory. The DMAC is especially useful in situations that requires a large amount of data, such as a display's frame buffer, or for peripherals that require constant attention without constantly interrupting the CPU with interrupt service routines (ISR).

I found mantoui's excellent DMA code on github really helpful: GitHub - manitou48/ZERO.

At the heart of the DMAC is the descriptor. This is a data structure held in SRAM that the microcontroller uses to control the DMA transfer:

typedef struct {
    uint16_t btctrl;
    uint16_t btcnt;
    uint32_t srcaddr;
    uint32_t dstaddr;
    uint32_t descaddr;
} dmacdescriptor ;
volatile dmacdescriptor wrb[12] __attribute__ ((aligned (16)));
dmacdescriptor descriptor_section[12] __attribute__ ((aligned (16)));
dmacdescriptor descriptor __attribute__ ((aligned (16)));

It describes:

btcntrl - (block transfer control) the type of data to be sent/received, be it BYTE, HWORD (half word) or WORD known as the beat size and whether the source or destrination address is incremented during each read, (useful for reading or writing from/to sequential memory locations)

btcnt - (block transfer count) the number of BYTEs, HWORDs or WORDs to transfer

srcaddr - the source address of the data to be transfered, (you actually enter the address the data at the end of your data block: source address + data size in bytes)

destaddr - the destination address of the data to be transfered, (you actually enter the address the data at the end of your data block: destination address + data size in bytes)

descaddr - the address of the next descriptor, which allows descriptors to be chained as a linked list so that they can be executed sequentially, the last descriptor in the list is loaded with address 0, (so if you're using only one descriptor for the transfer descaddr is 0)

In the code above the "descriptor" declaration holds the current transfer descriptor.

There are 12 "descriptor_section[]" array elements, one for each of the DMAC's 12 channels, but you can have many more than this by linking the descriptors together, hence why the they're stored in SRAM. This allows you to chain a number of different sequential transfers, perhaps reading then writing, all independent of the CPU.

The wrb[] (write back) descriptor array is used to hold a descriptor in the event that the current transfer is interrupted by a DMA transfer with a higher priory (see below). It's holds the descriptor pending the completion of a higher priorty transfer, whereupon the wrb is copied back to the descriptor to continue the transfer.

The other DMAC register worthy of note is the CHCTRLB (Channel Control B) register. This controls the channel's priority level, the trigger source and the trigger action.

DMAC->CHCTRLB.reg = DMAC_CHCTRLB_LVL(0) |
DMAC_CHCTRLB_TRIGSRC(ADC_DMAC_ID_RESRDY) | DMAC_CHCTRLB_TRIGACT_BEAT;

In terms of priority the DMAC channels go in ascending order from 0 the highest to 12 the lowest. However, in addition each channel can be assigned a priory level that goes (confusingly) in descending order from 3 the highest to 0 the lowest that overrides the channel number ordering. The channel selection is determined by the DMAC's arbiter.

The trigger source is the interrupt or event that causes a transfer to occur, this could be for example a timer overflow (OVF), or serial port's receive complete (RXC), or on the other hand it could be an event or software trigger from your loop().

The trigger action describes the block size of the data that's transfered for each trigger, either BEAT, BLOCK or TRANSACTION. If you're transfering to/from peripherals this is usually set to BEAT.

Optionally the DMAC's interrupt sevice routine can be called, should CPU intervention be required at any stage:

void DMAC_Handler() {}

This can be used to service for example the DMAC's interrupt flags.

Once the DMAC and its descriptors have been set-up, a transfer can be initiated by simply selecting the DMAC channel and enabling it in the CHCTRLA (Channel Control A) register:

DMAC->CHID.reg = DMAC_CHID_ID(0);
DMAC->CHCTRLA.reg |= DMAC_CHCTRLA_ENABLE;

These two lines can be used repeatedly to start the transfer, (without having to initialise the descriptor each time).

The SAMD21's datasheet provides a full description of the DMAC's operation.

All in all the DMAC can be a powerful tool that can significantly maximise your CPU's performance. I guess the only downside is that its implementation is tied to the microcontroller, making the code less portable between processors.

Thanks, MArtinL... that's an elegant solution.

Also, could the original scheme work if the DMA address range is limited to only one address. I think that would work, as long as the ADC-DMA operation runs parallel to the sketch code.

Yes, by default the DMA address range is limited to one address. If for example you're logging data from the ADC RESULT register to the SAMD21's memory, you only need to increment the SAMD21's memory (destination) address each time the ADC is read.

If you take mantoui's "adcdma.ino" code:

DMAC->CHCTRLB.reg = DMAC_CHCTRLB_LVL(0) |
DMAC_CHCTRLB_TRIGSRC(ADC_DMAC_ID_RESRDY) | DMAC_CHCTRLB_TRIGACT_BEAT;
descriptor.descaddr = 0;
descriptor.srcaddr = (uint32_t) &ADC->RESULT.reg;
descriptor.btcnt =  hwords;
descriptor.dstaddr = (uint32_t)rxdata + hwords*2;   // end address
descriptor.btctrl =  DMAC_BTCTRL_BEATSIZE_HWORD | DMAC_BTCTRL_DSTINC | DMAC_BTCTRL_VALID;
memcpy(&descriptor_section[chnl],&descriptor, sizeof(dmacdescriptor));

Note that the data is read as a 16-bit HWORD (half word) from the ADCs result register (ADC->RESULT.reg). The DMAC copies the data from the ADC's result register to memory each time it receives a ADC_DMAC_ID_RESRDY (result ready) trigger and that the destination (memory) address is incremented each time (DMAC_BTCTRL_DSTINC).

So it's possible to specify addresses from one to one, one to many, many to one or many to many depending upon the situation.

I have tried manitou48's ADC DMA code and trying to write the time taken to fill the DMA to verify that it is 2 usec per sample. It appears to work, but then appears to randomly stop working. Not sure why. Sometimes it will run for 2 or 3 seconds, sometimes for 15, sometimes for a 60.

The code added is below which prints the time to fill the buffer and the value at ADC[0]. Any thoughts why this keeps halting? I want it to run continuously.

void setup(){
Serial.begin(9600);
analogWriteResolution(10);
analogWrite(A0,64); // test with DAC
adc_init();
dma_init();
}

void loop() {
uint32_t t;

t = micros();
adc_dma(adcbuf,HWORDS);
while(!dmadone); // await DMA done isr
t = micros() - t;
Serial.print(t); Serial.print(" us ");
Serial.println(adcbuf[0]);
delay(250);
}

dacarriere, I ran into this same issue. I did not have time to fully explore the problem, but I suspect it has do to with a sync issue between the DMAC and ADC. The "dmadone" flag is never set by the DMAC_handler(), meaning that DMAC->CHINTFLAG.reg is never returning true. I also observed that this tends to occur more often when there is not an active voltage source on the A1 pin. A simple fix is to change

while(!dmadone);

to

while(!dmadone && micros() - t < 2*HWORDS + 13 );

I did a regression to get the timing equation, and it seems to scale well for most values of HWORDS.

Additionally, if the DMAC fails to set its channel interrupt register, subsequent adc_dma() calls will grab garbage values, which makes me suspect it is unable to sync with the ADC correctly. If this happens, calling adc_init() again will reset the ADC and the DMAC works correctly again. Anyone else have insight into what causes this issue? I'm a bit flummoxed about the root cause.

I had the same problems regarding dmadone. Therefore I had a look on the contents of the result array adcbuf (cleared before enabling dma) if dmadone was not set in DMAC_Handler(). Always only the first entry of the buffer was not zero, so it seems that the dma transfer was never triggered by the ADCs RESRDY signal after enabling the dma. This points indeed to a sync problem as stevewells20 suspected.

It turned out that a single read of the adc result just before starting the dma (at the end of adc_dma() in manitou48's code) can fix this:

    while (ADC->INTFLAG.bit.RESRDY == 0);  
    uint16_t value = ADC->RESULT.reg;      
    ADCsync();                             

    DMAC->CHCTRLA.reg |= DMAC_CHCTRLA_ENABLE;

I'm currently running the code for more than two hours at a sampling time of 2.5µs (loop time 2.56 ms). So far only in three instances I got a timeout (dmadone was not set). Thus it is still possible that the dma transfer is not successfully started. In these very rare cases, however, the call to adc_dma() can simply be repeated.

Awesome, glad to see progress on this example! nappo's code allows manitou48's example to work for me with out the delay in the main loop (instead of crashing when removed).

I'm seeing performance issues with this code with respect to real(ish) time audio. I'm using it to pull in microphone samples and graphing the peak to peak values of sample blocks. When I use analogRead, I get pretty responsive output at a decent resolution (anywhere from 256-2k samples at at time). (I'm snapping my fingers next to the mic and seeing large crisp spikes on my graph). When I graph the output with this code, I notice it misses a lot of samples. (Many snaps don't register spikes). I need to move HWORDS down to something like 64 to get a responsive output that graphs all my snaps. That's too low for my use case, I'd like to fft larger blocks.

Does anybody have any ideas on why this code would miss so many samples? Is nappo's fix waiting too long to return?

I was able to resolve my issues by raising the prescaler and sample control by a small amount and double buffering the adc data I was processing. I also switched to using Adafruit's ZeroDMA since it's bundled with my board's libraries (feather M0) and defines its own DMAC_Handler. It's available here: https://github.com/adafruit/Adafruit_ZeroDMA/blob/master/examples/zerodma_adc/zerodma_adc_example.ino
It'd be great to have another pair of eyes look it over.

1 Like