Ethernet2 (UDP) SPI transfers have a lot of dead time

Been doing a LOT of work and experimentation with the Due SPI/DMA and finally decided to implement all SPI transfers via DMA.

The throughput is... great!

I have an OLED1351 @ 128*128 rgb565 and can fill it at > 20 fps off SD card.

The SD card has a DIV=4 and OLED DIV=5 and they both play nicely.

I found in the low level routines that called write8() it made a large difference as to whether the function was inline or not.

I got probably a 20-30% speedup by forcing inline.

I also broke the write8() functions down into 2 to cut down on overhead of setting the same registers multiple times.

 __INLINE__ uint8_t cDMA_spi_send_do_wait_buffer() 
{
 while (!due_dma_dmac_channel_transfer_done(DUE_DMA_SPI_TX_CH)) {}

 while ((SPI0->SPI_SR & SPI_SR_TXEMPTY) == 0) {}

 // leave RDR empty
 return  SPI0->SPI_RDR;
}

// new routines 8 bit send -- DMA --

__INLINE__ void cDMA_spi_send_again(uint8_t b, bool wait)
{
 __src8 = b;

 DMAC->DMAC_CH_NUM[DUE_DMA_SPI_TX_CH].DMAC_SADDR = (uint32_t)&__src8;
 DMAC->DMAC_CH_NUM[DUE_DMA_SPI_TX_CH].DMAC_CTRLA = DMAC_CTRLA_BTSIZE(1) | DMAC_CTRLA_SRC_WIDTH_BYTE | DMAC_CTRLA_DST_WIDTH_BYTE;
 due_dma_dmac_channel_enable(DUE_DMA_SPI_TX_CH);
 
 if (wait)
 {
 cDMA_spi_send_do_wait_buffer();
 }
}

__INLINE__ void cDMA_spi_send(uint8_t b, bool wait)
{
// due_dma_dmac_channel_disable(DUE_DMA_SPI_TX_CH);
 DMAC->DMAC_CH_NUM[DUE_DMA_SPI_TX_CH].DMAC_DSCR = 0;
 DMAC->DMAC_CH_NUM[DUE_DMA_SPI_TX_CH].DMAC_DADDR = (uint32_t)&SPI0->SPI_TDR;
 DMAC->DMAC_CH_NUM[DUE_DMA_SPI_TX_CH].DMAC_CTRLB = DMAC_CTRLB_SRC_INCR_INCREMENTING | DMAC_CTRLB_SRC_DSCR | DMAC_CTRLB_DST_DSCR | DMAC_CTRLB_FC_MEM2PER_DMA_FC | DMAC_CTRLB_DST_INCR_FIXED;
 DMAC->DMAC_CH_NUM[DUE_DMA_SPI_TX_CH].DMAC_CFG = DMAC_CFG_DST_PER(DUE_DMA_SPI_TX_IDX) | DMAC_CFG_DST_H2SEL | DMAC_CFG_FIFOCFG_ALAP_CFG | DMAC_CFG_SOD; 

 cDMA_spi_send_again(b, wait);
}

[code/]
cDMA_spi_send(b1, false);

// do some stuff here as have some cycles b4 the request ends


cDMA_spi_send_do_wait_buffer(); // now wait

cDMA_spi_send_again(b2, true);
cDMA_spi_send_again(b3, true);

With the ability to control whether to wait or not it allows some work to be done "for free". For the video streamer it means the pixel processing and looping is basically done for free as it's done while I would usually be waiting for the DMA request to end.

Also have the 16 bit send functions that don't require changing modes etc which also made a huge difference.

Just waiting on an AD5330 DAC so I can test the sound output with video. ATM the video is running 2x - 4x the normal speed so confident I should be able to support video with sound.

Made a SPIDevice class I use for all my projects. It will do things like check the DIV every time the chip is selected. However, it's important to only reset the DIV if necessary as it's a costly operation.

bool cDMA_spi_check_div(uintX_t sckDivisor, bool dma)  // check .. really need to do before each send to make sure each device is at correct speed etc
{
	// may be SPI lib or DMA 

	if (dma && last_div_dma != sckDivisor) 
	{
		last_div_dma = sckDivisor;

		SPI0->SPI_CR = SPI_CR_SPIDIS;   //  disable SPI
		SPI0->SPI_CR = SPI_CR_SWRST; // reset SPI
		SPI0->SPI_MR = SPI_PCS(DUE_DMA_SPI_CHIP_SEL) | SPI_MR_MODFDIS | SPI_MR_MSTR; // no mode fault detection, set master mode
		SPI0->SPI_CSR[DUE_DMA_SPI_CHIP_SEL] = SPI_CSR_SCBR((uint8_t)sckDivisor) | SPI_CSR_NCPHA; // mode 0, 8-bit,						

		SPI0->SPI_CR |= SPI_CR_SPIEN; // enable SPI
		
		return true;
	}

	return false;	
}

I suppose the other thing worth mentioning is I totally gutted sdfat to include an external library for all SPI and made it fat32 only. It's about as lean and mean as I can get.