You can do a memory-to-memory transfer under control of the DUE DMA, though I'm not sure why you'd want to because memcpy() is faster! Here is simple sketch that compares a DMA memcpy (called memcpy32()) and memset (memset32()) to the library memcpy/memset and a for-loop (dst=src) for 32-bit words.
Earlier, did the same thing on a maple (72Mhz), here are times in microseconds for 1000 32-bit words
loop 269 132 for loop
memcpy32 93 68 DMA
memcpy 62 56
memset32 94 69 DMA
memset 38 41
Looking at the dis-assembled memcpy() for the DUE sketch, you get an unrolled loop (64) of
and version 1.20 of newlib actually has an ARM memcpy.S that has an unrolled loop with
Try using channel 5 :) Also, try using ASAP mode.
Ah, thanks. Those changes do speed things up. I made another sketch, mem2mem2.ino, to test DUEling memcpy's. I added DMA interrupt to sketch and altered the DMA memcpy32() to run asynchronously. I then let the DMA memcpy battle with the library memcpy() by starting up the DMA memcpy32() and immediately starting up the library memcpy(), operating on two separate src/dst 1024 32-bit word vectors.
The DMA copy won the DUEl, finishing first in 34us, and the memcpy() took 80us, total elapsed time 82us. Using SRAM1 for one of the src/dst pairs, resulted in the DMA running in 33us and the memcpy() took 60us, with total elapsed time of 70us.
For the maple, the memcpy() finished first (64us) and the DMA took 122us.
I don't know what magic the DUE MATRIX might provide...
mem2mem2.ino can be found in my DUEZoo