using DMA for memory-to-memory

You can do a memory-to-memory transfer under control of the DUE DMA, though I'm not sure why you'd want to because memcpy() is faster! Here is simple sketch that compares a DMA memcpy (called memcpy32()) and memset (memset32()) to the library memcpy/memset and a for-loop (dst_=src*) for 32-bit words._
_
http://pastebin.com/UR9DvzKB*_
Earlier, did the same thing on a maple (72Mhz), here are times in microseconds for 1000 32-bit words
*_ <em>*                maple    DUE         loop     269    132            for loop       memcpy32  93     68         DMA       memcpy    62     56       memset32  94     69         DMA       memset    38     41*</em> _*
Looking at the dis-assembled memcpy() for the DUE sketch, you get an unrolled loop (64) of
ldr.w
str.w
and version 1.20 of newlib actually has an ARM memcpy.S that has an unrolled loop with
ldrd
strd

Try using channel 5 :slight_smile: Also, try using ASAP mode.

3
45
memcpy32 31
loop 133
memset32 33
loop 108
memcpy 59
3
memset 39
42424242

Ah, thanks. Those changes do speed things up. I made another sketch, mem2mem2.ino, to test DUEling memcpy's. I added DMA interrupt to sketch and altered the DMA memcpy32() to run asynchronously. I then let the DMA memcpy battle with the library memcpy() by starting up the DMA memcpy32() and immediately starting up the library memcpy(), operating on two separate src/dst 1024 32-bit word vectors.

The DMA copy won the DUEl, finishing first in 34us, and the memcpy() took 80us, total elapsed time 82us. Using SRAM1 for one of the src/dst pairs, resulted in the DMA running in 33us and the memcpy() took 60us, with total elapsed time of 70us.

For the maple, the memcpy() finished first (64us) and the DMA took 122us.

I don't know what magic the DUE MATRIX might provide...

mem2mem2.ino can be found in my DUEZoo