Pages: [1]   Go Down
Author Topic: using DMA for memory-to-memory  (Read 1355 times)
0 Members and 1 Guest are viewing this topic.
Offline Offline
Jr. Member
**
Karma: 9
Posts: 79
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

You can do a memory-to-memory transfer under control of the DUE DMA, though I'm not sure why you'd want to because memcpy() is faster!  Here is simple sketch that compares a DMA memcpy (called memcpy32()) and memset (memset32()) to the library memcpy/memset and a for-loop (dst=src) for 32-bit words.
  http://pastebin.com/UR9DvzKB

Earlier, did the same thing on a maple (72Mhz), here are times in microseconds for 1000 32-bit words

Code:
               maple    DUE  
       loop     269    132            for loop
       memcpy32  93     68         DMA
       memcpy    62     56
       memset32  94     69         DMA
       memset    38     41


Looking at the dis-assembled memcpy()  for the DUE sketch, you get an unrolled loop (64) of
ldr.w
str.w
and  version 1.20 of newlib actually has an ARM memcpy.S that has an unrolled loop with
ldrd
strd
 
« Last Edit: December 21, 2012, 05:02:02 pm by mantoui » Logged

Offline Offline
God Member
*****
Karma: 32
Posts: 507
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Try using channel 5 smiley Also, try using ASAP mode.

Code:
3
45
memcpy32 31
loop 133
memset32 33
loop 108
memcpy 59
3
memset 39
42424242
« Last Edit: December 21, 2012, 06:17:40 pm by stimmer » Logged


Offline Offline
Jr. Member
**
Karma: 9
Posts: 79
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Ah, thanks.  Those changes do speed things up.  I made another sketch, mem2mem2.ino, to test DUEling memcpy's.  I added DMA interrupt to sketch and altered the DMA memcpy32() to run asynchronously.  I then let the DMA memcpy battle with the library memcpy() by starting up the DMA memcpy32() and immediately starting up the library memcpy(), operating on two separate src/dst 1024 32-bit word vectors.

The DMA copy won the DUEl, finishing first in 34us, and the memcpy() took 80us, total elapsed time 82us.  Using SRAM1 for one of the src/dst pairs, resulted in the DMA running in 33us and the memcpy() took 60us, with total elapsed time of 70us.

For the maple, the memcpy() finished first (64us) and the DMA took 122us.

I don't know what magic the DUE MATRIX might provide...

mem2mem2.ino can be found in my DUEZoo

  https://github.com/manitou48/DUEZoo
 
Logged

Pages: [1]   Go Up
Jump to: