The fastest way to move a block of RAM?

I need to swap 2, 10.8k blocks of RAM back and forth in a hurry. What's the fastest way to do that? Is there some sort of block move? (This is on a teensy3,2)

Right now I'm using a for loop one byte at a time.

Thanks!

-jim lee

https://www.cplusplus.com/reference/cstring/memmove/ ?

2 Likes

Thanks! That is exactly the kind of thing that I was looking for!

-jim lee

memmove() is not super smart, usual implementation is just walking through the bytes

void * memmove (void *dest, const void *src, size_t len)
{
  char *d = dest;
  const char *s = src;
  if (d < s)
    while (len--)
      *d++ = *s++;
  else
    {
      char *lasts = s + (len-1);
      char *lastd = d + (len-1);
      while (len--)
        *lastd-- = *lasts--;
    }
  return dest;
}

have you considered just swapping the pointers to the memory regions? (if the design allows for this)

The Teensy 3.2 got 16 DMA channels. Maybe utilize that for memory to memory copies

+1
I've used DMA on T3.2 just for that purpose.

Why not simply swap the pointers? Moving that much data from point A to point B seems to me infinitely avoidable...

Because I need a copy. One is the primary and the other is the srcatch pad.

First I copy the primary to the scratch pad. Then using the primary I make changes that go on the scratch pad so as to not disturb the primary. Once all the changes are complete, I can swap the pointers, but in the beginning of the next cycle, I still need to make the copy.

Its the Game of Life algorithm running on a 240x360 TFT display.

-jim lee

That sounds promising. How do you access that?

-jim lee

Start by studying the DMA-related chapters in the processor's datasheet. The Teensy core includes a set of class declarations in 'DMAChannel.h' that takes away some of the burden of dealing directly with the hardware. But, it's still kind of a slog. I spent a lot of time going between the datasheet and the source code of some of the classes in the Teensy Audio Library. That library uses DMA extensively … for example see how it handles the I2S interfaces. What you want to do is quite similar. Except, instead of copying between RAM and memory-mapped I/O peripherals, you'll be copying from one section of memory to another.

memcpy() is likely to be slightly faster than memmove() or bcopy(), because it's not required to be "defined" for overlapping source and destination. But it looks like the compiler might be smart enough to do the "they don't overlap decision at compile-time anyway...

memcpy() on ARM CM3 is pretty optimized - it has an unrolled loop of 32bit moves, for the portion of the memory that is 32bit aligned.

00081c98 <memcpy>:
   81c98:	4684      	mov	ip, r0
   81c9a:	ea41 0300 	orr.w	r3, r1, r0
   81c9e:	f013 0303 	ands.w	r3, r3, #3
   81ca2:	d149      	bne.n	81d38 <memcpy+0xa0>
   81ca4:	3a40      	subs	r2, #64	; 0x40
   81ca6:	d323      	bcc.n	81cf0 <memcpy+0x58>
   81ca8:	680b      	ldr	r3, [r1, #0]
   81caa:	6003      	str	r3, [r0, #0]
   81cac:	684b      	ldr	r3, [r1, #4]
   81cae:	6043      	str	r3, [r0, #4]
   81cb0:	688b      	ldr	r3, [r1, #8]
   81cb2:	6083      	str	r3, [r0, #8]
   81cb4:	68cb      	ldr	r3, [r1, #12]
   81cb6:	60c3      	str	r3, [r0, #12]
      :
3 Likes

Ok, sorry guys I was barking up the wrong tree entirely. I ASSUMED the issue was these big data block moves. I just timed a memcpy() and it's around 178 micro-seconds. Heck I could do 50 of these and not notice. The bottleneck must be somewhere else.

-jim lee

Remove the delay(500);

:slight_smile:

(kidding)

1 Like

Ha! Totally!

Turns out.. Each check takes 4-6 microseconds. 320 X 240 of them ends up taking nearly 500 ms. This is the limit of my "clever" so I don't know how to speed it up.

Here's the code for anyone that would be interested. Just so you know what you been helping me with.

#include <adafruit_1947_Obj.h> 
#include <screen.h> 

#define GRID_X  240
#define GRID_Y  320

byte  grid[GRID_X/8][GRID_Y];
byte  tempGrid[GRID_X/8][GRID_Y];
long  bytes = GRID_X/8 * GRID_Y;

void setup() {
   
   Serial.begin(57600);
   Serial.println("Hello?");
   if (!initScreen(ADAFRUIT_1947,ADA_1947_SHIELD_CS,PORTRAIT)) {
     Serial.println(F("Screen failed, halting program."));
     while(true);                  //  kill the program.
   }
   screen->fillScreen(&black);
   clearGrid();
   randomSeed(analogRead(A0));
   //randomFill(10000);               // Put something out there to watch.
   
   setGrid(grid,13,12,true);  // Flyer
   setGrid(grid,14,13,true);
   setGrid(grid,15,13,true);
   setGrid(grid,13,14,true);
   setGrid(grid,14,14,true);

   setGrid(grid,34,44,true);  // Med spaceship
   setGrid(grid,35,44,true);
   setGrid(grid,36,44,true);
   setGrid(grid,37,44,true);
   setGrid(grid,38,44,true);
   setGrid(grid,33,45,true);
   setGrid(grid,38,45,true);
   setGrid(grid,38,46,true);
   setGrid(grid,33,47,true);
   setGrid(grid,37,47,true);
   setGrid(grid,35,48,true);

   setGrid(grid,63,70,true);  // Blinker. Toad?
   setGrid(grid,64,70,true);
   setGrid(grid,65,70,true);
   setGrid(grid,64,71,true);
   setGrid(grid,65,71,true);
   setGrid(grid,66,71,true);
   
   paintGrid(true);
}


void randomFill(int numDots) {

   int x;
   int y;
   
   for (int i=0;i<numDots;i++) {
      x = random(0,GRID_X);
      y = random(0,GRID_Y);
      setGrid(grid,x,y,true);
   }
}


void clearGrid(void) {

   byte* gPtr;

   gPtr = (byte*) &grid[0][0];
   for (int i=0;i<GRID_X/8 * GRID_Y;i++) {
      gPtr[i] = 0;
   }
}


void gridToTemp(void) {

   void* gPtr;
   void* tPtr;

   gPtr = (void*) &grid[0][0];
   tPtr = (void*) &tempGrid[0][0];
   memcpy(tPtr,gPtr,bytes);
}


void tempToGrid(void) {
  
   byte* gPtr;
   byte* tPtr;

   gPtr = (byte*) &grid[0][0];
   tPtr = (byte*) &tempGrid[0][0];
   memcpy(gPtr,tPtr,bytes);
}


void  setGrid(byte grid[][GRID_Y],int x,int y,bool value) {

   int   xIndex;
   byte  xBit;
   
   if (x<0 || x>=GRID_X) return;
   if (y<0 || y>=GRID_Y) return;
   xIndex = x>>3;
   xBit = x - (xIndex<<3);
   if (value) {
      bitSet(grid[xIndex][y],xBit);
   } else {
      bitClear(grid[xIndex][y],xBit);
   }
}


bool getGrid(byte grid[][GRID_Y],int x,int y) {
   
   int   xIndex;
   byte  xBit;
   bool  result;

   result = false;
   if (x<0 || x>=GRID_X) return result;
   if (y<0 || y>=GRID_Y) return result;
   xIndex = x>>3;
   xBit = x - (xIndex<<3);
   result = (bool) bitRead(grid[xIndex][y],xBit);
   return result;
}



// Any live cell with two or three live neighbours survives.
// Any dead cell with three live neighbours becomes a live cell.
// All other live cells die in the next generation.
// Similarly, all other dead cells stay dead.
bool updatePoint(int x,int y) {

   byte  numLiveCells;
   int   xMinus;
   int   xPlus;
   
   numLiveCells = 0;
   xMinus = x-1;
   xPlus = x+1;
   y--;
   numLiveCells = numLiveCells + (int)getGrid(grid,xMinus,y);
   numLiveCells = numLiveCells + (int)getGrid(grid,x,y);
   numLiveCells = numLiveCells + (int)getGrid(grid,xPlus,y);
   y++;
   numLiveCells = numLiveCells + (int)getGrid(grid,xMinus,y);
   numLiveCells = numLiveCells + (int)getGrid(grid,xPlus,y);
   y++;
   numLiveCells = numLiveCells + (int)getGrid(grid,xMinus,y);
   numLiveCells = numLiveCells + (int)getGrid(grid,x,y);
   numLiveCells = numLiveCells + (int)getGrid(grid,xPlus,y);
   y--;
   if (getGrid(grid,x,y)) {
      if (numLiveCells == 2 || numLiveCells == 3) {
         return true;
      } else {
         return false;
      }
   } else {
      if (numLiveCells == 3) {
         return true;
      }
   }
   return false;
}


void stepTime(void) {

   bool  result;
   
   gridToTemp();
   for (int y=0;y<GRID_Y;y++) {
      for (int x=0;x<GRID_X;x++) {
          result = updatePoint(x,y);
          setGrid(tempGrid,x,y,result);
      }
   }
}


void paintGrid(bool fullGrid) {

   bool  tempGridVal;
   
   if (fullGrid) {
      for (int y=0;y<GRID_Y;y++) {
         for (int x=0;x<GRID_X;x++) { 
             if (getGrid(grid,x,y)) {
               screen->drawPixel(x,y,&white);
             } else {
               screen->drawPixel(x,y,&black);
             }
         }
      }
   } else {
     for (int y=0;y<GRID_Y;y++) {
         for (int x=0;x<GRID_X;x++) { 
             tempGridVal = getGrid(tempGrid,x,y);
             if (getGrid(grid,x,y)!=tempGridVal) {
                if (tempGridVal) {
                  screen->drawPixel(x,y,&white);
                } else {
                  screen->drawPixel(x,y,&black);
                }
             }
         }
      }
      tempToGrid(); 
   }
}


void loop() {
   
   stepTime();
   paintGrid(false);
}

-jim lee

Hi Jim. I've been meaning to play with Conway's Life for a while, so your code was a welcome jumping off point. I fiddled with it a bit and left comments at the top to explain what I changed. I'd be interested to hear if any of that helps with performance on your Teensy.

All I did was re-state your code in a slightly different way. The algorithm is effectively unchanged. There are many articles about more clever approaches to iterating Life. It's just a matter how much time you want to invest in this. :slight_smile:

Sorry to not get back to you earlier. Around the time you posted this I ended up in the ER with a detached retina. (Eye falling to bits) This led to eye surgery and well.. Time passed. Now I'm back trying to type with one eye blurry and the other in rehab.

I'm glad someone got some fun out of that code I wrote. I really don't know why I wrote it. I guess I was bored and it was a challenge for me to see if I A) could do it and B) How fast I could get it running.

-jim lee

Hi Jim, hope you’ll recover quickly

1 Like

Good lord! I still mess up this forum interface. I thought that was a direct message to stavs as a reply to his. Didn't mean to broadcast it.

Anyway, thanks! I think I'm on the mend. Although they can save the retina, it seems that the operation kills off your lens. So, within a year, it seems the'll have to replace that as well. I guess I'm lucky that we live in a time when they can do this stuff.

-jim lee

Sorry I pinpoint it then - hang in there, technology does miracle indeed

All the best with your vision, Jim.

Eventually this code made it onto my Teensy 4 and 128x64 RGB panel. I don't think I've seen Life at 240 frame-per-second before, so thanks for that.

Conway

1 Like