I need to swap 2, 10.8k blocks of RAM back and forth in a hurry. What's the fastest way to do that? Is there some sort of block move? (This is on a teensy3,2)
Right now I'm using a for loop one byte at a time.
Thanks!
-jim lee
I need to swap 2, 10.8k blocks of RAM back and forth in a hurry. What's the fastest way to do that? Is there some sort of block move? (This is on a teensy3,2)
Right now I'm using a for loop one byte at a time.
Thanks!
-jim lee
Thanks! That is exactly the kind of thing that I was looking for!
-jim lee
memmove() is not super smart, usual implementation is just walking through the bytes
void * memmove (void *dest, const void *src, size_t len)
{
char *d = dest;
const char *s = src;
if (d < s)
while (len--)
*d++ = *s++;
else
{
char *lasts = s + (len-1);
char *lastd = d + (len-1);
while (len--)
*lastd-- = *lasts--;
}
return dest;
}
have you considered just swapping the pointers to the memory regions? (if the design allows for this)
The Teensy 3.2 got 16 DMA channels. Maybe utilize that for memory to memory copies
+1
I've used DMA on T3.2 just for that purpose.
Why not simply swap the pointers? Moving that much data from point A to point B seems to me infinitely avoidable...
Because I need a copy. One is the primary and the other is the srcatch pad.
First I copy the primary to the scratch pad. Then using the primary I make changes that go on the scratch pad so as to not disturb the primary. Once all the changes are complete, I can swap the pointers, but in the beginning of the next cycle, I still need to make the copy.
Its the Game of Life algorithm running on a 240x360 TFT display.
-jim lee
That sounds promising. How do you access that?
-jim lee
Start by studying the DMA-related chapters in the processor's datasheet. The Teensy core includes a set of class declarations in 'DMAChannel.h' that takes away some of the burden of dealing directly with the hardware. But, it's still kind of a slog. I spent a lot of time going between the datasheet and the source code of some of the classes in the Teensy Audio Library. That library uses DMA extensively … for example see how it handles the I2S interfaces. What you want to do is quite similar. Except, instead of copying between RAM and memory-mapped I/O peripherals, you'll be copying from one section of memory to another.
memcpy() is likely to be slightly faster than memmove() or bcopy(), because it's not required to be "defined" for overlapping source and destination. But it looks like the compiler might be smart enough to do the "they don't overlap decision at compile-time anyway...
memcpy() on ARM CM3 is pretty optimized - it has an unrolled loop of 32bit moves, for the portion of the memory that is 32bit aligned.
00081c98 <memcpy>:
81c98: 4684 mov ip, r0
81c9a: ea41 0300 orr.w r3, r1, r0
81c9e: f013 0303 ands.w r3, r3, #3
81ca2: d149 bne.n 81d38 <memcpy+0xa0>
81ca4: 3a40 subs r2, #64 ; 0x40
81ca6: d323 bcc.n 81cf0 <memcpy+0x58>
81ca8: 680b ldr r3, [r1, #0]
81caa: 6003 str r3, [r0, #0]
81cac: 684b ldr r3, [r1, #4]
81cae: 6043 str r3, [r0, #4]
81cb0: 688b ldr r3, [r1, #8]
81cb2: 6083 str r3, [r0, #8]
81cb4: 68cb ldr r3, [r1, #12]
81cb6: 60c3 str r3, [r0, #12]
:
Ok, sorry guys I was barking up the wrong tree entirely. I ASSUMED the issue was these big data block moves. I just timed a memcpy() and it's around 178 micro-seconds. Heck I could do 50 of these and not notice. The bottleneck must be somewhere else.
-jim lee
Remove the delay(500);
(kidding)
Ha! Totally!
Turns out.. Each check takes 4-6 microseconds. 320 X 240 of them ends up taking nearly 500 ms. This is the limit of my "clever" so I don't know how to speed it up.
Here's the code for anyone that would be interested. Just so you know what you been helping me with.
#include <adafruit_1947_Obj.h>
#include <screen.h>
#define GRID_X 240
#define GRID_Y 320
byte grid[GRID_X/8][GRID_Y];
byte tempGrid[GRID_X/8][GRID_Y];
long bytes = GRID_X/8 * GRID_Y;
void setup() {
Serial.begin(57600);
Serial.println("Hello?");
if (!initScreen(ADAFRUIT_1947,ADA_1947_SHIELD_CS,PORTRAIT)) {
Serial.println(F("Screen failed, halting program."));
while(true); // kill the program.
}
screen->fillScreen(&black);
clearGrid();
randomSeed(analogRead(A0));
//randomFill(10000); // Put something out there to watch.
setGrid(grid,13,12,true); // Flyer
setGrid(grid,14,13,true);
setGrid(grid,15,13,true);
setGrid(grid,13,14,true);
setGrid(grid,14,14,true);
setGrid(grid,34,44,true); // Med spaceship
setGrid(grid,35,44,true);
setGrid(grid,36,44,true);
setGrid(grid,37,44,true);
setGrid(grid,38,44,true);
setGrid(grid,33,45,true);
setGrid(grid,38,45,true);
setGrid(grid,38,46,true);
setGrid(grid,33,47,true);
setGrid(grid,37,47,true);
setGrid(grid,35,48,true);
setGrid(grid,63,70,true); // Blinker. Toad?
setGrid(grid,64,70,true);
setGrid(grid,65,70,true);
setGrid(grid,64,71,true);
setGrid(grid,65,71,true);
setGrid(grid,66,71,true);
paintGrid(true);
}
void randomFill(int numDots) {
int x;
int y;
for (int i=0;i<numDots;i++) {
x = random(0,GRID_X);
y = random(0,GRID_Y);
setGrid(grid,x,y,true);
}
}
void clearGrid(void) {
byte* gPtr;
gPtr = (byte*) &grid[0][0];
for (int i=0;i<GRID_X/8 * GRID_Y;i++) {
gPtr[i] = 0;
}
}
void gridToTemp(void) {
void* gPtr;
void* tPtr;
gPtr = (void*) &grid[0][0];
tPtr = (void*) &tempGrid[0][0];
memcpy(tPtr,gPtr,bytes);
}
void tempToGrid(void) {
byte* gPtr;
byte* tPtr;
gPtr = (byte*) &grid[0][0];
tPtr = (byte*) &tempGrid[0][0];
memcpy(gPtr,tPtr,bytes);
}
void setGrid(byte grid[][GRID_Y],int x,int y,bool value) {
int xIndex;
byte xBit;
if (x<0 || x>=GRID_X) return;
if (y<0 || y>=GRID_Y) return;
xIndex = x>>3;
xBit = x - (xIndex<<3);
if (value) {
bitSet(grid[xIndex][y],xBit);
} else {
bitClear(grid[xIndex][y],xBit);
}
}
bool getGrid(byte grid[][GRID_Y],int x,int y) {
int xIndex;
byte xBit;
bool result;
result = false;
if (x<0 || x>=GRID_X) return result;
if (y<0 || y>=GRID_Y) return result;
xIndex = x>>3;
xBit = x - (xIndex<<3);
result = (bool) bitRead(grid[xIndex][y],xBit);
return result;
}
// Any live cell with two or three live neighbours survives.
// Any dead cell with three live neighbours becomes a live cell.
// All other live cells die in the next generation.
// Similarly, all other dead cells stay dead.
bool updatePoint(int x,int y) {
byte numLiveCells;
int xMinus;
int xPlus;
numLiveCells = 0;
xMinus = x-1;
xPlus = x+1;
y--;
numLiveCells = numLiveCells + (int)getGrid(grid,xMinus,y);
numLiveCells = numLiveCells + (int)getGrid(grid,x,y);
numLiveCells = numLiveCells + (int)getGrid(grid,xPlus,y);
y++;
numLiveCells = numLiveCells + (int)getGrid(grid,xMinus,y);
numLiveCells = numLiveCells + (int)getGrid(grid,xPlus,y);
y++;
numLiveCells = numLiveCells + (int)getGrid(grid,xMinus,y);
numLiveCells = numLiveCells + (int)getGrid(grid,x,y);
numLiveCells = numLiveCells + (int)getGrid(grid,xPlus,y);
y--;
if (getGrid(grid,x,y)) {
if (numLiveCells == 2 || numLiveCells == 3) {
return true;
} else {
return false;
}
} else {
if (numLiveCells == 3) {
return true;
}
}
return false;
}
void stepTime(void) {
bool result;
gridToTemp();
for (int y=0;y<GRID_Y;y++) {
for (int x=0;x<GRID_X;x++) {
result = updatePoint(x,y);
setGrid(tempGrid,x,y,result);
}
}
}
void paintGrid(bool fullGrid) {
bool tempGridVal;
if (fullGrid) {
for (int y=0;y<GRID_Y;y++) {
for (int x=0;x<GRID_X;x++) {
if (getGrid(grid,x,y)) {
screen->drawPixel(x,y,&white);
} else {
screen->drawPixel(x,y,&black);
}
}
}
} else {
for (int y=0;y<GRID_Y;y++) {
for (int x=0;x<GRID_X;x++) {
tempGridVal = getGrid(tempGrid,x,y);
if (getGrid(grid,x,y)!=tempGridVal) {
if (tempGridVal) {
screen->drawPixel(x,y,&white);
} else {
screen->drawPixel(x,y,&black);
}
}
}
}
tempToGrid();
}
}
void loop() {
stepTime();
paintGrid(false);
}
-jim lee
Hi Jim. I've been meaning to play with Conway's Life for a while, so your code was a welcome jumping off point. I fiddled with it a bit and left comments at the top to explain what I changed. I'd be interested to hear if any of that helps with performance on your Teensy.
All I did was re-state your code in a slightly different way. The algorithm is effectively unchanged. There are many articles about more clever approaches to iterating Life. It's just a matter how much time you want to invest in this.
Sorry to not get back to you earlier. Around the time you posted this I ended up in the ER with a detached retina. (Eye falling to bits) This led to eye surgery and well.. Time passed. Now I'm back trying to type with one eye blurry and the other in rehab.
I'm glad someone got some fun out of that code I wrote. I really don't know why I wrote it. I guess I was bored and it was a challenge for me to see if I A) could do it and B) How fast I could get it running.
-jim lee
Hi Jim, hope you’ll recover quickly
Good lord! I still mess up this forum interface. I thought that was a direct message to stavs as a reply to his. Didn't mean to broadcast it.
Anyway, thanks! I think I'm on the mend. Although they can save the retina, it seems that the operation kills off your lens. So, within a year, it seems the'll have to replace that as well. I guess I'm lucky that we live in a time when they can do this stuff.
-jim lee
Sorry I pinpoint it then - hang in there, technology does miracle indeed
All the best with your vision, Jim.
Eventually this code made it onto my Teensy 4 and 128x64 RGB panel. I don't think I've seen Life at 240 frame-per-second before, so thanks for that.