Can this array rotation be better?

I'm playing with the charlieplexed matrix on the R4WiFi and it's pretty cool. The way it works the leds are numbered in 8 rows of 12 from 0 to 95 and those are represented as bits in three 32 bit unsigned variables. Imagine the 3 32 bit variables as the same 96 bits as if they were layed out as a 12*8 array of bits.

For the thing I wanted to display, I wanted to work with 12 rows of 8 instead of 8 rows of 12. So I need to rotate my array of bytes into that 32 bit format.

Here's the code I used. It works. But I'm betting there's a way to do it other than bit by bit. Or maybe a better way to iterate through it?

Just wondering what you wonderful folks might come up with.

uint32_t frame[3] = { 0, 0, 0 };

// 12 rows of 8.  MSB is bottom LSB is top.  so MSB of rot_frame[11] is pixel 96 at bottom right and LSB of rot_frame[0] is pixel 1.    
byte rot_frame[12] = {0,0,0,0,0,0,0,0,0,0,0,0};

void rotateFrame(){
  for (int i=0; i<3; i++){
    frame[i] = 0;
  }
  for(int i=0; i<96; i++){
    if(rot_frame[i%12] & (1<<(7-(i/12)))){
      frame[i/32] |= (1ul << (31 - (i%32)));
    }
  }
}

TIA

You might optimise it slightly by using bitwise operations directly on the frame array without iterating through each pixel one by one and using more bitwise operation rather than divide and modulo (although the optimiser might get you that for free).

uint32_t frame[3] = { 0, 0, 0 };
uint8_t rot_frame[12] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };

void rotateFrame() {
  for (uint8_t i = 0; i < 3; i++)  frame[i] = 0;
  
  for (uint8_t i = 0; i < 12; i++) {
    uint32_t row_data = 0;
    for (uint8_t j = 0; j < 8; j++) {
      if (rot_frame[i] & (1 << (7 - j))) {
        row_data |= (1ul << (j * 4));
      }
    }
    
    // Calculate index and shift directly
    frame[i >> 2] |= (row_data << ((i & 3) << 3));
  }
}

It performs bitwise operations on entire rows at once (32 bits at a time) rather than individual pixels where you iterated through each pixel, which required multiple bit shifts and write operations for each pixel

not sure it will be faster as optimisers do a pretty good job those days

1 Like

I ran this on an ESP32 architecture

uint32_t frame[3] = { 0, 0, 0 };

// 12 rows of 8.  MSB is bottom LSB is top.  so MSB of rot_frame[11] is pixel 96 at bottom right and LSB of rot_frame[0] is pixel 1.
byte rot_frame[12] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};

void rotateFrame1() {
  for (int i = 0; i < 3; i++) {
    frame[i] = 0;
  }
  for (int i = 0; i < 96; i++) {
    if (rot_frame[i % 12] & (1 << (7 - (i / 12)))) {
      frame[i / 32] |= (1ul << (31 - (i % 32)));
    }
  }
}


void rotateFrame2() {
  for (uint8_t i = 0; i < 3; i++)  frame[i] = 0;

  for (uint8_t i = 0; i < 12; i++) {
    uint32_t row_data = 0;
    for (uint8_t j = 0; j < 8; j++) {
      if (rot_frame[i] & (1 << (7 - j))) { // on 8 bits platform need 1ul probably 
        row_data |= (1ul << (j * 4)); // could do j << 2 but the optimiser probably does it
      }
    }

    // Calculate index and shift directly
    frame[i >> 2] |= (row_data << ((i & 3) << 3));
  }
}

void setup() {
  // put your setup code here, to run once:
  Serial.begin(115200);
  Serial.println("TESTING!"); Serial.flush();

  delay(1000); // in  case something is going on

  uint32_t chrono = micros();
  rotateFrame1();
  uint32_t deltaT1 = micros() - chrono;

  chrono = micros();
  rotateFrame2();
  uint32_t deltaT2 = micros() - chrono;

  Serial.print("∆t1 = "); Serial.println(deltaT1);
  Serial.print("∆t2 = "); Serial.println(deltaT2);

}

void loop() {}

it tells me

TESTING!
∆t1 = 186
∆t2 = 56

so it seems 3x better but I'm not sure this can be 100% trusted

Edit this code is buggy

1 Like

I can't think of one. Only (potentially) faster/more efficient ways to do it bit by bit. But, as @J-M-L says, the optimiser is pretty good, so you might not save much.

Is faster better? Or some other criteria like shortness or readability of code?

What is the "thing" that you want to display ?

Rather than converting the array on the fly could it be converted once and saved ?

I just thought of one, but...

It would require a huge look-up table. 12 x 256 x 3 x 4 = 36KB

EDIT: could probably reduce that to a 3KB lookup table, using some extra bit-shifting...

code in post 3 - if technically equivalent (I've not proofed it) - seems to bring a 3 fold improvement
maybe it's good enough?

@Delta_G didn't explain what "good" is. One ask was that it should not be, if possible, bit-by-bit.

@Delta_G 's code is better, if better == faster.

indeed
let's wait for @Delta_G to come back

hum, my example was running 3x faster

This is my attempt, not as fast as the code from post #3, but that code does not appear to produce the same values in the frame array as the original.

void rotateFrame1() {
  //no need to clear frame, all currents contents will be shifted out
  for (size_t i = 8; i-- > 0;) {
    for (size_t j = 0; j < 12; j++) {
      frame[0] = frame[0] << 1;
      frame[0] |= frame[1] >> 31;
      frame[1] = frame[1] << 1;
      frame[1] |= frame[2] >> 31;
      frame[2] = frame[2] << 1;
      frame[2] |= (rot_frame[j] >> i) & 0x01;
    }
  }
}

< edit >
Same code, just written a bit more compactly.

void rotateFrame() {
  //no need to clear frame, all currents contents will be shifted out
  for (size_t i = 8; i-- > 0;) {
    for (size_t j = 0; j < 12; j++) {
      frame[0] = (frame[0] << 1) | (frame[1] >> 31);
      frame[1] = (frame[1] << 1) | (frame[2] >> 31);
      frame[2] = (frame[2] << 1) | ((rot_frame[j] >> i) & 0x01);
    }
  }
}
1 Like

Probably as fast as you will get (and not what the OP wanted) but a lot of code (and a bit of fun with a text editor to produce).

void rotateFrame3() {
  frame[0] = 0;
  frame[0] |= rot_frame[0] >> 7;
  frame[0] = frame[0] << 1;
  frame[0] |= rot_frame[1] >> 7;
  frame[0] = frame[0] << 1;
  frame[0] |= rot_frame[2] >> 7;
  frame[0] = frame[0] << 1;
  frame[0] |= rot_frame[3] >> 7;
  frame[0] = frame[0] << 1;
  frame[0] |= rot_frame[4] >> 7;
  frame[0] = frame[0] << 1;
  frame[0] |= rot_frame[5] >> 7;
  frame[0] = frame[0] << 1;
  frame[0] |= rot_frame[6] >> 7;
  frame[0] = frame[0] << 1;
  frame[0] |= rot_frame[7] >> 7;
  frame[0] = frame[0] << 1;
  frame[0] |= rot_frame[8] >> 7;
  frame[0] = frame[0] << 1;
  frame[0] |= rot_frame[9] >> 7;
  frame[0] = frame[0] << 1;
  frame[0] |= rot_frame[10] >> 7;
  frame[0] = frame[0] << 1;
  frame[0] |= rot_frame[11] >> 7;
  frame[0] = frame[0] << 1;
  frame[0] |= rot_frame[0] >> 6 & 0x01;
  frame[0] = frame[0] << 1;
  frame[0] |= rot_frame[1] >> 6 & 0x01;
  frame[0] = frame[0] << 1;
  frame[0] |= rot_frame[2] >> 6 & 0x01;
  frame[0] = frame[0] << 1;
  frame[0] |= rot_frame[3] >> 6 & 0x01;
  frame[0] = frame[0] << 1;
  frame[0] |= rot_frame[4] >> 6 & 0x01;
  frame[0] = frame[0] << 1;
  frame[0] |= rot_frame[5] >> 6 & 0x01;
  frame[0] = frame[0] << 1;
  frame[0] |= rot_frame[6] >> 6 & 0x01;
  frame[0] = frame[0] << 1;
  frame[0] |= rot_frame[7] >> 6 & 0x01;
  frame[0] = frame[0] << 1;
  frame[0] |= rot_frame[8] >> 6 & 0x01;;
  frame[0] = frame[0] << 1;
  frame[0] |= rot_frame[9] >> 6 & 0x01;;
  frame[0] = frame[0] << 1;
  frame[0] |= rot_frame[10] >> 6 & 0x01;;
  frame[0] = frame[0] << 1;
  frame[0] |= rot_frame[11] >> 6 & 0x01;;
  frame[0] = frame[0] << 1;
  frame[0] |= rot_frame[0] >> 5 & 0x01;
  frame[0] = frame[0] << 1;
  frame[0] |= rot_frame[1] >> 5 & 0x01;
  frame[0] = frame[0] << 1;
  frame[0] |= rot_frame[2] >> 5 & 0x01;
  frame[0] = frame[0] << 1;
  frame[0] |= rot_frame[3] >> 5 & 0x01;
  frame[0] = frame[0] << 1;
  frame[0] |= rot_frame[4] >> 5 & 0x01;
  frame[0] = frame[0] << 1;
  frame[0] |= rot_frame[5] >> 5 & 0x01;
  frame[0] = frame[0] << 1;
  frame[0] |= rot_frame[6] >> 5 & 0x01;
  frame[0] = frame[0] << 1;
  frame[0] |= rot_frame[7] >> 5 & 0x01;

  frame[1] = 0;
  frame[1] |= rot_frame[8] >> 5 & 0x01;;
  frame[1] = frame[1] << 1;
  frame[1] |= rot_frame[9] >> 5 & 0x01;;
  frame[1] = frame[1] << 1;
  frame[1] |= rot_frame[10] >> 5 & 0x01;;
  frame[1] = frame[1] << 1;
  frame[1] |= rot_frame[11] >> 5 & 0x01;;
  frame[1] = frame[1] << 1;
  frame[1] |= rot_frame[0] >> 4 & 0x01;
  frame[1] = frame[1] << 1;
  frame[1] |= rot_frame[1] >> 4 & 0x01;
  frame[1] = frame[1] << 1;
  frame[1] |= rot_frame[2] >> 4 & 0x01;
  frame[1] = frame[1] << 1;
  frame[1] |= rot_frame[3] >> 4 & 0x01;
  frame[1] = frame[1] << 1;
  frame[1] |= rot_frame[4] >> 4 & 0x01;
  frame[1] = frame[1] << 1;
  frame[1] |= rot_frame[5] >> 4 & 0x01;
  frame[1] = frame[1] << 1;
  frame[1] |= rot_frame[6] >> 4 & 0x01;
  frame[1] = frame[1] << 1;
  frame[1] |= rot_frame[7] >> 4 & 0x01;
  frame[1] = frame[1] << 1;
  frame[1] |= rot_frame[8] >> 4 & 0x01;;
  frame[1] = frame[1] << 1;
  frame[1] |= rot_frame[9] >> 4 & 0x01;;
  frame[1] = frame[1] << 1;
  frame[1] |= rot_frame[10] >> 4 & 0x01;;
  frame[1] = frame[1] << 1;
  frame[1] |= rot_frame[11] >> 4 & 0x01;;
  frame[1] = frame[1] << 1;
  frame[1] |= rot_frame[0] >> 3 & 0x01;
  frame[1] = frame[1] << 1;
  frame[1] |= rot_frame[1] >> 3 & 0x01;
  frame[1] = frame[1] << 1;
  frame[1] |= rot_frame[2] >> 3 & 0x01;
  frame[1] = frame[1] << 1;
  frame[1] |= rot_frame[3] >> 3 & 0x01;
  frame[1] = frame[1] << 1;
  frame[1] |= rot_frame[4] >> 3 & 0x01;
  frame[1] = frame[1] << 1;
  frame[1] |= rot_frame[5] >> 3 & 0x01;
  frame[1] = frame[1] << 1;
  frame[1] |= rot_frame[6] >> 3 & 0x01;
  frame[1] = frame[1] << 1;
  frame[1] |= rot_frame[7] >> 3 & 0x01;
  frame[1] = frame[1] << 1;
  frame[1] |= rot_frame[8] >> 3 & 0x01;;
  frame[1] = frame[1] << 1;
  frame[1] |= rot_frame[9] >> 3 & 0x01;;
  frame[1] = frame[1] << 1;
  frame[1] |= rot_frame[10] >> 3 & 0x01;;
  frame[1] = frame[1] << 1;
  frame[1] |= rot_frame[11] >> 3 & 0x01;;
  frame[1] = frame[1] << 1;
  frame[1] |= rot_frame[0] >> 2 & 0x01;
  frame[1] = frame[1] << 1;
  frame[1] |= rot_frame[1] >> 2 & 0x01;
  frame[1] = frame[1] << 1;
  frame[1] |= rot_frame[2] >> 2 & 0x01;
  frame[1] = frame[1] << 1;
  frame[1] |= rot_frame[3] >> 2 & 0x01;

  frame[2] = 0;
  frame[2] |= rot_frame[4] >> 2 & 0x01;
  frame[2] = frame[2] << 1;
  frame[2] |= rot_frame[5] >> 2 & 0x01;
  frame[2] = frame[2] << 1;
  frame[2] |= rot_frame[6] >> 2 & 0x01;
  frame[2] = frame[2] << 1;
  frame[2] |= rot_frame[7] >> 2 & 0x01;
  frame[2] = frame[2] << 1;
  frame[2] |= rot_frame[8] >> 2 & 0x01;;
  frame[2] = frame[2] << 1;
  frame[2] |= rot_frame[9] >> 2 & 0x01;;
  frame[2] = frame[2] << 1;
  frame[2] |= rot_frame[10] >> 2 & 0x01;;
  frame[2] = frame[2] << 1;
  frame[2] |= rot_frame[11] >> 2 & 0x01;;
  frame[2] = frame[2] << 1;
  frame[2] |= rot_frame[0] >> 1 & 0x01;
  frame[2] = frame[2] << 1;
  frame[2] |= rot_frame[1] >> 1 & 0x01;
  frame[2] = frame[2] << 1;
  frame[2] |= rot_frame[2] >> 1 & 0x01;
  frame[2] = frame[2] << 1;
  frame[2] |= rot_frame[3] >> 1 & 0x01;
  frame[2] = frame[2] << 1;
  frame[2] |= rot_frame[4] >> 1 & 0x01;
  frame[2] = frame[2] << 1;
  frame[2] |= rot_frame[5] >> 1 & 0x01;
  frame[2] = frame[2] << 1;
  frame[2] |= rot_frame[6] >> 1 & 0x01;
  frame[2] = frame[2] << 1;
  frame[2] |= rot_frame[7] >> 1 & 0x01;
  frame[2] = frame[2] << 1;
  frame[2] |= rot_frame[8] >> 1 & 0x01;;
  frame[2] = frame[2] << 1;
  frame[2] |= rot_frame[9] >> 1 & 0x01;;
  frame[2] = frame[2] << 1;
  frame[2] |= rot_frame[10] >> 1 & 0x01;;
  frame[2] = frame[2] << 1;
  frame[2] |= rot_frame[11] >> 1 & 0x01;;
  frame[2] = frame[2] << 1;
  frame[2] |= rot_frame[0] & 0x01;
  frame[2] = frame[2] << 1;
  frame[2] |= rot_frame[1] & 0x01;
  frame[2] = frame[2] << 1;
  frame[2] |= rot_frame[2] & 0x01;
  frame[2] = frame[2] << 1;
  frame[2] |= rot_frame[3] & 0x01;
  frame[2] = frame[2] << 1;
  frame[2] |= rot_frame[4] & 0x01;
  frame[2] = frame[2] << 1;
  frame[2] |= rot_frame[5] & 0x01;
  frame[2] = frame[2] << 1;
  frame[2] |= rot_frame[6] & 0x01;
  frame[2] = frame[2] << 1;
  frame[2] |= rot_frame[7] & 0x01;
  frame[2] = frame[2] << 1;
  frame[2] |= rot_frame[8] & 0x01;;
  frame[2] = frame[2] << 1;
  frame[2] |= rot_frame[9] & 0x01;;
  frame[2] = frame[2] << 1;
  frame[2] |= rot_frame[10] & 0x01;;
  frame[2] = frame[2] << 1;
  frame[2] |= rot_frame[11] & 0x01;;
}
1 Like

indeed - just checked and got it wrong... I'll revisit later

So still with the two nested loop and the idea to address directly the right bit in the frame I would propose

void rotateframe2() {
  for (int i = 0; i < 3; i++) frame2[i] = 0;
  uint8_t frameBitPos = 0;
  for (uint8_t j = 0; j < 8; j++) {
    for (uint8_t i = 0; i < 12; i++) {
      uint8_t bitPosInElement = 31 ^ (frameBitPos & 31);
      frame2[frameBitPos >> 5] |= ((uint32_t)(rot_frame[i] >> (7 - j)) & 0x01) << bitPosInElement;
      frameBitPos++;
    }
  }
}

EDIT

it's probably as fast as @david_2018's code since I don't have the bit shifting, that would probably cover for the cost of the for loops and a bit of bitwise maths

I should not make such claims without testing :slight_smile: probably not !

1 Like

Untested, and unfinished. Need to figure out how to populate rotate array!

But it might meet the challenge of not doing the rotation bit-by-bit.

uint32_t frame[3] = { 0, 0, 0 };

// 12 rows of 8.  MSB is bottom LSB is top.  so MSB of rot_frame[11] is pixel 96 at bottom right and LSB of rot_frame[0] is pixel 1.    
byte rot_frame[12] = {0,0,0,0,0,0,0,0,0,0,0,0};

uint32_t rotate[255][3];

//Needs to be called only once, eg. in setup()
void populateRotate(){
  for(int i=0; i<=255; i++){
    for(int j=0; j<8; j++){
      int k = j*12;
      rotate[i][k>>5] |= ((i>>j) & 1) << (k & 31));
    }
  }
}

void rotateFrame(){
  for(int j=11; j>=0; j--){
    for(int i=2; i>=0; i--){
      if(j==11) frame[i] = 0;
      frame[i] |= rotate[rotFrame[j]][i];
    }
    if(j>0){
      for(int i=2; i>=0; i--){
        frame[i] <<= 1;
        if (i>0) frame[i] |= (frame[i-1] & 0x80000000) != 0;
      }
    }
  }
}
1 Like

Let me be clear, none of this is necessarily needed or broken. I don't have any performance issues. It just looked ugly and I wondered if there might be a clever way to do it that I was missing.

When I wrote the code I was randomly filling the columns of 8 up one light at a time where it would pick a random column and light the top light then move it down one so it like rained down into position.

But really the idea was just to be able to use the array sideways in general.

Fair enough. I'll take any sort of improvement. To be honest I wrote that, looked at it, thought it was ugly, and wondered if I might be missing something that makes it easier. Was really just wondering what sorts of options folks here might come up with. Just wondering if I was overlooking something clever.

Would be nice if the R4 WiFi allowed the use of an array of eight 16-bit unsigned int for the image, just ignoring the upper four bits. The packed format using 32-bit unsigned integers is cumbersome, and the other format I can find is an 8x12 array of bytes.

1 Like

This will be my last attempt, had a though of how to eliminate a few of the shifts:

void rotateFrame() {
  for (size_t i = 8; i-- > 0;) {
    frame[0] = (frame[0] << 12) | (frame[1] >> 20);
    frame[1] = (frame[1] << 12) | (frame[2] >> 20);
    for (size_t j = 0; j < 12; j++) {
      frame[2] = (frame[2] << 1) | ((rot_frame[j] >> i) & 0x01);
    }
  }
}
1 Like

Really challenging and tempting ... :wink:

I came up with this structure of the data in the matrix:

image

where the yellow bits belong to frame[0], the green to frame[1] and the blue to frame[2].

If you look at the data there is a certain pattern:

  • Use the first column to apply
    • bit 7 of the first (vertical) byte to bit 31 of frame[0]
    • bit 6 to bit 19 of frame[0]
    • bit 5 to bit 7 of frame[0]
    • bit 4 to bit 27 of frame[1]
    • etc.
  • Decrement the "frame bits" and - if you come below 0
    • start with bit 31 for the frame and increment the frame no for this row
  • Go on with the vertical byte no 2 etc. until all 12 bytes are done.

That's the principle.

The function requires more lines but seems to be about 5 times as quick ...

See on Wokwi: https://wokwi.com/projects/375508353239239681

I used the famous heart image as an example and added a function to print the matrix result via Serial ...
image

(This is the standard heart image but coded in 12 vertical bytes.

Sketch:

/*
  Forum: https://forum.arduino.cc/t/can-this-array-rotation-be-better/1167029
  Wokwi: https://wokwi.com/projects/375508353239239681
*/

// Initialized with "heart" image
uint32_t frame[3] = {
    0x3184a444,
    0x42081100,
    0xa0040000
};

// Initialized with "heart" image
byte rot_frame[12] = {0, 96, 144, 136, 68, 34, 68, 136, 144, 96, 0, 0};

unsigned long startTime = 0;
constexpr byte NoOfTest = 10;

void setup(){
  Serial.begin(115200);
  Serial.println("Start");
  //printMatrix();
  Serial.println(F("---convert---"));
  startTime = micros();
  for (int k= 0;k<NoOfTest;k++) convertFrame();
  Serial.print((micros()-startTime)/NoOfTest);
  Serial.println(F(" microsec per conversion"));
  //printMatrix();
  Serial.println(F("---rotate----"));
  startTime = micros();
  for (int k= 0;k<NoOfTest;k++) rotateFrame();
  Serial.print((micros()-startTime)/NoOfTest);
  Serial.println(F(" microsec per rotation"));
  //printMatrix();
}

void loop(){
}

// Function to convert 8 byte array to 32 bit matrix data
constexpr byte arrLen = 8;
constexpr uint8_t startBitNo[arrLen] = {31, 19,  7, 27, 15, 3, 23, 11};
constexpr uint8_t startFrameIndex[arrLen] = {0,0,0,1,1,1,2,2};
uint8_t frameIndex[arrLen] = {0,0,0,1,1,1,2,2};
uint8_t bitNo[arrLen];

void convertFrame(){
  // Prepare arrays
  memcpy(bitNo,startBitNo,sizeof bitNo);
  memcpy(frameIndex,startFrameIndex,sizeof frameIndex);
  for (int i=0; i<3; i++){
    frame[i] = 0;
  }
  // Convert from 8 bit rot_frame to 32 bit frame array
  for (int i=0;i<12;i++){
     for (int j = 0;j<arrLen;j++){
        if (rot_frame[i] & (1 << (7-j))) {
          frame[frameIndex[j]] |= (1ul << bitNo[j]);
        }
        bitNo[j]--;
        if (bitNo[j] > 31){
          bitNo[j] = 31;
          frameIndex[j]++;
        }
     }
  }
}

// Original: Shorter but takes about five times longer

void rotateFrame(){
  for (int i=0; i<3; i++){
    frame[i] = 0;
  }
  for(int i=0; i<96; i++){
    if(rot_frame[i%12] & (1<<(7-(i/12)))){
      frame[i/32] |= (1ul << (31 - (i%32)));
    }
  }
}

// Functions to print the matrix via Serial

void printBinary(uint32_t val, byte from, byte to, bool CR){
  byte bitN = to;
  for (int i = from;i<=to;i++,bitN--){
    if (val & (1uL << (bitN) ))  Serial.print("1");
                          else   Serial.print("0"); 
  }
  if (CR) Serial.println();
}

void printMatrix(){
   printBinary(frame[0],20,31,true); 
   printBinary(frame[0],8,19,true);
   printBinary(frame[0], 0, 7,false);
   printBinary(frame[1], 28,31,true);
   printBinary(frame[1],16,27,true);
   printBinary(frame[1],4,15,true);
   printBinary(frame[1],0,3,false);
   printBinary(frame[2],24,31,true);
   printBinary(frame[2],12,23,true);
   printBinary(frame[2],0,11,true);
}

I hope I did not make too many or severe mistakes while trying to understand the way the coding of the matrix data is done ... :wink:

1 Like

Hi @Delta_G ,

the structure of the data made me think of another simple way of rotation/conversion:

void rotAndConv() {
  byte mask;
  uint32_t bitToAdd;
  uint32_t row[arrLen];
  for (int i = 0; i < arrLen; i++) {
    row[i] = 0;
  };
  mask = 1;
  for (int j = 0; j < 8; j++) {
    bitToAdd = 1uL << 11;
    for (int i = 0; i < 12; i++) {
      if (rot_frame[i] & mask) {
          row[j] |= bitToAdd;
      } 
      bitToAdd = bitToAdd >> 1;
    }
    mask = mask << 1;
  }
  frame[0] = (row[7] << 20) + (row[6] << 8) + (row[5] >> 4);
  frame[1] = (row[5] << 28) + (row[4] << 16) + (row[3 << 4])+(row[2] >> 8);
  frame[2] = (row[2] << 24) + (row[1] << 12) + row[0];
}

It is even more than two times faster than my solution posted above and dead simple:

  • Calculate the 12 bit value of each row first
  • Then shift the value of the 8 rows to the correct positions and add them to the appropriate 32 bit frame variable

See Wokwi: https://wokwi.com/projects/375568237709771777

Please check for any mistakes ... :slight_smile:

1 Like

The row[3 << 4] should be row[3] << 4

Would be interesting to see the execution times on an actual R4 Wifi, there is substantial difference between an UNO and a Nano 33 IOT. It also makes a bit of difference if you use global variables, or pass the arrays to the function, because of compiler optimizations.

I get a very slight improvement in my method by moving some of the shifts outside the inner for loop, but your method is faster. (Your code beats the long-winded individual bit method on an UNO, if the arrays are passed to the function).

void rotateFrame7(uint32_t frame[3], uint8_t rot_frame[12]) {
  for (size_t i = 8; i-- > 0;) {
    if ((i & 0x01) == 1) {
      frame[0] = (frame[0] << 24) | (frame[1] >> 8);
      frame[1] = (frame[1] << 24) | (frame[2] >> 8);
    }
    for (size_t j = 0; j < 12; j++) {
      frame[2] = (frame[2] << 1) | ((rot_frame[j] >> i) & 0x01);
    }
  }
}
1 Like