dataconversion to slow for multiplexing led matrix

I want to build a big rgb Led matrix. I also want it to be easy to write new functions for animations, games etc. Therefore I choose to use a array of byte with each byte representing a pixel in the led matrix. This way a function can easily write in this array using coordinates without worrying about the hardware design.

The hardware consists of shift registers. The first byte is for the rows, the next three bytes are for blue, red and green of the first 8x8x3 matrix, the next three bytes for the next display etc.

To display the image, the array needs to be translated into something the hardware can work with. This is basically copying the bits in a different order. When testing this on a 24x8x3 display it worked well, but i was in doubt about it being fast enough for bigger displays. therefore i tested the code for a 24x24x3 display. The code is to slow for a multiplexing display of this size.

Is there an easy way to get my code faster or do i need a different processor?

//this first line, was last edited on 2012-04-13
//author: peertje

//Hardware
//Number of horizontal 8x8 matrixes (Directly connected to each other)
const int hDisp = 3;
//Number of Vertical 8x8 matixes (Connected with some kind of special connector)
const int vDisp = 1;
//the total number of displays
const int numberOfDisplays = hDisp * vDisp;

// Interfacing with the hardware
//Pin connected to ST_CP of 74HC595
int latchPin = 8;
//Pin connected to SH_CP of 74HC595
int clockPin = 10;
//Pin connected to DS of 74HC595
int dataPin = 9;

//declaring the necesarry video memory
const int Heigth = 8 * vDisp;
const int Width = 8 * hDisp;
byte VideoMem [Heigth][Width]; // the value of a byte contains the color
// for now, only 3bit color is supported
// 0 = black
// 1 = Red
// 2 = Green
// 3 = Yellow (Red + Green)
// 4 = Blue
// 5 = Violet (Red + Blue)
// 6 = navy (Green + Blue)
// 7 = White (Red + Green + Blue)

//the low bit in this byte is the active hLine 
byte hLine = B00000000; //During reset we want a low current, so all rows are off


void setup() {
  //start serialcommunication for debugging
  //Serial.begin(57600);
  //Serial.print("This display is ");
  //Serial.print(Heigth,DEC);
  //Serial.print(" x ");
  //Serial.println(Width,DEC);
  
  //Setting the used pins to Output
  pinMode(latchPin, OUTPUT);
  pinMode(clockPin, OUTPUT);
  pinMode(dataPin, OUTPUT);

  //empty the shiftregisters  
  digitalWrite(latchPin, LOW);
  for(int i=0; i<(numberOfDisplays *3); i++){
    shiftOut(dataPin, clockPin,LSBFIRST, 0);
  }
  shiftOut(dataPin, clockPin, MSBFIRST, hLine);
  digitalWrite(latchPin, HIGH);
  
  //draw something for a nice test screen
  cross();
  square();
}

void loop() {
  
  //the array containing the outputdata
  byte SerialOutput [numberOfDisplays*3];
  boolean buffer;
  
  //Show one row at the time
  for (int iRow = 0; iRow<8; iRow++)
  {
    hLine = 0;
    bitWrite(hLine,iRow,true);
    
    //filling the ouputdata array 
    int bytenr = 0;
    //Read some of the video memory and define the outputdata
    for (int y = vDisp; y>0; y--)
    {
      for (int x = hDisp; x>0; x--)
      {
        for (int col = 3; col>0; col--)
        {
          //first the outputbyte is empty
          SerialOutput[bytenr] = B00000000;
          for(int pixel = 0; pixel<8; pixel++){
            //get output bit from videomemory. due to electronic design, the bit needs to be inverted
            buffer = not bitRead(VideoMem[iRow+(y-1)*8][pixel+(x-1)*8],col-1); 
            if (col==3){
              //green works the other way around, due to electronic design
              bitWrite(SerialOutput[bytenr],(7 - pixel),buffer);
            }
            else
            {
              //this one is for blue and red
              bitWrite(SerialOutput[bytenr],pixel,buffer);
            }
          }
          //all pixels in this byte are handeld, lets go to the next byte
          bytenr++;
        }
      }
    }
      
      //Set the latchPin low to start sending the output data
       digitalWrite(latchPin, LOW);
       
      for (bytenr=0; bytenr< (numberOfDisplays *3); bytenr++){
        shiftOut(dataPin, clockPin, LSBFIRST, SerialOutput[bytenr]);
        //Serial.println(SerialOutput[bytenr],BIN);
      }
      
      //output row
      shiftOut(dataPin, clockPin,MSBFIRST, hLine);
      //This is the end of the outputdata, set the latchpin high again
      digitalWrite(latchPin, HIGH);
      
      //If the Code runs to fast :S, we can have little coffee break now
      //delay(1);
      
      //Do some code to write in the video memory
      //animation(iRow);
  }
}

void cross()
{
  //write some data to the videomemory
  VideoMem [3][2]= B00000111;
  VideoMem [4][2]= B00000111;
  VideoMem [2][3]= B00000111;
  VideoMem [3][3]= B00000111;
  VideoMem [4][3]= B00000111;
  VideoMem [5][3]= B00000111;
  VideoMem [2][4]= B00000111;
  VideoMem [3][4]= B00000111;
  VideoMem [4][4]= B00000111;
  VideoMem [5][4]= B00000111;
  VideoMem [3][5]= B00000111;
  VideoMem [4][5]= B00000111;
}

void square()
{
  //write some data to the videomemory
  VideoMem [2][18]= B00000011;
  VideoMem [3][18]= B00000011;
  VideoMem [4][18]= B00000001;
  VideoMem [5][18]= B00000001;
  VideoMem [2][19]= B00000110;
  VideoMem [5][19]= B00000100;
  VideoMem [2][20]= B00000110;
  VideoMem [5][20]= B00000100;
  VideoMem [2][21]= B00000010;
  VideoMem [3][21]= B00000010;
  VideoMem [4][21]= B00000101;
  VideoMem [5][21]= B00000101;
}

Sure:

Change this part to use SPI.transfers()s instead of shiftouts(), will be tons faster:

// Interfacing with the hardware //Pin connected to ST_CP of 74HC595 int latchPin = 8; //Pin connected to SH_CP of 74HC595 int clockPin = 10; //Pin connected to DS of 74HC595 int dataPin = 9;

You could also change the digitalWrites to direct port manipulations, save a lot of time there too.

To clear a bit: PORTC = PINC & B11111101; // clears PortC, pin1

To set a bit: PORTC = PINC & B00000010; // sets PortC, pin1

Is there an easy way to get my code faster or do i need a different processor?

seldom, but…

2 tips:

  • use smallest possible variables uint8_t iso int if possible
  • make use of precalculated values.

I see 5 nested loops that use an int as index.

replace int with uint8_t where possible uint_8 (aka byte) can go from 0…255 and will be often OK enough. (watch out a byte cannot be negative so don’t test for < 0)

just a quick refactor

void loop() 
{
	byte SerialOutput [numberOfDisplays*3];  //the array containing the outputdata
	boolean buffer;

	for (uint8_t iRow = 0; iRow<8; iRow++)
	{
		hLine = 0;
		bitWrite(hLine, iRow, true);
		
		int bytenr = 0;
		
		for (uint8_t y = vDisp; y>0; y--)
		{
			uint8_t tempY = iRow+(y-1)*8;     // precalculated
			
			for (uint8_t x = hDisp; x>0; x--)
			{
				uint8_t tempX = (x-1)*8;      // precalculated

				for (uint8_t col = 3; col>0; col--)
				{
					int tempZ = col-1;          // precalculated  
					
					// SerialOutput[bytenr] = B00000000;  // not needed I think, not sure)
					for(uint8_t pixel = 0; pixel<8; pixel++)
					{
						//get output bit from videomemory. due to electronic design, the bit needs to be inverted
						buffer = not bitRead(VideoMem[tempY][pixel+tempX], Z);
						
						if (col == 3)
						{
							//green works the other way around, due to electronic design
							bitWrite(SerialOutput[bytenr],(7 - pixel), buffer);
						}
						else
						{
							//this one is for blue and red
							bitWrite(SerialOutput[bytenr], pixel, buffer);
						}
					}
					bytenr++;
				}
			}
		}
		
		//Set the latchPin low to start sending the output data
		digitalWrite(latchPin, LOW);
		
		for (bytenr=0; bytenr< (numberOfDisplays *3); bytenr++)
		{
			shiftOut(dataPin, clockPin, LSBFIRST, SerialOutput[bytenr]);
			//Serial.println(SerialOutput[bytenr],BIN);
		}
		
		//output row
		shiftOut(dataPin, clockPin,MSBFIRST, hLine);
		//This is the end of the outputdata, set the latchpin high again
		digitalWrite(latchPin, HIGH);
		
		//If the Code runs to fast :S, we can have little coffee break now
		//delay(1);
		
		//Do some code to write in the video memory
		//animation(iRow);
	}
}

Give it a try

Another thing to considder is to bring the shiftout in the nested loops so that when a byte is filled shift it.
That would remove the need for the SerialOutput array and thus its indexing (which includes a multiply + add internally)

OK lets add that in, don’t know if it all works but at least you get the idea how to optimize.

void loop() 
{
	byte output;
	boolean buffer;

	for (uint8_t iRow = 0; iRow<8; iRow++)
	{
		hLine = 0;
		bitWrite(hLine, iRow, true);
		
		for (uint8_t y = vDisp; y>0; y--)
		{
			uint8_t tempY = iRow+(y-1)*8;     // precalculated
			
			for (uint8_t x = hDisp; x>0; x--)
			{
				uint8_t tempX = (x-1)*8;      // precalculated

				for (uint8_t col = 3; col>0; col--)
				{
					int tempZ = col-1;          // precalculated  
					
					for(uint8_t pixel = 0; pixel<8; pixel++)
					{
						buffer = not bitRead(VideoMem[tempY][pixel+tempX], Z);
						
						if (col == 3)
						{
							//green works the other way around, due to electronic design
							bitWrite(output,(7 - pixel), buffer);
						}
						else
						{
							//this one is for blue and red
							bitWrite(output, pixel, buffer);
						}
					}
                                        // output is filled lets shift it out
					digitalWrite(latchPin, LOW);
                                        shiftOut(dataPin, clockPin, LSBFIRST, output);
					digitalWrite(latchPin, HIGH);
				}
			}
		}
		
		digitalWrite(latchPin, LOW);
		shiftOut(dataPin, clockPin,MSBFIRST, hLine);
		digitalWrite(latchPin, HIGH);

	}
}

CrossRoads: You could also change the digitalWrites to direct port manipulations, save a lot of time there too.

To clear a bit: PORTC = PINC & B11111101; // clears PortC, pin1

To set a bit: PORTC = PINC & B00000010; // sets PortC, pin1

I think that second one should be an OR instead of an AND.

Yeah, I keep doing that - cut & paste and forget to change it. Thanks.

Crossroads & robtillaart have good ideas and that may be enough of a speed boost for you.

If not, the other thing you could do to substantially boost speed is to use bit planes instead pixel bytes.

Right now you're storing each pixel as three bytes and when you draw the matrix you have to pull out the bits every time. This is slow. Alternatively, you could set up one group of bytes = the 7th bit of all pixels, the next group the 6th bit of all the pixels, etc. Then, when you update your shift registers you'll just be shifting out the data planes; no other conversion necessary so it will be very fast.

The downside to this is you're inverting the task so writing a new value to each pixel will require you to pull THOSE bits apart and store them across the 24 bytes. This will be more complicated and take longer to run that part of the code, but my guess is you're strobing the matrix many many more times than you update pixels so it will still be a net gain.

As an aside, the old IBM VGA display used this technique back in the days when 640x480 was high resolution and 286s ruled the land.