Go Down

Topic: LED display Issues (Read 1 time) previous topic - next topic


Oct 21, 2009, 05:53 am Last Edit: Oct 21, 2009, 06:01 am by inertia Reason: 1
Hello all,

Have been having a wonderful time moving from basic stamp to arduino.  But I have an interesting project I am working on and having a bit of an issue wrapping my head around.

I have a massive 64x16 led display I custom built that is common cathode along the 16 rows.

I currently have a stockpile of 74HC595N shift registers so thats where my initial design was drawn from.  I created a circuit using 10 shift registers to drive the leds anodes and then had two seperate shift registers to drive 16 transistors to control the cathodes.

I am able to easily control each row individually using shiftout using the arduino but when I try and get more then one row to display I get terrible flickering.  Since I can't obviously drive all these leds at once I am trying to display one row at a time.  The flickering is horrible and the brightness is drastically decreased.

I have been doing some reading and finding some information about clockspeed causing the issue but am completely confused by this. If this is it where can I find some decent information on figuring out a way to appropriate my clock speed for a smooth display?

Also I have been looking into using the TLC5941 which is a dedicated LED Driver.  I'm assuming this is a much better option but even then, i'd rather use what I have if it is possible.

Thanks in advance for any assistance.


The flickering is horrible and the brightness is drastically decreased.

The brightness will be reduced by multiplexing because the LED is only on part of the time. You can reduce the resistor value to up the current. It will also flicker unless you have the row on at least 32 times a second.

I have been looking into using the TLC5941

That uses current sinks and you have wired you matrix up as common cathode so it won't work.

nformation about clockspeed causing the issue

It's only an issue of clock speed if you software can't go fast enough.
Have a look at the code I used in this project:-


What's your outer vs inner loop?  Are you sweeping so each LED will be on 1/64 of the time, or are you going the other way so they will be on 1/16th of the time?

Are you using digitalWrite() to bit bang the 595s?  That function is pretty slow.  Instead use direct register access.

Finally, are the 595s cascaded?  Doing it that way causes you to make a long inner loop.  Instead connect one arduino output to all the 595 clock inputs, and a different arduino output to each 595 data in.  If you choose the right arduino inputs then you can write the data to 8 595s at the same time by just setting one register (PORTB or PORTD are the digital IO register names)...


Thank you for the responses.

As of now I am trying to get them to be on 1/16th of the time.  By going through the rows.

I am using digitalwrite to set the latch pin and then shiftout to push data to each shift register. so I have several shiftouts after the digitial write that corresponds to each shift register then the process repeats after shifting the first two registers to a ground a new row.

I dont quite understand the direct register access you mentioned. Is there a tutorial or some information I can find somewhere else.

Right now they are cascaded with the data pins chained to each other. So if I understand this right, you are suggesting that I take the data pin from each shfit register and connect it to seperate I/O pins on the arduino? I think I get what you are saying and I'm definitely going to try this out.

Sadly I don't have much experience in this.  I have taken some classes but I'm a mechanical engineer so electronics are not my specialty.  But I really appreciate this help.

Thank you again.  


RE direct register access:

WRT the cascading, imagine a loop to be a  mechanical typewriter.  Every time your loop ends and goes back to the beginning its like hitting the carriage return.  It takes the typewriter a lot more time to go back to the beginning than it does to type a single letter.  

Therefore if you want to type 42 letters (i.e the state of each LED) it is faster to type all 42 letters in one line than to type one letter per line.  This is a time optimization called "loop unrolling" (google it).

Add to that the direct register access idea.  Using it, ONE key press (one instruction) can type 8 letters (by directly writing the port) instead of it taking more like 10-20 key presses to type just ONE letter (digitalWrite function call).

Finally, when you cascade the 595s you have to manipulate the clock line for each letter.  To continue the analogy, this would be like typing a ^ (up)  before and a v (down) after each letter.  So you instead of typing:
you are need to do ^1v^0v^1v, etc.

But if you give each of 8 chips its own data line but use the same clock line its like typing:

^10101010v^10101010v, etc

A LOT fewer ^v are needed.  Note that there is a limit to how many chips the Arduino can drive from a single line... it has to do with how much current the Arduino can source per line and how much the 595 draws.  I'm not sure what those number are, just be aware that weird behavior may be a hardware not software issue.

With all of these optimizations you'll probably get it to go hundreds if not 1000 times faster.  I have driven 6 M5451 chips (they are like a 35 output shift register LED driver) using code like this and am able to blink the LEDs at 16kHz (I didn't try more).  200 hz or so looks "on" solid to a human eye...

My code (for the M5451 chip) is available at <url>http://code.google.com/p/arduino-m5451-current-driver/</url>.  But you know you'll probably have more fun and learn a lot more if you DIY! :-)


200 hz or so looks "on" solid to a human eye...

Agreed but so does 50Hz that's why TVs work. The actual limit does vary from individual to individual but about 32 Hz is the lower limit.

you'll probably get it to go hundreds if not 1000 times faster.

That's a bold, claim have you the maths to back this up. Using direct port accessing will speed things up the most.
The parallel shift register loading is only going to save you a shift register clock cycle per shift register.



You're right that the hertz can be lower.  But don't forget that old-style CRT TVs had a phosphor that glowed for longer than the beam illuminated it.  So maybe the duty cycle was quite large.  In fact given that when the TV is off you can STILL see the phosphor glowing a little bit, the duty cycle on a CRT TV is very analog -- its never completely off.   I think that this would make a big difference.  From my experiments, as the duty cycle goes down, the hz must go up because its a lot easier to perceive blinking at 10% duty cycle than at 90%.  

BTW, I heard that 200 number as a minimum from an architect specializing in LED lighting... so its just hearsay.

WRT cycle counting... nope I didn't count them.  Its pretty hard nowadays with pipelining and all... and its boring.  But because you spend so many hours helping us all out, I'm going to reciprocate and give a little rough counting a try! :-)  Here's digitalWrite:

void digitalWrite(uint8_t pin, uint8_t val)    // Fn call + 2 vars = 3
       uint8_t timer = digitalPinToTimer(pin);  // macro so only push,add, and mem ref = 3
       uint8_t bit = digitalPinToBitMask(pin);   // = 3
       uint8_t port = digitalPinToPort(pin);     // = 3
       volatile uint8_t *out;  // push = 1

       if (port == NOT_A_PIN) return;  // test and jump = 2

       // If the pin that support PWM output, we need to turn it off
       // before doing a digital write.
       if (timer != NOT_ON_TIMER) turnOffPWM(timer);  // test not taken = 2

       out = portOutputRegister(port);  // add and assign = 2

       if (val == LOW) *out &= ~bit;  // if, deref, read, not, and, assign = 6
       else *out |= bit;
}  // return: pop, jump = 2

So adding all of this up we get a count of 26.

Let me guess a single clock and data cycle is CLK_HIGH, DATA HI OR LOW, CLK_LOW.  So that is a total of 26*3 = 78 counts.  I feel that this is very conservative because I am assuming that conditions, jumps, memory access, etc are all one cycle AND because I didn't dig into all those macros that carefully and gave the compiler the benefit of the doubt.  For example, there is some data type casting in there which could result in a unnecessary copy.

So if the OP uses register access the cycle count is 3 instead of 78 to clock one bit in  (Actually if the clock is in the same register as the data, you could reduce the count to 2).

Now, if the OP switches from chained 595s to parallel then he's clocking in 8 of these at a time.  So instead of 78*8 = 624 counts he is doing 3.

Let's say the OP uses all 16 digital outputs.  15 for the data and 1 for the clock.  So instead of 78*15 or 1170 counts, he is doing 4.

Now let's unroll the final outer loop.  So I'm guessing it looks something like:

for (i=0;i<NUMSHIFTS;i++)

So that loop itself does a test, add, and jump (say count 3) and NUMSHIFTS is for a 64 by 16 matrix or 80.  So that's another 240 counts.  So lets add that to the 80 bits * the 78 count is 6480 clocks.

Now that 64 x 16 matrix is 8 x 2 chips.  So we can simultaneously clock 10 chips.  And we need to do that 8 times.  So that would look something like:

CLK_HIGH and write 2 registers = 2 counts
CLK_LOW = 1 count
(and cut and paste that 8 times)

So you get a total of 24 counts.

So the back of the envelope calculation shows a speed up of 270 times.

Sure that's on the low side of my estimate.  But we haven't even really dug into the ugliness... for example a quick look at shiftout() shows this gem within the 0-7 for loop:

               if (bitOrder == LSBFIRST)
                       digitalWrite(dataPin, !!(val & (1 << i)));
                       digitalWrite(dataPin, !!(val & (1 << (7 - i))));

Now I don't know about the AVR but back when I was counting cycles a decade ago, lots of uCs handled the bit shift operation in 1 bit shift per clock (1<<7 takes 7 clocks).  Ergo, this if statement and all that val munging adds a LOT more cycles than my estimation.

But, you know, I didn't think all of this thru before posting.  It was simply apparent by comparing what the OP said his matrix was doing with what I'm getting out of my M5451 library.


Thanks for that, I was thinking more of measuring it with a scope but that's an interesting analysis. As I said it's the direct port accessing, that is removal of the digitalWrite() that gives you the most increase in speed.

Your other point:-
But don't forget that old-style CRT TVs had a phosphor that glowed for longer than the beam illuminated it.

True but consider the case of movie film. You need at least 18 frames per second to give the illusion of continuous movement, but at that rate you see it flickering. So to save film each frame is flashed up three times on the screen giving a flash rate of 18 * 3 = 54 Hz. There is no persistence there except in the eye. I think this ratio of three flashes per frame even applies to modern films where the frame rate is 24 or 30 per second.


I have this problem as well (iChat is fine using the display's iSight, Photobooth does not use it).

Posted another problem elsewhere here related to the USB audio speakers in the display causing jittery/poor HD full screen video playback (turn off display speakers then the playback is fine/smooth). This also needs to be fixed... curious if others see it (I'm running a v2 Macbook Air).

Go Up