How can I improve this PWM code?

I wrote the following code to do PWM modulation, but I've found that I get a lot of flicker with the dimmer settings:

int array[8] = {3, 7, 15, 31, 63, 127, 191, 255};

void setup()   {                

  for (int x = 2; x < 10; x++) 
  {
    pinMode(x, OUTPUT);
  }
  
}

// the loop() method runs over and over again,
// as long as the Arduino has power

void loop()                     
{

  updatearray();
    
}

void updatearray() {

  int firstpin = 2;
  //int dutycycle = 1785; // Dutycycle = 1000000 microseconds / 70hz update / groups.  1785 = 8 groups of 8 at 70hz.  490hz update of 8 groups of leds = 256 microsecond duty cycle.
  unsigned long delaystart;
  int timeelapsed;
  
  for (int x = 0; x < 64; x+=8) { // Simulate updating 64 leds.
    
    if (x < 8) { 

      delaystart = micros(); 
      timeelapsed = 0;
      
      while (timeelapsed < 255) { // Regardless of whether all LED's are off, wait for duty cycle to complete.
        
        for (int y = 0; y < 8; y++) {          
          
          // Using a 490hz update greatly simplifies duty cycle calculations.  This page mentions that the Arduino uses that rate on its PWM pins -> http://www.arduino.cc/en/Reference/AnalogWrite
          // Note that the 16mhz Arduino has a resolution of four microseconds, so only 64 brightness levels are possible. -> http://arduino.cc/en/Reference/Micros
         
          if (array[x+y] > 0) {
            if (timeelapsed >= (255-array[x+y])) { 
              digitalWrite(firstpin + x + y, HIGH);
              array[x+y] = -array[x+y]; // Toggle array value negative to keep track of current pin state.
            }
          }
           
        } 

        timeelapsed = micros()-delaystart; 
        
      } 
      
      for (int y = 0; y < 8; y++) {
        if (array[x+y] < 0) { 
          digitalWrite(firstpin + x + y, LOW);
          array[x+y] = -array[x+y]; 
        }
      }
      
    }
    else {
      delayMicroseconds(255); // Removing this delay would make the leds brighter when fewer are lit, and could burn them out if we are pulsing them at higher than their rated current.
    }  
    
  }
  
}

The code above requires 8 leds connected to pins 2 thru 10 with resistors on the cathodes going to ground. And it looks a bit funny because it's simulating how the leds will look when I expand the array out to an 8x8. So the 8 leds will be run on a 1/8th duty cycle.

Anyway, the code should be updating the entire array at 490hz, so I wouldn't think I should be seeing any flicker, but nothing I do makes it go away. It WAS much worse when I was setting the pin state every update, but now that I toggle the array to keep track of the current state of the pin, it's much better, but still not perfect.

I've thought about setting all the pins at once, but in this case that doesn't make a lot of sense because multiple pins generally won't need to be set at the same time.

So, any ideas how I can improve upon this? Or am I just stuck with it because the Arduino isn't fast enough?

Oh, and one more thing about how the code works. Unlike standard PWM where the pins would be in an on state initially and then shut off for a period of time, my code starts the pins in an off state and turns them on for a period of time. I did this because I found that when I was turning them on at the beginning of the loop, I could not get them to be very dim. I'm not sure why that is, but perhaps it was simply due to how long the loop took to step through all the pins to turn them on.

Move this...

  for (int x = 0; x < 8; x++) {
    array[x] = array[x] + random(9) - 4;
    array[x] = constrain(array[x], 0, 255);
  }

...into setup.

Does that change the flicker?

No, that was just some test code I forgot I left in there. :slight_smile:

I updated the code to remove those lines.

I've updated the code to include linear, logarithmic, and gamma corrected output. I'm not really sure if it's worthwhile to do though. At least, not with the setup I'm using. The micros() timer only has a 4 microsecond resolution, which means I've only got 64 potential brightness levels. Maybe it'll make a linear fade in and out brighter, I dunno. It does prevent the leds from getting quite as dark though. I guess that's not really an issue if the very dimmest settings flicker like crazy though.

int array[8] = {1, 2, 3, 4, 63, 127, 0, 255};
byte dutycycle[256] = {0}; // Logarithmic duty cycle lookup table for improved LED appearance when using PWM.


void setup()   {                

  float c;
  
  for (int x = 2; x < 10; x++) 
  {
    pinMode(x, OUTPUT);
  }
  
  c = 255.0 / log(255 + 1);
  
  for (int x = 0; x < 256; x++) { 
    dutycycle[x] = x; // Linear
    //dutycycle[x] = pow(x/255.0, 1.0/2.2)*255.0 + 0.5; // 2.2 gamma
    //dutycycle[x] = c * log(x + 1); // Logarithmic      
  }  
  
}

// the loop() method runs over and over again,
// as long as the Arduino has power

void loop()                     
{

  /*for (int x = 0; x < 8; x++) {
    array[x] = array[x] + random(9) - 4;
    array[x] = constrain(array[x], 0, 255);
  }*/
  
  updatearray();
    
}

void updatearray() {

  int firstpin = 2;
  //int dutycycle = 1785; // Dutycycle = 1000000 microseconds / 70hz update / groups.  1785 = 8 groups of 8 at 70hz.  490hz update of 8 groups of leds = 256 microsecond duty cycle.
  unsigned long delaystart;
  int timeelapsed;
  
  for (int x = 0; x < 64; x+=8) { // Simulate updating 64 leds.
    
    if (x < 8) { 

      delaystart = micros(); 
      timeelapsed = 0;
      
      while (timeelapsed < 255) { // Regardless of whether all LED's are off, wait for duty cycle to complete.
        
        for (int y = 0; y < 8; y++) {          
          
          // Using a 490hz update greatly simplifies duty cycle calculations.  This page mentions that the Arduino uses that rate on its PWM pins -> http://www.arduino.cc/en/Reference/AnalogWrite
          // Note that the 16mhz Arduino has a resolution of four microseconds, so only 64 brightness levels are possible. -> http://arduino.cc/en/Reference/Micros
         
          if (array[x+y] > 0) {
            if (timeelapsed >= (255-dutycycle[array[x+y]])) { 
              digitalWrite(firstpin + x + y, HIGH);
              array[x+y] = -array[x+y]; // Toggle array value negative to keep track of current pin state.
            }
          }
           
        } 

        timeelapsed = micros()-delaystart; 
        
      } 
      
      for (int y = 0; y < 8; y++) {
        if (array[x+y] < 0) { 
          digitalWrite(firstpin + x + y, LOW);
          array[x+y] = -array[x+y]; 
        }
      }
      
    }
    else {
      delayMicroseconds(255); // Removing this delay would make the leds brighter when fewer are lit, and could burn them out if we are pulsing them at higher than their rated current.
    }  
    
  }
  
}

I've been trying to follow your code (and I won't claim to know the best or only way of doing things), but I think you are kind of on the wrong track. Not to be discouraging at all, after all part of the fun (and frustration :P) is finding out how to do things?

A sidenote; your variable "firstpin" doesn't need to be a variable (taking up RAM), since it doesn'[ch7831] change. Use

const int firstpin = 2;

instead (as a global before setup())

For one thing, I wonder how you will eventually expand this to 64 LEDs. This works for the first 8 LED's, but thats about it (unless you have a arduino Mega, but even that have "only" 54 outputs).

digitalWrite(firstpin + x + y, HIGH);

I would suggest simply using

digitalWrite(firstpin + y, HIGH);

for the LEDs columns, then using (at the very simplest) another 8 outputs for row multiplexing. Something along the lines of

digitalWrite(firstRowPin + x/8, LOW);

or "HIGH", depending on how you connect things, then have the rest of the row pins the opposite (only one active at a time!). If you plan on using high-powered LEDs, a transistor is really recommended!
(assuming here the continued use of varable "x" as in your code, and that the row pins are in sequence, e.g. 10 through 17 (8 pins, some of which would be the analog pins on an atmega 328-based arduino).

Oh btw something I just noticed:

if (timeelapsed >= (255-dutycycle[array[x+y]])) {
digitalWrite(firstpin + x + y, HIGH);
array[x+y] = -array[x+y]; // Toggle array value negative to keep track of current pin state.

Not good at all referencing an array with a negative index.. (not the first time through, but I'd guess the loop will loop several times until timeelapsed < 255 is not true anymore. Also, the value at the negative index (if at all possible), is undetermined.

The LED PWM cycle is iterated through ASAP, not waiting 1/256th of a PWM period, but rather iterating the loop several times until one PWM period is done. Going back to your earlier, similar code without the corrected output (which seem like a good idea btw, correcting the output, but maybe wait with that until you got PWM working):

while (timeelapsed < 255) { // Regardless of whether all LED's are off, wait for duty cycle to complete.

for (int y = 0; y < 8; y++) {

// Using a 490hz update greatly simplifies duty cycle calculations. This page mentions that the Arduino uses that rate on its PWM pins -> http://www.arduino.cc/en/Reference/AnalogWrite
// Note that the 16mhz Arduino has a resolution of four microseconds, so only 64 brightness levels are possible. -> http://arduino.cc/en/Reference/Micros

if (array[x+y] > 0) {
if (timeelapsed >= (255-array[x+y])) {
digitalWrite(firstpin + x + y, HIGH);
array[x+y] = -array[x+y]; // Toggle array value negative to keep track of current pin state.
}
}

}
// no waiting, or other method to ensure 1/256th of a duty cycle is accounted for.
timeelapsed = micros()-delaystart;

} // This will most likely loop several times pr. 255 us, but only the first time through will LEDs actually be in the OFF state. IE (very) wrong PWM brightness

There is no defined time for 1/256th PWM cycle (or better, ensuring somehow that the period is 1/PWMresolution, before continuing the PWM period). It's just going though the loop as quickly as possible, as many times it can within one PWM period (255 us). Only after that is the outputs all reset to LOW.
Meaning; any pin that is not fully off, will be very quickly set to on, regardless of the value in the array. (Example, if the loop is gone through, say 50 times (wild guess), only the first time will be a true PWM cycle, so almost all LEDs that are not fully off, will be almost fully on).

But as far as I can see, the update frequency is about 490 Hz, like you say. So it's puzzling why you see flickering.

Apart from that I don't really think you should use delay() or delayMicroseconds() at all (meaning I think you should rethink it somewhat), you could for starters insert a delayMicroseconds(1) in the PWM loop, just to see if that improves things or not:

unsigned char array[8] = {3, 7, 15, 31, 63, 127, 191, 255};

void setup()   {                

  for (int x = 2; x < 10; x++)
  {
    pinMode(x, OUTPUT);
  }
  
}

// the loop() method runs over and over again,
// as long as the Arduino has power

void loop()                    
{

  updatearray();
    
}

void updatearray()
{

  int firstpin = 2;
  //int dutycycle = 1785; // Dutycycle = 1000000 microseconds / 70hz update / groups.  1785 = 8 groups of 8 at 70hz.  490hz update of 8 groups of leds = 256 microsecond duty cycle.
  unsigned long delaystart;
  int timeelapsed;
  
  for (int x = 0; x < 64; x+=8) { // Simulate updating 64 leds.

    if (x < 8) {
      
      for (unsigned char PWMcycle=0; PWMcycle<=255; PWMcycle++) { // could be +=4 here, since as you say resolution is 4 us

        for (int y = 0; y < 8; y++) {
          
          if (array[x+y] > PWMcycle) digitalWrite(firstpin + x + y, HIGH);
        }
        delayMicroseconds(1); // or 1/256th of a PWM period, or (better) 1/PWMresolution, whatever that might be
      }

      for (int y = 0; y < 8; y++) digitalWrite(firstpin + x + y, LOW);
      

    } else {

      delayMicroseconds(255); // Removing this delay would make the leds brighter when fewer are lit, and could burn them out if we are pulsing them at higher than their rated current.
    }
  }
   
}

FYI, this suggested code is untested. But I don't think this is a good solution. As I said I would suggest rethink the thing into not using delay() or delayMicroseconds(), but rather update one row (or 8 LEDs in this case) at a time, then have the main loop() function continue between updates.

Also some variables that are compared at some time are of different types. timeelapsed is an int, while delaystart is an unsigned, long int (like micros(), which it should be). And array[] is a (signed) int. I think "unsigned char" is better for that (or "uint8_t"), as the range of that variable is 0-255.

For one thing, I wonder how you will eventually expand this to 64 LEDs. This works for the first 8 LED's, but thats about it (unless you have a arduino Mega, but even that have "only" 54 outputs).

You're right that as written the code doesn't expand out, but my test platform is a 10 segment array with the leds going to ground (via resistors), so there's no pin to set low. I haven't yet purchased the IC I need to sink the columns. (ULN2803 darlington array)

Oh btw something I just noticed:

if (timeelapsed >= (255-dutycycle[array[x+y]])) {
digitalWrite(firstpin + x + y, HIGH);
array[x+y] = -array[x+y]; // Toggle array value negative to keep track of current pin state.

Not good at all referencing an array with a negative index.. (not the first time through, but I'd guess the loop will loop several times until timeelapsed < 255 is not true anymore. Also, the value at the negative index (if at all possible), is undetermined.

That can't happen. That code is contained within:
if (array[x+y] > 0) {

Going back to your earlier, similar code without the corrected output (which seem like a good idea btw, correcting the output, but maybe wait with that until you got PWM working):

Actually, the 'corrected' output doesn't really seem correct at all. I set up the code to fade in and out, and with both gamma and logarithmic 'correction' a linear fade from 0 to 255 and back appears bright most of the time with either of those, logartihmic being the worse for wear. Linear on the other hand, looks like you would expect.

The LED PWM cycle is iterated through ASAP, not waiting 1/256th of a PWM period, but rather iterating the loop several times until one PWM period is done. Going back to your earlier, similar code without the corrected output:

There is no defined time for 1/256th PWM cycle (or better, ensuring somehow that the period is 1/PWMresolution, before continuing the PWM period). It's just going though the loop as quickly as possible, as many times it can within one PWM period (255 us). Only after that is the outputs all reset to LOW.
Meaning; any pin that is not fully off, will be very quickly set to on, regardless of the value in the array. (Example, if the loop is gone through, say 50 times (wild guess), only the first time will be a true PWM cycle, so almost all LEDs that are not fully off, will be almost fully on).

I don't think you understand the code.

Each time you call the function, the entire display is updated once.

The function loops through each of 8 columns.

(In this test code, only the fiist column is actually updated, so I simply delay for 255 microseconds for each of the other seven columns. Also no column pins are pulled low because in the test setup no pins are actually connected to the columns, only rows are connected.)

The code then waits at each column for 256 microseconds. This is accomplished with this code:

delaystart = micros();
timeelapsed = 0;

while (timeelapsed < 255) {

  timeelapsed = micros()-delaystart;
       
}

While we are in that loop, timeelapsed tells us how many microseconds we have been on that column.

Now, initally no leds in the column are on. But this code, contained in the above loop...

for (int y = 0; y < 8; y++) {
  if (array[x+y] > 0) {
    if (timeelapsed >= (255-array[x+y])) {
      digitalWrite(firstpin + x + y, HIGH);
    }
  }
}

...loops through each led in the column, and checks to see if it has been set high. (if (array[x+y] > 0) then pin = LOW)

If it has not, then it checks to see if the led should be turned on.

The time at which an led should be turned on is given by "255-array[x+y]", ie: 255-brightness.

So, an led with a brightness of 255 has a turn on time of 0. And an led with brightness 0 has a turn on time of 255.

We then compare the turn on time to the amount of time we have spent at this column:

"if (timeelapsed >= (255-array[x+y]))"

And if the time spent at the column (which is intially 0 because we don't make another check of that until after the first update) is equal to or greater than the led's turn on time, then we turn the led on, and set the brightness to be negative so we know that pin has been set high:

digitalWrite(firstpin + x + y, HIGH);
array[x+y] = -array[x+y];

We keep track of whether the pin has been set high, because setting a pin high takes a lot more time than checking if a value in an array is negative, and results in more flicker.

The code then keeps checking the led's to see if it's time to trigger them, looping through that while loop, until 255 microseconds have passed. Then it moves on to the next column.

So, by the time it has finished updating a column, an led with brightness 255 will have been on for 255 milliseconds. And an led with brightness 63 will have been off for around 192 millseconds and then on for around 63. (It's not perfect because the timer has a resolution of 4ms.)

Then it goes to move on to the next column. But before it does, it turns off those leds which have had their pins set high:

for (int y = 0; y < 8; y++) {
        if (array[x+y] < 0) {
          digitalWrite(firstpin + x + y, LOW);
          array[x+y] = -array[x+y];
        }
      }

Does that clear things up?

Also some variables that are compared at some time are of different types. timeelapsed is an int, while delaystart is an unsigned, long int (like micros(), which it should be). And array[] is a (signed) int. I think "unsigned char" is better for that (or "uint8_t"), as the range of that variable is 0-255.

It needs to be a signed int because it may be 255 or -255 depending on whether that leds pin has been set high. I could save ram, and make the code easier to understand by using eight bytes, one for each column, and setting the bits in them to indicate the current pin states for each led in the column, but this is easier and I'm not low on ram.

(if (array[x+y] > 0) then pin = LOW)

That's true.

The code then keeps checking the led's to see if it's time to trigger them, looping through that while loop, until 255 microseconds have passed. Then it moves on to the next column.

Ah, you are right about that as well. For some reason I thought of the outer while-loop as a for-loop. I really managed to miss point here :-[
Still, imho, it does loop several more times through each led than it needs to do. But this shouldn't affect the display/flickering for this test code, so I don't know why you get that then.

Then it goes to move on to the next column. But before it does, it turns off those leds which have had their pins set high:

That at least I got. Except I would think instead of "next column", you mean "next row"? Of course it doesn't matter whether you multiplex rows or columns. I just thought the 8 LEDs you have in the test code were 8 columns.

It needs to be a signed int because it may be 255 or -255 depending on whether that leds pin has been set high. I could save ram, and make the code easier to understand by using eight bytes, one for each column, and setting the bits in them to indicate the current pin states for each led in the column, but this is easier and I'm not low on ram.

Fair enough. Out of curiosity,did you try my suggested code? (I haven't tested that exact version myself, but I think I will - tomorrow) It "brute forces" all LEDs to off at the end, but a simple test for not zero could be implemented at the end similar to yours, if you want to do that.

if (array[x+y] != 0) digitalWrite(firstpin + x + y, LOW);

(as any non-zero value would be lit at some point, well not accounting the 4 us time resolution) Also the "+ x" is not really needed..I just forgot that. Just a minor detail though.

The 8 leds are in a column in my setup, with the first row at the bottom of the display. I'm sinking the columns because I'm making a vu-meter like display. It's actually set up in a 16x4 configuration, but to save pins I'm indexing it like an 8x8.

But yeah, I guess that is the opposite of what people would expect. :slight_smile:

Fair enough. Out of curiosity,did you try my suggested code? (I haven't tested that exact version myself, but I think I will - tomorrow) It "brute forces" all LEDs to off at the end, but a simple test for not zero could be implemented at the end similar to yours, if you want to do that.

No, I haven't, but in an earlier version of the code I did brute force all the leds off. I don't think that's the issue.

In fact, now that I went through the code and described how it works to you, I think I understand what the issue might be. And it's a problem which may not be solvable in software.

The Arduino has a microsecond timer resolution of 4ms. You know how I said "an led with brightness 63 will have been off for around 192 millseconds and then on for around 63"? Well, the reason I said "around" is because of that 4 microsecond resolution.

The timing code will wait until more than 63 microseconds have passed before turning on the led. But even that loop runs 100 times a microsecond, that timer won't tick over till four microseconds have elapsed. And depending on when I stored delaystart, the led may turn on after 63 microseconds, or it may turn on after 67 microseconds.

Now, for an led which has a brightness of 127, shaving 4 microseconds off its ON time isn't gonna make a whole lot of difference in its brightness.

But what about an LED which is dimly lit? Ie, the ones I'm having problems with? Well, those leds are only supposed to be on for between 4-32 microseconds. And shaving 4 microseconds off that time is quite a difference. Shaving four microseconds off an led which is only supposed to be on for four microseconds leaves nothing at all.

But where does the flicker come in? Well, the error isn't going to be constant here. So sometimes it will be brighter, and sometimes it will be darker. On average, it will be, well, the average of the desired brightness minus 2 miicroseconds, and if I was updating the display much faster then perhaps it would appear that way to the eye. But I can't update the display faster. Or at least, my efforts so far have been unsucessful.

Oh... and if I was successful, that success would come at a cost. Right now, I've got 64 brightness levels. 256 microseconds / microsecond timer resolution of 4 = 64. If I update the display 4x as fast, and spend only 64 microseconds on each column, then I'll have only 16 brightness levels. Which I suppose might not be terrible. I'm not exactly displaying graphics here.

But there's another side effect of having only 16 brightness levels that I just thought of. And that might be the key to why I seemed to be having difficulty getting faster updates to work properly.

If I don't have my timing code set up just right, then leds which have a brightness of < 4 may always be off. And my problem when increasing the speed of the updates was that I was losing like half my display. The leds that should have been dim were turning off.

Well, it turns out that if increase the speed, and I have only 16 brightness levels, then that means any led with a brightness < 16 may not be lit, because those are now all the ones whose ON time will be 4 microseconds or less.

So what's the solution?

Well, perhaps the solution isn't to update the display faster, but slower? I am after all not having a problem with flickering wih the brighter leds. Only the dimmer ones. So maybe I can spend more than 255 miliseconds on each column?

I'm not even sure exactly why I selected 490hz for my updates. I think I may have been using the fact that the Arduino uses 490hz for it's PWM as my starting point to avoid flicker.

Hm... Well I guess I should do the math. Let's see.

There are 1000000 microseconds in a second. I need the display to update at a minimum of 70hz to avoid flicker even without PWM. That means I can spend a total of 14285 microseconds on each update of the display.

And my display is broken up into 8 columns. So that leaves me with 1785 milliseconds to spend on each column. Which is what I was doing before all this PWM nonsense came along.

Now if the display is updating at 70hz, then does it matter if I turn off some of the leds for part of that time? The flickering on and off should be 70hz as well. So perhaps I was mistaken to up my update speed to avoid flciker? I think I may have done that in the first place because I was having that problem with not being able to get the leds to be very dim, but that was when I was turning them all on at the strt of their PWM cycle and then off, and accessing the ports over and over without storing the port states.

So maybe I made a boo boo.

It's going to make the math a bit more complicated but I guess I should see what happens when I try the 1785 microsecond duty cycle. That'll leave the leds on over 3.5x as long. And that means the microsecond timing will be over 3.5x as accurate. And since it is that timing which seems like the likely culprit in regards to my low brightness leds flickering, improving that might get rid of my problem.

Anyway, I'll post back here once I've got the new code up and running to let you know if it works!

I need the display to update at a minimum of 70hz to avoid flicker even without PWM. That means I can spend a total of 14285 microseconds on each update of the display.

OK.

And my display is broken up into 8 columns. So that leaves me with 1785 milliseconds[sic] to spend on each column.

(microseconds?) But NO, I don't think so. For full brightness, each LED needs to be on for ALL of the 14285 microseconds, not just all of the 1785 microseconds.

If you want 256 steps of brightness for each LED, that makes each step of the 14285 microseconds 55 microseconds long. And during that 55 us, you have to compute whether each of the 64 LEDs should change state. So you have somewhat less than 1 us for the calculations on each LED. About 12 AVR instructions.

This is relatively difficult, especially since that 12 instructions also has to include the code to turn the LED on or off depending on the results. Perhaps it is impossible. This is why hardware support for PWM is so popular!

I have not carefully read all the preceding messages, but it LOOKED to me like your fundamental problem is that you're assume that the calculations you are performing are fast enough relative to the on/off times involved that you can use "delay-like" calls to measure on and off times. When in fact you'll be hard-pressed to run your loop fast enough to do SW PWM on 64 leds even with NO artificial delays...

(microseconds?) But NO, I don't think so. For full brightness, each LED needs to be on for ALL of the 14285 microseconds, not just all of the 1785 microseconds.

Yes, microseconds. I keep mistyping that because before I used the Arduino I've only ever had access to millisecond timers in my code. :slight_smile:

As for 'full brightness', I'm not sure if you're quoting me or not, but the display has to be multiplexed, so 'full brightness' for any led is being on for the whole time each column is active, which in the case of a 70hz display update is 1785 microseconds.

If you want 256 steps of brightness for each LED, that makes each step of the 14285 microseconds 55 microseconds long. And during that 55 us, you have to compute whether each of the 64 LEDs should change state. So you have somewhat less than 1 us for the calculations on each LED. About 12 AVR instructions.

This is relatively difficult, especially since that 12 instructions also has to include the code to turn the LED on or off depending on the results. Perhaps it is impossible. This is why hardware support for PWM is so popular!

Hm... interesting. I haven't yet delved into instruction speed on this thing, but I figured I didn't have a lot of time to do the work. When I first started coding in C and Asm years and years ago, I was working with a 16mhz 386.

I have not carefully read all the preceding messages, but it LOOKED to me like your fundamental problem is that you're assume that the calculations you are performing are fast enough relative to the on/off times involved that you can use "delay-like" calls to measure on and off times. When in fact you'll be hard-pressed to run your loop fast enough to do SW PWM on 64 leds even with NO artificial delays...

Yep. That's exactly what I'm doing. I hoped that the Arduino was fast enough. And it seems like it's almost fast enough. The biggest speed hit was writing the ports. Once I did checks to avoid that, things got much better.

But they were still far from perfect, so I kept optimzing.

This is what I've ended up with:

byte array[8] = {31, 31, 63, 63, 127, 127, 255, 255};           // Data for array.  Brightness of each led in array 0..255.  

const int rows = 8;                                             // Number of rows in array. 
const int columns = 8;                                          // Number of columns in array. (My array is arranged such that one whole column is illuminated at a time, rather than one whole row, as you might have in a standard array.)

const int rowPin0 = 2;                                          // First row pin.  
const int colPin0 = -1;                                         // First column pin.  Not used at this time.
  
const int hz = 61;//122;                                        // Number of times per second display is updated. 
                                                          
const int columnduty = 2048;  //1000000.0 / float(hz) / float(columns);  // The duty cycle for each column.  The number of microseconds for which leds in it will be lit before we move onto the next column.  1024 = 122hz
const float dutyscale = float(columnduty) / 256.0;              // Compute factor by which brightness should be scaled by to bring the duty cycles for the leds in line with the duty cycle of each column.

unsigned int dutycycle[256] = {0};                              // Duty cycle lookup table.  Index is LED brightness 0..255.  Data is duty cycle for LED in microseconds.

byte pinState[20] = {0};                                        // The current state of each pin, if set with the digitalHIGH() and digitalLOW() functions.


void setup()   {                
  
  for (int x = 2; x < 10; x++) { // Set pins to output mode.
    pinMode(x, OUTPUT);
  }
  
  for (int x = 0; x < 256; x++) { // Create lookup table to convert brightnesses to duty cycles. (Duty cycles are LOW period!)
    //dutycycle[x] = columnduty - x*dutyscale; // Linear.
    dutycycle[x] = columnduty - pow(x/255.0, 2)*255.0*dutyscale; // Logarithmic.
  }  
  
}


void loop()                     
{

  static int v = 0;
  static int i = 2;

  
  // Flicker array:
    /*
    for (int x = 0; x < 8; x++) {
      array[x] = array[x] + random(9) - 4;
      array[x] = constrain(array[x], 0, 255);
    }
    */
    
  // Pulse array:
    
    v = v + i;
    if (v > 255) { v = 255; i = -i; }
    if (v < 0)   { v = 0;   i = -i; }  
      
    for (int x = 0; x < 8; x++) {
      array[x] = v;
    }   
   
  
  updatearray();
    
}


void updatearray() {
  
  byte index;  
  unsigned long delaystart;
  int timeelapsed;
  
  for (int x = 0; x < columns; x++) { // Step through each column.
      
    if (x < 1) { // Do not update any columns after the first. *TEST CODE*

      // Illuminate column:

        index = x*rows;
   
        timeelapsed = 0;
        delaystart = micros(); 
        
        while (timeelapsed < columnduty) { // Wait for this column's duty cycle to complete.
          for (int y = 0; y < rows; y++) {          
            if (timeelapsed >= dutycycle[array[index+y]]) { 
                if (pinState[rowPin0 + y] == LOW) {
                  digitalHIGH(rowPin0 + y);
                }                  
            }              
          } 
          timeelapsed = micros()-delaystart; 
        } 
      
      // Blank column:
      
        for (int y = 0; y < rows; y++) {
          if (pinState[rowPin0 + y] == HIGH) { 
            digitalLOW(rowPin0 + y);
          }
        }
      
    }
    else {
      delayMicroseconds(columnduty); // Removing this delay when testing would make the leds brighter when fewer are lit, and could burn them out if we are pulsing them at higher than their rated current.
    }  
    
  }
  
}
  
 
// These functions set pins faster than the digitalWrite() function, and record their current state in the pinState[] array.
// They do not check to make sure you have attempted to set a valid pin, nor do they make sure the pin is not set to PWM mode before attempting to set it.
 
#include "pins_arduino.h"
 
void digitalHIGH(uint8_t pin)
{
        uint8_t bit = digitalPinToBitMask(pin);
      uint8_t port = digitalPinToPort(pin);
      volatile uint8_t *out;

      out = portOutputRegister(port);

      *out |= bit;

        pinState[pin] = HIGH;
}  

void digitalLOW(uint8_t pin)
{
      uint8_t bit = digitalPinToBitMask(pin);
      uint8_t port = digitalPinToPort(pin);
      volatile uint8_t *out;

      out = portOutputRegister(port);

      *out &= ~bit;

        pinState[pin] = LOW;
}

As you can see, I've created modified versions of the port writing functions, where I got rid of some unnecessary code, and added code to keep track of the state of the ports in an array.

I also optimized my loops, by moving the calculations for the duty cycles out of them and into the setup function. I realised I could just store the off time instead of storing the on time and calculating the off time while in the loop.

Lastly, I determined that it was indeed better to update the display slower than faster, and the 61hz update I am now using (2048 microseconds per column) provides me with a theoretical 512 levels of brightness rather than the 64 I was getting with the 490hz update which spent only 256 microseconds on each column.

Do I still get flicker with these changes? Unfortunately, yes.

Using the logarithmic setup (which I've now debugged, and which provides a nice linear ramp when fading in and out) anything below 63 for brightness will flicker noticably, and anything below 7 is completely dark. And with the linear setup, where most of the values are scrunched up into the brighter end of the spectrum, anything below 7 will flicker slightly, and 1 is dark. But that is because 7 in logartihmic mode is significantly darker than 7 in linear mode.

But it is much better than it was. And when it is animated, you can't even see the flicker.

There is one other issue worth mentioning too, but it's not something that could be fixed in code...

Becuase of slight differences in led manufacture, even with a bargraph which is supposed to have matched leds, if you dim it to half brightness this way you can see differences in the leds. Also, some leds seem more susceptible to flicker than others.

Anyway, if anyone has any suggestions on how I might futher optimize this without getting too crazy with the code I'd be interested in hearing it. I thought about unrolling the loop eight times, but it didn't really seem like it would be worthwhile. It'd make the code hard to modify.

But what about an LED which is dimly lit? Ie, the ones I'm having problems with? Well, those leds are only supposed to be on for between 4-32 microseconds. And shaving 4 microseconds off that time is quite a difference.

I think definately you are on to it there!

Also, to really perfect the PWM, you should use direct port adressing for speed, and setting up an interrupt for each row (or in your case column). That way your normal code will execute between updates to the display. In fact I made one such thing using shiftregisters, at the severe cost of PWM resolution (but it was okay for me, I had to save on pins too).

The code below I regard as just as a proof of concept thing (also based on your code, so just one string of LEDs so far). It uses the delayMicroseconds() function between columns, and the normal digitalWrite for the pins (very wasteful, but OK for testing/small apps). Since micros() have a resoution of 4 us, I use that between each PWM step, thus one column of LED's take 1024 us. For an 8 by 8 matrix it should be a max FPS of about 120 (but not leaving much room for other calculations) At least unless my math is far off. If you want 70 Hz you still got some extra time. Then you'd either need to implement some FPS control in the main loop() for that, or use software interrupt for the columns as stated above. Oh, and no flicker at lower levels as far as I can see. Imho the arduino can do this just nicely. Also this is not corrected logartithmically or anything.

// PWM test for a string of LEDs


//int array[8] = {3, 7, 15, 31, 63, 127, 191, 255};
unsigned char array[8] = { 0, 1, 2, 15, 31, 63, 127, 255 };

const int firstPin = 2;


// time for one "slot" of PWM resolution (e.g. 1/256th period for one column (or row, if applicable). Using 4 as that is the resolution of the micros() timer.
// At the cost of some FPS. Still 256 * 4us = 1024 us for one column (theoretically, sans array and digitalWrite time)
//   8192 us for 8 by 8 ~ about maximum of 120 Hz framerate, not counting other code. (which would worsen FPS).
//   not optimal solution, the delay() or wasting time between PWM "slots" should be avoided.
const long PWMslot = 4; 


void setup()
{                
  //frameStart = millis();
  for (int x = 2; x < 10; x++)
  {
    pinMode(x, OUTPUT);
  }
}



void loop()                    
{
  /*
  // pulsating test pattern
  float period = 1000; // ms
  float time = millis()/period;
  for (int i=0; i<8; i++) // test pattern
  {
    float value = 128 + 127.0 * sin(2*PI*time - (2*PI*i/32)); // show 8/32th of a sin curve simultaneously on 8 LEDs
    array[i] = (unsigned char) value;
  }
  */
    
  
  updatearray();
    
}


void updatearray()
{
  
  for (int x = 0; x < 64; x+=8) // Simulate updating 64 leds.
  {
    if (x < 8)
    {
      for (int PWMcycle=255; PWMcycle >= 0; PWMcycle--)
      {
        //int PWMstart = micros();
        for (int y = 0; y < 8; y++)
        {
          if ( (int)array[x+y] > PWMcycle) digitalWrite(firstPin + x + y, HIGH);
        }
        delayMicroseconds(PWMslot);
        //while (micros()-PWMstart < PWMslot); // alternate wait, not really much better.
      }

      for (int y = 0; y < 8; y++) digitalWrite(firstPin + x + y, LOW);

    } else {

      delayMicroseconds(256 * PWMslot); // Simulating waiting for the rest of the display
    }
  }
   
}

I tried your code. It does display without flicker, but I see several bands indicating it can't display many different brightness levels, and the lowest brightness it can display is something like 4-5x as bright as the minimum levels I am achieving now.

So I've made a breakthrough with the speed at which I can update the display. Here's my latest code:

byte array[8] = {1, 3, 7, 15, 31, 63, 127, 255};                 // Data for array.  Brightness of each led in array 0..255.  
//byte array[8] = {1, 2, 3, 4, 5, 6, 7, 8};                 // Data for array.  Brightness of each led in array 0..255.  
//byte array[8] = {1, 1, 1, 1, 1, 1, 1, 1};                 // Data for array.  Brightness of each led in array 0..255.  

const int rows = 8;                                             // Number of rows in array. 
const int columns = 8;                                          // Number of columns in array. (My array is arranged such that one whole column is illuminated at a time, rather than one whole row, as you might have in a standard array.)

const int rowPin0 = 2;                                          // First row pin.  
const int colPin0 = -1;                                         // First column pin.  Not used at this time.
  
const int hz = 61;//122;                                        // Number of times per second display is updated. 
                                                          
const int columnduty = 2048;  //1000000.0 / float(hz) / float(columns);  // The duty cycle for each column.  The number of microseconds for which leds in it will be lit before we move onto the next column.  1024 = 122hz
const float dutyscale = float(columnduty) / 256.0;              // Compute factor by which brightness should be scaled by to bring the duty cycles for the leds in line with the duty cycle of each column.

unsigned int dutycycle[256] = {0};                              // Duty cycle lookup table.  Index is LED brightness 0..255.  Data is duty cycle for LED in microseconds.

byte pinState[20] = {0};                                        // The current state of each pin, if set with the digitalHIGH() and digitalLOW() functions.


void setup()   {                
  
  for (int x = 2; x < 10; x++) { // Set pins to output mode.
    pinMode(x, OUTPUT);
  }
  
  for (int x = 0; x < 256; x++) { // Create lookup table to convert brightnesses to duty cycles. (Duty cycles are LOW period!)
    dutycycle[x] = columnduty - x*dutyscale; // Linear brightness. (127 does not look half as bright as 255.)
    //dutycycle[x] = columnduty - pow(x/255.0, 2)*255.0*dutyscale; // Logarithmic brightness. (127 looks as you would expect it to.)
  }  
  
}


void loop()                     
{

  static int v = 0;
  static int i = 2;

  
  // Flicker array:
    /*
    for (int x = 0; x < 8; x++) {
      array[x] = array[x] + random(9) - 4;
      array[x] = constrain(array[x], 0, 255);
    }
    */
    
  // Pulse array:
    
    v = v + i;
    if (v > 255) { v = 255; i = -i; }
    if (v < 0)   { v = 0;   i = -i; }  
      
    for (int x = 0; x < 8; x++) {
      //array[x] = v; // Uncomment this to see pulsing.  Leave commented out to see data in array[].
    }   
   
  
  updatearray();
    
}


void updatearray() {
  
  byte index;  
  unsigned long columnstart;
  int timeelapsed;
  int minduty;
  int newminduty;
  int duty;
  
  for (int x = 0; x < columns; x++) { // Step through each column.
      
    if (x < 1) { // Do not update any columns after the first. *TEST CODE*

      // Illuminate column:
        
        // Find the row with the shortest LOW time in this column.  If more than one row has the same duty cycle, pick the first one.
        
          minduty = columnduty; // Assume all leds are off.  
          for (int y = 0; y < rows; y++) {          
             if (dutycycle[array[index+y]] < minduty) { minduty = dutycycle[array[index+y]]; }
          } 
                
        index = x*rows;
   
        timeelapsed = 0;
        columnstart = micros(); 
        
        while (timeelapsed < columnduty) { // Wait for this column's duty cycle to complete.
        
          if (timeelapsed >= minduty) { // If timelapsed is greater than or equal to the time at which the next brightest led in this column should turn on...
                      
            newminduty = columnduty; // Set newminduty equal to the max time for this column, so we can search for times less than it, but greater than the current time (mintime).
            
            for (int y = 0; y < rows; y++) { // Loop through each row.
             
              duty = dutycycle[array[index+y]];             
              
              if (duty < minduty) { continue; } // If this row has already been illuminated, skip to next row. (Skips additional processing on half of column on average!)
              
              if (duty > minduty) { // If this row is not to be illuminated at this time, check to see if it is the next which should be.
                if (duty < newminduty) { newminduty = duty; }  
                continue; // Skip next check to see if this row should be illuminated at this time.
              } 
              
              // if (duty = minduty) { --- If we're here, then duty definitely = minduty.  We don't need to check!
              
                //if (pinState[rowPin0 + y] == LOW) { // This check is not be necessary, since we only attempt to set each pin once with this setup!
                  digitalHIGH(rowPin0 + y);
                //}                  
                
              // }
            }              
            
            minduty = newminduty; // If all leds are on, newminduty will equal columnduty, and this loop won't be executed again.
            
          } 
          
          timeelapsed = micros()-columnstart; 
          
        } 
      
      // Blank column:
      
        for (int y = 0; y < rows; y++) {
          if (pinState[rowPin0 + y] == HIGH) { 
            digitalLOW(rowPin0 + y);
          }
        }
      
    }
    else {
      delayMicroseconds(columnduty); // Removing this delay when testing would make the leds brighter when fewer are lit, and could burn them out if we are pulsing them at higher than their rated current.
    }  
    
  }
  
}
  
 
// These functions set pins faster than the digitalWrite() function, and record their current state in the pinState[] array.
// They do not check to make sure you have attempted to set a valid pin, nor do they make sure the pin is not set to PWM mode before attempting to set it.
 
#include "pins_arduino.h"
 
void digitalHIGH(uint8_t pin)
{
        uint8_t bit = digitalPinToBitMask(pin);
      uint8_t port = digitalPinToPort(pin);
      volatile uint8_t *out;

      out = portOutputRegister(port);

      *out |= bit;

        pinState[pin] = HIGH;
}  

void digitalLOW(uint8_t pin)
{
      uint8_t bit = digitalPinToBitMask(pin);
      uint8_t port = digitalPinToPort(pin);
      volatile uint8_t *out;

      out = portOutputRegister(port);

      *out &= ~bit;

        pinState[pin] = LOW;
}

The big change here is that rather than loop through each led in a column, and checking to see if the current time is greater than the time at which it should trigger, I now loop through the column once, look for the next trigger time, and then simply wait for that time to arrive. When it does, I light all leds which should light at that point in time, and at the same time, do a search for the next time at which I need to light an led.

This change means that most of the time, the loop is checking only a single variable, whereas before it needed to access an array eight times each loop. The change also allowed me to get rid of the check on the led's current state.

...and one more update.

I think I've taken this as far as it can go without getting messy.

byte array[8] = {1, 3, 7, 15, 31, 63, 127, 255};                 // Data for array.  Brightness of each led in array 0..255.  
//byte array[8] = {1, 2, 3, 4, 5, 6, 7, 8};                 // Data for array.  Brightness of each led in array 0..255.  
//byte array[8] = {1, 1, 1, 1, 1, 1, 1, 1};                 // Data for array.  Brightness of each led in array 0..255.  

const int rows = 8;                                             // Number of rows in array. 
const int columns = 8;                                          // Number of columns in array. (My array is arranged such that one whole column is illuminated at a time, rather than one whole row, as you might have in a standard array.)

const int rowPin0 = 2;                                          // First row pin.  
const int colPin0 = -1;                                         // First column pin.  Not used at this time.
  
const int hz = 61;//122;                                        // Number of times per second display is updated. 
                                                          
const int columnduty = 2048;  //1000000.0 / float(hz) / float(columns);  // The duty cycle for each column.  The number of microseconds for which leds in it will be lit before we move onto the next column.  1024 = 122hz
const float dutyscale = float(columnduty) / 256.0;              // Compute factor by which brightness should be scaled by to bring the duty cycles for the leds in line with the duty cycle of each column.

unsigned int dutycycle[256] = {0};                              // Duty cycle lookup table.  Index is LED brightness 0..255.  Data is duty cycle for LED in microseconds.

byte pinState[20] = {0};                                        // The current state of each pin, if set with the digitalHIGH() and digitalLOW() functions.


void setup()   {                
  
  for (int x = 2; x < 10; x++) { // Set pins to output mode.
    pinMode(x, OUTPUT);
  }
  
  for (int x = 0; x < 256; x++) { // Create lookup table to convert brightnesses to duty cycles. (Duty cycles are LOW period!)
    dutycycle[x] = columnduty - x*dutyscale; // Linear brightness. (127 does not look half as bright as 255.)
    //dutycycle[x] = columnduty - pow(x/255.0, 2)*255.0*dutyscale; // Logarithmic brightness. (127 looks as you would expect it to.)
  }  
  
}


void loop()                     
{

  static int v = 0;
  static int i = 2;

  
  // Flicker array:
  /*
    for (int x = 0; x < 8; x++) {
      array[x] = array[x] + random(9) - 4;
      array[x] = constrain(array[x], 0, 255);
    }
  */

    /*for (int x = 0; x < 8; x++) {
      array[x] = random(256);
    }*/
  
  // Pulse array:
    
    v = v + i;
    if (v > 255) { v = 255; i = -i; }
    if (v < 0)   { v = 0;   i = -i; }  
      
    for (int x = 0; x < 8; x++) {
      array[x] = v; // Uncomment this to see pulsing.  Leave commented out to see data in array[].
    }   
   
  
  updatearray();
    
}


void updatearray() {
  
  byte index;  
  unsigned long columnstart;
  int timeelapsed;
  int rowpin[rows];
  int rowduty[rows];
  int nextduty;
  
  for (int x = 0; x < columns; x++) { // Step through each column.
      
    if (x < 1) { // Do not update any columns after the first. *TEST CODE*

      // Illuminate column:
        
        // Store pin numbers and their coinciding duty cycles in two arrays:
          for (int y = 0; y < rows; y++) {          
            rowpin[y] = rowPin0 + y;
            rowduty[y] = dutycycle[array[x*rows+y]];
          }

        //  Sort duty cycle list, keeping pin numbers synchronized with it. (Bubble sort)
        
          int i, j, temp;
 
          for (i = (rows - 1); i >= 0; i--)
          {
            for (j = 1; j <= i; j++)
            {
              if (rowduty[j-1] > rowduty[j])
              {

                temp = rowduty[j-1];
                rowduty[j-1] = rowduty[j];
                rowduty[j] = temp;

                temp = rowpin[j-1];
                rowpin[j-1] = rowpin[j];
                rowpin[j] = temp;
                
              }
            }
          }
     
        index = 0;   
        timeelapsed = 0; 
        nextduty = rowduty[index];
        columnstart = micros(); 
                
        while (timeelapsed < columnduty) { // Wait for this column's duty cycle to complete.
          while (timeelapsed >= nextduty) { // If timelapsed is greater than or equal to the time at which the next brightest led in this column should turn on...
            digitalHIGH(rowpin[index]);
            index++;
            if (index < rows) { nextduty = rowduty[index]; } else { nextduty = columnduty; }
          }   
          timeelapsed = micros()-columnstart; 
        } 

        
      // Blank column:
      
        for (int y = 0; y < rows; y++) {
          //if (pinState[rowPin0 + y] == HIGH) { 
            digitalLOW(rowPin0 + y);
          //}
        }
      
    }
    else {
      delayMicroseconds(columnduty); // Removing this delay when testing would make the leds brighter when fewer are lit, and could burn them out if we are pulsing them at higher than their rated current.
    }  
    
  }
  
}
  
 
// These functions set pins faster than the digitalWrite() function, and record their current state in the pinState[] array.
// They do not check to make sure you have attempted to set a valid pin, nor do they make sure the pin is not set to PWM mode before attempting to set it.
 
#include "pins_arduino.h"
 
void digitalHIGH(uint8_t pin)
{
        uint8_t bit = digitalPinToBitMask(pin);
      uint8_t port = digitalPinToPort(pin);
      volatile uint8_t *out;

      out = portOutputRegister(port);

      *out |= bit;

        //pinState[pin] = HIGH;
}  

void digitalLOW(uint8_t pin)
{
      uint8_t bit = digitalPinToBitMask(pin);
      uint8_t port = digitalPinToPort(pin);
      volatile uint8_t *out;

      out = portOutputRegister(port);

      *out &= ~bit;

        //pinState[pin] = LOW;
}

This new code gets rid of almost everything in the loop which updates a column. Now I calculate the off time for each led, store that in an array, take the pin number for the led, store that in another array, then sort the first array so the shortest off times are at the beginning, and keep the second array synced with it so that pin numbers in onr array correlate with off times in the other.

Then in the main loop I simply wait until the first off time, check to make sure there aren't any others which are the same, and then wait until the next off time. If I get through all the off times before my time on the column is up, the code waits. Then I move on to the next column.

As with this setup I no longer need to know a pins state, simply turning it on once it's time to do so, and then turning them all off at the end of each column's duty cycle, I commented out the code that keeps track of that in those two functions I put in there whcih write to the pins faster.

Anyway, with this code, there is now almost no flicker, and darker shades no longer get lost or flicker madly when other shades appear in the same array causing a delay which made the code miss its chance to render them. There is only a teeny bit of flicker in the darkest shades, and a little intermittent flicker when displaying one solid dark array. And I think that may be due to the sorting algorithm taking up a bit of time at the start of a column. I used a bubble sort which I heard was fast on small arrays even though it sucks on big ones; and you don't get much smaller than this!