Optimization of code for maximum speed, very specific project.

A while back I built a Christmas lights controller that used Vixen to create the light sequences. That project used mechanical relays and, while it worked fine, it didn't provide any real dimming capabilities. I've since moved on to version 2 of the controller using random-cross SSRs instead of the mechanical relays, the hardware is done, now and I'm working on the software. I'm trying to do full dimming of 32 channels of lights all controlled via the Vixen sequencing software.

I've already done most of what I can think of to speed it up, even before bothering to do real testing on it. (I've already tested the hardware side) My normal rule would be make it work then make it work fast. Unfortunately in this case if it isn't fast it just won't work. :slight_smile: I've put comments in the code explaining why I thought doing certain things would speed it up. I've also put in TODO: comments every place I'm curious about the method I used or think there is a better, faster way to do something.

What I would like is a little help or pointers in the right direction for squeezing every extra cycle possible out of the sketch. I've put the old (version 1) and new (version 2) code up here:

https://code.google.com/p/arduino-christmas-lights-control-system-for-vixen/

The new code is the XmasLightControllerForVixenSSRs.ino download. I'm using 1.0.1 version of the Arduino IDE right now.

The basics of the sketch are:

1> reads 32 channels of data from Vixen via an FT232R breakout board on Serial1, Vixen sends me 32 channels (bytes) every 50ms but this is adjustable. I want to be able to handle the data at the fastest possible speed
2> uses an AC zero-cross detection circuit to trigger an interrupt so I can calculate when in the AC cycle to turn the digital pins on/off (this is the dimming, turn on later in the cycle makes the lights dimmer)
3> uses portions of the Faster Digital Write library to do port manipulations
4> has a random lights mode (you can ignore all the code related to this as it doesn't need to be fast)
5> uses Timer/Counter 1 in CTC mode to keep track of number of ticks (cycles) via an interrupt
6> The methods that have to execute as fast as possible are: loop(), readFromVixen(), ZeroCrossDetected(), ISR(TIMER1_COMPA_vect) - everything else can run slower.
7> uses an ArdunioMega 2560

Any insights you can give would be greatly appreciated. Thanks.

I would have posted the code but it is too long for the forum, thus the link to code.google.com above.

unsigned long time = millis();

...is pointless. millis always returns zero during the RTL initialization.

zparticle:
I would have posted the code but it is too long for the forum, thus the link to code.google.com above.

Files can be attached to posts.

// speed for the comm port for talking with vixen, this is the comm
// port on the FT232R breakout board.
#define VIXEN_COM_SPEED 57600

// speed for talking with the serial monitor in the IDE, this is the
// comm port built into the ardunio with the USB port
#define PC_COM_SPEED 115200

If possible, use higher baud rates. 250000, 500000, 1000000 are good choices.

Thank you. The file is attached here. :slight_smile:

XmasLightControllerForVixenSSRs.ino (46.8 KB)

  randomSeed(analogRead(0));

...in most cases is the same as this...

  randomSeed( 614 );

In any case, randomSeed should be called once in setup. Calling it more often indicates you do not trust the generator. In that case the generator should be replaced with one you do trust.

  delay(RANDOM_MODE_SPEED);
  if(waitForVixenHeader()) // we got the data header now read the channel data
void readFromVixen()

...is where you start optimizing. The most effective way to optimize Arduino code is to get rid of blocking code like delay.

If the ISR / digital I/O code is a bottleneck (I highly doubt it is) using "port manipulation" should allow you to speed it up. You would be able to manipulate eight bits / eight I/O pins with a few machine instructions.

Thanks again. I've been reading this link about the time ISRs take, found a reference to it in another post here. Seems I completely under estimated the cycles it takes just to get into and out of an ISR. Maybe it won't be a problem though.

I'd like to do 255 levels of dimming but I suppose if I find I can't make that happen I can change that to something smaller like 32 or 64. Vixen lets you define the lower and upper bounds of the channel values.

The delay() call shouldn't be a problem, where it is, as it is used in the POST and the random mode which isn't an issue because there is no dimming happening there and there is no reading data from Vixen. So thumbs up on that one. :slight_smile:

So I know you don't have time to go totally in depth but: With the little bit you have been able to look, do you think the code I have in the ISRs is a bad (slow) way to structure it? Sounds like you think that should be fast enough the way it is. I guess my next move is to truly hook this up and see what happens, :stuck_out_tongue:

zparticle:
I guess my next move is to truly hook this up and see what happens,

That's what I would do. :wink:

Just noticed I have the powerDelayPreCalc array backwards, lol. Higher channel values need lower powerDelay values to make them turn on earlier in the AC cycle.

Does v2 run as you expect?

This code fragment, picked out of the array at random, has me thinking not.

TICK_HALF_CYCLE * ( 12 / 255)

'TICK_HALF_CYCLE' is defined as -

#define TICK_HALF_CYCLE 133333

The signed integer literal '133333' is outside the range of an unsigned integer which on the ArdunioMega 2560 is 2 bytes wuth a range of -32766 to 32767.

Next we have the signed integer literal 12 to be divided by the signed integer 255 which should result in 0.

You then want to multiply the out of range 'TICK_HALF_CYCLE' by 0 which should then result in an array entry of 0

It looks to me as if the whole table 'powerDelayPreCalc' should be filled with 0's

Thank you, I hadn't run through most of this code yet. That probably would have had me scratching my head for a bit. :blush:

Attached is the new version.

XmasLightControllerForVixenSSRs.ino (59.8 KB)

How do you know it's too slow? And how much too slow IS it? Can you do the sort of dimming you're looking for, with your existing code, on 16 pins? 2 pins?

Hmm. It looks like the biggest problem you have at the moment is trying to run the timer interrupt too often. Trying to interrupt in fewer cycles than your service routine takes to execute is a recipe for disaster; you'll essentially end up executing one main program instruction, doing the entire ISR, one more main program instruction, etc...

Here are some details and other comments.

There's an intermediate-speed digitalWrite that might be useful. Have arrays indexed by your channel_pin (whatever is sent over the serial port) for Port and bitmask, just like the real digitalWrite does. Except populate them at initialization time instead of every time you want to write to the pin:

void populate_fdw_tab (char myPinNum, arduinoPinNum) {
   fdw_port[myPinNum] = digitalPinToPort(arduinoPinNum);
   fdw_ONbitmask = digitalPinToBitMask(arduinoPinNum);
   fdw_OFFbitmask = ~digitalPinToBitMask(arduinoPinNum);
   pinMode(arduinoPinNum, OUTPUT);
   digitalWrite(arduinoPinNum, LOW);
}
   :
void setup() {
  populate_fdw_tab( 0, 22);  // lights are on pins 22 -52
  populate_fdw_tab( 1, 31);  // and can be mapped "however"
    :
  populate_fdw_tab(31, 53);
}

Then, turning a pin off is just:
*fdw_port[pin] &= fdw_OFFbitmask[pin];
and turning it on is just:
*fdw_port[pin] |= fdw_ONbitmask[pin];

(Hmm. You can probably scratch all that, since you're always dealing with all the pins, and the current fastdigitalWrite is doing single instructions. It would give you faster random/variable access to your pins, but I guess that's unnecessary.)

Similarly, since serial messages are only occasional and not more than every 50ms, while AC cycles happen every 1/120 s (8.3ms),
get rid your calculations of powerDelay[] in ZeroCrossDetected and move them into the serial receive code.

How many brightness levels are you trying to get? Make sure that you're not running your timer interrupt too often, or it will suck up extra cycles in overhead that might be needed elsewhere. (It looks like this is really bad at the moment! This is probably the biggest change that needs to happen!) tickCounter should probably be MUCH smaller than 32bits; ideally you should divide the 8.3mS half-cycle time into exactly N ticks by adjusting the timer prescaling and count, which will give you up to N brightness levels and simplify a LOT of the math that is happening, assuming that N < 256.

This code is interesting:

  if(powerDelay[CHANNEL_PIN_1] <= tickCounter) customDigitalWrite(CHANNEL_PIN_1, HIGH); // turn the channel on here
  if(powerDelay[CHANNEL_PIN_2] <= tickCounter) customDigitalWrite(CHANNEL_PIN_2, HIGH); // turn the channel on here
  if(powerDelay[CHANNEL_PIN_3] <= tickCounter) customDigitalWrite(CHANNEL_PIN_3, HIGH); // turn the channel on here
  if(powerDelay[CHANNEL_PIN_4] <= tickCounter) customDigitalWrite(CHANNEL_PIN_4, HIGH); // turn the channel on here
  if(powerDelay[CHANNEL_PIN_5] <= tickCounter) customDigitalWrite(CHANNEL_PIN_5, HIGH); // turn the channel on here

Because all the array calculations are done at compile time, so it's load memory, load memory, compare, portwrite.
However, the load memory operations are currently 32bits, and since tickCounter is volatile, it is loaded each time (at significant expense.) Replace with something like:

  register uint8_t tick = tickCounter;  // get a local copy!
  if(powerDelay[CHANNEL_PIN_1] <= tick) customDigitalWrite(CHANNEL_PIN_1, HIGH);
  if(powerDelay[CHANNEL_PIN_2] <= tick) customDigitalWrite(CHANNEL_PIN_2, HIGH);
  if(powerDelay[CHANNEL_PIN_3] <= tick) customDigitalWrite(CHANNEL_PIN_3,

HIGH);

You guys have really made my night. I can't tell you how much this helps. I've got to get some sleep but I had to say thanks again. :slight_smile:

westfw:
How do you know it's too slow? And how much too slow IS it? Can you do the sort of dimming you're looking for, with your existing code, on 16 pins? 2 pins?

Well, I don't yet, to tell the truth. :slight_smile: I just know that the faster I can get the better looking the dimming is going to be.

westfw:
Hmm. It looks like the biggest problem you have at the moment is trying to run the timer interrupt too often. Trying to interrupt in fewer cycles than your service routine takes to execute is a recipe for disaster; you'll essentially end up executing one main program instruction, doing the entire ISR, one more main program instruction, etc...

Yep, I'm going to have to play with this but I have changed it to count to 500 for right now.

westfw:
Similarly, since serial messages are only occasional and not more than every 50ms, while AC cycles happen every 1/120 s (8.3ms),
get rid your calculations of powerDelay[] in ZeroCrossDetected and move them into the serial receive code.

Those calcs must have be in an earlier version of the code, right now the ZeroCrossDetected function just turns all of the channels off and then assigns the correct powerDelay values from the pre-computed list. So should be good to go here.

westfw:
How many brightness levels are you trying to get? Make sure that you're not running your timer interrupt too often, or it will suck up extra cycles in overhead that might be needed elsewhere. (It looks like this is really bad at the moment! This is probably the biggest change that needs to happen!) tickCounter should probably be MUCH smaller than 32bits; ideally you should divide the 8.3mS half-cycle time into exactly N ticks by adjusting the timer prescaling and count, which will give you up to N brightness levels and simplify a LOT of the math that is happening, assuming that N < 256.

I'm trying to get 256 levels of brightness, but frankly I'm not sure that level is needed. 64 would probably be enough but I want to shoot for 256. I've changed tickCounter to an unsigned 16bit int. I don't think I'm going to be able to run the IRS fast enough to get down to a byte, but you never know. :slight_smile:

westfw:
This code is interesting:

  if(powerDelay[CHANNEL_PIN_1] <= tickCounter) customDigitalWrite(CHANNEL_PIN_1, HIGH); // turn the channel on here

if(powerDelay[CHANNEL_PIN_2] <= tickCounter) customDigitalWrite(CHANNEL_PIN_2, HIGH); // turn the channel on here
 if(powerDelay[CHANNEL_PIN_3] <= tickCounter) customDigitalWrite(CHANNEL_PIN_3, HIGH); // turn the channel on here
 if(powerDelay[CHANNEL_PIN_4] <= tickCounter) customDigitalWrite(CHANNEL_PIN_4, HIGH); // turn the channel on here
 if(powerDelay[CHANNEL_PIN_5] <= tickCounter) customDigitalWrite(CHANNEL_PIN_5, HIGH); // turn the channel on here



Because all the array calculations are done at compile time, so it's load memory, load memory, compare, portwrite.
However, the load memory operations are currently 32bits, and since tickCounter is volatile, it is loaded each time (at significant expense.) Replace with something like:


register uint8_t tick = tickCounter;  // get a local copy!
 if(powerDelay[CHANNEL_PIN_1] <= tick) customDigitalWrite(CHANNEL_PIN_1, HIGH);
 if(powerDelay[CHANNEL_PIN_2] <= tick) customDigitalWrite(CHANNEL_PIN_2, HIGH);
 if(powerDelay[CHANNEL_PIN_3] <= tick) customDigitalWrite(CHANNEL_PIN_3,


HIGH);

This was VERY helpful, it has been a LONG time since I've had to think about this kind of stuff. I'm now grabbing a local copy.

Part of the reason I'm so focused on speed here is that I have a number of folks that may be building similar projects by following mine. While I'm only going to be using 32 channels with 256 dimming levels and a 50ms frame rate from Vixen, these other folks may be using other settings. The faster I can make this now the more likely the other people will be to not run into major issues, I hope.

Attached is the latest code.

XmasLightControllerForVixenSSRs.ino (59.3 KB)

If speed is very critical, I imagine moving the part that does the interrupt handling and turning on specific lights at specific times off to an outboard processor that does not do interrupts, instead it is dedicated, and just reads the status constantly in the loop. It should do nothing but just a busy poll, and turn on/off lights. It would need some way of getting control updates from the brain processor (another pin perhaps). Otherwise you are going to be spending a lot of time having the interrupt set up the call stack and then undo it. And if that is not fast enough, consider getting a faster processor like an ARM embedded board.

Perhaps you could use a standard 555 type circuit to do the timing, perhaps not.