LED Fire

I got my first Uno a few weeks ago. I've been a web developer for many years, but have never done C/C++ or electronics before.

My first project is to emulate fire with an RGB LED strip on my car. I found some JavaScript on Github, my fork of it can be seen at Fire by CaraesNaur. I have since ported it to both Processing and Arduino sketches. I plan to use the Android app I made in Processing to control the heat, intensity, and hopefully speed parameters of the fire on the Arduino via Bluetooth.

Note: the demo shows a large matrix, but the LEDs will only show one row of that, 2-3 rows from the bottom.

The Arduino sketch in its current form can be seen at LED Fire Arduino sketch - Pastebin.com. This version is set up as a speed test; all it does is calculate the canvas values and report how long each iteration of loop() takes to execute. As you can see from the comments in the code, I've already eliminated the float operations (and reduced max_y from 5 to 3), which reduced the iteration time from ~70ms to ~40ms, however, the execution time currently varies from 33ms to 49ms. I suspect most of the variance comes from random_heatspots().

max_x is 40 because that's how many pixels I have room for using a 60/meter strip.

Having read that memcpy() is slow, I spent an afternoon learning about pointers and made a version of this that manipulated pointers into the canvas array rather than use memcpy(), but memcpy() was faster overall.

What's left to do is:

  • Parse serial input coming from Bluetooth (I don't have a shield yet), which will likely be very short commands such as "H25\n" (heat = 25) and "I70\n" (intensity = 70).
  • Send the RGB values out to the LED strip.

I'm looking for ways to squeeze more efficiency out of the existing code, and to stabilize the execution time. If the final, fully-functional code can't be made to iterate in 50ms or less, I feel there's no point in controlling the speed parameter.

If that speed is simply beyond the capabilities of an Uno, then I think my alternative is to switch up to a Mega and have it drive 3 strips or 40 pixels, although I suspect that I wouldn't be fully utilizing a Mega. It kinda makes me wish there was a board somewhere in between, with a clock speed in the 32-64 MHz range.

I have hardware questions also, which don't strictly belong here.

What RGB strip are you using? Link?

How long does your code take to run now? Can't be very long...it's pretty short & simple.

One quick & sure way to cut off some time would be to lose the serial.print lines....if not needed.

Good job using "const"!

Why are other variables global if only used locally inside one function?

Also, counting down & pre-decrementing in for statements is generally faster/easier for Arduino than counting up & post-incrementing

for(i=5;i>=0;--i)
{
  
}

Is typically "better" than

for(i=0;i<=5;i++)
{
}

Let's put the code here, pastebin stuff can disappear. I auto-formatted it for you:

const uint8_t max_y = 3;
const uint8_t max_x = 40;
const uint8_t numcolors = 100;
const uint8_t colorbytes = 3;

uint8_t heat = 32;
uint8_t intensity = 100;
uint8_t focus = 10;
//float cf1 = .75;
//float cf2 = 5.0;
uint8_t cf1 = 16;

int speed = 100; // TODO: use this

// RGB triplets omitted for brevity
uint8_t colors[numcolors][colorbytes] = {
};


uint8_t canvas[max_y][max_x] = {
};
uint8_t maxval = 255;
uint8_t i, j;
uint8_t freshrow[max_x] = {
};
float mn_e = 10000.0, mx_e = 10, cu_e = 10000.0;
bool f = false;


unsigned long pm, cm;

int d = 1000;

void setup() {
  pm = micros() - (d * 1000);
  randomSeed(analogRead(0));
  Serial.begin(115200);

  init_canvas();
}

void loop() {
  if (f) {
    cm = micros();
    cu_e = (cm - pm) / 1000.0;
    mn_e = min(mn_e, cu_e);
    mx_e = max(cu_e, mx_e);

    Serial.print(cu_e);
    Serial.print(" (");
    Serial.print(mn_e);
    Serial.print(", ");
    Serial.print(mx_e);
    Serial.print(") ");
    Serial.println(mx_e - mn_e);
  }

  Serial.flush();
  f = true;

  pm = cm;

  move_up();
  random_heatspots();
  interpolate_all();
  draw_fire();
}

void random_heatspots () {
  int heatspots = 0;
  int x = 0;

  while (heatspots <= heat) {
    x = random(0, max_x);

    if (canvas[0][x] == 0) {
      canvas[0][x] = int((intensity * (numcolors - 1)) / 100);
      heatspots++;
    }
  }
}

void move_up() {
  // now move each row up
  for (i = max_y - 1; i > 0; i--) {
    // destination, origin, size
    memcpy(canvas[i], canvas[i - 1], sizeof(canvas[i]));
  }

  // copy freshrow into canvas[0]
  memcpy(canvas[0], freshrow, sizeof(freshrow));
}

uint8_t interpolate_point(uint8_t x, uint8_t y) {
  int coords[4][2] = {
    {
      x - 1, y    }
    ,
    {
      x + 1, y    }
    ,
    {
      x, y - 1    }
    ,
    {
      x, y + 1    }
  };

  uint8_t oc = min(canvas[y][x], numcolors - 1);
  unsigned long color = oc * focus;
  int neighbours = focus;

  for (i = 0; i < 4; i++) {
    if ((coords[i][0] >= 0) && (coords[i][0] < max_x) && (coords[i][1] >= 0) && (coords[i][1] < max_y)) {
      color += canvas[ coords[i][1] ][ coords[i][0] ] || 0;
      neighbours++;
    }
  }

  if (color == 0) {
    return color;
  }

  color /= (int) neighbours;
  int cool = random(0, cf1);
  //	int cool = random(0, (cf1 * cf2));

  if ((color - cool) <= 0) {
    return 0;
  }

  return uint8_t(min(color, numcolors - 1) - cool);
}

void interpolate_all () {
  for (uint8_t x = 0; x < max_x; x++) {
    for (uint8_t y = 0; y < max_y; y++) {
      canvas[y][x] = interpolate_point(x, y);
    }
  }
}

void draw_fire() {
  // TODO: make this send values out to the strip
}
void init_canvas() {
  for(i = 0; i < max_y; i++) {
    for (j = 0; j < max_x; j++) {
      // populate the canvas with zeroes
      canvas[i][j] = 0;
      // populate freshrow with zeroes
      freshrow[j] = 0;
    }
  }
}

1ChicagoDave is right, the Serial.print lines are probably the limiting factor. Try setting up a test which times a block of code and reports the results at the end.

I wrote a profiler that helps with that:

Be cautious about following this advice:

1ChicagoDave:
Also, counting down & pre-decrementing in for statements is generally faster/easier for Arduino than counting up & post-incrementing

for(i=5;i>=0;--i)

{
 
}




Is typically "better" than



for(i=0;i<=5;i++)
{
}

As the variable "i" is unsigned it will never be less than zero and thus the suggested loop will not exit. I don't think counting up is slow personally, and there is that trap I mentioned if you count down.

I would also get rid of all floats if you are interested in speed. Use "scaled" integers (or long integers) instead.

@Dave:

I don't know which RGB strip to get, or the best way to drive it (I2C, shift registers, etc). I've been researching them, but I admit I don't fully understand all the details. I've read about 2801, 2803, and 2811 strips, and found conflicting info about whether an Uno can provide proper timing for 2811's, especially when the rest of my code is processor-intensive.

The Serial commands that are still executed are just the ones I use to report the time of each iteration of loop() so I could see it in the serial console. I'll reply later with timing output for iteration sets of 1000 or more without the Serial commands. The current code speed is in the original post, ~33ms to ~49ms.

Pretty much everything except heat, intensity, speed, canvas, and the iterators (i, j, x, y) could be const, and probably will be when the code is finished. I'll also clean up the variable scope by then. I ran out of RAM in one of my other sketches, so I quickly learned to use uint8_t instead of int. Although I do wonder if I could use byte instead and still not have to worry about having to cast them (more overhead).

@Gammon:

I will see what your profiler gives me, thanks for the link.

I put the code on Pastebin to be sure I wouldn't violate the forum rules... this is my first post. That's also why I omitted the values in canvas.

I'll probably leave my for loops alone.

The only float left and not commented out is the one for storing the timing results, so that will go away.

Thanks for the replies, guys... keep 'em coming.

Casting isn't overhead necessarily. It just tells the compiler what you want.

I modified the sketch a bit so it tracks time without floats and spits out serial traffic only on dedicated iterations of loop(), every 1000 "real" iterations: (milliseconds)

Min    Max    Var    Avg
31.85  41.87  10.02  35.49
31.70  41.89  10.19  35.48
32.12  40.88   8.76  35.45
32.11  42.18  10.07  35.44
31.98  41.50   9.51  35.43
31.97  41.73   9.76  35.41
31.83  40.30   8.47  35.46
31.55  42.87  11.32  35.41
31.98  41.31   9.33  35.53
31.97  41.32   9.35  35.39
31.98  40.61   8.63  35.43
31.71  42.61  10.90  35.35
32.12  41.44   9.32  35.38
31.68  41.74  10.06  35.44
31.97  40.74   8.76  35.46
31.56  41.04   9.49  35.40
31.70  41.86  10.16  35.42
31.25  40.77   9.52  35.43
32.11  42.03   9.92  35.30
31.68  42.88  11.20  35.47
31.84  40.88   9.04  35.46
31.67  41.87  10.20  35.43
31.57  40.46   8.90  35.36
31.98  44.42  12.44  35.47
31.70  41.32   9.62  35.44
31.83  40.74   8.91  35.41
32.10  40.88   8.78  35.40
31.95  40.90   8.94  35.36
31.82  40.76   8.94  35.44
31.83  41.30   9.47  35.42
31.69  42.42  10.73  35.43
32.12  44.31  12.19  35.46
31.96  40.17   8.21  35.40
31.83  41.88  10.05  35.54
31.95  41.17   9.22  35.47
32.12  41.73   9.61  35.44
32.40  41.46   9.06  35.45
32.25  42.02   9.76  35.41
32.42  41.58   9.16  35.46
31.96  41.29   9.32  35.43

So it can run reliably in under 50ms (Serial output was adding 6-8ms). The averages are tight, but I'd like to minimize the variance. If I can get this to run reliably in under 35ms, I hope the extra tasks of capturing Bluetooth input and sending output to the LED strip can be done in under 15ms.

Next is to profile each of my functions, beginning with random_heatspots(), which I'm sure is where the variance is coming from.

What would I gain by moving the functions' code directly into loop()?

I ran Gammon's ProfileTimer on my functions:

Function            Min uS  Max uS  Var
random_heatspots      6336   17216  10880
move_up                 68      76      8
interpolate_all      25308   25520    212

As I suspected, my variance was coming from random_heatspots(), so I wrote another version of it:

void random_heatspots2 () {
	int x = 0;

	for (i = 0; i < max_x; i++) {
		x = random(0, max_x);
		canvas[0][i] = (x < heat) ? int((intensity * (numcolors - 1)) / 100) : 0;
	}
}

which also made the the last call to memcpy() in move_up() unnecessary. The result:

Function            Min uS  Max uS  Var
random_heatspots2     5860    6492    632

Which on average is faster than the fastest execution of random_heatspots(). This leaves me with a combined total execution time of 31236 to 32088 uS, a variance of only 852 uS. Interestingly, assignment via the ternary operator takes about 500 uS... need to optimize that line somehow.

Compare these results of 1000 iteration sets with those in post #7: (milliseconds)

Min    Max    Var   Avg
31.37  33.98  2.62  31.53
31.38  33.79  2.41  31.52
31.33  33.83  2.50  31.52
31.29  33.85  2.56  31.52
31.31  33.74  2.43  31.52
31.27  33.74  2.47  31.52
31.36  33.80  2.44  31.52
31.30  33.80  2.50  31.52
31.38  33.82  2.44  31.53
31.39  33.88  2.49  31.52
31.41  33.79  2.38  31.53
31.31  33.78  2.46  31.53
31.26  33.83  2.57  31.52
31.40  33.86  2.46  31.52
31.38  33.85  2.47  31.52
31.34  33.80  2.46  31.53
31.36  33.83  2.46  31.52
31.38  33.78  2.39  31.53
31.40  33.77  2.38  31.53
31.35  33.81  2.46  31.52
31.24  33.79  2.55  31.52
31.31  33.74  2.43  31.52
31.36  33.89  2.53  31.52
31.27  33.83  2.56  31.52
31.36  33.81  2.46  31.52
31.27  33.85  2.58  31.52
31.38  33.87  2.49  31.52
31.31  33.82  2.51  31.52
31.30  33.73  2.43  31.52
31.30  33.79  2.49  31.52
31.36  33.77  2.41  31.52
31.31  33.79  2.48  31.52
31.32  33.85  2.53  31.52
31.35  33.80  2.45  31.52
31.36  33.84  2.49  31.53
31.37  33.76  2.38  31.52
31.35  33.78  2.43  31.52
31.38  33.84  2.46  31.52
31.36  33.84  2.48  31.52
31.33  33.70  2.37  31.52

Plenty consistent for my purposes, and ~18 mS should be more than enough headroom (to my goal of 50 mS) to complete the other not-yet-implemented tasks.

So I'll be implementing random_heatspots2() in the JS and Processing versions of the code.

Actually, I just realized another way to stabilize random_heatspots2() (calculate the true side of the ternary before the loop), but that improvement won't be nearly as dramatic.

Thanks for clarifying that one. The unsigned issue makes sense. I think actually I ran into trouble with that once, but I just attributed it to something else.