best practice: variable declaration

Hi- minor question here about declaring variables.

Say I use a 'for' loop frequently in my sketch and need an index var 'i' (or whatever). Is there any benefit (speed/memory) of declaring it as a global rather than local (i.e. declaring it each time a function runs)?

There is no performance benefit to declaring it globally, and a possibly severe maintainability/readability problem with declaring it globally.

The scope of a variable should be as limited as is practical. This increases readability (and therefore maintainability) of the code, and reduces the likelihood of introducing an error when changes are being made to the code (especially at some later date, or by someone other than the original author).

Example: in your loop() function, you have a for loop with your global index i. The body of this for loop calls function foo(). While adding some functionality to foo(), you add a for loop using i as an index. Your code goes nuts, and subsequently you go nuts trying to figure out why loop() isn’t working any more.

There could be a slight memory utilization improvement in making variables (variables in general, not just one) local instead of global.

-j

One benefit to making it local is that it won’t try to save the value to RAM unnecessarily because the compiler can be smart enough to know you don’t need the final value once the for loop is done. It will probably just store the variable in a register, which is faster than reading it from and writing it to RAM.

One thing that will improve performance is to use appropriate data types for your loop variables. If you are only looping 10 times, don’t declare i as an int (a two-byte value), declare it as a char or unsigned char (a one-byte value).

Unless the compiler is smarter than I give it credit for, the following:

int i;
for (i = 0; i < 100; i++)

should be slower than:

char i;
for (i = 0; i < 100; i++)

  • Ben

Compilers are scary-smart sometimes. A couple of years ago I was looking at compiler optimization in some test code that did a bunch of vector operations. When I turned the compiler optimization all the way down to "no optimization" it ran very slowly, but generated assembly that was basically identical to the C source in structure. When I cranked optimization all the way up to the "danger, don't use this, strange things may happen" setting, my code executed in zero time. Turns out the compiler decided since I wasn't saving the results of all that math, there really wasn't any point in doing the work, so it completely gutted my program. :)

The way I was taught to code (and it's worked out well for me) is to write code that is first and foremost correct and easy to understand, and don't worry about performance. If you find out later that performance is a problem, only then should you go back and look at optimizing your source. Microcontrollers are resource constrained enough that we hit that point sooner than on a general purpose microcomputer, but the compilers still help us out a lot.

I suspect that, with typical compiler optimization, Ben's example would result in at most a few more instructions per loop. If I were a betting man, I'd bet that the compiler would optimize it down to the point where the code is identical. I'm too lazy to run a test and see, though. :)

-j

Compilers are scary-smart sometimes. A couple of years ago I was looking at compiler optimization in some test code that did a bunch of vector operations. When I turned the compiler optimization all the way down to "no optimization" it ran very slowly, but generated assembly that was basically identical to the C source in structure. When I cranked optimization all the way up to the "danger, don't use this, strange things may happen" setting, my code executed in zero time. Turns out the compiler decided since I wasn't saving the results of all that math, there really wasn't any point in doing the work, so it completely gutted my program. :)

I've had similar problems with compilers optimizing away delay loops and debug code because it didn't think they did anything useful. I would toss in a variable to track the current state of something, but since I didn't do anything with it (my goal was to track it with the debugger) the compiler decided it wasn't useful.

I suspect that, with typical compiler optimization, Ben's example would result in at most a few more instructions per loop. If I were a betting man, I'd bet that the compiler would optimize it down to the point where the code is identical. I'm too lazy to run a test and see, though. :)

I'm also usually too lazy to compare various bits of code at the assembly level, which is why I try to help out the compiler as much as possible by using the smallest data types necessary and using temporary variables to hint at which values should be put into registers when (sometimes the compiler isn't so smart about this).

In general, the impact on code size and speed of using a char instead of an int for a loop variable will probably be negligible for 99.9% of applications, but it can be useful for people who need maximum performance, and it's a good thing to at least be aware of as you're programming. Working with an int is usually around twice as slow as working with a char, so you can realize substantial savings depending on what your program is doing with the data.

  • Ben

I’d bet that the compiler would optimize it down to the point where the code is identical
You would lose that bet. The compiler produced 25% more instructions in the loop using ints than the loop with a byte.

I do agree completely with j’s point that code should be written to be clear and easy to understand, and only worry about the optimization if it proves to matter.

That said, using a byte for a local variable where appropriate in a loop seems good advice to me.

  int i; 
  for (i = 0; i < 100; i++) 
    Serial.print('i'); 
 504:      c0 e0             ldi      r28, 0x00      ; 0
 506:      d0 e0             ldi      r29, 0x00      ; 0
 508:      69 e6             ldi      r22, 0x69      ; 105
 50a:      81 e8             ldi      r24, 0x81      ; 129
 50c:      91 e0             ldi      r25, 0x01      ; 1
 50e:      0e 94 de 03       call      0x7bc      ; 0x7bc <_ZN14HardwareSerial5printEc>
 512:      21 96             adiw      r28, 0x01      ; 1
 514:      c4 36             cpi      r28, 0x64      ; 100
 516:      d1 05             cpc      r29, r1
 518:      b9 f7             brne      .-18           ; 0x508 <__stack+0x9>

char ii; 
for (ii = 0; ii < 100; ii++) 
    Serial.print('i');  
 51a:      10 e0             ldi      r17, 0x00      ; 0
 51c:      69 e6             ldi      r22, 0x69      ; 105
 51e:      81 e8             ldi      r24, 0x81      ; 129
 520:      91 e0             ldi      r25, 0x01      ; 1
 522:      0e 94 de 03       call      0x7bc      ; 0x7bc <_ZN14HardwareSerial5printEc>
 526:      1f 5f             subi      r17, 0xFF      ; 255
 528:      14 36             cpi      r17, 0x64      ; 100
 52a:      c1 f7             brne      .-16           ; 0x51c <__stack+0x1d>

Not 25% inside the loop, but about 14% (7 instructions vs. 8). The int loop has 2 initialization instructions (high and low bytes for the int, I assume), compared to 1 initialization for the char. Check the branch target (the comment gives the actual address) for the actual looping part of the code.

So yeah, I would have lost the bet (which is why I don't bet), but it isn't too terribly different.

-j

fun compiler observation - why subtract 255 instead of adding 1 to the char?

thanks for all the interesting replies - nice to get a glimpse at how the compiler sees the code etc.

About the initial question - declaring a reused var global instead of redeclaring it every time it is used. I understand and agree that is it makes more readable sense along with avoiding conflicts to declare it in the function where it is used. I didn't know though if a variable declaration "costs" much in terms of performance (like say using floats instead of integers "costs" more), so I thought I would ask.

tx!

--Roy

My statment was the compiler produced 25% more instructions in the int version, which is true. As is your statment that the int version executes 14% more lines of code in the loop.

;) :)

Roy, it doesn't cost anything like as much using ints for bytes as using floats for ints.

I didn't know though if a variable declaration "costs" much in terms of performance (like say using floats instead of integers "costs" more), so I thought I would ask.

A local variable declaration itself doesn't cost you anything. When you first use that variable it will start using registers to store values, and if necessary add those register values to the stack if it needs to free up the registers for other things. If you only need the value locally, it's definitely better for you to use a local variable.

  • Ben

Nope, no cost in terms of execution time to allocate. There is the cost of the space, but when a function starts all the space it needs for declared variables is allocated at once, along with some overhead for the function itself (which means you pay the cost even if there are no local variables). There's no performance cost in allocating X bytes vs. Y bytes.

As Ben points out, it may even be cheaper performance-wise because a local variable may live its entire life in registers and never be stored in RAM with that corresponding delay.

-j

Great info! Thanks. I'm not doing anything terribly timing/speed dependent - just thinking about good habits/practices etc. (though I wonder about some ISRs I am using in a motor control routine - not sure how to test those since millis() doesn't update - guess as a test I could call the same ISR from loop() and see how long it takes).

At any rate, I've been wondering about these little issues for a while so it's good to have this perspective.

--Roy

I didn’t know though if a variable declaration “costs” much in terms of performance (like say using floats instead of integers “costs” more), so I thought I would ask.

But bear in mind that when an expression uses floating math then the cost can go up considerably compared to an equivalent expression using integer math.

float f; 

for (f = 0; f < 50; f+= .5)
   Serial.print((int)( f * 2));

50a:      0f 2e            mov      r0, r31
    50c:      f0 e0            ldi      r31, 0x00      ; 0
    50e:      ef 2e            mov      r14, r31
    510:      f0 e0            ldi      r31, 0x00      ; 0
    512:      ff 2e            mov      r15, r31
    514:      f0 e0            ldi      r31, 0x00      ; 0
    516:      0f 2f            mov      r16, r31
    518:      f0 e0            ldi      r31, 0x00      ; 0
    51a:      1f 2f            mov      r17, r31
    51c:      f0 2d            mov      r31, r0
    51e:      c0 e0            ldi      r28, 0x00      ; 0
    520:      d0 e0            ldi      r29, 0x00      ; 0
    522:      a8 01            movw      r20, r16
    524:      97 01            movw      r18, r14
    526:      c8 01            movw      r24, r16
    528:      b7 01            movw      r22, r14
    52a:      0e 94 03 07      call      0xe06      ; 0xe06 <__addsf3>
    52e:      0e 94 45 07      call      0xe8a      ; 0xe8a <__fixsfsi>
    532:      81 e8            ldi      r24, 0x81      ; 129
    534:      91 e0            ldi      r25, 0x01      ; 1
    536:      0e 94 87 04      call      0x90e      ; 0x90e <_ZN14HardwareSerial5printEi>
    53a:      20 e0            ldi      r18, 0x00      ; 0
    53c:      30 e0            ldi      r19, 0x00      ; 0
    53e:      40 e0            ldi      r20, 0x00      ; 0
    540:      5f e3            ldi      r21, 0x3F      ; 63
    542:      c8 01            movw      r24, r16
    544:      b7 01            movw      r22, r14
    546:      0e 94 03 07      call      0xe06      ; 0xe06 <__addsf3>
    54a:      7b 01            movw      r14, r22
    54c:      8c 01            movw      r16, r24
    54e:      21 96            adiw      r28, 0x01      ; 1
    550:      c4 36            cpi      r28, 0x64      ; 100
    552:      d1 05            cpc      r29, r1
    554:      31 f7            brne      .-52          ; 0x522 <__stack+0x23>

char ii;
for (ii = 0; ii < 100; ii++)
  Serial.print(ii)

556:      c0 e0            ldi      r28, 0x00      ; 0
    558:      d0 e0            ldi      r29, 0x00      ; 0
    55a:      6c 2f            mov      r22, r28
    55c:      81 e8            ldi      r24, 0x81      ; 129
    55e:      91 e0            ldi      r25, 0x01      ; 1
    560:      0e 94 01 04      call      0x802      ; 0x802 <_ZN14HardwareSerial5printEc>
    564:      21 96            adiw      r28, 0x01      ; 1
    566:      c4 36            cpi      r28, 0x64      ; 100
    568:      d1 05            cpc      r29, r1
    56a:      b9 f7            brne      .-18          ; 0x55a <__stack+0x5b>

The mega168 hardware timers continue to run even while an ISR is executing, so if you want to know how long your ISR takes you can store the value of TCNT0 at the start and then subtract that value from TCNT0 at the end of your ISR. This won’t tell you exactly how long your ISR takes because you’ll be missing the register pushes and pops as well as the time it takes to get into the ISR and the time it takes to return from the ISR, but it will at least give you a measure of the amount of time the body of your ISR takes:

volatile unsigned char ISRTime; // global

ISR(something_vect)
{
unsigned char startTime = TCNT0;

ISRTime = TCNT0 - startTime;
}

You can convert ISRTime into a measure of time by noting the prescaler timer0 is using. I think it’s 64, which means that it ticks at a rate of 16 MHz / 64 (you can easily check this by looking at init() in wiring.c). The only way this wouldn’t work is if your ISR takes longer than 255 ticks of timer0.

  • Ben