[Bare C] Passing an argument (function) quintuples the size of my binary sketch

I've been playing with bare C, PORTs, bits, multiple-file sources, etc for the last few days.

I came accross a behavior that surprised me a bit.

allume(1000);

with

void allume(int time){
  PORTB |= 1<<5;
  _delay_ms(time);
}

makes my code 5 times bigger than:

allume();
_delay_ms(1000);

with

void allume(){
  PORTB &= !(1<<5);
}

Is passing an argument including a big chunck of a library?

For your reference, here is a small code:

//#include <avr/io.h>
#include <util/delay.h>

void allume(); // Pas nécessaire en pratique
void eteint(); // Pas nécessaire en pratique


int main (void) {
 /* set pin 5 of PORTB for output*/
 DDRB |= 1<<5;
 
 while(1) {
  allume();
  _delay_ms(1000);
 
  eteint();
  _delay_ms(1000);
  }
}


void allume(){
  PORTB |= 1<<5;
}

void eteint(){
  PORTB &= !(1<<5);
}

(Binary is 212 B)

And a fat one:

//#include <avr/io.h>
#include <util/delay.h>

void allume(int time); // Pas nécessaire en pratique
void eteint(); // Pas nécessaire en pratique


int main (void) {
 /* set pin 5 of PORTB for output*/
 DDRB |= 1<<5;
 
 while(1) {
  allume(1000);
  
 
  eteint();
  _delay_ms(1000);
  }
}


void allume(int time){
  PORTB |= 1<<5;
  _delay_ms(time);
}

void eteint(){
  PORTB &= !(1<<5);
}

(1038 B)
If I do the same to eteint(), size goes up to 1194 B.

I don't think I have - yet - the knowledge to understand it myself.

I suspect in some cases the compiler is not bothering to include your function at all because it sees that it is not doing anything useful.

...R

void allume(int time){
  PORTB |= 1<<5;
  _delay_ms(time);
}

_delay_ms() MUST be used with a constant argument...
http://www.nongnu.org/avr-libc/user-manual/group__util__delay.html

The argument to _delay_ms is actually a floating point number, but the compiler normally takes care of everything so the floating point library is not loaded (unless required elsewhere in the program). I'll bet that passing the time argument is causing that library to be included, so your code size increases dramatically.

Robin2:
I suspect in some cases the compiler is not bothering to include your function at all because it sees that it is not doing anything useful.

Does that make a difference once converted to assembly code?

Robin2:
_delay_ms() MUST be used with a constant argument...
avr-libc: <util/delay.h>: Convenience functions for busy-wait delay loops

Does it mean this one should be smaller then?

void allume(const int TIME){
  PORTB |= 1<<5;
  _delay_ms(TIME);
}

(Precisely the same 1038 B)

jremington:
The argument to _delay_ms is actually a floating point number, but the compiler normally takes care of everything so the floating point library is not loaded (unless required elsewhere in the program). I'll bet that passing the time argument is causing that library to be included, so your code size increases dramatically.

Would it be possible to pass an argument to delay without including this library?

It looks like westfw is right. From the page he quoted:

In order for these functions to work as intended, compiler optimizations must be enabled, and the delay time must be an expression that is a known constant at compile-time. If these requirements are not met, the resulting delay will be much longer (and basically unpredictable), and applications that otherwise do not use floating-point calculations will experience severe code bloat by the floating-point library routines linked into the application.

I tried making the argument const but that didn't help. The resulting object file had a lot of floating-point stuff included.

In the small version the compiler has figured out to put all of your code inline. There are no calls, it's all there in main(). In the bigger version it pulled in a whole ream of floating point code. It couldn't figure out an easier way apparently.

I wonder why _delay_ms() takes a double type? That doesn't seem very useful to me. I replaced those lines with calls to the Arduino function delay(), which uses an unsigned long. The sketch size was still 540 bytes.

You could write your own delay code.

In your case it doesn't matter, right? You are flashing an LED once a second. So you have spare program memory over? So what? It is the nature of the way linking works that as you include extra stuff (eg. serial comms) the sketch jumps in size because of the libraries.

Read: Why does it take 1000 bytes to blink one LED?

I reduced your sketch size to 662 bytes by using the "normal" delay:

//#include <avr/io.h>
#include <util/delay.h>

void allume(const int time); // Pas nécessaire en pratique
void eteint(); // Pas nécessaire en pratique


int main (void) {
  
 init ();
 
 /* set pin 5 of PORTB for output*/
 DDRB |= 1<<5;
 
 while(1) {
  allume(1000);
  
 
  eteint();
  delay(1000);
  }
}


void allume(const int time){
  PORTB |= 1<<5;
  delay(time);
}

void eteint(){
  PORTB &= !(1<<5);
}

jboyton:
I replaced those lines with calls to the Arduino function delay(), which uses an unsigned long. The sketch size was still 540 bytes.

You must have done a better job than me. :slight_smile:

Or did you not call init() ? Ah yes, I suspect that was it. You need init() or the timers won't be running.

If all you want to do is flash an LED using very little program memory this sketch does it in 222 bytes:

const int STROBE_FREQ = 2000;     // sets the frequency in milliseconds 
const unsigned long countTo = ((float) F_CPU / 1024.0) / (1000.0 / STROBE_FREQ); 

int main (void) 
  {
  // D10 to output
  bitSet (DDRB, 2);

  // Fast PWM top at OCR1A
  TCCR1A = bit (WGM10) | bit (WGM11); // fast PWM
  TCCR1B = bit (WGM12) | bit (WGM13) | bit (CS12) | bit (CS10);   // fast PWM, prescaler of 1024
  OCR1A = countTo - 1;                 // zero relative 
  OCR1B = (countTo / 2) - 1;           // 50% duty cycle
  bitSet (TCCR1A, COM1B1);   // clear OC1B on compare
  }

Plus it has the advantage of using a hardware timer, so you can be doing something else while it flashes.

Note that it flashes D10 not D13, because that is where the hardware timer output is.

void allume(const int TIME){

Not a C "const type"; an actual compile-time constant.

I wonder why _delay_ms() takes a double type? That doesn't seem very useful to me.

It allows you to do arbitrarily complex calculations without running into integer math truncation errors. Like:

#define BAUD_RATE 38400
#define BIT_TIME (BAUD_RATE/1000.0)
// We've seen a start bit transition.  Wait 1.5 bit times for middle of the first bit
    _delay_ms(1.5*BIT_TIME);

But the value is resolved AT COMPILE TIME to an integer number of clock cycles.

Would it be possible to pass an argument to delay without including this library?

Usually you do something like:

void allume(int time){
  PORTB |= 1<<5;
  while (time--)
    _delay_ms(1.0);
}

Note that one millisecond at 16MHz is some 16000 cycles, so the added overhead of sticking a loop around it is pretty insignificant.

I assume it doesn't matter in this particular case, that you're just curious about the magic that the compiler performs. Anyway, here's one solution to this somewhat imaginary problem. This sketch is 266 bytes.

More generally I find that it is sometimes advantageous to eliminate calls to system or library functions, when space is at a premium that is. As you discovered, it can be quite an eye-opener how much stuff gets included by what appears to be an innocuous addition to a sketch.

//#include <avr/io.h>
#include <util/delay.h>

void allume(int time); // Pas nécessaire en pratique
void eteint(); // Pas nécessaire en pratique


int main (void) {
 /* set pin 5 of PORTB for output*/
 DDRB |= 1<<5;
 
 while(1) {
  allume(1000);
  
 
  eteint();
  delay_milliseconds(1000);
  }
}


void allume(int time){
  PORTB |= 1<<5;
  delay_milliseconds(time);
}

void eteint(){
  PORTB &= !(1<<5);
}

void delay_milliseconds(int time) {
  uint8_t t, start, diff;
  uint16_t count;
  
  start = TCNT0;
  count = 0;
  while (time > 0) {
    t = TCNT0;
    diff = t-start;
    count += diff;
    start = t;
    if (count > 1000/4) {
      count -= 1000/4;
      time--;
    }    
  }
}

jboyton:
In the small version the compiler has figured out to put all of your code inline. There are no calls, it's all there in main(). In the bigger version it pulled in a whole ream of floating point code. It couldn't figure out an easier way apparently.

Then, would another compiler or another set of parameters be able to optimize the fat sketch?

jboyton:
You could write your own delay code.

Me? No, I couldn't :grin:

[quote author=Nick Gammon link=msg=1970236 date=1416532530]
If all you want to do is flash an LED using very little program memory this sketch does it in 222 bytes[/quote]All I want to do is understand how this stuff works.

jboyton:
I assume it doesn't matter in this particular case, that you're just curious about the magic that the compiler performs.

Absolutely.

jboyton:
Anyway, here's one solution to this somewhat imaginary problem. This sketch is 266 bytes.

I'm just starting to read about internal interrupts

jboyton:
More generally I find that it is sometimes advantageous to eliminate calls to system or library functions, when space is at a premium that is. As you discovered, it can be quite an eye-opener how much stuff gets included by what appears to be an innocuous addition to a sketch.

That's my point, yes.

//#include <avr/io.h>
#include <util/delay.h>

void allume(int time); // Pas nécessaire en pratique
void eteint(); // Pas nécessaire en pratique


int main (void) {
 /* set pin 5 of PORTB for output*/
 DDRB |= 1<<5;
 
 while(1) {
  allume();
  
 
  eteint();
  }
}


void allume(){
  PORTB |= 1<<5;
  _delay_ms(1000);
}

void eteint(){
  PORTB &= !(1<<5);
  _delay_ms(1000);
}

Is 212 bytes as well.

Somehow, that made me want to try pointers:

//#include <avr/io.h>
#include <util/delay.h>

const double TIME = 1000;


int main (void) {
 /* set pin 5 of PORTB for output*/
 DDRB |= 1<<5;
 
 while(1) {
  allume(&TIME);
  
 
  eteint(&TIME);
  }
}


void allume(const double *temps){
  PORTB |= 1<<5;
  _delay_ms(*temps);
}

void eteint(const double *temps){
  PORTB &= !(1<<5);
  _delay_ms(*temps);
}

1064 bytes...

All I want to do is understand how this stuff works.

Well then, stop trying to GUESS what the compiler is doing, and start looking at the assembly language that the compiler produces:

avr-objdump -SC file.elf

(there's no need to be completely competent at the assembly language just to LOOK at it. For example, those cases where code is completely deleted because "it doesn't do anything" should be pretty obvious. Those cases where it calls floating point math will be obvious. Etc.)

westfw:
Well then, stop trying to GUESS what the compiler is doing, and start looking at the assembly language that the compiler produces:

avr-objdump -SC file.elf

That's just looking at what it did.

If you've figured out how to reliably predict compiler optimizations please post the details.

Then, would another compiler or another set of parameters be able to optimize the fat sketch?

This compiler actually does an excellent job of optimizing away code it thinks is not needed. This is all hypothetical with your "blink an LED sketch". However as a general rule pulling in floating point maths for something simple like calculating how many microseconds to delay, is a bad idea.

Read my link above about "Why does it take 1000 bytes to blink one LED?". That explains things pretty well, if I do say so myself. For one thing, the sketch does not quintuple in size again if you blink two LEDs.

Also, now that floating-point maths has been included, if you next step in your code is to do something with floating-point (like GPS calculations) then it won't increase in size a second time, because the floating-point code has now been included.

Bear in mind a lot of the (fairly simple) example sketches supplied will probably take 1000 to 6000 bytes of PROGMEM (out of 32000 on the Uno). So, you still have plenty over. And once libraries have been linked in, sketch size will then increment linearly after that. Add a few more lines of code, a few more bytes will be added to the sketch size. It's not a major issue.

See Toorum's Quest II - ATmega328P based retro video game with 256 color graphics for example. The author of that fitted into the 32 kB of a Uno an arcade-style game with graphics, music, reading a game controller, and game logic. So if he can do that, you don't need to worry too much about what the compiler is doing to your "blink" sketch.

I'm not worrying, I'm trying to understand what is happening here.

Why is floating-point maths necessary as soon as I move delay to a function?

Bianco:
Why is floating-point maths necessary as soon as I move delay to a function?

That delay function uses floating point. So the real question is: Why isn't floating point necessary when the call to the delay function is in main()?

The answer is that the compiler is trying to make your code as small and efficient as it can. In one case it figured out that the call to delay could be replaced with a simple loop. In the other case it wasn't smart enough to decide that, or maybe it assumed it wasn't worth it. Without greater insight into the details of the compiler optimization algorithm it's hard to do more than speculate.

Sometimes the optimizations are surprising. I've had it eliminate code I didn't want eliminated, or rearrange code in such a way as to defeat what I was trying to do. But these are rare exceptions.

I recently worked out that you can use floating-point maths on a constant like this:

const unsigned long countTo = ((float) F_CPU / 1024.0) / (1000.0 / STROBE_FREQ);

This still generates a very small sketch. The floats were there to eliminate truncation. For example, with fixed-point arithmetic, if you wanted STROBE_FREQ to be 2000 (2 seconds) then 1000/2000 will be zero, and therefore we are dividing by zero. By forcing floats in this calculation we get the right result, which is then converted to an unsigned long. However this is done at compile time, so the compiler does not need to include floating-point code in the generated object file.

Why is floating-point maths necessary as soon as I move delay to a function?

I think that in this case once something becomes a function argument the compiler can't optimize stuff away as much. For one thing, it may think "something else might call this function with a non-literal argument.

That seems like a reasonable explanation.

Now how would you explain the following?

I have a basic blank sketch that I use as a starting point for testing things:

void setup()
{
  Serial.begin(115200);
  Serial.println("\nHello\n");
}

void loop() {}

Without anything added it compiles to a size of 1,930 bytes.

I added 151 lines of newly written code and global variable declarations above setup(), but didn't call any of the newly added functions. The code size decreased to 1,694 bytes.

Post this code?

Rule #1. Don't ask us questions about code without posting the code.