For loop problem, IDE V2.0.4

Good day,
I wrote this simple code, ATMEGA8A microcontoller

void setup() {

  pinMode(1, OUTPUT);

  for (byte i = 0; i < 2; i++) {
    digitalWrite(1, HIGH);
    delay(1000);
    digitalWrite(1, LOW);
    delay(1000);
  }
}

void loop() {
  // put your main code here, to run repeatedly:
}

which is executing the loop 2 times,
when compiled it used 888 bytes of storage space.

but when I edited the loop line to 3 times:

  for (byte i = 0; i < 3; i++) {

and when compiled it used 868 bytes of storage space.
it used less space, why smaller loop using larger space ?
I think they must use same space or larger space for larger loop.

I'm asking that because I have a large project that occupies all microcontroller space almost, and when I decrease the loop count the program get bigger and doesn't fit the microcontroller.

thank you

There are lots of other things you can do to save space.
This program, compiled for the ATmega328P using IDE 1.8.19, takes 944 bytes.

void setup() {

  pinMode(1, OUTPUT);

  for (byte i = 0; i < 2; i++) {
    digitalWrite(1, HIGH);
    delay(1000);
    digitalWrite(1, LOW);
    delay(1000);
  }
}

void loop() {
  // put your main code here, to run repeatedly:
}

This one, using direct port access to do the exact same things, take 654 bytes.

void setup() {

//  pinMode(1, OUTPUT);
  DDRD = 2; //PORTD, bit 1 = OUTPUT
  for (byte i = 0; i < 2; i++) {
//    digitalWrite(1, HIGH);
    PORTD |= 2;  //set PORTD, bit 1
    delay(1000);
 //    digitalWrite(1, LOW);   
    PORTD &= ~2; //clear
    delay(1000);
  }
}

void loop() {
  // put your main code here, to run repeatedly:
}

thank you I know that, I use all tricks to get lower space
I put simple code to not go in complex issues.
I'm asking a specific question, why smaller loop gets more space ?
what is going behind the scene

Study the disassembled machine code to see why. E.g. use the "compiler explorer" https://godbolt.org/

It is extremely unlikely that anyone knows that "off the top of their head".

when the loop is bigger than 3 times the program size doesn't affected, only when it's 2 times it gets bigger
that what I'm asking about, I think that's illogical.

See reply #4.

I'll try to find it, thank you

Off the top of my head, the compiler is "unrolling" your small loop into two duplicate sections of code, but thinks that the -Os (optimize for size) directive that arduino uses should prevent it from duplicating the code three times.

Check the output code to be sure.

Compiler Explorer offers only Arduino 1.8.9, and with the -Os option, gives the same length code for both, no unrolling. Only the loop endpoint changes.

You need to look at the actual code generated by the IDE version you are currently using.

__SP_H__ = 0x3e
__SP_L__ = 0x3d
__SREG__ = 0x3f
__tmp_reg__ = 0
__zero_reg__ = 1
setup:
        push r28
.L__stack_usage = 1
        ldi r22,lo8(1)
        ldi r24,lo8(1)
        call pinMode
        ldi r28,lo8(3)
.L2:
        ldi r22,lo8(1)
        ldi r24,lo8(1)
        call digitalWrite
        ldi r22,lo8(-24)
        ldi r23,lo8(3)
        ldi r24,0
        ldi r25,0
        call delay
        ldi r22,0
        ldi r24,lo8(1)
        call digitalWrite
        ldi r22,lo8(-24)
        ldi r23,lo8(3)
        ldi r24,0
        ldi r25,0
        call delay
        subi r28,lo8(-(-1))
        brne .L2
        pop r28
        ret
loop:
.L__stack_usage = 0
        ret

I don't think that the compiler explorer does -flto by default.

Here are actual results (compiled with actual 1.8.19):

void setup() {

  pinMode(1, OUTPUT);

  for (byte i = 0; i < 2; i++) {
    digitalWrite(1, HIGH);
 376:   81 e0           ldi     r24, 0x01       ; 1
 378:   0e 94 70 00     call    0xe0    ; 0xe0 <digitalWrite.constprop.1>
    delay(1000);
 37c:   0e 94 dd 00     call    0x1ba   ; 0x1ba <delay.constprop.2>
    digitalWrite(1, LOW);
 380:   80 e0           ldi     r24, 0x00       ; 0
 382:   0e 94 70 00     call    0xe0    ; 0xe0 <digitalWrite.constprop.1>
    delay(1000);
 386:   0e 94 dd 00     call    0x1ba   ; 0x1ba <delay.constprop.2>
void setup() {

  pinMode(1, OUTPUT);

  for (byte i = 0; i < 2; i++) {
    digitalWrite(1, HIGH);
 38a:   81 e0           ldi     r24, 0x01       ; 1
 38c:   0e 94 70 00     call    0xe0    ; 0xe0 <digitalWrite.constprop.1>
    delay(1000);
 390:   0e 94 dd 00     call    0x1ba   ; 0x1ba <delay.constprop.2>
    digitalWrite(1, LOW);
 394:   80 e0           ldi     r24, 0x00       ; 0
 396:   0e 94 70 00     call    0xe0    ; 0xe0 <digitalWrite.constprop.1>
    delay(1000);
 39a:   0e 94 dd 00     call    0x1ba   ; 0x1ba <delay.constprop.2>
        
        setup();
    
        for (;;) {
                loop();
                if (serialEventRun) serialEventRun();
 39e:   c0 e0           ldi     r28, 0x00       ; 0
 3a0:   d0 e0           ldi     r29, 0x00       ; 0
 3a2:   20 97           sbiw    r28, 0x00       ; 0
 3a4:   f1 f3           breq    .-4             ; 0x3a2 <main+0xe8>
 3a6:   0e 94 00 00     call    0       ; 0x0 <__vectors>
 3aa:   fb cf           rjmp    .-10            ; 0x3a2 <main+0xe8>

000003ac <_exit>:
 3ac:   f8 94           cli

000003ae <__stop_program>:
 3ae:   ff cf           rjmp    .-2             ; 0x3ae <__stop_program>

so yeah, it inlined the two loops. Increasing the rep count to 3 we get an actual loop:

void setup() {

  pinMode(1, OUTPUT);

  for (byte i = 0; i < 3; i++) {
    digitalWrite(1, HIGH);
 378:   81 e0           ldi     r24, 0x01       ; 1
 37a:   0e 94 70 00     call    0xe0    ; 0xe0 <digitalWrite.constprop.1>
    delay(1000);
 37e:   0e 94 dd 00     call    0x1ba   ; 0x1ba <delay.constprop.2>
    digitalWrite(1, LOW);
 382:   80 e0           ldi     r24, 0x00       ; 0
 384:   0e 94 70 00     call    0xe0    ; 0xe0 <digitalWrite.constprop.1>
    delay(1000);
 388:   0e 94 dd 00     call    0x1ba   ; 0x1ba <delay.constprop.2>
 38c:   c1 50           subi    r28, 0x01       ; 1
void setup() {

  pinMode(1, OUTPUT);

  for (byte i = 0; i < 3; i++) {
 38e:   a1 f7           brne    .-24            ; 0x378 <main+0xbe>

Same with int, long, unsigned long... increasing the max value to the size of the type uses the same bytes as 3. Only stating what has been said, that two iterations has an inefficient algorithm in the compiler, where any other number is more efficiently handled.

oooh. An extra 10 instructions, or 2.5% bigger code, and it's somewhat quicker (not that it matters with code full of delay() calls.)

I've often noticed that gcc's idea of "size optimization" doesn't seem to be strongly based on actual object code - there are some things that seem to "blow up" at the instruction level that gcc seems to think are "small." (multi-bit shifts, for one, IIRC.)

Note that in the looping example, the compiler has determined that the code is looping 3 times, instead of literally incrementing a variable and testing against 3...

thank you all, so we can say compiler is duplicating code when it's 2 times loop?
can we force compiler to build it as a loop ?

For this particular for loop, yes. If the body of the loop were longer, perhaps not.

I believe that there are individual optimization switch that can turn off Just loop unrolling, and using gcc-specific pragmas you can turn them on and off for small segments of,code

But… why do you want to?

thank you

For this particular for loop, yes. If the body of the loop were longer, perhaps not.

I believe that there are switches to control individual optimization settings, that can be included on a per-function basis in the source code via a gcc "pragma"... (Hmm. With various experiments, I couldn't get it to NOT unroll the 2x loop.)

// Doesn't work :-(
#pragma GCC push_options
#pragma GCC optimize("no-unroll-loops", "no-peel-loops")
void setup() {

  pinMode(1, OUTPUT);

  for (byte i = 0; i < 2; i++) {
    digitalWrite(1, HIGH);
    delay(1000);
    digitalWrite(1, LOW);
    delay(1000);
  }
}
#pragma GCC pop_options

I found this topic that talks about unroll pragma

#pragma GCC unroll 0

but arduino IDE didn't know it

any ideas ?

Why not move to a processor with more resources. Hardware is cheap, programmer's time isn't.

we live in poor country, we must create the cheapest product to can sell, so it's important to use this microcontroller, I believe that it works, but I'm looking for solutions.

If you post your code (or some of your code), we might suggest other ways to cut down on memory consumption.

I find it difficult to believe that for-loops of 2 iterations are causing enough extra memory usage to be worth trying to fix.

1 Like