Why does even a very simple sketch use 400+ bytes?

This comes up relatively frequently. A trivial sketch, not even referencing any of the Arduino functions, ends up occupying over 400 bytes of flash memory. Is the compiler THAT inefficient, that a few lines of code that ought to generate single AVR instructions, generates 400 bytes instead?

Well, NO! Of course not. Most of the space is occupied by overhead functions that you might need in a sketch, whether or not you actually use them. The Arduino environment ends up doing a pretty good job of excluding unused function (if you don't use "digitalWrite", the code that implements it is not included), but it's not perfect.

I decided to compile an empty sketch and analyze the code produced, to see which pieces remained, and whether they looked like they were reasonably optimized. Here are the results.

First, the empty sketch:

void setup(){
}
void loop(){
}

Next, the breakdown. I'm not going to boggle everyone with all of the assembly language produced; just some analysis (I'll put the assembly in the next message in this thread.)

Let's get right to those empty setup() and loop() functions. They end up compiling to bare "return" instructions; two bytes each!

;;; Empty setup() function
;;; length 2 bytes
;;; Provided by: user sketch
;;; required by: Arduino environment

;;; Empty loop() function
;;; length 2 bytes
;;; Provided by: user sketch
;;; required by: Arduino environment

The setup and loop functions in a user sketch are called by another function "main", that is the traditional C/C++ language main function. main() in turn is called by startup code generated by the C compiler. This string of function calls occupies another couple of bytes

;;; Linkage to main() program
;;; length 8 bytes
;;; provided by: gcc compiler.

;;; main()
;;; length: 14 bytes
;;; provided by: Arduino environment
;;; required by: C language convention

;;; exit and __stop_program
;;; length: 4 bytes
;;; provided by: gcc C compiler
;;; required by: nothing.
;;; Comment pretty much unused in the arduino environment.

main() also calls init(), an Arduino environment function that sets up the AVR peripherals (Timers, A-D converter, etc) to the state that the rest of the Arduino functions are expecting. init() is 114 bytes of code, and is one of the few places where the code and/or the compiler did some obviously inefficient things. But it only executes once anyway, and 114 bytes is not a lot compared to the 7k of space you have even on an ATmega8, so it's not really worth optimizing. Really.

;;; init()
;;; Length: 114 bytes
;;; provided by: Arduino Environment
;;; Required by: Arduino Environment, user sketches
;;; Comments: initializes peripherals (especially timers) as expected by
;;;     the ISR and PWM output, and so on.
;;;     The compiler seems to do a particularly poor job of optimizing
;;;       what ought to be straightforward code.

Now, before the startup code calls main(), it has to do some initialization as "required" by the C standards. It sets up a stack, and makes sure the CPU status register is in a known state. Initialized variables are copied from flash to RAM, and uninitialized variables are cleared to zero. This code will be present whether or not you have any variables at all.

;;; Basic core startup code;
;;; Length 12 bytes;
;;; Provided by: gcc Compiler
;;; required by: gcc Compiler

;;; Copy initialized data from Flash to RAM.
;;; Length 22 bytes
;;; Provided by: gcc compiler
;;; required by: any sketch using initialized data.

;;; Clear uninitialized data to 0s.
;;; length 16 bytes
;;; Provided by: gcc compiler
;;; required by: C language specification.
;;; Comments: not necessary if there are no uninitialized variables.

Now, the Arduino environment uses a timer interrupt to maintain the millis() clock, uses interrupts for the serial port, and allows users to attach interrupts to a couple of the pins. The AVR uses "vectored interrupts", which means that each potential source of interrupts has a function ("vector") registered to it. The AVR has 26 interrupt sources, and this table occupies 104 bytes.

;;; Table of interrupt vectors
;;; Size: 104 bytes.
;;; Provided by: gcc C compiler
;;; Required by: RESET, Timer, UART, etc.
;;; Comment: in theory, unused interrupt vectors could hold other data.

Finally, there is the timer interrupt service routine itself, which is present and running whether you use it or not. This is 142 bytes long, which is pretty long (especially for an interrupt service routine.) Unfortunately, this is already the "optimized" version; it ends up having to maintain TWO 32-bit counters in memory for the sake of backward compatibility, and you get to see firsthand just how inefficient 32bit math can be on an 8bit CPU. Each load/increment/store takes about 30 bytes, plus overhead for saving the registers used, plus the math to keep track of milliseconds when your interrupt happens every 1.024 ms...

;;; Timer0 interrupt service routine
;;; Length: 142 bytes
;;; Provided by: Arduino Environment
;;; Required by: Arduino Environment (millis(), delay(), etc)
;;; Comments: long due to several 32-bit variable modifications.

Here's the actual assembly language code. Note that this is arranged somewhat differently than it was discussed in the previous posting. This message has it just as the compiler produced it, rather than having been re-ordered for clarity of explanation (hah!)

;;; This is the result of compiling an "empty" Arduino sketch (v 0018)
;;;
;;; void setup() {}
;;; void loop() {}
;;;
;;; The idea is to explain why doing nothing takes 400+ bytes.


Disassembly of section .text:

;;; Table of interrupt vectors
;;; Size: 104 bytes.
;;; Provided by: gcc C compiler
;;; Required by: RESET, Timer.
;;; Comment: in theory, unused interrupt vectors could hold other data.
      
VectorTable
{
   0:      0c 94 34 00       jmp      0x68      ; 0x68 <__ctors_end>
   4:      0c 94 51 00       jmp      0xa2      ; 0xa2 <__bad_interrupt>
   8:      0c 94 51 00       jmp      0xa2      ; 0xa2 <__bad_interrupt>
   c:      0c 94 51 00       jmp      0xa2      ; 0xa2 <__bad_interrupt>
  10:      0c 94 51 00       jmp      0xa2      ; 0xa2 <__bad_interrupt>
  14:      0c 94 51 00       jmp      0xa2      ; 0xa2 <__bad_interrupt>
  18:      0c 94 51 00       jmp      0xa2      ; 0xa2 <__bad_interrupt>
  1c:      0c 94 51 00       jmp      0xa2      ; 0xa2 <__bad_interrupt>
  20:      0c 94 51 00       jmp      0xa2      ; 0xa2 <__bad_interrupt>
  24:      0c 94 51 00       jmp      0xa2      ; 0xa2 <__bad_interrupt>
  28:      0c 94 51 00       jmp      0xa2      ; 0xa2 <__bad_interrupt>
  2c:      0c 94 51 00       jmp      0xa2      ; 0xa2 <__bad_interrupt>
  30:      0c 94 51 00       jmp      0xa2      ; 0xa2 <__bad_interrupt>
  34:      0c 94 51 00       jmp      0xa2      ; 0xa2 <__bad_interrupt>
  38:      0c 94 51 00       jmp      0xa2      ; 0xa2 <__bad_interrupt>
  3c:      0c 94 51 00       jmp      0xa2      ; 0xa2 <__bad_interrupt>
  40:      0c 94 5c 00       jmp      0xb8      ; 0xb8 <__vector_16>
  44:      0c 94 51 00       jmp      0xa2      ; 0xa2 <__bad_interrupt>
  48:      0c 94 51 00       jmp      0xa2      ; 0xa2 <__bad_interrupt>
  4c:      0c 94 51 00       jmp      0xa2      ; 0xa2 <__bad_interrupt>
  50:      0c 94 51 00       jmp      0xa2      ; 0xa2 <__bad_interrupt>
  54:      0c 94 51 00       jmp      0xa2      ; 0xa2 <__bad_interrupt>
  58:      0c 94 51 00       jmp      0xa2      ; 0xa2 <__bad_interrupt>
  5c:      0c 94 51 00       jmp      0xa2      ; 0xa2 <__bad_interrupt>
  60:      0c 94 51 00       jmp      0xa2      ; 0xa2 <__bad_interrupt>
  64:      0c 94 51 00       jmp      0xa2      ; 0xa2 <__bad_interrupt>
__startup:
;;; Basic core startup code;
;;; Length 12 bytes;
;;; Provided by: gcc Compiler
;;; required by: gcc Compiler
  68:      11 24             eor      r1, r1
  6a:      1f be             out      0x3f, r1      ; Initialize status reg = 0
  6c:      cf ef             ldi      r28, 0xFF
  6e:      d4 e0             ldi      r29, 0x04
  70:      de bf             out      0x3e, r29      ; Initialize Stack Pointer
  72:      cd bf             out      0x3d, r28      ;  ...

00000074 <__do_copy_data>:
;;; Copy initialized data from Flash to RAM.
;;; Length 22 bytes
;;; Provided by: gcc compiler
;;; required by: any sketch using initialized data.
  74:      11 e0             ldi      r17, 0x01      ; 1
  76:      a0 e0             ldi      r26, 0x00      ; 0
  78:      b1 e0             ldi      r27, 0x01      ; 1
  7a:      e0 ec             ldi      r30, 0xC0      ; 192
  7c:      f1 e0             ldi      r31, 0x01      ; 1
  7e:      02 c0             rjmp      .+4            ; 0x84 <.do_copy_data_start>

00000080 <.do_copy_data_loop>:
  80:      05 90             lpm      r0, Z+
  82:      0d 92             st      X+, r0

00000084 <.do_copy_data_start>:
  84:      a0 30             cpi      r26, 0x00      ; 0
  86:      b1 07             cpc      r27, r17
  88:      d9 f7             brne      .-10           ; 0x80 <.do_copy_data_loop>

      
0000008a <__do_clear_bss>:
;;; Clear uninitialized data to 0s.
;;; length 16 bytes
;;; Provided by: gcc compiler
;;; required by: C language specification.
;;; Comments: not necessary if there are no uninitialized variables.
  8a:      11 e0             ldi      r17, 0x01      ; 1
  8c:      a0 e0             ldi      r26, 0x00      ; 0
  8e:      b1 e0             ldi      r27, 0x01      ; 1
  90:      01 c0             rjmp      .+2            ; 0x94 <.do_clear_bss_start>

00000092 <.do_clear_bss_loop>:
  92:      1d 92             st      X+, r1

00000094 <.do_clear_bss_start>:
  94:      a9 30             cpi      r26, 0x09      ; 9
  96:      b1 07             cpc      r27, r17
  98:      e1 f7             brne      .-8            ; 0x92 <.do_clear_bss_loop>

;;; Linkage to main() program
;;; length 8 bytes
;;; provided by: gcc compiler.
  9a:      0e 94 55 00       call      0xaa      ; 0xaa <main>
  9e:      0c 94 de 00       jmp      0x1bc      ; 0x1bc <_exit>

000000a2 <__bad_interrupt>:
  a2:      0c 94 00 00       jmp      0      ; 0x0 <__vectors>
000000a6 <setup>:
void setup(){
;;; Empty setup() function
;;; length 2 bytes
;;; Provided by: user sketch
;;; required by: Arduino environment
}
  a6:      08 95             ret

000000a8 <loop>:
void loop(){
;;; Empty loop() function
;;; length 2 bytes
;;; Provided by: user sketch
;;; required by: Arduino environment
}
  a8:      08 95             ret

000000aa <main>:

;;; main()
;;; length: 14 bytes
;;; provided by: Arduino environment
;;; required by: C language convention
int main(void)
{
      init();
  aa:      0e 94 a4 00       call      0x148      ; 0x148 <init>

      setup();
  ae:      0e 94 53 00       call      0xa6      ; 0xa6 <setup>
    
      for (;;)
            loop();
  b2:      0e 94 54 00       call      0xa8      ; 0xa8 <loop>
  b6:      fd cf             rjmp      .-6            ; 0xb2 <main+0x8>
;;; Timer0 interrupt service routine
;;; Length: 142 bytes
;;; Provided by: Arduino Environment
;;; Required by: Arduino Environment (millis(), delay(), etc)
;;; Comments: long due to several 32-bit variable modifications.
SIGNAL(TIMER0_OVF_vect)
{
000000b8 <__vector_16>:
  b8:      1f 92             push      r1
  ba:      0f 92             push      r0
  bc:      0f b6             in      r0, 0x3f      ; 63
  be:      0f 92             push      r0
  c0:      11 24             eor      r1, r1
  c2:      2f 93             push      r18
  c4:      3f 93             push      r19
  c6:      8f 93             push      r24
  c8:      9f 93             push      r25
  ca:      af 93             push      r26
  cc:      bf 93             push      r27
      // copy these to local variables so they can be stored in registers
      // (volatile variables must be read from memory on every access)
      unsigned long m = timer0_millis;
  ce:      80 91 04 01       lds      r24, 0x0104
  d2:      90 91 05 01       lds      r25, 0x0105
  d6:      a0 91 06 01       lds      r26, 0x0106
  da:      b0 91 07 01       lds      r27, 0x0107
      unsigned char f = timer0_fract;
  de:      30 91 08 01       lds      r19, 0x0108

      m += MILLIS_INC;
  e2:      01 96             adiw      r24, 0x01      ; 1
  e4:      a1 1d             adc      r26, r1
  e6:      b1 1d             adc      r27, r1
      f += FRACT_INC;
  e8:      23 2f             mov      r18, r19
  ea:      2d 5f             subi      r18, 0xFD      ; 253
      if (f >= FRACT_MAX) {
  ec:      2d 37             cpi      r18, 0x7D      ; 125
  ee:      20 f0             brcs      .+8            ; 0xf8 <__vector_16+0x40>
            f -= FRACT_MAX;
  f0:      2d 57             subi      r18, 0x7D      ; 125
            m += 1;
  f2:      01 96             adiw      r24, 0x01      ; 1
  f4:      a1 1d             adc      r26, r1
  f6:      b1 1d             adc      r27, r1
      }

      timer0_fract = f;
  f8:      20 93 08 01       sts      0x0108, r18
      timer0_millis = m;
  fc:      80 93 04 01       sts      0x0104, r24
 100:      90 93 05 01       sts      0x0105, r25
 104:      a0 93 06 01       sts      0x0106, r26
 108:      b0 93 07 01       sts      0x0107, r27
      timer0_overflow_count++;
 10c:      80 91 00 01       lds      r24, 0x0100
 110:      90 91 01 01       lds      r25, 0x0101
 114:      a0 91 02 01       lds      r26, 0x0102
 118:      b0 91 03 01       lds      r27, 0x0103
 11c:      01 96             adiw      r24, 0x01      ; 1
 11e:      a1 1d             adc      r26, r1
 120:      b1 1d             adc      r27, r1
 122:      80 93 00 01       sts      0x0100, r24
 126:      90 93 01 01       sts      0x0101, r25
 12a:      a0 93 02 01       sts      0x0102, r26
 12e:      b0 93 03 01       sts      0x0103, r27
}
 132:      bf 91             pop      r27
 134:      af 91             pop      r26
 136:      9f 91             pop      r25
 138:      8f 91             pop      r24
 13a:      3f 91             pop      r19
 13c:      2f 91             pop      r18
 13e:      0f 90             pop      r0
 140:      0f be             out      0x3f, r0      ; 63
 142:      0f 90             pop      r0
 144:      1f 90             pop      r1
 146:      18 95             reti

(continued in next posting)

;;; init()
;;; Length: 114 bytes
;;; provided by: Arduino Environment
;;; Required by: Arduino Environment, user sketches
;;; Comments: initializes peripherals (especially timers) as expected by
;;;     the ISR and PWM output, and so on.
;;;     The compiler seems to do a particularly poor job of optimizing
;;;       what ought to be straightforward code.
00000148 <init>:
void init()
{
      // this needs to be called before setup() or some functions won't
      // work there
      sei();
 148:      78 94             sei
      
      // on the ATmega168, timer 0 is also used for fast hardware pwm
      // (using phase-correct PWM would mean that timer 0 overflowed half as often
      // resulting in different millis() behavior on the ATmega8 and ATmega168)
#if !defined(__AVR_ATmega8__)
      sbi(TCCR0A, WGM01);
 14a:      84 b5             in      r24, 0x24      ; 36
 14c:      82 60             ori      r24, 0x02      ; 2
 14e:      84 bd             out      0x24, r24      ; 36
      sbi(TCCR0A, WGM00);
 150:      84 b5             in      r24, 0x24      ; 36
 152:      81 60             ori      r24, 0x01      ; 1
 154:      84 bd             out      0x24, r24      ; 36
      // set timer 0 prescale factor to 64
#if defined(__AVR_ATmega8__)
      sbi(TCCR0, CS01);
      sbi(TCCR0, CS00);
#else
      sbi(TCCR0B, CS01);
 156:      85 b5             in      r24, 0x25      ; 37
 158:      82 60             ori      r24, 0x02      ; 2
 15a:      85 bd             out      0x25, r24      ; 37
      sbi(TCCR0B, CS00);
 15c:      85 b5             in      r24, 0x25      ; 37
 15e:      81 60             ori      r24, 0x01      ; 1
 160:      85 bd             out      0x25, r24      ; 37
#endif
      // enable timer 0 overflow interrupt
#if defined(__AVR_ATmega8__)
      sbi(TIMSK, TOIE0);
#else
      sbi(TIMSK0, TOIE0);
 162:      ee e6             ldi      r30, 0x6E      ; 110
 164:      f0 e0             ldi      r31, 0x00      ; 0
 166:      80 81             ld      r24, Z
 168:      81 60             ori      r24, 0x01      ; 1
 16a:      80 83             st      Z, r24
      // this is better for motors as it ensures an even waveform
      // note, however, that fast pwm mode can achieve a frequency of up
      // 8 MHz (with a 16 MHz clock) at 50% duty cycle

      // set timer 1 prescale factor to 64
      sbi(TCCR1B, CS11);
 16c:      e1 e8             ldi      r30, 0x81      ; 129
 16e:      f0 e0             ldi      r31, 0x00      ; 0
 170:      80 81             ld      r24, Z
 172:      82 60             ori      r24, 0x02      ; 2
 174:      80 83             st      Z, r24
      sbi(TCCR1B, CS10);
 176:      80 81             ld      r24, Z
 178:      81 60             ori      r24, 0x01      ; 1
 17a:      80 83             st      Z, r24
      // put timer 1 in 8-bit phase correct pwm mode
      sbi(TCCR1A, WGM10);
 17c:      e0 e8             ldi      r30, 0x80      ; 128
 17e:      f0 e0             ldi      r31, 0x00      ; 0
 180:      80 81             ld      r24, Z
 182:      81 60             ori      r24, 0x01      ; 1
 184:      80 83             st      Z, r24

      // set timer 2 prescale factor to 64
#if defined(__AVR_ATmega8__)
      sbi(TCCR2, CS22);
#else
      sbi(TCCR2B, CS22);
 186:      e1 eb             ldi      r30, 0xB1      ; 177
 188:      f0 e0             ldi      r31, 0x00      ; 0
 18a:      80 81             ld      r24, Z
 18c:      84 60             ori      r24, 0x04      ; 4
 18e:      80 83             st      Z, r24
#endif
      // configure timer 2 for phase correct pwm (8-bit)
#if defined(__AVR_ATmega8__)
      sbi(TCCR2, WGM20);
#else
      sbi(TCCR2A, WGM20);
 190:      e0 eb             ldi      r30, 0xB0      ; 176
 192:      f0 e0             ldi      r31, 0x00      ; 0
 194:      80 81             ld      r24, Z
 196:      81 60             ori      r24, 0x01      ; 1
 198:      80 83             st      Z, r24

      // set a2d prescale factor to 128
      // 16 MHz / 128 = 125 KHz, inside the desired 50-200 KHz range.
      // XXX: this will not work properly for other clock speeds, and
      // this code should use F_CPU to determine the prescale factor.
      sbi(ADCSRA, ADPS2);
 19a:      ea e7             ldi      r30, 0x7A      ; 122
 19c:      f0 e0             ldi      r31, 0x00      ; 0
 19e:      80 81             ld      r24, Z
 1a0:      84 60             ori      r24, 0x04      ; 4
 1a2:      80 83             st      Z, r24
      sbi(ADCSRA, ADPS1);
 1a4:      80 81             ld      r24, Z
 1a6:      82 60             ori      r24, 0x02      ; 2
 1a8:      80 83             st      Z, r24
      sbi(ADCSRA, ADPS0);
 1aa:      80 81             ld      r24, Z
 1ac:      81 60             ori      r24, 0x01      ; 1
 1ae:      80 83             st      Z, r24

      // enable a2d conversions
      sbi(ADCSRA, ADEN);
 1b0:      80 81             ld      r24, Z
 1b2:      80 68             ori      r24, 0x80      ; 128
 1b4:      80 83             st      Z, r24
      // here so they can be used as normal digital i/o; they will be
      // reconnected in Serial.begin()
#if defined(__AVR_ATmega8__)
      UCSRB = 0;
#else
      UCSR0B = 0;
 1b6:      10 92 c1 00       sts      0x00C1, r1
#endif
 1ba:      08 95             ret

;;; exit and __stop_program
;;; length: 4 bytes
;;; provided by: gcc C compiler
;;; required by: nothing.
;;; Comment pretty much unused in the arduino environment.

000001bc <_exit>:
 1bc:      f8 94             cli

000001be <__stop_program>:
 1be:      ff cf             rjmp      .-2            ; 0x1be <__stop_program>

This was pretty interesting reading, learning about all the stuff that happens behind the scenes. I was aware of all the stuff (we,, most of it, anyway) that happens, but breaking it down into the number of bytes required for each part was interesting.

Thanks for taking the time to do this, and to post the results.

Stumbled across this at Newark (which was linked to from elsewhere in the forum). Is the underlying code/compiling improved 2+ years later?