I have been writing some time intensive code for Arduino and after analyzing the generated asm of the time intensive bits it seems that GCC does pretty poor job in optimizing the C++ code. Here’s an example (smp is of type int8_t and res is int16_t):
uint8_t vol=channel->volume; res+=(smp*vol)>>8; 37c6: 2e 2f mov r18, r30 37c8: 33 27 eor r19, r19 37ca: 27 fd sbrc r18, 7 37cc: 30 95 com r19 37ce: 8f 85 ldd r24, Y+15 ; 0x0f 37d0: 90 e0 ldi r25, 0x00 ; 0 37d2: fc 01 movw r30, r24 37d4: 2e 9f mul r18, r30 37d6: c0 01 movw r24, r0 37d8: 2f 9f mul r18, r31 37da: 90 0d add r25, r0 37dc: 3e 9f mul r19, r30 37de: 90 0d add r25, r0 37e0: 11 24 eor r1, r1 37e2: 89 2f mov r24, r25 37e4: 99 0f add r25, r25 37e6: 99 0b sbc r25, r25 37e8: a8 0e add r10, r24 37ea: b9 1e adc r11, r25
So in this case GCC emits 3 (!) muls for the simple 8-bit x 8-bit multiplication, which should be single mul on ATmega328, and all this code seems pretty excessive for these lines of C++. So, I would say that there are no optimizations enabled by the GCC at all. Is there some compiler flags where I can enable optimization or is the AVR port of GCC just so poor in optimizing code?
player.ino (280 KB)
mod_player.h (4.4 KB)
mod_player.cpp (11.1 KB)