Because most instructions are 1 cycle, and only some instructions are 2 cycles, optimize for size is very similar to optimize for speed. If you get the option switched somehow, I'd be very interested in hearing about any actual differences you found!
True, with regards to instruction cycles. I'd also like to try things such as -funroll_loops, for some math-heavy applications it could bring a nice boost (encryption comes to mind).
I'll see if I can find anything, I was hoping I could make it 'stick' so I wouldn't have to leave the arduino environment every time.
Have a look at template metaprograms. TMP allows you to create code that generates an unrolled loop.
This is one I created for a library. I provide the small loop based version as default, and if I have space left over I can toggle this and unroll my loops.
Doesn't gcc have a pragmatic option to set command line options from source these days?
Not quite as slick as pragmatic optimize from MSVC, but pretty workable last I used it.
With luck, that works on gcc-avr too. Google for it!