Anyone know what the compiler option "-Ofast" does?

Hi all,

In my graphics VFD library, I used this:

[b]#if defined(__GNUC__)
#pragma GCC optimize ("Ofast")
#endif[/b]

This improved the speed of the graphics by almost 20%. But I noticed that the loading time for the sketch seemed to be awfully long. So I checked the code size and found that a simple program (reading a keypad and displaying the results on the VFD as text) was 72622 bytes!!! (compiled using the -Os option).

I tried using the -O3 option. That resulted in a sketch of 71848 bytes. HUH???? -O3 smaller than -Os? Something is wrong here!

So I took out the "-OFast" option from the VFD library code and tried again. With -Os the sketch is now 26212 bytes (completely in line with what I expect) and with the -O3 option it's 71848 bytes (same as -O3 with the -Ofast option).

I am puzzled with two questions:

(1) Why does -Ofast results in such huge code?
(2) Why does -O3 (without -Ofast) also result in huge code (the same size, in fact) but runs 20% slower?

...and I guess this is question #3... what exactly does -Ofast DO?

Any info will be appreciated!

Obviously, it makes your code go fast.

Less obviously, it does this by unrolling loops.

MorganS:
Obviously, it makes your code go fast.

Less obviously, it does this by unrolling loops.

What does "unrolling loops" mean? Do you mean it generates code for each loop iteration and runs it "inline" as opposed to re-using the same code within the loop body?

Code optimization is almost always a trade-off. If you want the fastest code, it will generally be larger. If you want the smallest code, it will generally be slower. If you want the most RAM-efficient code, it will generally use more FLASH. And it's often far from linear. Getting 20% more speed can double the code size. That is why there are so many optimization options in the compiler - so it can be tailored to the characteristics of the specific code being compiled. Every program will be different. There is no "perfect" solution that gives you both the highest speed and the smallest size. Its all trade-offs.

Regards,
Ray L.

RayLivingston:
Code optimization is almost always a trade-off. If you want the fastest code, it will generally be larger. If you want the smallest code, it will generally be slower. If you want the most RAM-efficient code, it will generally use more FLASH. And it's often far from linear. Getting 20% more speed can double the code size. That is why there are so many optimization options in the compiler - so it can be tailored to the characteristics of the specific code being compiled. Every program will be different. There is no "perfect" solution that gives you both the highest speed and the smallest size. Its all trade-offs.

Regards,
Ray L.

Thanks! I realize that "you get nothing for nothing" and that fast code is generally "paid for" as a larger code size, etc...

I was just surprised as how MUCH of a difference there was. If I saw a sketch go from say, 20K to 30K with a "fast" optimization, I wouldn't question it. But to go from 20K to almost 80K seems to be a bit much.

BTW, I am compiling with AVR-GCC v6.3.0:

[b]root@michael:/# avr-gcc -v
Using built-in specs.
Reading specs from /usr/local/lib/gcc/avr/6.3.0/device-specs/specs-avr2
COLLECT_GCC=avr-gcc
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/avr/6.3.0/lto-wrapper
Target: avr
Configured with: ../configure --target=avr --enable-languages=c,c++ --disable-nls --disable-libssp --with-dwarf2
Thread model: single
gcc version 6.3.0 (GCC)[/b]

Testing with an older avr-gcc (v4.9.2) yields almost identical results.

If you use multiple -O options, with or without level numbers, the last such option is the one that is effective.

GCC Optimization.

Krupski:
What does "unrolling loops" mean? Do you mean it generates code for each loop iteration and runs it "inline" as opposed to re-using the same code within the loop body?

Yes.

for(int i=0; i<100; i++) {...} will make 100 copies of "...". It is surprising how good the compiler can be at doing calculations during the compilation to make expressions into constants, so the number of loops is a constant.

This improved the speed of the graphics by almost 20%.

An improvement of that magnitude is indicative that you are doing work inside your loops that is better done outside.