Compiler flags for smallest code size

I've been playing around with GCC compiler flags to try and get the smallest code size possible. A list of flags can be found here: Optimize Options (Using the GNU Compiler Collection (GCC))

It looks like the -Os flag is on by default, which should optimize for size. However, when I manually force the compiler to not inline some functions using "attribute((noinline))", I manage to shrink my code from 32492 to 31914, which is 578 bytes, a big difference when you are close to the limit. So what's up with the -Os flag? Is it poorly implemented for the AVR instruction set? Or have I misunderstood its use?

I'm using Arduino 1.8.8. Arduino Uno board.

Example sketch to show my point:

class InlineClass
{
public:
    void __attribute__((noinline)) testFunc(int arg1)       // 570 bytes
    //void testFunc(int arg1)               // 602 bytes
    //inline void testFunc(int arg1)        // 602 bytes
    {
        _var1 = arg1 + 1;
        _var2 = arg1 + 2;
        _var3 = arg1 + 3;
        _var4 = arg1 + 4;
        _var5 = arg1 + 5;
    }

private:
    volatile int _var1 = 0;
    volatile int _var2 = 0;
    volatile int _var3 = 0;
    volatile int _var4 = 0;
    volatile int _var5 = 0;
};

void setup()
{
    InlineClass ic;
    ic.testFunc(1);
    ic.testFunc(2);
    ic.testFunc(3);
    ic.testFunc(4);
    ic.testFunc(5);
}

void loop() {
}

You assume the compiler does not significantly favour small size when the code image is too large to fit.

Until the code image reaches the space available does it matter if it's larger than expected?

Coding Badly: 32492 bytes is too large to fit in flash memory, while 31914 bytes isn't. I could only get to 31914 bytes by selectively marking functions noinline.

Program too big. The size is 32492 bytes (of a 32256 byte maximum). Ensure debugging is OFF and/or see http://www.arduino.cc/en/Guide/Troubleshooting#size for tips on reducing it.
Build failed for project 'ChargeMonitor'

The only thing I changed in the next step was to mark some functions noinline:

Compiling 'ChargeMonitor' for 'Arduino/Genuino Uno'
Program size: 31 914 bytes (used 99% of a 32 256 byte maximum) (2,98 secs)
Minimum Memory Usage: 1428 bytes (70% of a 2048 byte maximum)

Here is a new sketch where automatic inlining breaks the available code size, while forced noinline makes it fit:

#define LS "dsfynxlekyfnkdfsjhnkxdsjfhnxhdfkdshfsjdfagsdgfjdgsbbsjdhwsdbegssdasdsdfeedewrwereeeejdfad"
#define XS LS LS LS LS LS LS LS LS LS LS LS LS LS LS LS LS LS LS LS LS LS LS LS LS LS LS LS LS LS LS LS LS LS LS LS
#define YS XS XS XS XS XS XS XS XS XS XS XS XS XS XS XS XS XS XS XS XS XS XS XS XS XS XS LS LS LS LS "sdsddfsdsfddsdfsdff"

class InlineClass
{
public:
	void __attribute__((noinline)) testFunc(int arg1)		// Compiles fine. 32254 bytes of 32256 maximum.
	// void testFunc(int arg1)				// Program too big. 32286 bytes
	//inline void testFunc(int arg1)		// Program too big. 32286 bytes
	{
		_var1 = arg1 + 1;
		_var2 = arg1 + 2;
		_var3 = arg1 + 3;
		_var4 = arg1 + 4;
		_var5 = arg1 + 5;
	}

	volatile int _var1 = 0;
	volatile int _var2 = 0;
	volatile int _var3 = 0;
	volatile int _var4 = 0;
	volatile int _var5 = 0;
};

void setup()
{

	InlineClass ic;
	const __FlashStringHelper *_str1 = F(YS);
	const __FlashStringHelper *_str2 = F(YS);
	const char *p1 = reinterpret_cast<const char *>(_str1);
	ic._var1 = pgm_read_byte(p1);
	const char *p2 = reinterpret_cast<const char *>(_str2);
	ic._var2 = pgm_read_byte(p2);
	ic.testFunc(1);
	ic.testFunc(2);
	ic.testFunc(3);
	ic.testFunc(4);
	ic.testFunc(5);
}

void loop()
{

}

BjornMoren:
Coding Badly: 32492 bytes is too large to fit in flash memory...

According to the manufacturer's datasheet the ATmega328 processor has 32768 bytes of Flash.

BjornMoren:
I could only get to 31914 bytes by selectively marking functions noinline.

Modifying the linker script may have gotten you to your goal.

Eliminating the bootloader would have worked.

Coding Badly: The current topic is not about my specific project, it is about the -Os compiler flag in general, and what it is supposed to do. It clearly doesn’t work the way most people assume.

If the compiler uses a flag that sets max binary size, then your first comment would make sense. Perhaps the Arduino creators didn’t set such a flag, and that’s why -Os doesn’t work. I read some of the GCC documentation, but couldn’t find anything about setting a max binary size. Perhaps some GCC guru could fill us in here.

BjornMoren:
It clearly doesn't work the way most people assume.

"Most people"? That's a bold claim. I assume you have evidence to make such a claim. Survey results. Notes from peer discussions. Those sorts of things. I imagine, given the size of the AVR market, you would have megabytes of raw data. Please, provide a summary. I'm curious.

-Os enables -O2 without a FEW flags that COULD increase code size.

From documentation:
"Turning on optimization flags makes the compiler attempt to improve the performance and/or code size".

So it prioritize performance vs size.

What ever is eating at you Coding Badly, I hope you'll find a solution. Life is too short for half empty glasses my friend. :slight_smile:

arduino_new:
-Os enables -O2 without a FEW flags that COULD increase code size.

From documentation:
"Turning on optimization flags makes the compiler attempt to improve the performance and/or code size".

So it prioritize performance vs size.

It says in the documentation:

"-Os = Optimize for size. -Os enables all -O2 optimizations except those that often increase code size."

I think you are correct. My bad, I read it the wrong way. It doesn't really focus on getting the size as small as possible, it just leaves out the speed optimizations that would give really bad bloat. So the name "optimize for size" is confusing. Wouldn't a flag for "make the binary as small as possible" be very useful though, especially for small embedded systems? Or a way to set a max binary size, and then have the compiler produce the fastest binary to fit that size? Many times speed is not the issue, but packing all the necessary bits into a small space is important.

BjornMoren:
What ever is eating at you...

You assuming I was being sarcastic. Then dismissing my question with tripe.

BjornMoren:
Life is too short for half empty glasses my friend. :slight_smile:

Yup. Good luck. I have better things to do...

...or not.

I suspect this is relevant...
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65740

(And, that bug explains why -fno-inline has no effect.)

There is a discussion here...

With great thanks to the venerable @westfw!

Thanks Coding Badly, great find.

I'd be happy to add what we found out to a general page about reducing compiled code size, if there is such a page here on arduino.cc. Looked around but could not find it.

Code optimization is a very complicated process, and isn't always predictable. The compiler may reduce the size of one function, but that causes other functions to be larger. Or, maybe the compiler could save 2 more bytes but the code would take 100x longer to run. All of these interactions can't really be predicted perfectly, so the compiler writer makes some general assumptions and defines some bounds for the optimizations.

Except in the most trivial cases, optimizing to produce the "smallest" code or the "fastest" code isn't really possible. Also, it can only apply the optimizations that the compiler writer has implemented - the optimizer doesn't have the ability (or intelligence) to know every way to implement some code. FPGA "compilers" usually have the option to try many combinations (and take a long time) to reduce size, but I don't know of any C/C++ compilers that do this.

Optimization is done primarily through pattern-matching: if the code looks like "this", then rearrange it like "that". The command-line switches just change the set of patterns that are looked for, and whether "speed" or "size" rearrangements are preferred.

dnwheeler: Yea I agree, there really isn't such a thing as the smallest binary size, because how would you prove it for a given source code? Perhaps when AI gets incorporated into compilers, things will change. But I assumed that the GCC compiler at least could predict when inlining would increase code size, and the -Os switch would prevent such inlining. In theory I could write a tool that rewrites my code and tries every permutation, invokes the GCC compiler and picks the version with the smallest binary size. I assume this kind of fine tuning isn't useful enough, or it would already be in the GCC compiler.

One interesting thing is that the GCC compiler does not always produce the exact same binary for the same source code. I've seen it happen several times, for example toggling something on/off and checking the resulting binary size several times. Not sure why that is. At first I thought the compiler kept some internal state between compilations, but that seems far fetched. Perhaps it doesn't have the same amount of RAM available for optimizations at the different instances, and that affects binary size.

BjornMoren:
I'd be happy to add what we found out to a general page about reducing compiled code size, if there is such a page here on arduino.cc. Looked around but could not find it.

Thanks for your interest in documenting your findings. The traditional place to publish this sort of information is the Arduino Playground, but unfortunately that was recently made read-only so it's no longer an option to either add to an existing Playground page on this topic (I didn't check if there is on), or to create a new page.

Arduino's suggested alternative is the Arduino Project Hub. That is intended more for documenting projects than this sort of information, and in fact you need to have an official Arduino product in your "Components and Supplies" section to even get on the Arduino Project Hub. It's also curated, so you have no guarantee that a project will make it on the Arduino Project Hub (though it will always be published to hackster.io, which is the service behind Arduino Project Hub). However, I think you could format your information in a way that is compatible with the Arduino Project Hub, and, if done to a reasonable quality standard, I think it would make it past the curation process. The big downside for me with the Arduino Project Hub vs Arduino Playground is that the Project Hub doesn't allow community collaboration on the content (other than the comments section). Some might actually consider that an advantage.

Of course, there are an endless number of other options for publishing information on the Internet, but it's kind of nice to centralize Arduino information as much as we can.

pert:
Thanks for your interest in documenting your findings. ...

Thanks for the tips and suggestions pert!