I'm using arduino-cmake for my programming. On one project, the hex files created were 21176 bytes. This seemed large. So I took all of the code, dumped it into a sketch directory, and got it to compile. The Arduio IDE reported "Binary sketch size: 7426 bytes (of a 131072 byte maximum)". So I thought that for some odd reason the code was three times the size, as compared to using make. But this turns out to not be the case. When I actually went out to the file system, I see that the hex created by the IDE is 20910 bytes. Whereas this is 266 bytes smaller (compiler argument differences probably), it's not the 3X I thought it was.
So when the IDE reports that 7426 byes for the sketch size, how is it deriving that number?
Hex file will also contain fields like address and checksum, which won't translate to flash taken up on the chip. These are included in each "line" of the HEX file. So the increase could be non-linear.
I'm not sure what format avr-gcc uses, but here is a breakdown of how a Intel HEX would look:
Most of the lines in a .HEX file have 16 bytes (32 characters) and 12 extra characters (colon, length, address, checksum, newline) so the file size is roughly 44/16ths (275%) the binary size. Add a little extra for the 12-character last line which specifies the starting address and any lines that don't contain a full 16 bytes (but still have the 12 character overhead).
So does the Arduino IDE just basically use that formula when it spits out the sketch size?
No, the compiler knows how many bytes it is converting to hex and reports that.
The formula will give you a way to estimate the memory footprint of a .hex file (divide by 44/16ths or multiply by 16/44ths). That should get you within 20 bytes or so.
Oh, I forgot to mention that on a machine that uses CR/LF for line terminators (Windows) it's 45/16ths.
20,910 * 16/45ths = 7434.666... which is very close to the actual reported binary size 7426.