I think much of how compiler tools work particularly in embedded systems is not
well understood by many developers and with Arduino, the IDE hides much of
how it works with has both good and bad sides.
The bad side is that the typical Arduino user is very ignorant as to how the
modules are compiled and the final linked images are created.
I see much misunderstanding and misinformation being flung around in this thread.
(solar_eta descriptions is correct, however, there have been some recent
additions to the compiler toolsets to aid the less skilled developer in creating smaller linked images -
so some of this is a repeat of solar's description)
A first point of clarification is that what Arduino calls "libraries" are not libraries at all
in terms of the compiler tools.
A true library is a collection of compiled objects that are archived into a .a file.
The Arduino IDE does build a real library for the core code. All the other "libraries" regardless
of whether it is a 3rd party "library" or the "libraries" included with the IDE are NOT libraries
as far as the compiler tools are concerned.
They are just a bunch of source modules that are compiled into .o objects and then linked in.
In fact the goofy way the IDE does its build process the .o files from an arduino "library"
will be compiled to a .o file and linked in if the sketch simply includes a header from that "library's"
In terms of whether unused code is linked in by the linker or removed by the linker
both can be correct.
The compiler creates compiled .o object modules.
The linker links compiled object .o modules to create a final image.
Those compiled .o files can be directly specified or can be pulled from a library .a file.
The Arduino IDE creates and uses only a single library .a file - it is the core files.
When linking, if the .o object is directly specified the full contents of that .o module
is pulled in. It doesn't matter how many functions or data objects are in that module
or which ones are being used. They are ALL pulled in.
When linking, after all the .o modules are pulled in, if there are any unresolved symbols
and libraries were specified, then the liker will start to look in those .a files for a .o file
that contains the unresolved symbol.
If a .o file is found that contains that symbol, again, ALL of the .o module is pulled in,
regardless of how much of it is really being used.
One thing that made real libraries a bit difficult to deal with in the old days was
that the linker did only a single pass through a .a file. So if a .o file referenced a symbol
in another .o file in .a file, the order of the .o files in the .a file made a big difference.
There was a tool for ordering the .o files and it was possible that a .a could not be created
if there were multiple .o references that depended on each other.
Multi pass through .a files was eventually added to resolve this.
The kicker in all this is that the ENTIRE .o contents are pulled in regardless of what is
actually needed/used within that .o file.
This required developers to deal with this reality by either breaking up their source code across
many files to create many .o modules, properly ordering their .o files in the .a files and properly creating their makefiles to pull in what is actually used
OR to use conditionals in the code to only compile what is needed.
In a REAL library environment you really can't use conditionals since the code must be compiled
prior to knowing what the final target is or what modules many be needed.
As toolsets have moved away from developers developing in unix environments and to using
GUI based IDEs, there is less and less control and less developer knowledge on how set up proper
build environments that only pull in the actual code being used.
These new generation developers were not getting good build images since the images
contained "dead wood" and wanted smarter tools to give them better images.
There was a desire to have smarter tools that could look
into a gigantic shit pile of object files and do garbage collection to
selectively pull out only what is needed
vs what that they told the linker they were using.
Essentially what was wanted was a way to optimize the linking process similar to
what has been done in the compiler over the past few decades:
i.e the compiler can generate good object code from mediocre high level C code.
So why shouldn't the linker be able to do something similar?
The solution was to dream up a way to make an additional linker pass through the linked image and
yank out those pieces that are not needed and then go back and fix up all the addresses/references
to account for all the removed code/data.
Note that this is fairly recent capability in compiler tools and the Arduino IDE
starting using it a few releases before the 1.0 release.
To make this happen with the gcc tools, the compiler has to be involved.
So with the gcc tools, the way this happens is that the compiler is told
to place every single function and data object into its very own linker section
so there is now block of code/data associated with each linker symbol.
The gcc options -fdata_sections and -ffunction_sections tells the compiler to do this.
Then the linker is told it can remove a section of code/text or data if there is no reference
to the symbol associated with that section.
The linker option --gc-sections tells the linker to enable garbage collection of unused sections.
The net result is that while all the dead wood code and data is still pulled in, the linker makes a final
garbage collection pass to remove all the dead wood before creating the final linked image.
The IDE uses these gcc options so the final linked image won't contain unused code or data.
i.e. if you have functions in an arduino "library" that are not called, the code for those functions
will not be linked in. (actually it is linked in but then the garbage collection will rip it back out)
There are some exceptions to this particularly in the AVR environment.
There are some special cases that "break" the linkers ability to detect that a section
is not being used.
ISR routings are one such case. If you link in code that sets up ISRs, that code will
be pulled in even if never used since there is no way to actually determine if the ISR is ever
called. This is because the ISR "magic" for a C level ISR function
installs a the h/w vector and the h/w vector points to the C ISR function. This creates
a reference that the linker "sees" so it will pull in the C ISR function. Then in some cases
the C ISR function calls other functions, and then you can end up with a cascading effect
where lots of functions and data storage is called in.
This is something to keep in mind when using the IDE given its goofy IDE build process.
The IDE will link against a "library's" .o files if its header is used.
For example, if you include the Wire.h header, you get lots of the Wire code even if you
don't use it.
Another area is with PROGMEM. If you use PROGMEM pointers inside data structures,
the linker wont be able to properly resolve that the data is never used and it will pull in the data.
There are also some cases of using data pointers to const data inside data structures that have the same issues.
I'm assuming that data reference issues are because the linker doesn't look beyond a simple reference
the way the compiler does when doing optimizations.
The linker garbage collection stuff is fairly good a removing stuff but it doesn't always remove
The best way to see what is eating up the space is simply to create a link map and symbol table.
With the 1.5x build rules you can alter the linker rule automatically build them - which is what I do.
That way I can look at them if I ever need to.