Understanding Arduino build process for IDE-haters

I'm writing a Makefile which will allow me to write sketches and program my Arduino no matter what developer tools I use.

From reading http://www.arduino.cc/en/Hacking/BuildProcess it's unclear why Arduino environment does this:

Next, the environment searches for function definitions within your main sketch file and creates declarations (prototypes) for them. These are inserted after any comments or pre-processor statements (#includes or #defines), but before any other statements (including type declarations).

Is this to allow calling functions which are defined further in the source file?

Is this to allow calling functions which are defined further in the source file?

Yes, exactly.

This is the sketch I'm testing my setup on:

#include <SdFat.h>
#include <Wire.h>
#define DS1307 0x68

byte decToBcd(byte val)
{
    return ((val / 10 * 16) + (val % 10));
}

void setDateDs1307(byte hour,          // 1-23
                   byte minute,        // 0-59
                   byte second,        // 0-59
                   byte dayOfWeek,     // 1-7
                   byte year,
                   byte month,         // 1-12
                   byte dayOfMonth)    // 1-28/29/30/31
{
    Wire.beginTransmission(DS1307);
    Wire.send(0);
    Wire.send(decToBcd(second));
    Wire.send(decToBcd(minute));
    Wire.send(decToBcd(hour));
    Wire.send(decToBcd(dayOfWeek));
    Wire.send(decToBcd(dayOfMonth));
    Wire.send(decToBcd(month));
    Wire.send(decToBcd(year));
    Wire.endTransmission();
}

void setup() {                
  pinMode(A0, OUTPUT);
  setDateDs1307(14, 14, 2, 7, 9, 2, 11);
}

void loop() {
  digitalWrite(A0, HIGH);
  delay(1000);
  digitalWrite(A0, LOW);
  delay(1000); 
}

It simply makes a single call via TWI and starts blinking. It requires sdfatlib but does not utilize it.

Makefile I have so far:

BOARD = atmega328
PORT = /dev/ttyUSB0
SKETCH = blink

ARDUINO-DIR = $(lastword $(wildcard /usr/share/arduino-*))
SKETCHBOOK-DIR = $(HOME)/sketchbook
TMP-DIR = tmp

EXTRA-INCLUDES = $(ARDUINO-DIR)/libraries/Wire/utility/
EXTRA-DEPS = $(ARDUINO-DIR)/libraries/Wire/utility/twi.c

include $(ARDUINO-DIR)/hardware/arduino/boards.txt

CXX = avr-g++
CC = avr-gcc
CXXFLAGS = -mmcu=$($(BOARD).build.mcu) -DF_CPU=$($(BOARD).build.f_cpu) \
	   -I$(ARDUINO-DIR)/hardware/arduino/cores/$($(BOARD).build.core)\
	   $(foreach dir,$(wildcard $(ARDUINO-DIR)/libraries/*),-I$(dir))\
	   $(foreach dir,$(wildcard $(SKETCHBOOK-DIR)/libraries/*),-I$(dir))\
	  -I$(EXTRA-INCLUDES)\
	  -Os\
	  -fno-threadsafe-statics
CFLAGS = $(CXXFLAGS)

.SECONDEXPANSION:

.PRECIOUS: %.deps

.PHONY: clean program

# List object files from .deps file
define list-obj-deps
  $(foreach base,$(shell xargs -L1 basename < $(SKETCH).deps),$(TMP-DIR)/$(base).obj)
endef

# Main sketch file is converted to .cpp
$(SKETCH).cpp: $(SKETCH).pde
	@echo Using $($(BOARD).name)
	@cat $< | sed -e '1i #include "WProgram.h"' > $@

# Save a list of project dependencies and copy them to TMP-DIR
$(SKETCH).deps: $(SKETCH).cpp $(wildcard *.c) $(wildcard *.cpp) $(wilcard *.h) $(wildcard *.hpp)
	@$(CXX) $(CXXFLAGS) -MM $< | sed -e 's/ \\//' -e '$ a.' -e "$ a$(EXTRA-DEPS)" | tail -n +2 | sort | uniq | xargs -L 1 dirname | uniq \
	| xargs -L1 -I{} find {} -maxdepth 1 -type f -regex .*\\\(c\\\|cpp\\\)$ > $@
	@xargs -L1 -I{} cp {} $(TMP-DIR) < $@

# Intermediate object files for dependencies
# @TODO Update when system libraries update
$(TMP-DIR)/%.obj: $(TMP-DIR)/%
	$(CC) $(CFLAGS) $(TMP-DIR)/$* -c -o $@

# @TODO Must somehow expand list-obj-deps call after updating .deps file
$(SKETCH).elf: $(SKETCH).deps $(call list-obj-deps)
	$(CC) $(CFLAGS) $(call list-obj-deps) -o $@

%.hex: %.elf $(call list-obj-deps)
	avr-gcc -O ihex $< $@

program: $(SKETCH).hex
	avrdude -p$($(BOARD).build.mcu)\
		-c$($(BOARD).upload.protocol)\
		-P$(PORT)\
		-b$($(BOARD).upload.speed)\
		-U flash:w:$(wildcard *.hex)\

clean:
	@rm -frv `hg status --unknown --no-status`

First we make a blink.cpp from blink.pde my adding

#include "WProgram.h"

Then through the GCC -M option we make a list of dependencies for our sketch:

$ $CXX $CXXFLAGS -MM blink.cpp

blink.o: blink.cpp \
 /usr/share/arduino-0021/hardware/arduino/cores/arduino/WProgram.h \
 /usr/share/arduino-0021/hardware/arduino/cores/arduino/wiring.h \
 /usr/share/arduino-0021/hardware/arduino/cores/arduino/binary.h \
 /usr/share/arduino-0021/hardware/arduino/cores/arduino/WString.h \
 /usr/share/arduino-0021/hardware/arduino/cores/arduino/HardwareSerial.h \
 /usr/share/arduino-0021/hardware/arduino/cores/arduino/Stream.h \
 /usr/share/arduino-0021/hardware/arduino/cores/arduino/Print.h \
 /home/bojes/sketchbook/libraries/SdFat/SdFat.h \
 /home/bojes/sketchbook/libraries/SdFat/Sd2Card.h \
 /home/bojes/sketchbook/libraries/SdFat/Sd2PinMap.h \
 /home/bojes/sketchbook/libraries/SdFat/SdInfo.h \
 /home/bojes/sketchbook/libraries/SdFat/FatStructs.h \
 /usr/share/arduino-0021/hardware/arduino/cores/arduino/Print.h \
 /usr/share/arduino-0021/libraries/Wire/Wire.h

Then according to the BuildProcess

The .c and .cpp files of the target are compiled and output with .o extensions to this directory, as is the main sketch file and any other .c or .cpp files in the sketch and any .c or .cpp files in any libraries which are #included in the sketch.

I assume that we need to gather all .c or .cpp files in any of directories listed by gcc -MM, copy them to our project's temporary lib and compile in objective files using

avr-gcc -c

plus the required include path options and F_CPU definition. With my Makefile I ended up with these files in my tmp/ subdirectory:

blink.cpp
blink.cpp.obj
HardwareSerial.cpp
HardwareSerial.cpp.obj
main.cpp
main.cpp.obj
pins_arduino.c
pins_arduino.c.obj
Print.cpp
Print.cpp.obj
Sd2Card.cpp
Sd2Card.cpp.obj
SdFile.cpp
SdFile.cpp.obj
SdVolume.cpp
SdVolume.cpp.obj
Tone.cpp
Tone.cpp.obj
WInterrupts.c
WInterrupts.c.obj
Wire.cpp
Wire.cpp.obj
wiring_analog.c
wiring_analog.c.obj
wiring.c
wiring.c.obj
wiring_digital.c
wiring_digital.c.obj
wiring_pulse.c
wiring_pulse.c.obj
wiring_shift.c
wiring_shift.c.obj
WMath.cpp
WMath.cpp.obj
WString.cpp
WString.cpp.obj

Now from

These .o files are then linked together into a static library and the main sketch file is linked against this library.

It looks like I need to call avr-gcc (without -c) with all these files listed as arguments:

avr-gcc -mmcu=atmega328p -Os -fno-threadsafe-statics   tmp/blink.cpp.obj tmp/Sd2Card.cpp.obj tmp/SdFile.cpp.obj tmp/SdVolume.cpp.obj tmp/wiring.c.obj tmp/main.cpp.obj tmp/WMath.cpp.obj tmp/WInterrupts.c.obj tmp/HardwareSerial.cpp.obj tmp/pins_arduino.c.obj tmp/Tone.cpp.obj tmp/wiring_digital.c.obj tmp/wiring_analog.c.obj tmp/Print.cpp.obj tmp/wiring_pulse.c.obj tmp/WString.cpp.obj tmp/wiring_shift.c.obj tmp/twi.c.obj tmp/Wire.cpp.obj -o blink.elf

After fixing the __cxa_pure_virtual error by adding dummy handler extern "C" void __cxa_pure_virtual() { while (1); } to my code (-fno-threadsafe-statics doesn't help somehow), I end up with one big .elf waiting to be converted to ihex and flashed:

blink.elf: ELF 32-bit LSB executable, Atmel AVR 8-bit, version 1 (SYSV), statically linked, not stripped

The size of the resulting binary (30K) makes me think that I somehow failed to follow this section:

Only the parts of the library needed for your sketch are included in the final .hex file, reducing the size of most sketches.

And my binary contains all the code from included libraries no matter if this code is used or not. How can I make it include only the code needed to run my code in blink.cpp?

Adding

-ffunction-sections -fdata-sections

options to C/CXXFLAGS in order to force each function reside in its own section coupled with

-gc-sections

linker option allowed me to reach 9.5k hex size compared against 8.3k from Arduino IDE.

I just made an eclipse plugin that is able to build arduino projects in the same way that the arduino IDE can. Although I only had to modify the existing source code from the IDE (I didn't have to translate it into a makefile), I did have to find where in the Arduino IDE source was the code that actually does the building.

The process goes something like this:
Sketch.java's build function is called
build calls preprocess (this is the function that copies all of the pde files into a single c file and does the prototypes and whatnot.
preprocess does (creates the main .c file)
If there are .c or .h files in the project directory it copies them nto the build directory
it also makes a list of all the imported files and theyre locations (not sure how it knows where they are located)

build then creates a new compiler and calls the compiler's compile function
the compile function creates the .o files for all the used libraries.
then it creates all the core .o files
then it links all the core.o files into a static library (core.a)
then it creates a .elf file from the .o files
then it extracts eeprom data and makes a .eep file
then it builds the . hex file

You can find the needed code to see what its doing in the arduino source... the important files are:
Sketch.Java, Compiler.java, PdePreprocessor.java

Or on github, I have the same files (slightly modified and commented) at:

hope this helps (sorry, I don't know much about makefiles)

@Trump211

Thank you for sharing the code. Looks like I've missed the .eep generation and your code uses avr-ar to combine all object files into core.a file. I wonder if this could the be the reason of my hex being bigger than IDE's.

There is a tutorial for building applications without the Arduino IDE on the Project Struix site that may be helpful.

The main site is here:

heres the commands that arduino calls when it builds the AnalogReadSerial example... It will be basically the same for yours, just with different names for the files. the -DARDUINO=22 doesn't seem to do anything. Aslo, I replace the /hardware/arduino directory with a ~ for readability

#before anything else happens, the ppde files get put into a cpp file, with the added includes, and prototypes
#makes the .o for the user code
avr-g++ -c -g -Os -w -fno-exceptions -ffunction-sections -fdata-sections -mmcu=atmega328p -DF_CPU=16000000L -DARDUINO=22 -I~\cores\arduino BuildDir\AnalogReadSerial.cpp -oBuildDir\AnalogReadSerial.cpp.o

#builds the core .o files
avr-gcc -c -g -Os -w -ffunction-sections -fdata-sections -mmcu=atmega328p -DF_CPU=16000000L -DARDUINO=22 -I~\cores\arduino ~\cores\arduino\pins_arduino.c -oBuildDir\pins_arduino.c.o 
(repeats for all the c and cpp files in the core directory)(note that for the cpp files the only difference is avr-gcc becomes avr-g++)

#builds the core.a file from the core .o files
avr-ar rcs BuildDir\core.a BuildDir\pins_arduino.c.o 
(repeats for all.o files made in last step)

#builds the .elf file from the user code.o and the core.a
avr-gcc -Os -Wl,--gc-sections -mmcu=atmega328p -o BuildDir\AnalogReadSerial.cpp.elf BuildDir\AnalogReadSerial.cpp.o BuildDir\core.a -LBuildDir -lm

#makes the .eep data
avr-objcopy -O ihex -j .eeprom --set-section-flags=.eeprom=alloc,load --no-change-warnings --change-section-lma .eeprom=0 BuildDir\AnalogReadSerial.cpp.elf BuildDir\AnalogReadSerial.cpp.eep 

#makes the hex file
avr-objcopy -O ihex -R .eeprom BuildDir\AnalogReadSerial.cpp.elf BuildDir\AnalogReadSerial.cpp.hex

If you have the output that your makefile creates it might be a good thing thing, so that they can be compared

Be sure to try compiling your sketch in the arduino IDE in "verbose" mode to see all the steps it is taking.

I'm glad to see you doing this -- I have been contemplating tackling this myself, but since you
are hard at it I'll just wait and make use of what you are doing. I am eager to escape the IDE
largely because I want to use my own editor, but also because I have been using Makefiles
for longer than I can remember and feel it is healthy to be in touch with the build process.

Keep it up!

I think I'll inspect disassembled .hex files from my build and from Arduino IDE to find out the reason behind the size difference.

I'll inspect disassembled .hex files

Don't do that; inspect the .elf files instead. The .elf files contain MUCH more information (symbols, linenumber info, sections, etc.) The .hex file is completely stripped except for the actual object code bytes!

Yes, right, I confused the two.