How about this: When compiling small programs, the size of the resulting binary code can be larger than you might expect, because in addition to the code that you have written, the compiler brings in additional code needed to implement any functions you envoke, plus other "support functions." This "hidden" code is usually loaded in "clumps" (library files) and each clump may include more code than is strictly needed to implement your program. As programs get larger, these libraries are reused, and any unused portions are more likely to be used anyway, so the "overhead" you see in a small program becomes a smaller percentage of the total size of a large program.
This is in contrast to a "interpreter" like the Basic Stamp, where the equivalent of the overhead code is built permanently into the chip, and the "binary" of your program consists only of references to that already-existing code (in one form or another.)
This is also in contrast to writing software in assembler or "machine code", where you would have to write EVERYTHING that the microcontroller would have to do, but wouldn't have to go one instruction beyond that.
Also, the arduino environment and libraries are somewhat optimized for "clarity of expression" rather than making small programs very small. You COULD write a "blink" program using the compiler behind Arduino that was much smaller than the ~1500 bytes of the blink sketch, but it wouldn't be understandable by most of the arduino community...
So let's look at that output of "avr-nm -S" in reply #6 in more detail. (This is essentially the "map" output that your co-worked was talking about. It might be slightly clearer for other implementations of "map file", but in general telling a beginner that a map file will show him where the memory went is about like telling your grandmother that you use the internet to send email...)
000002fc 00000054 T delay
00000400 00000096 T digitalWrite
0000008b 00000014 T digital_pin_to_bit_mask_PGM
00000077 00000014 T digital_pin_to_port_PGM
0000009f 00000014 T digital_pin_to_timer_PGM
This first set of functions is the code that implements the "delay" and "digitalWrite" functions. Almost 300 bytes, since Arduino does some pretty complicated things to make digitalWrite work on all the possible pins, and Delay deals wirth 32bit numbers (on an 8bit CPU) for the time values.
0000057a W exit
00000350 00000074 T init
0080010a 00000004 b intFunc
This is the code that initializes the arduino to the state that it is expected to be in when a sketch starts... About 100 bytes worth.
00800100 00000002 D ledPin
0000010a 0000002e T loop
This is the loop function from "blink." (and the ledPin variable.) This is a much more reasonable (compared to the size of the source code) 46 bytes.
00000144 0000000e T main
000003c4 0000003c T pinMode
00000072 00000005 T port_to_input_PGM
00000068 00000005 T port_to_mode_PGM
0000006d 00000005 T port_to_output_PGM
library functions that implement the pinMode() function. (and some more stuff that goes with digitalWrite)
0080011e 00000080 B rx_buffer
0080011a 00000002 B rx_buffer_head
0080011c 00000002 B rx_buffer_tail
00000496 00000010 T serialWrite
library code for dealing with the serial port ("Serial.print()") that shoudn't really be in your sketch, and won't be in v15 (probably)
00000138 0000000c T setupThe setup() function from your sketch. 12 bytes.
00800112 00000004 B timer0_clock_cycles
00800116 00000004 B timer0_millis
0080010e 00000004 B timer0_overflow_count
More of the code that implements millis() and the timer for delay()