My take on all the ARduino ISP problems is that the real problem is that
the HardwareSerial buffer down in the core code is
not large enough to hold a full command buffer from avrdude.
The buffer in the sketch is an attempt to work around this limitation in the core code.
In fact, if the core code buffer was larger, ArduinoISP would not have to re-buffer the data
and could simply pull it directly from HardwareSerial as needed.
avrdude will wait 5 seconds for a command to complete and during page write commands
it waits for a command to complete before it sends the next command.
You would think that 5 seconds is long enough to allow the sketch to do
anything goofy it wants to do including goofy 10's of millisecond delays to flicker an LED.
HOWEVER......
Avrdude will send a flash page size of bytes of data (128 bytes for a mega328) plus
4 bytes of command header and 1 byte of sync-EOP-CRC for a total of 133 bytes.
The HardwareSerial RX buffer is not that large.
It used to be 128 bytes, now in the 1.x core it is only 64 bytes.
By having an RX buffer that is smaller than the size of the total bytes that avrdude sends on a command,
it means that the sketch must always drain the HardwareSerial buffer while avrdude is still squirting it out.
Because of the way the sketch code is written, it has trouble doing it.
It was easier to keep up in pre 1.x since the core buffer was nearly large enough at 128 bytes.
Now that the core code buffer is much smaller (64 bytes),
the sketch struggles to keep up given what it is doing and the way it is doing things.
The goofy 20ms delay in heartbeat() doesn't help - in fact it really hurts particularly
with only a 64 byte RX buffer.
loop() calls heartbeat() which delays for 20ms.
Given it is called in the loop and there really isn't much else in the loop other than a check
on Serial.available() is very likely that this will delay processing any RX characters.
If you are running a baud rate of 38400 or beyond, that 20ms delay is guaranteed to
overflow the 64 byte RX buffer down in HardwareSerial before the sketch even gets
a chance to start looking at the bytes much less extracting them.
If the RX buffer down in the core code were large enough for the entire avrdude command,
the baud rate is no longer a factor and the sketch could take up to 5 seconds to process it,
since avrdude will not send more than 1 command at time.
It would also mean that the sketch wouldn't have to re-buffer the data and things
could get a bit simpler in the sketch and timing within the sketch
would no longer be an issue other than slowing down the burn speed.
However the HardwareSerial buffer isn't large enough.
After ArduinoISP finally "notices" a STK_PROG_PAGE command
ArduinoISP eventually calls fill() which will copy characters from HardwareSerial
into buff[] by calling getch() which calls Serial.read()
The problem is fill() wants all the expected data bytes
and getch() will wait FOREVER if there is no character to read.
So if a byte is dropped/lost down in HardwareSerial due to a rx buffer overrun
because the sketch isn't keeping up, ArduinoISP hangs in a loop in fill()
potentially forever waiting for all the date bytes it expects from the PROG_PAGE command.
Now both ends are waiting on each other.
avrdude times out after 5 seconds and will try to get back in sync by sending
a GET_SYNC command - again waiting 5 seconds for a response.
Things continue like this for up to 33 retries to get back in sync.
Meanwhile the ArduinoISP sketch can still be out of sync with avrdude
as it could be hung in its fill() routine
since it is waiting for the full data bytes from the PROG_PAGE command.
ArduinoISP simply isn't very robust with respect to recovering from lost serial data.
A simple fix would be to bump the size of the buffer down in the HardwareSerial core code to be 256 bytes
(even if only temporarily to build the sketch and then change it back).
This would allow the sketch to run "as is" including with the goofy current heartbeat() delays.
There are smarter ways to fix HardwareSerial permanently so it can continue to work on the multi port
AVRs, but it requires fixing more of the HardwareSerial
core code to do things like not use the same size buffer on RX and TX and vary the buffer sizes depending on the number
of serial ports and use some smarter weak functions to allow eliminating ram buffers for serial ports
that are not used.
An even better solution would be to allow the sketch to declare the HardwareSerial constructors
which could define the buffer sizes for the TX and RX buffers and let the user define what he wants for the ports he
is actually using.
But those fixes are not as quick and easy as just a temporary kludge to bump
the buffer size while you build the ArduinoISP sketch.
Another easy optimization that can help a little bit (but not eliminate the issue)
is to change the head/tail pointers to unsigned chars instead of unsigned ints.
This is a an easy modification that can done and left in.
It will make the HardwareSerial code smaller and a bit faster.
I make this change to every single Arduino release.
There is no real need to use unsigned ints since they would only come into play
with buffers larger than 256 bytes which they aren't.
Further, if buffers were ever made larger than 256 byes there are other
portions of the code that would break because the code is not properly handling atomicity issues
with respect to the head/tail pointers. i.e. they assumed that looking at and modifying a head/tail value
is atomic and for larger than 8 bit values, on the AVR, it isn't
--- bill