ATmega328P, Optiboot, stk500_recv() and going deep

I have a custom board with an ATmega328P, with no external clock source. As an exercise in masochism, I'd like to get Optiboot running on it.

Long story short, I grabbed optiboot_atmega328.hex from my Arduino IDE 2.3.6 installation here: ~/Library/Arduino15/packages/arduino/hardware/avr/1.8.6/bootloaders/optiboot/optiboot_atmega328.hex.

and flashed it. Note this hex file is compiled with F_CPU set to 16MHz.

Flashing using an AVR Programmer via the ICSP interface is going well. I'm using fuse settings L: 0xE2, H: 0xDE, E: 0xFC to select the internal 8MHz RC Oscillator, turn off CKDIV8, enable the bootloader on reset, and set its size to 256 words.

But when uploading using the bootloader, stk500 fails over UART0, but not straight away!

I'm using Arduino's avrdude (Version 6.3-20190619), Arduino's avrdude.conf file, -patmega328p and -carduino, and crucially, -b57600.

From what I can see, avrdude does exactly what it should and so does the bootloader, at first. I can see DTR-triggered reset, the three LED pin flashes, and I can even see it respond to the "0 [30] [20]" sync pattern with the expected ". [14] . [10]". But then the MCU resets (or at least, the TX line goes high impedance, so I'm guessing it resets).

Rabbit Hole #1

So obviously a baud rate mismatch right? Well, no, I don't think so. Optiboot uses F_CPU and BAUD_RATE to set the uart clock, and in attempting to set it assuming 16MHz and 115200, actually sets the prescaler to 16, which produces 57600 at 8MHz. How close is the 8MHz? Bloody close.

Here's two bytes sent by my USB to UART dongle. From start bit to stop bit (9 bits) is 156µs. That's 57692 baud.

And here's a byte sent back from the ATmega328P. 155.9µs. Eerily, eerily similar rate.

Even if I measure a single bit period, the worse case baud I could find is 58140, a mere 1% faster than ideal.

Honestly I'm a little surprised these are so close to ideal, but in any case, it seems we have a good baud rate match.

Rabbit Hole #2

So obviously, the watchdog is triggering early right? Well, no.

I was surprised find such a big range for the oscillator that drives the watchdog. Though the typicals seem might tighter.

Optiboot uses a 1 second timeout (128k cycles). But for my 8MHz situation, the LED pulses take nearly 800ms. Here the LED pin signal is in magenta, Tx from the point of view of the host (ie. rx for ATmega328P) in yellow, Rx in cyan. Major events are annotated in orange.

I now realise the watchdog is reset after each flash, so it doesn't matter. Before I realised that I painstakingly modified the hex file to change the watchdog timeout from 1 second (bit pattern 0x0E) to 2 seconds (bit pattern 0x0F). After re-flashing, it didn't make any difference.

Smoking Gun

So the crux of the problem seems to be at the tail end of the capture above. The ATmega328P boots, Optiboot runs and pulses the LEDs three times. Meanwhile, avrdude is sending the sync pair STK_GET_SYNC, CRC_EOP (0x30, 0x20). The ATmega328P receives them, but the receive FIFO is two bytes long, so only the last pair is intact once the LED pulsing finishes. At that point it reads those characters, verifies the space (0x20) character is where it should be, and responds with STK_INSYNC, STK_OK (0x14, 0x10).

But, fatefully, avrdude continues with its first STK_GET_PARAMETER command ("Send: A [41] . [80] [20] "), and the ATmega328P hangs up the phone. About 16ms after STK_GET_PARAMETER is sent, the rx line gets pulled up to the tx voltage, suggesting the ATmega328P is no longer asserting it. 16ms happens to be Optiboot's "pull the pin" watchdog timeout setting.

You can see that here. This is the last part of the previous capture. The RX bytes are 0x30, 0x20 and the tx bytes shortly afterwards are 0x41, 0x80, 0x20.

To be continued

I'm in too deep now. I'm determined to get to the bottom of this headscratcher. Any words of wisdom, or otherwise, gratefully accepted.

Curiouser and curiouser.

I used a terminal program instead to play the part of avrdude. Here's what happens if I send STK_READ_SIGN while the bootloader is running:

Optiboot responds with STK_INSYNC, 0x1E950F, STK_OK, exactly as it should.

So what happens if I send the 3 byte STK_GET_PARAMETER command?

Exactly what should happen. Optiboot responds with STK_INSYNC, 0x03, STK_OK. So why doesn't it do it when avrdude sends it?

Riiiiight. So having confirmed that it works with the terminal program, even if the same sequence of commands are sent, about the only point of difference left was the timing of the LED pulses.

I modified the Optiboot hex file to reduce the number of pulses from 3 to 1, mindful that at 16MHz they would be twice as quick, but also that the sync commands are sent while the pulses are still happening.

And now it works.

avrdude still sends three sync commands. The first one still arrives while the LED is pulsing, but in theory this time the UART FIFO doesn't overflow. Like before, the characters are pulled off as soon as the pulsing finishes, and the response is sent. The response to the next two sync commands is sent immediately, as all all the responses to the commands that follow.

So.... wha? How, in the first case, did Optiboot manage to pull a sync command out of the FIFO, but then get stuck on the next command? Why would the lack of FIFO overflow make a difference?

Hold the phone.

A data overrun occurs when the receive buffer is full (two characters), it is a new character waiting in the receive shift register, and a new start bit is detected.

This butchered sentence seems to reveal an important idiosyncrasy. My interpretation is the same as this one, though it was never confirmed.

That is, there is actually 3 bytes stored in the UART when an overrun occurs: the 2 in the FIFO and another one in the shift register! If so, then the three sync commands will fill the UART like so:

F0|F1|SR  : FIFO 0 (the next to be read), FIFO 1, shift register
--|--|--
30|20|XX  : Both bytes of 1st sync command arrives. No overrun.
30|20|30  : 1st byte (0x30) of 2nd sync command arrives. FIFO is full.
30|20|20  : 2nd byte (0x20) of 2nd sync command arrives. Previous byte is lost.
30|20|30  : 1st byte (0x30) of 3rd sync command arrives. Previous byte is lost.
30|20|20  : 2nd byte (0x20) of 3rd sync command arrives. Previous byte is lost.
20|20|XX  : 1st byte (0x30) is read. Shift register moved into FIFO.
20|XX|XX  : 2nd byte (0x20) is read. First sync command correctly parsed.
XX|XX|XX  : 3rd byte (0x20) is read. Uh oh. Due to overrun, this
          : is actually the 2nd byte of the 3rd sync command.

This seems to be the behaviour observed here - if six bytes are sent, and then all are read, you get the 1st, the 2nd, and then the last!

So that, if true, would explain it. And maybe it would explain why for decades there have been reports on this forum of mysterious "stk500_recv(): programmer is not responding" issues relating to baud rate / cpu frequency config. If the LED pulses overlap with avrdude's sync commands (and they very nearly do even in the usual case) you'll get incomplete comms.

@fanofard , it's been a decade, but I wonder if my observations shine a light on your experience here.

avrdude has changed a bit over the years, but the whole reset and sync timing has always seemed a little "try stuff until it works". Maybe that's why there's been mixed reports on Optiboot working at 8MHz / 57600 baud.