bug in optiboot bootloader

i posted this in the programming forum earlier today, as a continuation of another problem, but thought it might be helpful to start another thread specifically about this issue here. basically, AVRdude optimizes page writes by eliminating trailing FF's (as any page should be cleared to FF before writing). But, the optiboot bootloader writes a full page, regardless of the actual page length. here is the code from optiboot.c:

    else if(ch == STK_PROG_PAGE) {
      // PROGRAM PAGE - we support flash programming only, not EEPROM
      uint8_t *bufPtr;
      uint16_t addrPtr;

      getch();			/* getlen() */
      length = getch();
      getch();

      // If we are in RWW section, immediately start page erase
      if (address < NRWWSTART) __boot_page_erase_short((uint16_t)(void*)address);

      // While that is going on, read in page contents
      bufPtr = buff;
      do *bufPtr++ = getch();
      while (--length);

      // If we are in NRWW section, page erase has to be delayed until now.
      // Todo: Take RAMPZ into account
      if (address >= NRWWSTART) __boot_page_erase_short((uint16_t)(void*)address);

      // Read command terminator, start reply
      verifySpace();

      // If only a partial page is to be programmed, the erase might not be complete.
      // So check that here
      boot_spm_busy_wait();

ths is the section where it writes a temporary buffer with the incoming data. it checks the length of data in the block ("length"), and only writes "length" bytes to the temporary buffer. in the next section, where it writes the temporary buffer to FLASH, it writes the whole buffer (SPM_PAGESIZE/2):

      // Copy buffer into programming buffer
      bufPtr = buff;
      addrPtr = (uint16_t)(void*)address;
      ch = SPM_PAGESIZE / 2;
      do {
        uint16_t a;
        a = *bufPtr++;
        a |= (*bufPtr++) << 8;
        __boot_page_fill_short((uint16_t)(void*)addrPtr,a);
        addrPtr += 2;
      } while (--ch);

so if the last few lines of a partial page write have FF in them, they will actually get written with the ending contents of the previous page. this is an unlikely scenario (the last few bytes being FF), but occurs when you declare a variable or variables to FF, and they are the last variable(s) you declare. a work-around is to make sure the last variable you declare is not FF, or re-initialize your FF variables in setup(). as for the bootloader, probably the best fix is to only write "length" bytes from the temporary buffer to the FLASH.

i realized there was a more appropriate forum for this post, and reposted there. feel free to delete this one.

Ugh. Next time, please contact a moderator. (Other topic deleted.)

I think this solves the problem...

getch(); /* getlen() /
length = getch();
ch = length / 2;
getch();
...
addrPtr = (uint16_t)(void
)address;
ch = SPM_PAGESIZE / 2;
do {

...nope. Now I remember. There is no definitive solution. Optiboot has no way of knowing how many 0xFF values avrdude trimmed.

how does one contact a moderator, or know which moderator to contact? there sure are a lot of subforums around here.

also, even though its unknown how many FF's have been trimmed, filling up the remaining space with FF (or leaving it FF as the case is), is way safer then just rewriting the trailing info from the last page.

AVRdude optimizes page writes by eliminating trailing FF's (as any page should be cleared to FF before writing)

Hmm. I knew that it skipped pages that were completely filled with 0xFF; this is the first I've heard that it eliminates trailing bytes within an individual page. Are you sure it does that? (I supposed I can check, too...)
For the end-of-sketch version of this problem, there is already Google Code Archive - Long-term storage for Google Code Project Hosting.
and avrdude doesn't properly write long series of 0xFF [imported] · Issue #236 · arduino/ArduinoCore-avr · GitHub

Frankly, I regard this (both of them) as a bug in avrdude. In the arduino case, it's explicitly been given the -D switch that supresses the initial chip-erase that a real programmer would do, so it shouldn't be assuming that Flash contains 0xFF to start with. Assuming that a per-page buffer need only be partially filled is even more seriously wrong, IMO...

It's hard to fix on the optiboot side without exceeding the space available :frowning:

Is this what we ran into here? (Although it may have been known before that.)

so it looks like this has been known about for a while, and that there isnt a solution that will work for all cases. the "entire page of FF" case is particularly tricky.

i would agree that its a problem with AVRdude. does AVRdude have a switch for sending all the bytes? im not too familiar with AVRdude. but, it seems like there isnt a downside to at least writing the remaining page of memory with FF (or leaving it unwritten as FF). it has already been erased, so there is no old code to save anymore, and putting what amounts to random bytes in is less predictable than just leaving FF (and takes marginally longer).

See this thread, particularly my post here:

http://forum.arduino.cc/index.php?topic=84243.msg631727#msg631727

I suggest a fix to Optiboot to correct that.

Except for the corner case when the FFs appear on a new flash page boundary at the end, then not even Optiboot will know to write it.

Nick, I tried to build your fix with avr-gcc v4.3.3, but this came out 2 bytes too big. (514 bytes)

     // While that is going on, read in page contents
      uint8_t i;
      for (i = 0; i < length; i++)
        *bufPtr++ = getch();
      // fill out rest of page
      for ( ; i < SPM_PAGESIZE / 2; i++)
        *bufPtr++ = 0xFF;

So I tried this instead. (with about 14 bytes to spare)

      // While that is going on, read in page contents
      uint8_t i = length;
      bufPtr = buff;
      do *bufPtr++ = getch();
      while (--length);
      for ( ; i < SPM_PAGESIZE / 2; i++)
	  *bufPtr++ = 0xFF;

Seems to work with the -1 test.

would it be possible for you to post the hex or elf?

The Optiboot hex for the m328p is attached.

optiboot_atmega328.hex (1.45 KB)

Nick, I noticed that your fix is only filling the first half a flash page instead of the whole page, why is that?

for ( ; i < SPM_PAGESIZE / 2; i++)
   *bufPtr++ = 0xFF;

Here is the hex dump of the last 2 pages of the flash memory of Jack's "-1" test code, uploaded via modified Optiboot.

:100300003F912F910F900FBE0F901F9018957894EA
:1003100084B5826084BD84B5816084BD85B582600A
:1003200085BD85B5816085BDEEE6F0E080818160A8
:100330008083E1E8F0E010828081826080838081A8
:1003400081608083E0E8F0E0808181608083E1EB80
:10035000F0E0808184608083E0EBF0E08081816068
:100360008083EAE7F0E0808184608083808182601E
:100370008083808181608083808180688083109207

:10038000C1000895F894FFCFFFFFFFFFFFFFFFFFBD
:10039000FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF6D
:1003A000FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF5D
:1003B000FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF4D
:1003C00081608083E0E8F0E0808181608083E1EB00
:1003D000F0E0808184608083E0EBF0E080818160E8
:1003E0008083EAE7F0E0808184608083808182609E
:1003F0008083808181608083808180688083109287

Notice how the updated Optiboot code filled in unused memory with 0xFFs, but only half the page.

hiduino:
The Optiboot hex for the m328p is attached.

thanks. now if it works and no other bugs have cropped up as a result this should become standard distribution. thanks again.

ps maybe byte vs word causes half page issue? i saw this often working on custom bootloaders.

haha..

what does this mean to us 'noobs' here?

I have used Optiboot to flash to chips... but I wouldnt have a clue about switches, compiling, looking under the hood..

Does any of this concern the 'normal' (beginner) members around here?

westfw:
Frankly, I regard this (both of them) as a bug in avrdude. In the arduino case, it's explicitly been given the -D switch that supresses the initial chip-erase that a real programmer would do, so it shouldn't be assuming that Flash contains 0xFF to start with. Assuming that a per-page buffer need only be partially filled is even more seriously wrong, IMO...

I would disagree with this because even when a full chip erase is not done, you still
have to erase a page before you write to the page, and when you erase the page
it erases that page to all 0xff. So not sending the trailing 0xff bytes in page seems perfectly reasonable,
particularly considering that those bytes have to transfered from the host over interfaces
that are sometimes not very fast.

The issue seemed to be that optiboot was not handling it correctly
by burning more bytes into the page than it was given and thereby corrupting
the back end of the page.

An interesting question would be if avrdude sends a full page of 0xff to burn the page
when the full page is 0xff when the -D option is used. If it isn't, then I'd say that is definitely a bug.

--- bill

xl97:
haha..

what does this mean to us 'noobs' here?

I have used Optiboot to flash to chips... but I wouldnt have a clue about switches, compiling, looking under the hood..

Does any of this concern the 'normal' (beginner) members around here?

its a pretty rare occurance that it causes a problem, so probably not a big worry. it happens when the last variables you define are FF.

here is an alternate fix, that i will compile and test out when i get a second, probably tomorrow. but if anyone sees any glaring errors with it, feel free to point them out. i define "ch" earliear as "length/2" so it can be use for the page writes.

    /* Write memory, length is big endian and is in bytes */
    else if(ch == STK_PROG_PAGE) {
      // PROGRAM PAGE - we support flash programming only, not EEPROM
      uint8_t *bufPtr;
      uint16_t addrPtr;

      getch();			/* getlen() */
      length = getch();
      ch = length / 2;  // save a copy for page writing down below
      getch();

      // If we are in RWW section, immediately start page erase
      if (address < NRWWSTART) __boot_page_erase_short((uint16_t)(void*)address);

      // While that is going on, read in page contents
      bufPtr = buff;
      do *bufPtr++ = getch();
      while (--length);

      // If we are in NRWW section, page erase has to be delayed until now.
      // Todo: Take RAMPZ into account
      if (address >= NRWWSTART) __boot_page_erase_short((uint16_t)(void*)address);

      // Read command terminator, start reply
      verifySpace();

      // If only a partial page is to be programmed, the erase might not be complete.
      // So check that here
      boot_spm_busy_wait();

#ifdef VIRTUAL_BOOT_PARTITION
      if ((uint16_t)(void*)address == 0) {
        // This is the reset vector page. We need to live-patch the code so the
        // bootloader runs.
        //
        // Move RESET vector to WDT vector
        uint16_t vect = buff[0] | (buff[1]<<8);
        rstVect = vect;
        wdtVect = buff[8] | (buff[9]<<8);
        vect -= 4; // Instruction is a relative jump (rjmp), so recalculate.
        buff[8] = vect & 0xff;
        buff[9] = vect >> 8;

        // Add jump to bootloader at RESET vector
        buff[0] = 0x7f;
        buff[1] = 0xce; // rjmp 0x1d00 instruction
      }
#endif

      // Copy buffer into programming buffer
      bufPtr = buff;
      addrPtr = (uint16_t)(void*)address;
 //     ch = SPM_PAGESIZE / 2; // ch redefined up above for new boundary size
      do {
        uint16_t a;
        a = *bufPtr++;
        a |= (*bufPtr++) << 8;
        __boot_page_fill_short((uint16_t)(void*)addrPtr,a);
        addrPtr += 2;
      } while (--ch);

      // Write from programming buffer
      __boot_page_write_short((uint16_t)(void*)address);
      boot_spm_busy_wait();

#if defined(RWWSRE)
      // Reenable read access to flash
      boot_rww_enable();
#endif

also, i just verified that AVRdude does NOT send full pages of FF to the arduino, when they are present in the .hex file.

hiduino:
Nick, I noticed that your fix is only filling the first half a flash page instead of the whole page, why is that?

Probably because of this:

      // Copy buffer into programming buffer
      bufPtr = buff;
      addrPtr = (uint16_t)(void*)address;
      ch = SPM_PAGESIZE / 2;

They keep switching between words and bytes in the documentation. But I see under that, that for each "ch" they copy two bytes so your fix is probably correct.