When does -1 != -1 ?

Something funny with Optiboot?

LOL, that's been true sense it's initial release. Seeing the results of trying to fit 2 pounds of bootloader code into a one pound bootloader space has been quite a spectacle to watch over the last couple of years. Great entertainment it is. ;)

The generated code in setup is correct.

There are two possible root causes... 1. The code generated for __do_copy_data is not correct. 2. The part of Flash storing the initial value (-1) does not contain the correct value.

If I change the -1 to -2, then the LED stays off

The difference between the two HEX files is what would be expected (four bytes different). The difference between the two ELF files is exactly what would be expected.

And, that test eliminates #1.

Thanks for the input. What would be the next step? How do we validate #2?

Upon reading the original post of this thread I thought this was another beginner programming issue. However somewhat surprisingly, I can reproduce this problem on the latest Uno (rev 3 board) with Arduino 0022.

This is with the addition of setting pin 13 low, just in case it was in a random state when the sketch starts. Specifically:

#define TEST_VALUE -1
int foo = TEST_VALUE;

void setup(void)
{
    pinMode(13, OUTPUT);
    digitalWrite (13, LOW);
    if (foo != TEST_VALUE) 
       digitalWrite(13, HIGH);
}

void loop(void) { }

(as others have done).

Also, the generated code looks reasonable:

   if (foo != TEST_VALUE) 
 112:	80 91 00 01 	lds	r24, 0x0100
 116:	90 91 01 01 	lds	r25, 0x0101
 11a:	8f 5f       	subi	r24, 0xFF	; 255
 11c:	9f 4f       	sbci	r25, 0xFF	; 255
 11e:	21 f0       	breq	.+8      	; 0x128 <setup+0x26>

Well this looks like a nice challenge!

My suspicion is that you've uncovered a boundary condition bug in Optiboot.

What would be the next step? How do we validate #2?

AVRDUDE works both ways. It's time to upload then download then compare...

Yeah, just did that a bit earlier, i.e. verify with avrdude after uploading via the IDE. To my surprise, it passed verification.

[quote author=Nick Gammon link=topic=84243.msg631540#msg631540 date=1324594530] Upon reading the original post of this thread I thought this was another beginner programming issue. [/quote]

Understandable, I'm sure you're not alone!

Well this looks like a nice challenge!

Indeed, I'd sure like to understand it at least. OTOH it would not seem to be common or it would have been found before now.

Well this is very interesting. I changed the code to compare to -2 (which does not light the LED). However comparing the generated object for both cases gives this:

*** /Users/nick/nick.txt	2011-12-23 10:04:02.000000000 +1100
--- /Users/nick/nick2.txt	2011-12-23 09:58:00.000000000 +1100
***************
*** 135,141 ****
      if (foo != TEST_VALUE) 
   112:	80 91 00 01 	lds	r24, 0x0100
   116:	90 91 01 01 	lds	r25, 0x0101
!  11a:	8f 5f       	subi	r24, 0xFF	; 255
   11c:	9f 4f       	sbci	r25, 0xFF	; 255
   11e:	21 f0       	breq	.+8      	; 0x128 <setup+0x26>
         digitalWrite(13, HIGH);
--- 135,141 ----
      if (foo != TEST_VALUE) 
   112:	80 91 00 01 	lds	r24, 0x0100
   116:	90 91 01 01 	lds	r25, 0x0101
!  11a:	8e 5f       	subi	r24, 0xFE	; 254
   11c:	9f 4f       	sbci	r25, 0xFF	; 255
   11e:	21 f0       	breq	.+8      	; 0x128 <setup+0x26>
         digitalWrite(13, HIGH);

The only difference is the test! But shouldn’t the data be different, somewhere? It’s like the compiler decided not to generate code for initializing foo.

Download (snippet) when the value is -2…
:20030000E1EBF0E0808184608083E0EBF0E0808181608083EAE7F0E080818460808380814F
:20032000826080838081816080838081806880831092C1000895F894FFCFFEFF1F90189554
:20034000789484B5826084BD84B5816084BD85B5826085BD85B5816085BDEEE6F0E080817A

Download (snippet) when the value is -1…
:20030000E1EBF0E0808184608083E0EBF0E0808181608083EAE7F0E080818460808380814F
:20032000826080838081816080838081806880831092C1000895F894FFCF0F901F901895B2
:20034000789484B5826084BD84B5816084BD85B5826085BD85B5816085BDEEE6F0E080817A

Oops. Somebody forget to put -1 in there.

There is no evidence that the compiler or the linker is at fault. The generated HEX files appear correct. Which leaves AVRDUDE and Optiboot.

From @WizenedEE… “both turn the LED on for my uno but not on my duemilanove” …which leaves Optiboot.

Oh @retrolefty! I'm afraid you hit the nail on the head!

It's dinner time. I'll look at Optiboot later (if I have time).

This reminds me of an issue a while back (which I was about to check with the data on the chip) that the bootloader assumes that the EEPROM has been cleared to FFs, and doesn't upload 0xFF. However I thought that was only for whole pages.

My guess, and it is only a guess, is that the bootloader doesn't upload whole pages of 0xFF, and in addition (to save time) stops uploading the final page when all that is left is 0xFF.

Adding additonal code to the sketch would cause more data to be created (and thus the 0xFFFF has something past it) and of course any number other than 0xFFFF would cause that to be uploaded too.

See this:

http://lists.gnu.org/archive/html/avrdude-dev/2003-05/msg00068.html

I've just committed a change that affects the handling of flash memory. Basically, any time a file on disk is read into memory, it's size is reported such that 0xff padded data is ignored. This has the affect of causing avrdude to only write up to the last non 0xff data values into the flash, ignoring the rest.

Excellent Nick! That is exactly what is happening. The tail of the HEX file with the -1 marked…

:10 0320 00 82608083808181608083808180688083 17
:0A 0330 00 1092C1000895F894FFCF 69
:02 033A 00 FFFF C3
:00 0000 01 FF

The tail of the upload…

########avrdude: Send: U [55] . [80] . [01] [20]
avrdude: Recv: . [14]
avrdude: Recv: . [10]
avrdude: Send: d [64] . [00] : [3a] F [46] . [e1] . [eb] . [f0] . [e0] . [80] . [81] . [84] [60] . [80] . [83] . [e0] . [eb] . [f0] . [e0] . [80] . [81] . [81] [60] . [80] . [83] . [ea] . [e7] . [f0] . [e0] . [80] . [81] . [84] [60] . [80] . [83] . [80] . [81] . [82] [60] . [80] . [83] . [80] . [81] . [81] ` [60] . [80] . [83] . [80] . [81] . [80] h [68] . [80] . [83] . [10] . [92] . [c1] . [00] . [08] . [95] . [f8] . [94] . [ff] . [cf] [20]
avrdude: Recv: . [14]
avrdude: Recv: . [10]

The last ten bytes sent…

. [10] . [92] . [c1] . [00] . [08] . [95] . [f8] . [94] . [ff] . [cf]

Trimmed and put next to the lest ten bytes of the HEX file…

1092c1000895f894ffcf
1092C1000895F894FFCFFFFF

The -1 is in the HEX file but AVRDUDE does not send it. The problem is a combination of AVRDUDE trimming the FFs from the end of the upload and Optiboot not performing a chip erase.

Wow, nice job tracking that down, you guys.

Great work guys, indeed adding another variable makes the problem go away. I did notice, for the original sketch, the IDE reports Binary sketch size: 824 bytes, but avrdude only writes and verifies 822 bytes. With an added variable, the sizes then agree.

So how would the problem be summarized? The last static variable cannot be initialized to -1? Not sure whether "last" is always controllable from the source code.

The problem is a combination of AVRDUDE trimming the FFs from the end of the upload and Optiboot not performing a chip erase.

How can a 328 be programmed without a chip erase, esp if it contains a prior sketch. I thought flash could only be changed (written to) from a ONE bit to a ZERO bit and that the only way to change a zero bit back to a one bit is via chip erase? Or is this just a last block used kind of thingee?

Lefty

What I don’t get (but I think I might now) is why this is a problem. Consider that you erase pages, not bytes. A page is 64 words (128 bytes), so multiples of 128 bytes will be erased. The offending bytes seem to be at 0x33A/0x33B in your example, and 0x33D/0x33E in mine. That is half-way through a page.

What I suspect is happening is this:

  • The boot loader starts reading a page:
 // Immediately start page erase - this will 4.5ms
      boot_page_erase((uint16_t)(void*)address);

      // While that is going on, read in page contents
      bufPtr = buff;
      do *bufPtr++ = getch();
      while (--length);

Note that this is for the supplied length bytes.

  • The page is copied into the “programming buffer”:
 // Copy buffer into programming buffer
      bufPtr = buff;
      addrPtr = (uint16_t)(void*)address;
      ch = SPM_PAGESIZE / 2;
      do {
        uint16_t a;
        a = *bufPtr++;
        a |= (*bufPtr++) << 8;
        boot_page_fill((uint16_t)(void*)addrPtr,a);
        addrPtr += 2;
      } while (--ch);

This time the full page is copied, not the length amount.

  • The page is commited to flash:
  // Write from programming buffer
      boot_page_write((uint16_t)(void*)address);
      boot_spm_busy_wait();

The net effect of this would be that (even though the page was erased) the wrong data was copied from the temporary buffer to flash.

Aha! Proof!

Reading back from the chip:

:2002A000020190930301A0930401B0930501BF91AF919F918F913F912F910F900FBE0F9018
:2002C0001F901895789484B5826084BD84B5816084BD85B5826085BD85B5816085BDEEE670
:2002E000F0E0808181608083E1E8F0E01082808182608083808181608083E0E8F0E08081BA
:2003000081608083E1EBF0E0808184608083E0EBF0E0808181608083EAE7F0E0808184606F
:2003200080838081826080838081816080838081806880831092C1000895F894FFCF0F900A

Directly following the 0xFFCF which should be 0xFFFF, is actually 0x0F90. And if you look back exactly 128 bytes, there is 0x0F90 again!

So I would suggest that the bootloader should clear the temporary buffer to 0xFF before reading into it from incoming serial port.

Now that sounds like a software bug just shot down dead, even for this old hardware type.

But I bet optiboot doesn't have room left in it's code space to add the needed action. 8)

Optiboot, the loader the keeps on giving. ;)

[quote author=Nick Gammon link=topic=84243.msg631711#msg631711 date=1324608869] What I don't get (but I think I might now) is why this is a problem. Consider that you erase pages, not bytes. ... [/quote]

So should only occur for partial pages (presumably the last page)?