When does -1 != -1 ?

The problem is a combination of AVRDUDE trimming the FFs from the end of the upload and Optiboot not performing a chip erase.

How can a 328 be programmed without a chip erase, esp if it contains a prior sketch. I thought flash could only be changed (written to) from a ONE bit to a ZERO bit and that the only way to change a zero bit back to a one bit is via chip erase? Or is this just a last block used kind of thingee?

Lefty

What I don’t get (but I think I might now) is why this is a problem. Consider that you erase pages, not bytes. A page is 64 words (128 bytes), so multiples of 128 bytes will be erased. The offending bytes seem to be at 0x33A/0x33B in your example, and 0x33D/0x33E in mine. That is half-way through a page.

What I suspect is happening is this:

  • The boot loader starts reading a page:
 // Immediately start page erase - this will 4.5ms
      boot_page_erase((uint16_t)(void*)address);

      // While that is going on, read in page contents
      bufPtr = buff;
      do *bufPtr++ = getch();
      while (--length);

Note that this is for the supplied length bytes.

  • The page is copied into the “programming buffer”:
 // Copy buffer into programming buffer
      bufPtr = buff;
      addrPtr = (uint16_t)(void*)address;
      ch = SPM_PAGESIZE / 2;
      do {
        uint16_t a;
        a = *bufPtr++;
        a |= (*bufPtr++) << 8;
        boot_page_fill((uint16_t)(void*)addrPtr,a);
        addrPtr += 2;
      } while (--ch);

This time the full page is copied, not the length amount.

  • The page is commited to flash:
  // Write from programming buffer
      boot_page_write((uint16_t)(void*)address);
      boot_spm_busy_wait();

The net effect of this would be that (even though the page was erased) the wrong data was copied from the temporary buffer to flash.

Aha! Proof!

Reading back from the chip:

:2002A000020190930301A0930401B0930501BF91AF919F918F913F912F910F900FBE0F9018
:2002C0001F901895789484B5826084BD84B5816084BD85B5826085BD85B5816085BDEEE670
:2002E000F0E0808181608083E1E8F0E01082808182608083808181608083E0E8F0E08081BA
:2003000081608083E1EBF0E0808184608083E0EBF0E0808181608083EAE7F0E0808184606F
:2003200080838081826080838081816080838081806880831092C1000895F894FFCF0F900A

Directly following the 0xFFCF which should be 0xFFFF, is actually 0x0F90. And if you look back exactly 128 bytes, there is 0x0F90 again!

So I would suggest that the bootloader should clear the temporary buffer to 0xFF before reading into it from incoming serial port.

Now that sounds like a software bug just shot down dead, even for this old hardware type.

But I bet optiboot doesn't have room left in it's code space to add the needed action. 8)

Optiboot, the loader the keeps on giving. ;)

[quote author=Nick Gammon link=topic=84243.msg631711#msg631711 date=1324608869] What I don't get (but I think I might now) is why this is a problem. Consider that you erase pages, not bytes. ... [/quote]

So should only occur for partial pages (presumably the last page)?

Jack, Does westwf's optiload do the same thing?

Yes, presumably, since one presumes that Avrdude sends whole pages before that.

retrolefty:
But I bet optiboot doesn’t have room left in it’s code space to add the needed action. 8)

My suggested fix would be to change, in Optiboot (my copy anyway):

     // While that is going on, read in page contents
      bufPtr = buff;
      do *bufPtr++ = getch();
      while (--length);

to:

      // While that is going on, read in page contents
      uint8_t i;
      for (i = 0; i < length; i++)
        *bufPtr++ = getch();
      // fill out rest of page
      for ( ; i < SPM_PAGESIZE / 2; i++)
        *bufPtr++ = 0xFF;

On my compile, this changes Optiboot from having free space at 0x7FF0 to 0x7FFC. So, still 4 bytes free. :wink:
This change also doesn’t consume any extra time if the page is already full.

CrossRoads: Jack, Does westwf's optiload do the same thing?

Unknown, have not tried optiload.

A quick browse on Github appears to indicate that this part is unchanged. I couldn't say for certain whether or not it is fixed.

So how would the problem be summarized?

A Perfect(ly Annoying) Storm. :)

The last static variable cannot be initialized to -1?

The last N bytes of an upload cannot be 0xFF where N > 0. Depending on the application, that could be initial values, code, or PROGMEM data.

Not sure whether "last" is always controllable from the source code.

Actually, it is. The toolset gives very fine control over the layout of an image. It's possible to tack a non-0xFF value on the end.

But that may not be a general solution. If AVRDUDE drops pages filled with 0xFF then the problem could still occur.

How can a 328 be programmed without a chip erase, esp if it contains a prior sketch.

The erase is done it's just done page-by-page.

I thought flash could only be changed (written to) from a ONE bit to a ZERO bit and that the only way to change a zero bit back to a one bit is via chip [u]or page[/u] erase?

Exactly.

So I would suggest that the bootloader should clear the temporary buffer to 0xFF before reading into it from incoming serial port.

That will help but it does not entirely solve the problem. If the trailing 0xFFs continue to the next page the problem will occur even with the Optiboot modification.

There was a known problem that a page consisting entirely of 0xFF bytes would be dropped. However it seems to me that, whilst annoying, this is less likely to happen than just having a program that happens to have 0xFF at the end of the sketch.

All it would take to avoid the problem would be for a non 0xFF byte to occur in the next page (with my suggested fix).

The proposed fix has no great overhead, for example, no more pages are erased, and thus the life of the chip in the pages "further out" are preserved (this may not mean much, as presumably you always have to erase the lower pages anyway).

Other fixes could be:

  • Get Avrdude to send the correct length (which in itself could have a time penalty)
  • Erase all pages in Optiboot (this also might take somewhat more time)

If the comments in Optiboot.c are right, it takes 4.5 mS to erase a page, and for 32 Kb there would be 256 pages. Thus erasing them all would take 1.152 seconds. Compare this to the current case of around 1024 bytes in the sketch (8 pages) which only take 36 mS to erase. So a blanket erase would probably cost you a hit of a noticeable extra second per upload.

Your other problem is that a full erase would need to be done on top of fixing the read loop (otherwise junk is still copied in after the erase). With 4 bytes to spare we have probably run out of room, unless the erasing can be optimized a bit (eg. you don't need to decide how much to do).

As a work-around I am tempted to suggest putting something like this at the end of each sketch:

volatile byte bugfix = 1;

The "volatile" is to stop it being optimized away. However, rather strangely, that doesn't fix the problem.

As a work-around I am tempted to suggest putting something like this at the end of each sketch:

Even marked volatile, the linker may discard it because of a lack of reference. Try assigning a value in setup.

For what it's worth, the offending function is here... https://github.com/arduino/avrdude/blob/master/avr.c#L115

In addition to being called in avr.c, it is also called from here... https://github.com/arduino/avrdude/blob/master/fileio.c

Well, I've spent too much time on this. Time to fry some Pascal fish...

The TOOLSET allows it, through use of a linker script, but a user compiling via the Arduino IDE wouldn’t be able to reliably add extra space, would they?

[quote author=Coding Badly link=topic=84243.msg631775#msg631775 date=1324620347] Even marked volatile, the linker may discard it because of a lack of reference. Try assigning a value in setup. [/quote]

Yes, that works, good idea. :)

#define TEST_VALUE -1
int foo = TEST_VALUE;
byte fubar = 42;    // don't ask

void setup(void)
{
  fubar = 43;  // absolutely required
  pinMode(13, OUTPUT);
  digitalWrite (13, LOW);
  if (foo != TEST_VALUE) 
    digitalWrite(13, HIGH);
}

void loop(void) { }

Well, Jack, you certainly stirred up the programmers here with that one!

maniacbug: The TOOLSET allows it, through use of a linker script, but a user compiling via the Arduino IDE wouldn't be able to reliably add extra space, would they?

That's my understanding. But, I believe, a minor change to the linker command-line would make it possible.

Yes, that works, good idea.

Thanks.

Unfortunately, I don't think it's completely reliable. In my experience, linkers always place things in the order presented. The sketch's object file is first followed by the core library. If there just happens to be a global variable initialized to -1 that gets pulled in from the core and placed at the end of the image... Well, you know what happens.

But, that is very very unlikely to happen.

[quote author=Coding Badly link=topic=84243.msg631767#msg631767 date=1324617699]

So how would the problem be summarized?

A Perfect(ly Annoying) Storm. :)

The last static variable cannot be initialized to -1?

The last N bytes of an upload cannot be 0xFF where N > 0. Depending on the application, that could be initial values, code, or PROGMEM data.

Etc. etc. ... [/quote]

Thanks very much for the explanations, and thanks again to all who worked on this for some really super sleuthing! You guys seriously rock!

When I was starting to think that we may have actually discovered a bug, I figured it must be a really obscure one. Now that I understand it better, I'm a bit surprised that it wasn't identified sooner. Probably one of those things where people might rearrange the code somehow (like adding Serial did in this case) and when the problem goes away, they're happy to just continue.

If you ask me, this is a great example of the advantages of open-source, and of the very impressive depth of the Arduino community.

Happy Holidays!

Time to weigh in, I guess...

this changes Optiboot from having free space at 0x7FF0 to 0x7FFC. So, still 4 bytes free.

Which compiler are you using? With 4.3.2 (ships with Arduino), your suggested patch pushes my size up to 536 bytes, which is substantially too big. (with a 4.7 compiler, it fits, but I'd rather not require THAT.)

On reflection, I think that avrdude is broken. There is no way that it should be assuming that bytes it doesn't specifically upload are 0xFF when it hasn't done a chip erase (which Arduino doesn't do, by using the -D switch.) That a page erase occurs before actual bytes are programmed is a side-effect of the particular implementation in an AVR, and might not be true if you were uploading some sort of emulator, or some next-generation chip, or FPGA. Frankly, I don't think it should be assuming anything about the state of "erased" memory; if data is in the .hex file, it ought to be uploaded. Period. (Now, I haven't checked whether avrdude actually knows the bounds of the files it reads. It may not know which data came from its input file and which data was just initialized...) Grr.

On reflection, I think that avrdude is broken.

+1

[quote author=Coding Badly link=topic=84243.msg693443#msg693443 date=1329418061]

On reflection, I think that avrdude is broken.

+1

[/quote]

+2

Software shouldn't assume something about a device which it has no DIRECT knowledge about.

avrdude should ship the bits, as they are presented to it, then allow the boot loader, which does have DIRECT knowledge of the state of the hardware, to decide what to do with them.