Checking if code corrupted

Hello,

The behaviour of my machine is very much as if it's flash code is being corrupted by some demon.

Is there a way to compare the flash content to that generated by the compiler?
or simply to write it to a file? (along with the boot loader and whatever other stuff)

In the later case I'll create a file before and after the seeming corruption occurs and compare the two

Thanks for sharing your expertise

Edit - recap

Yes, a broken program can corrupt the flash!

To compare the flash with your code you can use avrdude -Uflash:v:...etc and it’ll show 1st discrepancy. Thanks to tf68 from avrfreak.

To view all differences you can use avrdude -Uflash:r:...etc to dunp the flash in Intel hex and then to use avr-objcopy" -I ihex -O binary to convert the IDE’s generated .cpp.hex and the flash hex dump to binary and finally use vbindiff from cjmweb.net to compare and display all differences. Thanks to clawson from avrfreaks

Does it work correctly after every reset, and eventually goes bad? Perhaps you are overwriting SRAM by accessing array elements that do not exist, screwing up some other part of the code's execution.

You can issue the same commands that Verify the programming that the IDE uses.
Under File:preferences, enable Verbose outputs, and review the avrdude commands used.

CrossRoads:
Does it work correctly after every reset, and eventually goes bad?

No. When it gets mad it behave abnormally after power cycling or reset

CrossRoads:
You can issue the same commands that Verify the programming that the IDE uses.
Under File:preferences, enable Verbose outputs, and review the avrdude commands used.

I'll try that and report back here

Tried to read back "blink"

The IDE loader command was

C:\Program Files (x86)\Arduino\hardware\tools\avr/bin/avrdude -CC:\Program Files (x86)\Arduino\hardware\tools\avr/etc/avrdude.conf -v -patmega2560 -cwiring -PCOM9 -b115200 -D -Uflash:w:C:\Users\Guy\AppData\Local\Temp\build7099495530470713399.tmp/Blink.cpp.hex:i

Mine was

C:\Users\Guy>"C:\Program Files (x86)\Arduino\hardware\tools\avr/bin/avrdude" -C"
C:\Program Files (x86)\Arduino\hardware\tools\avr/etc/avrdude.conf" -v -patmega2
560 -cwiring -PCOM9 -b115200 -D -Uflash:r:C:\Users\Guy\blink.hex:i

Ignoring that the read file was much bigger than the written one, presumably because the unused flash space, yet, as reported by Frhed, many initial bytes are different; see below the 64 first bytes represented in ascii

written
:1000000006C1000016C1000014C1000012C10000AA
:1000100010C100000E

read back
:2000000006C1000016C1000014C1000012C1000010C100000EC100000CC1000

What is it that I am doing wrong?

The file format above is Intel .hex and is not convenient for comparing as is.
It can be converted to contiguous program data bytes only (no load address) by avr-objcopy found (on my machine) in C:\Program Files (x86)\Arduino\hardware\tools\avr\bin, refer to this avrfreaks discussion.
I could compare blink files thus converted with vbindiff for windows. The read back file starts with same data as the written blink and continues with data from previously loaded sketches

What Arduino are you using?

AFAIK the Arduino IDE verifies the code after uploading it.
Have you a second Arduino? Does it behave the same?

...R

Hi,
Is it a particular code that is showing this problem, or all codes.
If only one can you post it for us to check?

Thanks.. Tom.. :slight_smile:

Robin2:
What Arduino are you using?

AFAIK the Arduino IDE verifies the code after uploading it.
Have you a second Arduino? Does it behave the same?

...R

Thanks Robin,

The card is a mega with enet shield

Yes verify is on and the card systematically load and run ok until it gets mad, few (2-9) hours later

No, I did not try with another arduino because it is not so easy to swap baords however, version 29 of the program run flawlessly for weeks and version 30 which only differ by the removal of some debug code never ran a full day

TomGeorge:
Hi,
Is it a particular code that is showing this problem, or all codes.
If only one can you post it for us to check?

Thanks.. Tom.. :slight_smile:

Thank you Tom!
It's just one code, I posted it in my original thread dealing with the stability issue whereas this one teals with flash integrity assessment.

guy_c:
version 29 of the program run flawlessly for weeks and version 30 which only differ by the removal of some debug code never ran a full day

Does this mean that if you reinstate Version 29 it will work properly? If not I think it is time to get a new Mega.

There was another Thread recently in which someone had a problem when they commented out a Serial.print() line and it was suggested that this meant that the compiler allocated the SRAM differently so that a "memory leak" elsewhere in the code was able to corrupt something important.

...R

@Robin -Thanks
Version 29 run perfectly since Oct 16.
I am waiting to have some free time to reinstall version 30 and use the procedure described above to compare the flash with the compiled code once the program had crashed and does not respond to reset anymore (but only to program reload)

This is the command to have avrdude verify the flash contents.

-Uflash:v:C:\Users\Guy\blink.hex:i

tf68:
This is the command to have avrdude verify the flash contents.

-Uflash:v:C:\Users\Guy\blink.hex:i

How stupid of me! Thank you

Installed version 30 on site, 2 hours later machine broke and did not respond to reset.
Ran avrdude to verify and it found 1st discrepancy at 0x81d4 09!=08.
Then converted to binary and compared - It appeared to be the sole error: The ls bit of program location 81d4 has been set when the sketch broke OR caused the sketch to break.

Repeated with another mega/enet card, waited that the code breaks and found exactly the same error.

Any ideas?

Thanks!

guy_c:
Any ideas?

You say you have a version 29 that works perefectly and a version 30 that causes progmen corruption.

You need to make a complete list of everything that is different beween the two programs.

In theory a program cannot modify progmen - but that is only theory because the bootloader does it as a matter of routine. My wild guess is that something in your program is causing memory corruption which just happens to be interpeted as an instruction to write to progmem (similarly to how the bootloader does that).

If that is true then the solution is to stop your version 30 program causing memory corruption.

It might be interesting to add a few variables (or extend arrays) so as cause a re-location of stuff within SRAM. It may be enough to prevent an inconvenient co-incidence of corruption so that you get a clean crash - but, equally, it may have no effect.

...R

Thanks Robin,

I don't care about #30 as I do for it's immediate ancestor, #29 who may very well have the same pathology in latent form, they have roughly the same "distance" as the one you suggest and as you wrote "it may have no effect"...

I agree with your guess that a memory corruption is causing a jump / return /stack pop on garbage argument to make the cpu execute say badly aligned instruction which cause the write PM.

I'll use kdiff3 to regress in dichotomic manner from 30 to 29 and report here

Thanks again