Broken Arduino Mega magically repears after uploading bootloader.

All
I can hardly believe what happened today; so I hope someone can help me to explain what has happened.
I have an Arduino sketch which I have been running on an UNO for days without problems. As the final solution uses a MEGA 2560 I switched to the MEGA today.
The system exists out of the MEGA powered via the Jack and connected with serial pin 0 and 1 to the serial port of a router (a hacked linksys WRT54GL). No other connections are made.
All worked fine for several hours but then the mega stopped responding.
I connected the mega to the USB port of my PC. The mega was recognized but the serial communication was dead. I tried several times to upload a sketch without success.
I had bootloaded a new bootloader to the mega with the arduino ISP and Arduino IDE 1.0.2 before. I tried to upload the bootloader again and it failed. I retried a couple of times but gave up.
At that time I was thinking that the serial pins of the mega were broken and I considered the board broken =(.
I did a final try to upload the bootloader with avrdude and a AVR ISP which worked on the first try with this command

C:\Users\IBM_ADMIN>D:\arduino-1.0.2\hardware/tools/avr/bin/avrdude -CD:\arduino-1.0.2\hardware/tools/avr/etc/avrdude.conf -v -v -v -v -patmega2560 -cstk500v2 -P\\.\COM11 -e -Ulock:w:0x3F:m -Uefuse:w:0xFD:m -Uhfuse:w:0xD8:m -Ulfuse:w:0xFF:m -F

When I plugged the mega in the USB of my PC and opened a serial monitor I saw Serial messages flooding the monitor. So the Mega was magically repaired and my sketch was still running :astonished:.
So it seems as if the bootloader got corrupted. Once in this stage I can understand the behavior and why the sketch runs after the bootloader has been fixed.
If the bootloader is broken and the mega went in watchdog I can even think of a reason why the Arduino as ISP didn't work. 8)

As I would not like this to happen in the future: I would like to know:
->How can a Arduino -only connected to the serial port- corrupt the mega so bad a bootloader burn is needed?
->What can I do to avoid this in the future?

Any ideas?

Best regards
Jantje

PS I burnt the bootloader available at Arduino-stk500v2-bootloader/goodHexFiles at master · arduino/Arduino-stk500v2-bootloader · GitHub to the MEGA
This bootloader fixes the "!!!" and watch dog timer issues.

As I would not like this to happen in the future: I would like to know:
->How can a Arduino -only connected to the serial port- corrupt the mega so bad a bootloader burn is needed?
->What can I do to avoid this in the future?

Any ideas?

Well the only way I've ever seen my mega1280 get 'bricked' was when testing if it's bootloader was written to handle WDT interrupts correctly or not. I person posted a short sketch that enabled the WDT at a ridiculous rate of 25 msec rate and when loaded on a newer Uno bootloader is handled it correctly disabling the WDT. I loaded the same sketch into my mega1280 board and after it loaded pin 13 went nuts flashing at a 25 msec rate, and the IDE couldn't upload to the board, it was the classic stuck in the bootloader/WDT loop that most older arduino boards of all flavors were vulnerable to. Only reburning the bootloader, which also erases the rest of the flash would recover the board. Not saying that is any way the source of your problem and if pin 13 wasn't flashing it probably wasn't but beware your mega 2560 board does have a old vulneable bootloader.

The serial bootloader is convenient for development from the IDE but once a design is finished and debugged and test and ready to be put into some kind of permanent service, there is a lot to be said for uploading the sketch via the upload using programmer option and taking the whole bootloader out of the application.

Lefty

Lefty
Thanks for the input. Led 13 was not flashing. But as I burned a new bootloader to fix the WDT and !!! problems this does not remove this possibility. The WDT was the reason why I started of with the UNO (actually I upgraded my duemilenova's with the UNO bootloader for the WDT problem).
I tested the Mega and it all worked fine. So something must have triggered something breaking the bootloader.
I hadn't considered the bootloader getting into a loop but that is indeed a very viable possibility.
I only use WDT as a safety net in case the "put reset pin HIGH" doesn't work. In that case I use 2 seconds as you can see in the code below.

// perform hard reset via reset pin and if fails use the watchdog
void MessageHandler::forceHardReset()
{
  Serial.println(F("triggering restart"));
  unsigned long resetTime =millis()+ (unsigned long) myForceRestartDelay;
  //Serial.flush();
  Serial.end();
  while ( resetTime > millis());
  Serial.println(F("Setting Pin High"));
  pinMode(A5,OUTPUT);
  digitalWrite(A5,HIGH);
  Serial.println(F("Pin is High"));
  delay(5000);  // this should be long enough to reset the arduino
  Serial.println(F("this should never happen; have you connected A5 to reset?"));
  delay(1000); //make sure the serial message gets send.
  // use the watchdog to try to reset
  noInterrupts(); // disable interrupts
  wdt_enable(WDTO_2S);
  interrupts();
  delay(5000);
  Serial.println(F("this should never happen; even the watchdog doesn't work?"));
}

The serial bootloader is convenient for development from the IDE but once a design is finished and debugged and test and ready to be put into some kind of permanent service, there is a lot to be said for uploading the sketch via the upload using programmer option and taking the whole bootloader out of the application.

I fully agree but.... I'm actually designing this solution to be able to continue my development of my mobile robot. Because the robot is in the garden (and it is very likely to be raining); changing settings or the sketch implies I have to go and get the robot on a "safe spot"; open it up and connect my laptop; close it again and bring the robot back to the testing area.
The setup I'm working on allows me to do all these things remotely with a wifi access point that covers the whole area and a wifi router on the robot. The fact the bootloader allows me to load a new sketch with only a serial port is a great feature for me.
Best regards
Jantje

Jantje:
I did a final try to upload the bootloader with avrdude and a AVR ISP which worked on the first try with this command

C:\Users\IBM_ADMIN>D:\arduino-1.0.2\hardware/tools/avr/bin/avrdude -CD:\arduino-1.0.2\hardware/tools/avr/etc/avrdude.conf -v -v -v -v -patmega2560 -cstk500v2 -P\\.\COM11 -e -Ulock:w:0x3F:m -Uefuse:w:0xFD:m -Uhfuse:w:0xD8:m -Ulfuse:w:0xFF:m -F

That avrdude command does NOT upload any bootloader, it merely resets fuses. So, if that repaired it, your problem was not a corrupted bootloader, it was corrupted fuse values.

Pico
Thanks for pointing this out :D.
I made this command based on the first command the Arduino IDE (1.0.2) launches when trying to burn the bootloader. The IDE uses -cstk500v1 which fails. After some reading I decide to try -cstk500v2 which worked but gave a error and proposed to add -F. With the -F it worked.

So I thought this uploaded the bootloader; but I had wondered how avrdude found the hex file to upload. Thanks for solving this ridle :smiley:

I don't know much about fuse bits and surely not about getting and setting them. As I wouldn't know how to set these fuse bits I'm tempted to think it must have been the bootloader who sets them (wrongly). But when I go through the code I only find boot_lock_fuse_bits_get there seems to be no set function :astonished:

More input is needed to get to the bottom of this.

Best regards
Jantje

Jantje:
I don't know much about fuse bits and surely not about getting and setting them. As I wouldn't know how to set these fuse bits I'm tempted to think it must have been the bootloader who sets them (wrongly). But when I go through the code I only find boot_lock_fuse_bits_get there seems to be no set function :astonished:

When you select "Burn Bootloader" from the rools menu, it actually performs two separate things: It 1) uploads a bootloader program (unless it is an attiny target), and 2) it sets the fuses to work with the bootloader.

These are two different avrdude commands. You have copied the "set fuses" avrdude command above.

It is possible that your programmer did not set the fuses corectly for whatever reason, but the new bootloader was OK. Or it is also possible that your programmer failed to load a new bootloader (so you still have the orginal one running), and it also screwed up the fuses.

Not all programmers can handle some of the trickiness of programming the 2560 chips, apparently. the problem is the extended memory space (16 bits defines a 64K memory space, so anything with more than 64K flash memory has to be "paged" in some fashion, and this scheme is where some programmers fall down, from various reports I've read on this forum.)

So there are two plausible scenarios. I'd start by researching the compatibility of your programmer with the 2560 chips.

That avrdude command does NOT upload any bootloader, it merely resets fuses. So, if that repaired it, your problem was not a corrupted bootloader, it was corrupted fuse values.

It's more mysterious than that. The avrdude command include "-e", which means "perform chip erase."
Since you sketch was "still running" after this, it's obvious that the chip was not in fact erased!

You said the programmer was an "AVR ISP"? A real one with a serial port? Was COM11 the serial port of the AVR ISP, or the serial port of the Arduino? It looks (given that -e didn't do anything, and that you had to use "stk500v2" as the programmer type; avrisp is supposed to be an stk500v1 programmer) like you ended up talking to the MEGA bootloader instead of the AVRISP. It (like most/all bootloaders) ignores fuse settings and chip-erase, and reports success. However, this explains even less WHY what you did fixed your problem!

(Also, I'm not sure why it's uploadable at all, given that you have the serial port connect to both the usb boot port AND your external serial device on D0/D1. If your device was "spewing data", that would surely interfere with bootloading. It's somewhat designed for paralleling two devices, but the external device has to be not driving the MEGA's RXD line during boot (ie "turned off."))

all
I have this AVR ISP AVR-ISP500
COM11 is from the AVR ISP. The mega was not plugged in when I ran the command.

westfw:
(Also, I'm not sure why it's uploadable at all, given that you have the serial port connect to both the usb boot port AND your external serial device on D0/D1. If your device was "spewing data", that would surely interfere with bootloading. It's somewhat designed for paralleling two devices, but the external device has to be not driving the MEGA's RXD line during boot (ie "turned off."))

At the time of running the command the mega was no longer connected to the Modem. I was informed that it is dangerous to have the USB connected and the serial port connected at the same time. One reason is that the router is 3.3 volt and the USB is 5Volt. As Mega's are expensive I never did so.
To repeat the story: When I had the problem
Swappped the mega by a UNO => all worked well.
Swapped the Uno for the mega => no more info
Conclusion Mega is the problem => disconnect Mega from rooter and try to find out what is wrong.
Plug mega in USB => port found but no messages on the serial monitor.
Try to upload sketch -> failed
Try to bootload with Arduino ISP failed
Try to bootload using AVR ISP=>failed
Tried the command with stk500v2 instead of stk500v1 => get the advice to use -F
Tried with -F => all works fine again.
(In this process I remember the -e I got advised and added that to)

westfw:
It looks (given that -e didn't do anything, and that you had to use "stk500v2" as the programmer type; avrisp is supposed to be an stk500v1 programmer) like you ended up talking to the MEGA bootloader instead of the AVRISP. It (like most/all bootloaders) ignores fuse settings and chip-erase, and reports success. However, this explains even less WHY what you did fixed your problem!

My AVR ISP document states

Fully STK500v2 compatible;

I understand this as STK500v2 should be the protocol for my AVR ISP. But he; I really don't know what I'm talking about here 8).

pico:
So there are two plausible scenarios. I'd start by researching the compatibility of your programmer with the 2560 chips.

The documentation of my AVR ISP states the following on compatibility

SUPPORTED MICROCONTROLLERS:
The following AVR microcontrollers are supported for programming:

  • Classic 8-bit AVRs
  • megaAVR
  • tinyAVR
  • USB AVR

Someone uploaded a sketch to my Mega with a broken USB port with this AVR ISP so I guess it is compatible but I can't confirm with the above data.
I'm sure I have the new bootloader (uploaded with Arduino ISP) because I tested for the WDT to work. With the original bootloader it didn't work. The red led started blinking rapidly like Lefty described.
Note that I have upgraded the AVR ISP recently so it is running the latest and greatest sotware.

In the mean time I have connected the mega back to the router and it has been working ok for a longer period. I still don't feel comfortable though.

Best regards
Jantje

Och yes :smiley:
I'm so happy I'm not the only one puzzled here :smiley:
Best regards
Jantje

I never found the root cause but it hasn't happened again till today.
Best regards
Jantje

In addition to fuses are "lock bytes". As I recall, in some settings the bootloader is not protected from being overwritten.

Possibly your original sketch modified the bootloader. It's too late now but it would have been interesting to see the bootloader contents, the fuse bytes, and the lock bytes, while it was in the failed state.

My sketch here does all that:

Another interesting test in this situation is to try to upload without letting the existing sketch run (for the WDT problem). To do this you have to power the device off, hold reset down, and re-apply power (eg. the USB cable). Holding reset down until the sketch starts to upload prevents the original loaded sketch from running and works around the WDT issue (as it never gets a chance to start the WDT up).

Nick
Your sketch does look interesting. I did test with holding the reset button and I have some experience with it so if that should work I should have succeeded.
Best regards
Jantje

Jantje:
The system exists out of the MEGA powered via the Jack and connected with serial pin 0 and 1 to the serial port of a router (a hacked linksys WRT54GL).

Not saying this has anything to do with the issue you had, Jantje - but when you say "connected to the serial port of a hacked linsys WRT54GL" - you do mean you're doing level shifting, right (such as detailed here: WRT54GL Dual Serial Port and SD Card Mods - JBProjects.net)?

cr0sh:

Jantje:
The system exists out of the MEGA powered via the Jack and connected with serial pin 0 and 1 to the serial port of a router (a hacked linksys WRT54GL).

Not saying this has anything to do with the issue you had, Jantje - but when you say "connected to the serial port of a hacked linsys WRT54GL" - you do mean you're doing level shifting, right (such as detailed here: WRT54GL Dual Serial Port and SD Card Mods - JBProjects.net)?

After long thinking and reading about level shifting I decided not to do level shifting.
The argumentation. The router works on 3.3Volt the Arduino on 5V. The Arduino is High on 3.3 Volt so signals from the router to Arduino are OK.
I found someone stating that he connected the pins directly without problems and someone else stating it were signal wires so it would be ok (It was put in a technical language I'm unable o reproduce). So I tried and indeed it works fine.
As it has been running with the uno for a long time and now with the mega I'm convinced this setup is ok.

I have been thinking though that at the time I had the problem I tried to upload a sketch while the Arduino was sending lots of serial data. I reset the Arduino based on a serial command but the router has delays when sending loads of data so the serial in buffer on the router may have contained data when avrdude started. Maybe AVRdude got confused?
I changed the upload code to turn off the serial send function wait 2 seconds before starting the original upload code.
I haven't seen the problems since then but I have had very little time to test as I have been busy with plenty of other issues lately.
Best regards
Jantje