Troubleshoorting arduino memory / stability issues (crashes/resets/freezes)

Hi,

We have an Arduino that is running a sketch for long periods on end. (weeks / months)

We sometimes see that the Arduino sketch "crashes" and becomes unresponsive indefinitely (all processing simply stops).

We suspect that the Arduino is running low on memory and at some point needs more memory than it has available and crashes.

We're calculating the free memory as part of our logging and noticed that just before this "crash, "freez", or "hang" the memory usually drops in the 300 - 500 bytes range. As such we suspect that low memory might be causing these crashes.

The big problem is that instead of resetting / restarting the sketch, the sketch simply hangs and the only way to recover from this is to unplug the Arduino and plug it in again. (we cannot access the reset button very easily).

A couple of questions related to that

  • When something goes wrong in an Arduino, we noticed that it can either "crash / resets / restart" by itself, or that it simply "crashes and hangs". We would like to get an insight what causes these events occur. Under what exact conditions does Arduino decide to do a "reset", and when does it just "hang" indefinitely.
  • Is there anything we can configure on an Arduino/Atmel level to ensure that we can always recover from a situation like this (for example resetting the system and relaunching the sketch in case the sketch freezes completely.
  • Are there easy ways to simulate such a "crash" or a "hang". The issue doesn't always happen, but when it does it is usually after >5 days of continuous usage, making it difficult to reproduce.

When something goes wrong in an Arduino, we noticed that it can either "crash / resets / restart" by itself, or that it simply "crashes and hangs". We would like to get an insight what causes these events occur. Under what exact conditions does Arduino decide to do a "reset", and when does it just "hang" indefinitely.

If low memory is your issue, it's hard to predict what will happen - reset would be a lucky break.

Is there anything we can configure on an Arduino/Atmel level to ensure that we can always recover from a situation like this (for example resetting the system and relaunching the sketch in case the sketch freezes completely.

Take a look at the watchdog - it'll reset the system as you desire.

Are there easy ways to simulate such a "crash" or a "hang". The issue doesn't always happen, but when it does it is usually after >5 days of continuous usage, making it difficult to reproduce.

You may be able to make it happen faster if your system is doing things periodically by reducing the period, but actually simulating the crash is tricky. Bear in mind that any change you make to track down the root cause may change the symptoms.

If low memory is the issue, what's causing it? Are you doing dynamic memory allocation or using String objects? Either of those would be where I'd look first.

wildbill:
If low memory is your issue, it's hard to predict what will happen - reset would be a lucky break.

I'm not 100% sure if it's memory. It was just a guess as I saw it dropping. But I just examined a log file where we had a hang and there was still >50% memory available, so I think we can rule out the memory part.

Take a look at the watchdog - it'll reset the system as you desire.

We're looking into it now, resetting the chip when it hangs for a couple of seconds. Still need to look at all potential side-effects of this.

I also saw a forum post here that there might be an issue with the Atmega2560 and the watchdog ?
http://forum.arduino.cc/index.php/topic,94676.0.html

I don't know if there are any issues with the watchdog and the Atmel 2560 chip (the one we are using) ?

If low memory is the issue, what's causing it? Are you doing dynamic memory allocation or using String objects? Either of those would be where I'd look first.

We've optimized the memory usage by moving Strings to flash (program) memory instead of SRAM.
Dynamic memory allocation is limited. We are using a SIM900 module for communication and so far it always crashes when we interact with the SIM900 (either while sending data or during the initializing of the SIM900).
But again, not 100% sure it's related to the SIM900.

We can now easily reproduce a "hang" by having the SIM900 chip download a 70kb file from our servers.
After processing a random number of bytes (we do a crc32 check on the incoming data), it simply hangs.

Hi, question, what have you got as a power supply for this project and how is it all connected, you say

We are using a SIM900 module for communication and so far it always crashes when we interact with the SIM900 (either while sending data or during the initializing of the SIM900).
But again, not 100% sure it's related to the SIM900.

Now these devices probably use quite a bit of power when TXing and when you turn them ON.
Have you checked the current drawn when the SIM900 is in full swing. Any drop in supply could quite easily cause your problem.
I hope it and any other peripheral devices are powered independently of the arduino.
Tom...... :slight_smile:

TomGeorge:
Hi, question, what have you got as a power supply for this project and how is it all connected, you say

We are using a SIM900 module for communication and so far it always crashes when we interact with the SIM900 (either while sending data or during the initializing of the SIM900).
But again, not 100% sure it's related to the SIM900.

Now these devices probably use quite a bit of power when TXing and when you turn them ON.
Have you checked the current drawn when the SIM900 is in full swing. Any drop in supply could quite easily cause your problem.
I hope it and any other peripheral devices are powered independently of the arduino.
Tom...... :slight_smile:

It's running off a 12V external power supply that is able to supply sufficient juice.
I've hooked up a multi-meter and the SIM900 has a burst of about 600mA as far as I can tell on my multimeter.
The whole setup sits at around 200mA and when the SIM900 chip is doing something it goes into the 300 - 400mA, with the occasional spike to 600mA.

Do you have any unused digital outputs that you could put an LED on? I have found that an effective debugging technique is to toggle outputs at certain strategic places in the code to isolate where the hang is happening. Or it can hint at a H/W problem if the hang cannot be isolated to a particular bit of code. If the execution is periodic, you can even put the information on an oscilloscope (poor man's logic analyzer :wink: ) and watch behavior as it operates. Toggling a digital output involves less overhead than many other output techniques so it should disturb normal program timing less.

HTH,
hank