I'm working on a project with Arduino Mega 2560 and Ethernet Shield W5100.
I'm posting sensors data to an online server every 10 seconds (when the project is finished, this interval will be longer - about 1 post per minute). I noticed that after a while (+10 minutes) Arduino freezes when trying to connect to my server. So I create a timeout of 7,5 seconds to close my connection and enabled watchdog with 8 seconds (I think there is no problem if I miss some data, because I'm willing to implement a queue of posts).
Using watchdog my system runs perfectly for ~12 consecutive hours and execute more then 3500 posts succesfully.
But suddenly it froze, restarted with watchdog and froze again on the Ethernet.begin command (and, of course, before enabling watchdog). At least my Serial Monitor shows the "connecting..." message (before Ethernet.begin(mac)) and doesn't show neither "DHCP success" nor "DHCP fails" messages. And my server stopped receiving posts (Yes, I was logging).
Now I'm not able even to reproduce this error (and I already tried). But I have to be sure that Arduino will be always working (at least for some months, preferably years), cause it's a solution to be in many different places with very difficult access.
What I can imagine:
Temperature problem. Maybe W5100 stopped working when gets hot (actually even hotter, cause in my opinion it gets hot after 3 min running). But the system was working for 12 consecutive hours... And I wasn't near when it froze and just realized after 1.5 hour... so I couldn't be sure of it (but I can buy a heatsink). What do you think of it?
Somewhere here at this forum searching about this problem, I read that an Arduino reset could not initialize Ethernet Shield correctly and Ethernet.begin could not works. Is there a way to prevent this behavior? The only way that occurs to me is use a LM555 to implement an external watchdog with longer interval, but it's not ideal because the project is delayed.
Does anyone imagine another possibility or have a good idea to avoid this Ethernet.begin freezing?
So I create a timeout of 7,5 seconds to close my connection and enabled watchdog with 8 seconds
Do you think it is wise to ignore an unresolved problem that requires resetting of the processor every 8 seconds?
Now I'm not able even to reproduce this error (and I already tried).
How about your remove the watchdog reset and figure that out first? You have absolutely no idea why the Arduino fails to connect a second time within ten seconds and you're okay with that behavior?
Watchdog timers are usually employed in mission critical applications to put the processor into a known, safe state after a program failure. When a watchdog times out, it means your program crashed. Period. That's all you need to know. It becomes your task to figure out what you did wrong. They should not be used to as a hardware solution to keep faulty software running 24/7.
but (for me at least) this happens when I run out of memory..
but if it is happening AFTER the brownout/reset.. then I'm not sure. (as there shouldnt be anything taking up the memory/space)
I really don't think of run out of memory... Arduino froze after starts. There is nothing on the memory... And I used to use Strings, but after reading a lot I changed every String to char array and now I'm controlling memory much better.
About heat: I'll buy a heatsink and try. But I'm still think that it's strange to freeze by heat after 12h running.
It most likely is an overheating 5volt regulator,
(assuming you're powering the Mega on the DC socket with 9volt or more).
Leo..
Good point!! I had not thought of powering. I'm using just a USB power (5v and that's it). What is the best approach when talking to power? I have a 9v power supply, but I don't know if it is good... I'll do my job about it.
avr_fred:
Do you think it is wise to ignore an unresolved problem that requires resetting of the processor every 8 seconds?
How about your remove the watchdog reset and figure that out first? You have absolutely no idea why the Arduino fails to connect a second time within ten seconds and you're okay with that behavior?
Watchdog timers are usually employed in mission critical applications to put the processor into a known, safe state after a program failure. When a watchdog times out, it means your program crashed. Period. That's all you need to know. It becomes your task to figure out what you did wrong. They should not be used to as a hardware solution to keep faulty software running 24/7.
I'm not resetting the processor every 8 seconds. 8 seconds is the maximum configurable timer of Arduino's watchdog. This means that every blocking command on my software can't last more than 8 seconds. And a internet post depends on the internet connection, the site's response time and many other things that could make it last 10, 20 or even 30 seconds. So I create a timeout of 7,5s to not activate watchdog on his max timer of 8s. And, yes, I'll use watchdog. Cause it will be very difficult for me to get on the Arduino after installed.
On the other hand you're right when you say I have no idea why Arduino fails to connect. I really don't know. But it's not on the second attempt. It fails to connect after many attempts (about a hundred or even more). And it not just fails... it froze completely on the client.connect. I'll make a test: I'll put my Arduino trying execute for a long time the Ethernet Shield example "WebClientRepeating". If it not crashes, ok... my software is faulty and I should do more research about it. I'll let you know.
Updating (for someone that has a problem similar to mine).
I leaved Arduino turned on this night and logged almost everything.
Watchdog restarted it 7 times (actually I believe more than this) and there is a pattern of restarting, which makes me think it's really a software bug (thanks, @avr_fred).
02:15 -> 02:43 (28 min)
02:43 -> 03:11 (28 min)
03:12 -> 03:40 (28 min)
03:41 -> 04:09 (28 min)
04:09 -> 04:38 (28 min)
04:38 -> 05:06 (28 min)
05:07 -> 11:20 ***
11:20 -> it's 11:45 now and it's still running. I'll verify
*** between 05:07 and 11:20 Serial.print didn't log many things but my server does. And I don't really know what happened (maybe my computer slept?). There are many blank spaces until 11:20 when Arduino restarted again and I was back to my chair. And the pattern is still there: 28 min later it restarted again.
Ok. So I'll ignore this blank space on logger and consider that my system always freezes after 28 min. Investigating I realized that this freezing behavior always occurs on exactly the same place and it seems it's on a Serial.print command:
<<< REQ id_post=...&datahora=2017-0727T11:20:23Z&id_sensor=paissandu334_hidr_entr&valor=0&token=..
Connecting... connected to server
Connection closed
<<< R� Restarting*
...
Connection closed
<<< RE Restarting*
It looks to me that Serial.println("<<< REQ") is the problem. So I realized that I got rid of all the Strings variables but not ALL the Strings, using inside Serial.prints... oops... I'll give it a try to correct this and see if the system runs safely for a longer period.
Question: is it enough to use F("my string") inside Serial.print to avoid memory problems with Strings?
Thanks again for everybody!
Wawa:
If you are powering the Mega with 5volt (PC?) on the USB socket, then the onboard 5volt regulator is not used.
Forget about power and heatsinks.
Leo..
Ok! But on production environment I'll probably use a power supply (9v?). Or should I use a usb power supply, like cellphone charger? Maybe I should start worrying about it now Thanks anyway.
I get rid of it and now my system is working for 7 consecutive hours. \o/
On the instructions they suggest to use free() to give memory back to the system. Maybe I'll give it a try, but I believe this is not a good idea.
It seems to me that it's a very good practice to use this freeRam function during development to verify everything is okay!
Thanks a lot everybody. Specially @avr_fred that instigated me to seek a real solution for my problem.
leomuniz:
Ok! But on production environment I'll probably use a power supply (9v?). Or should I use a usb power supply, like cellphone charger? Maybe I should start worrying about it now Thanks anyway.
A Mega with W5100 shield (no sensors etc.) uses about 250mA.
9volt on the DC socket results in 9volt - 0.7volt (reverse protection diode) - 5volt (Arduino supply) = 3.3volt across the regulator. 3.3volt * 0.25Amp = 0.825watt of heat in the regulator. Borderline for 24/7.
A 7.5volt (minimum) regulated supply would be better.
A cellphone charger on the USB socket would be best, because it bypasses the regulator.
Only the WiFi chip will get warm/hot (normal for this chip).
If you're not using a lot of I/O, consider changing to an ESP8266 based Arduino.
More power under the hood than a Mega, and buildin Wifi.
Leo..
leomuniz:
On the instructions they suggest to use free() to give memory back to the system. Maybe I'll give it a try, but I believe this is not a good idea.
Not a good idea, really? What do you think it does? free() is not a magic garbage collector. You can't just call it with no parameters and expect it to defragment your heap. It is complementary to malloc().
aarg:
Not a good idea, really? What do you think it does? free() is not a magic garbage collector. You can't just call it with no parameters and expect it to defragment your heap. It is complementary to malloc().
I know what it is... But, correct if I'm wrong: over a very long run session on Arduino (my intention on this project) using malloc and free, memory could became fragmented and this could cause an allocation to fail if there isn't enough contiguos free space, even the total memory is sufficient...
Another point: Why use malloc if I know exactly char array lenghts that I need?
You must free both memory chunks when you are finished with them:
free( hash );
free( md5str );
correct if I'm wrong... memory could became fragmented
Yes, but if you free all memory each time you are finished with it, the heap will not get fragmented.
Another point: Why use malloc if I know exactly char array lenghts that I need?
You are exactly correct. The ArduinoMD5 library should have let you pass in the required memory. The make_hash functions just need 16 bytes, and the make_digest function needs 33 bytes. That would have also eliminated the malloc/free library, saving about 600 bytes of program space. Oh well...