Ethernet lockup when running standalone

I have a sketch (attached to post because it was too long to place inline with the message, sorry) that runs on an Arduino UNO R3 connected to my home network via a WIZ811MJ interface running the Wiznet5100 chip.

@brief: The communications seem to stop if the USB cable is not connected to the computer and the serial monitor is running.

The sketch takes a reading from 3 sensors every 4 minutes and stores the results into a local buffer called "P". Every 4 minutes when the sample is taken the sketch uses UDP to grab a timestamp for the data from a NTP server. After 15 samples have been taken (happens once an hour) the sketch connects to a server and uploads the data from "P" using a PHP script and the process repeats itself for the next hour.

The sketch also puts the board to sleep into SLEEP_MODE_PWR_DOWN mode while it waits to take samples. The watchdog timer is configured by the user to wake up the board so it can run the loop() and see if it is time to take a sample. If a sample is to be taken the watchdog timer is disabled to stop it from interrupting in the middle of taking samples and/or sending data to the server. The timer is enabled again after the I/O ends so the system can save power in between samples.

I have a lot of debug print statements since I am still developing this sketch. I started testing my sketch "on the field" by unplugging the USB cable and letting the board run from a wall power supply in a manner similar to how I expect it to eventually work. I noticed then that it would work for a few hours and then it would stop updating the server. You can see that in one of the attached pictures. One shows a clear gap before new data was sent again to the server after I did a hardware reset of the board by pressing the reset button on it.

Last night I looked again through the code and added some more debug functionality but could not find a clear issue. I decided to leave the board connected to the laptop running the serial monitor all night to see if I could capture the point at which it will lock up. Except it never did. It has been running for more than 12 hours now without locking up whereas before it would be about 2.5 to 3 hours or so before it stopped being able to send data to the server and it seemed only a hardware reset would allow it to start sending data again.

I have status LEDs that indicate the stage in which the sketch is at. When data stops showing up at the server I use these to confirm that the board is waking up and taking samples and running the sketch on schedule, which it continues to do. Also, when the connection is working the 811MJ replies to ping requests properly even when the UNO is sleeping (as expected). But the ping to the 811MJ fails when the communications are lost. When this happens I can still ping a device that is connected to the same switch the 811MJ is connected to, so I think this might indicate an issue caused either by hardware or software on the 811MJ.

I will add that the WIZ811MJ is connected with an ethernet cable to a switch which is on a wireless access point/extender which itself connects to the home wireless network. I thought maybe this was part of the problem but after last night's run and the results of the ping test I don't think it is.

Ultimately since I am not taking high resolution data I could work around this by resetting the software ethernet interface in the sketch and forcing it to acquire a new IP from the DHCP server. But I'd feel better if the problem was actually solved.

Thanks!

A snippet from the serial monitor showing a successful sequence of data gathering will look like this:

Taking sample
Seconds since Jan 1 1900 = 3624706865
Unix time = 1415718065
The UTC time is 15:01:05
Sample: 14, Time: 1415718065, Capacitance: 0.000000000105852, Temperature: 28.231, Light: 69.05%
Taking sample
Seconds since Jan 1 1900 = 3624707145
Unix time = 1415718345
The UTC time is 15:05:45
Sample: 15, Time: 1415718345, Capacitance: 0.000000000106777, Temperature: 28.032, Light: 68.23%
------------SENDING DATA-------------
Connecting to upload server
connected
time=1415714152&temp=28.082&&light=70.577&&capacitance=0.000000000109826&&unit=0

Disconnecting from remote server.
Connecting to upload server
connected
time=1415714431&temp=28.231&&light=71.427&&capacitance=0.000000000106551&&unit=0

Disconnecting from remote server.
------------DONE-------------

While an unsuccessful sequence will print out like this:

Taking sample
Failed to parse UDP packet
Sample: 14, Time: 0, Capacitance: 0.000000000105852, Temperature: 28.231, Light: 69.05%
Taking sample
Failed to parse UDP packet
Sample: 15, Time: 0, Capacitance: 0.000000000106777, Temperature: 28.032, Light: 68.23%
------------SENDING DATA-------------
Connecting to upload server
The connection failed
Disconnecting from remote server.
Connecting to upload server
The connection failed
Disconnecting from remote server.
------------DONE-------------

CapacitanceMeterEthernet.ino (23.4 KB)

How are you powering the board when disconnected from the USB cable? 5100 chips are power hungry.

I am using a 12 volt, 1 ampere wall wart.

Do you have a 9 volt wall wart? Maybe your regulator is getting too hot?

I have cheated before and used a 1A phone charger with the USB cable connected to it for a 5 volt supply.

Just a thought...

edit: Like this charger

@SurferTim: I will check the regulator and try your suggestion. I do have several 1 Ampere cell phone chargers lying about.

Yesterday I had the system lock up on me even while the serial monitor was connected, so there goes that theory. However, I identified a variable in the code that I am almost certain was causing heap fragmentation and if so likely explain the delayed failure. This is the lead I am currently following. For the details read below but after a lock-up this morning that was not attributed to the Arduino or the W5100, the system seems to be running well stand-alone so far.

I added even more debugging output and ran the system all night after making other modifications to the code.

I was unaware of the existence of the Ethernet.maintain() method. If the sketch fails on attempting to decode a UDP packet or on client.connect() then Ethernet.maintain() executes and the connection is retried. After a few retries the existing data is saved to EEPROM and the system reboots via a watchdog reset to avoid a system lockup. This should mostly help with events such as the network cable being unplugged and then plugged back in. In fact this morning when I looked at the serial monitor the system had indeed entered this condition and was in a loop attempting to connect to the internet. However, I tracked this failure to my FiOS router having had a total loss of internet connectivity, so not an issue with the W5100. But happy to see my reset logic worked and recovered the stored samples in EEPROM. As soon as I rebooted the router and the internet became accessible, the Arduino/W5100 recovered and functionality was restored.

The lockups I am concerned with however, seem to completely block the W5100. I added a printout of the socket states whenever the sketch runs into an issue with some code I borrowed from SurferTim. More importantly though, I found what I believe could turn into a heap fragmentation issue in my sketch. So I cleaned that up. I will keep testing to see if I get it to freeze again the way it had been doing so before but I was encouraged to find out it would have survived all last night without a lockup if it hadn't been for Verizon...

Sample of output from the serial monitor when Verizon took my system down. First, the system was up and running taking samples:

Taking sample
Seconds since Jan 1 1900 = 3624779990
Unix time = 1415791190
The UTC time is 11:19:50
Sample: 0, Time: 1415791190, Capacitance: 0.000000000094807, Temperature: 30.471, Light: 5.01%
Taking sample
Seconds since Jan 1 1900 = 3624780270
Unix time = 1415791470
The UTC time is 11:24:30

...

Taking sample
Seconds since Jan 1 1900 = 3624782790
Unix time = 1415793990
The UTC time is 12:06:30
Sample: 10, Time: 1415793990, Capacitance: 0.000000000095936, Temperature: 30.620, Light: 39.98%

But when it tried to take the 11th. sample, the NTP request failed, causing the board to reset and hopefully remove what caused the connection to fail. As you can see, the W5100 came back up and connected to the local network and was able to upload the 10 samples it had managed to capture before resetting.

Taking sample
Failed to parse UDP packet
Stored some bytes: S, C : 18, 11, 30.47
Socket#0:0x22 8888 D:129.6.15.30(123)
Socket#1:0x0 0 D:0.0.0.0(0)
Socket#2:0x0 0 D:0.0.0.0(0)
Socket#3:0x0 0 D:0.0.0.0(0)
Reset : Failed to parse UDP packet
Attempting to bind connection with DHCP.
Success!
My IP address: 192.168.1.107.
Found: 11 samples, 30.47

Connecting to upload server
connected
time=1415791190&temp=30.471&&light=5.015&&capacitance=0.000000000094807&&unit=-1

Disconnecting from remote server.

...

Connecting to upload server
connected
time=1415793990&temp=30.620&&light=39.980&&capacitance=0.000000000095936&&unit=-1

Disconnecting from remote server.

But when it went to try for a new NTP request for the next sample. It failed and began a reboot loop (which is the behavior I programmed, so not a fault):

Taking sample
Failed to parse UDP packet
Reset : Failed to parse UDP packet
Stored some bytes: S, C : 18, 0, 30.47
Socket#0:0x22 8888 D:129.6.15.30(123)
Socket#1:0x0 0 D:0.0.0.0(0)
Socket#2:0x0 0 D:0.0.0.0(0)
Socket#3:0x0 0 D:0.0.0.0(0)

...

Attempting to bind connection with DHCP.
Success!
My IP address: 192.168.1.107.
Taking sample
Failed to parse UDP packet
Reset : Failed to parse UDP packet
Stored some bytes: S, C : 18, 0, 0.00
Socket#0:0x22 8888 D:129.6.15.30(123)
Socket#1:0x0 0 D:0.0.0.0(0)
Socket#2:0x0 0 D:0.0.0.0(0)
Socket#3:0x0 0 D:0.0.0.0(0)

The instability was in fact related to a power issue. I discovered it while troubleshooting a consistent offset on a temperature sensor that was sharing the same ground wire as the Ethernet module.

The solution was to provide the WIZ811 with 3.3v from the Vin pin via a separate regulator (lm317 in my case) and run a dedicated ground wire for the module from a GND pin on the Arduino. The voltage drop fro. The relatively high current draw of the 811 was lifting the ground plane of the sensensors. This affected the sensor readings and was causing the 811 to become unstable.

Thanks SurferTim and zoomcat for your help.