For sometime my Arduino webserver application has been having internet connectivity issues.
Basically after a while the Arduino application stops operating as a webserver and ignores (or cannot receive/recognise) incoming URL web page requests. The application continues to run - it is just that its calls to EthernetServer.available() never returns new EthernetClient connections (sockets).
I have previously discussed this problem over here EthernetUDP and EthernetClient/Server class compatability - Networking, Protocols, and Devices - Arduino Forum because sometimes the lost ethernet connectivity was preventing my daily UPD NTP automatic time reset.
I have researched this quite a bit on the world wide web and found a few discussion forums - but nothing that is definitive that solves the problem comprehensively for everyone. I don't promise that here either.
Eventually my research led me to question the four internet sockets on the W5100 chip of my Freetronics Ethermega card. Was it possible that the sockets were getting permanently locked and lost to my application in some way?
I found some code to dump the status of each of the sockets and embedded my own version of the procedure into my application to dump the socket status information every hour into my application's SD card daily activity log. Here is my implementation of the status dump function:
void ShowSocketStatus() {
ActivityWriteSPICSC("ETHERNET SOCKET LIST");
ActivityWriteSPICSC("#:Status Port Destination DPort");
ActivityWriteSPICSC("0=avail,14=waiting,17=connected,22=UDP");
ActivityWriteSPICSC("1C=close wait");
String l_line = "";
l_line.reserve(64);
char l_buffer[10] = "";
for (uint8_t i = 0; i < MAX_SOCK_NUM; i++) {
l_line = "#" + String(i);
uint8_t s = W5100.readSnSR(i); //status
l_line += ":0x";
sprintf(l_buffer,"%x",s);
l_line += l_buffer;
l_line += " ";
l_line += String(W5100.readSnPORT(i)); //port
l_line += " D:";
uint8_t dip[4];
W5100.readSnDIPR(i, dip); //IP Address
for (int j=0; j<4; j++) {
l_line += int(dip[j]);
if (j<3) l_line += ".";
}
l_line += " (";
l_line += String(W5100.readSnDPORT(i)); //port on destination
l_line += ") ";
if (G_SocketConnectionTimes[i] != 0)
l_line += TimeToHHMM(G_SocketConnectionTimes[i]);
//Serial.println(l_line);
ActivityWriteSPICSC(l_line);
}
}
By reviewing my application activity logs I was able to observe that the sockets were getting into a permanent "connected" (hex 17) status and never being released. When all four sockets appeared in the log file as "connected" I could no longer access the application via a web browser. (I could not even display the SD card log files with the evidence since that required a web connection - until I restarted my application.)
Once I found this evidence of the problem I extensively reviewed my code to make sure that every incoming ethernet connection was being correctly terminated via a call to EthernetClient.stop(). I found the odd problem there and the reliability of my application improved - but still I was losing occasional sockets.
I suspect (and still suspect) that there is a bug in the W5100 microcode associated with multiple requests from the same IP address coming in too quickly and not being assigned correctly to unique sockets that are then correctly managed. So I decided to see it I could force the stuck ("connected") sockets to close and I was pleased to find that there is a socket disconnect function in the W5100 library.
So I set about implementing application functionality to check the status of every web socket every five minutes, to record the time when each socket was observed as "connected" for the first time and to disconnect the sockets after ten minutes. However for testing purposes I am running a seventy minute timeout for "connected" sockets so I can observe socket statuses in my application activity logs between when the stuck connection is first detected and when it is disconnected.
My application ran for more than four days until today when the first stuck "connected" socket was observed at 1:52AM this morning. Here is a portion of my application's activity log for today:
01:00:00 ETHERNET SOCKET LIST
01:00:00 #:Status Port Destination DPort
01:00:00 0=avail,14=waiting,17=connected,22=UDP
01:00:00 #0:0x0 80 D:130.89.212.77 (56170)
01:00:00 #1:0x0 80 D:130.89.212.77 (56171)
01:00:00 #2:0x14 80 D:130.89.212.77 (56156)
01:00:00 #3:0x0 80 D:89.238.250.188 (59162)
01:00:00 Climate Update
- FREE RAM: 2847
02:00:00 ETHERNET SOCKET LIST
02:00:00 #:Status Port Destination DPort
02:00:00 0=avail,14=waiting,17=connected,22=UDP
02:00:00 #0:0x14 80 D:130.89.212.77 (60803)
02:00:00 #1:0x17 80 D:130.89.212.77 (60802) 01:52
02:00:00 #2:0x0 80 D:207.46.13.108 (11705)
02:00:00 #3:0x0 80 D:89.238.250.188 (59162)
02:00:00 Climate Update
- FREE RAM: 2847
03:00:00 ETHERNET SOCKET LIST
03:00:00 #:Status Port Destination DPort
03:00:00 0=avail,14=waiting,17=connected,22=UDP
03:00:00 #0:0x0 80 D:202.46.48.22 (24018)
03:00:00 #1:0x17 80 D:130.89.212.77 (60802) 01:52
03:00:00 #2:0x14 80 D:180.76.5.169 (20875)
03:00:00 #3:0x0 80 D:89.238.250.188 (59162)
03:00:00 Climate Update
- FREE RAM: 2847
03:02:45 Socket #1 - Disconnected
04:00:00 ETHERNET SOCKET LIST
04:00:00 #:Status Port Destination DPort
04:00:00 0=avail,14=waiting,17=connected,22=UDP
04:00:00 #0:0x0 80 D:85.212.109.147 (50341)
04:00:00 #1:0x14 80 D:85.212.109.147 (50339)
04:00:00 #2:0x0 80 D:85.212.109.147 (50342)
04:00:00 #3:0x0 80 D:85.212.109.147 (50330)
At 1:00am I had three available sockets and one in a wait status (which seems normal - there is always one socket in that status.)
At 2:00am the socket status list shows socket #1 in a "connected" status and the time this was first observed (1:52) is listed. It is apparently connected to IP address 130.89.212.77 using its destination port 60802.
At 3:00am the socket status list still shows socket #1 in a "connected" status from 1:52am. It is the same IP address and same destination port. Because seventy minutes has not elapsed there has been no attempt by my application to disconect socket #1 yet. And the port must have remained with the "connected" status at every five minute check since 1:52am or my application would have reset the connection timer.
And at 03:02:45am my application issued the W5100 socket disconnection command after the seventy minute timeout. Here is the command from my sketch:
W5100.execCmdSn(l_sock, Sock_DISCON);
And it seems to have worked. The 4:00am stocket status list shows socket #1 as no longer connected and having been used by another IP address using a different destination port. By 6:00am socket #1 was in the available status as shown here:
06:27:00 #0:0x14 80 D:157.55.39.27 (12208)
06:27:00 #1:0x0 80 D:218.77.79.43 (39460)
06:27:00 #2:0x0 80 D:85.212.109.147 (50342)
06:27:00 #3:0x0 80 D:85.212.109.147 (50330)
So if anyone else is still having problems with lost Arduino ethernet connectivity I suggest you start checking the socket status periodically within your application. If you see evidence of stuck "connected" sockets see if you can use the above socket disconnection command to solve the problem.
It is early days for me - my application has only been running for four days and has only had to deal with one stuck socket. I will let it run continuously for about a month to see if my solution correctly deals with other stuck sockets and allows the application to run without error for a full month.
If anyone wants to chase or test this solution I am happy to publish other code fragments from my solution current solution.
Cheers
Catweazle NZ