Ethernet Shield Communication Failures (Modifying Core Libraries)

I have a program which polls two other controllers on my network every 5 seconds for status using http. Everything works well for 1-3 hours and then communication is lost to both controllers at the same time. The returned status from the client.connect() function is zero (which is an undocumented response.). The only way to get communication back is to reset the controller. I know both controllers being polled are still functioning through other web services. The web server on the polling controller is still working since my web page is still being updated.

I found a thread for a problem similar to mine in which they modified EthernetClient.cpp file. Here's the link: Ethernet fails connecting after a while -Freetronics Forum

Does anyone have any suggestions on how resolve this problem? How do I incorporate the patch to a core library and rebuild the Arduino app for a Mac?

Thanks for the assistance.

What do you mean by “controllers”? Are these two other “controllers” client or server devices?

I know both controllers being polled are still functioning through other web services.

What other web services?

edit: Here is the client.connect() function from EthernetClient.cpp. If there is no socket available to attempt the connection, or the connection could not be established, then the function returns 0.

int EthernetClient::connect(IPAddress ip, uint16_t port) {
  if (_sock != MAX_SOCK_NUM)
    return 0;

  for (int i = 0; i < MAX_SOCK_NUM; i++) {
    uint8_t s = W5100.readSnSR(i);
    if (s == SnSR::CLOSED || s == SnSR::FIN_WAIT || s == SnSR::CLOSE_WAIT) {
      _sock = i;
      break;
    }
  }

  if (_sock == MAX_SOCK_NUM)
    return 0;

  _srcport++;
  if (_srcport == 0) _srcport = 1024;
  socket(_sock, SnMR::TCP, _srcport, 0);

  if (!::connect(_sock, rawIPAddress(ip), port)) {
    _sock = MAX_SOCK_NUM;
    return 0;
  }

  while (status() != SnSR::ESTABLISHED) {
    delay(1);
    if (status() == SnSR::CLOSED) {
      _sock = MAX_SOCK_NUM;
      return 0;
    }
  }

  return 1;
}

The two controllers being polled are irrigation controllers which provide web pages for control. One of the features that I am using is the JSON (JavaScript Object Notation) web page. This allows me to determined which water valve is currently on. I can reach these pages through my web browser even though my polling controller states it cannot. Everything is fine for 1-3 hours and then it faults out.

So you have three devices, two are servers and one is a client? The client is the polling controller?

If the two servers are accessible with a web browser, but the client Arduino is showing it can’t connect, then you probably have used all the sockets in the client.

Here is the code I use to check the socket status. Add this to your client (polling controller) code. Send an ‘r’ over the serial monitor to display the socket status on the serial monitor. If there is no socket with a status of 0x0, then any connection attempt will fail.

#include <utility/w5100.h>

 // in loop()
 if(Serial.available()) {
   if(Serial.read() == 'r') ShowSockStatus();    
 }

//then add this variable and function
byte socketStat[MAX_SOCK_NUM];

void ShowSockStatus()
{
 for (int i = 0; i < MAX_SOCK_NUM; i++) {
   Serial.print(F("Socket#"));
   Serial.print(i);
   uint8_t s = W5100.readSnSR(i);
   socketStat[i] = s;
   Serial.print(F(":0x"));
   Serial.print(s,16);
   Serial.print(F(" "));
   Serial.print(W5100.readSnPORT(i));
   Serial.print(F(" D:"));
   uint8_t dip[4];
   W5100.readSnDIPR(i, dip);
   for (int j=0; j<4; j++) {
     Serial.print(dip[j],10);
     if (j<3) Serial.print(".");
   }
   Serial.print(F("("));
   Serial.print(W5100.readSnDPORT(i));
   Serial.println(F(")"));
 }
}

First, thank you for your assistance. I incorporated your code and here are some of the results.

These first status blocks are just before the failure occurs. The .156 is my Mac viewing the web page from the controller. The .104 is one of the irrigation controllers that is polled. The .101 is a time server.

Socket#0:0x0 80 D:192.168.0.156(63194)
Socket#1:0x22 8888 D:132.163.4.101(123)
Socket#2:0x17 1198 D:192.168.0.104(80)
Socket#3:0x14 80 D:192.168.0.104(80)

Socket#0:0x0 2050 D:192.168.0.104(80)
Socket#1:0x22 8888 D:132.163.4.101(123)
Socket#2:0x17 1198 D:192.168.0.104(80)
Socket#3:0x14 80 D:192.168.0.104(80)

Here are a couple of status blocks after the failure. The .102 is the other irrigation controller.

Socket#0:0x0 80 D:192.168.0.156(63249)
Socket#1:0x22 8888 D:132.163.4.101(123)
Socket#2:0x17 1198 D:192.168.0.104(80)
Socket#3:0x17 2089 D:192.168.0.102(80)

Socket#0:0x14 80 D:192.168.0.104(80)
Socket#1:0x22 8888 D:132.163.4.101(123)
Socket#2:0x17 1198 D:192.168.0.104(80)
Socket#3:0x17 2089 D:192.168.0.102(80)

Connection To Irrigation Controller Failed
Connection To Irrigation Controller Failed

What I don’t understand is why the irrigation controllers show up at all. I complete a client.flush() and client.stop() after polling each controller. Is there anther way to close out connections? Is there a way to increase the number of sockets to see what affect it might have?

Thank you for your assistance.

The server must close the connection after sending a response. If you are not reading the response from the server (controller) until the connection is closed, the server may not be closing the connection, and the socket will not be released on the client.

Here is my web client code in the playground. It also has a timeout feature that prevents the socket loss if the connection breaks or the server stalls.
http://playground.arduino.cc/Code/WebClient