client.connect() fails after a while with Arduino Mega 2560 + Ethernet Shield

Hello,

System: Ethernet lib 1.1.2, ATMega 2560, Ethernet shield, arduino IDE 1.6.12
Description: client.connect() returns 0 after several successful POST requests

I am sending a small amount of data into an elasticsearch server using POST request every 2 seconds. After a while, could be 15 minutes, could be hours, Ethernet client does not manage to connect to my server ( Server up and running for sure ). At this step, The ATMega is not pingable anymore. The only way I have found to make it working again is to unplug/plug the power source.
Ive also attempted to add Ethernet.begin(mac); when it fails. Sometime it manage to recover and get a IP from the DHCP, sometime it does not.
Also, ive tried to soft reset, but it does not recover from it.

Ive made sure I have the following line in EthernetClient.cpp as mentioned in Ethernet fails connecting after a while -Freetronics Forum

int EthernetClient::connect(IPAddress ip, uint16_t port) {
  if (_sock != MAX_SOCK_NUM)
    return -1;

  for (int i = 0; i < MAX_SOCK_NUM; i++) {
    uint8_t s = W5100.readSnSR(i);
      if (s == SnSR::CLOSED || s == SnSR::FIN_WAIT || s == SnSR::CLOSE_WAIT) {
      _sock = i;
      break;
    }
  }

And you will find below the code:

#include <SPI.h>
#include <Ethernet.h>
byte mac[] = { 0xDE, 0xAD, 0xBE, 0xEF, 0xFE, 0xED };
EthernetClient client;
#include <TimeLib.h>


void setup() {
  Serial.begin(9600);
  while (!Serial) {}
  Ethernet.begin(mac);
  Serial.println(Ethernet.localIP());
}

void loop() {

 String dataES = "{\"location\": \"office\"}";
 if(!postPage("192.168.1.68",9200,"/sensors/temperature",dataES)) 
  Serial.print(F("Fail "));
 else {
  Serial.print(F("Pass "));
  Serial.println(dataES);
 } 
 delay(1000);
}  


byte postPage(char* domainBuffer,int thisPort,char* page, String thisData)
{
  int inChar;
  char outBuf[64];
  int TIMEOUT_REPLY = 5000;
  Serial.print(F("connecting..."));

  if(client.connect(domainBuffer,thisPort) == 1){
    Serial.println(F("connected"));
    sprintf(outBuf,"POST %s HTTP/1.1",page);
    client.println(outBuf);
    sprintf(outBuf,"Host: %s",domainBuffer);
    client.println(outBuf);
    client.println(F("Connection: close\r\nContent-Type: application/json"));
    sprintf(outBuf,"Content-Length: %u\r\n",thisData.length());
    client.println(outBuf);
    client.print(thisData);
  } 
  else {
    Serial.println(F("failed"));
  }

  unsigned long ctime = millis() + TIMEOUT_REPLY;
  while(client.connected()) {       
    if ( millis()>ctime ) {
       Serial.println("NO RESPONSE WITHIN PRESCRIBED TIME");
      break;
    }
    while(client.available()) {
      inChar = client.read();
      Serial.write(inChar);
      delay(1);
    }
    delay(1);
  }

  Serial.println();
  Serial.println(F("disconnecting."));
  client.stop();
  return 1;
}

Any idea ? I have been battling for months on this problem. I am bit clueless

Thank you very much

You are probably running out of sockets, since the Ethernet Shield only have 4 sockets.

I don’t see the need of editing the Ethernet Library (except to add the client IP), I don’t remember exactly why, but I already crossed patches with that post and decided not to do that, perhaps because it’s a 4 year old thread.

The first thing to do, is have sure that you are always closing the connection. It seens to me that you are, but as you didn’t provide the Serial Monitor output, I can’t really be sure.

Even doing that, from time to time a socket can get stuck in a unavailable status. You should have a look at the WebServerST to see how to handle frozen sockets.

And you can use this to track the socket status during debug.

void ShowSockStatus() {
  for (int i = 0; i < MAX_SOCK_NUM; i++) {
    Serial.print(F("Socket#"));
    Serial.print(i);
    uint8_t s = W5100.readSnSR(i);
    socketStat[i] = s;
    Serial.print(F(":0x"));
    Serial.print(s,16);
    Serial.print(F(" "));
    Serial.print(W5100.readSnPORT(i));
    Serial.print(F(" D:"));
    uint8_t dip[4];
    W5100.readSnDIPR(i, dip);
    for (int j=0; j<4; j++) {
      Serial.print(dip[j],10);
      if (j<3) Serial.print(".");
    }
    Serial.print(F("("));
    Serial.print(W5100.readSnDPORT(i));
    Serial.println(F(")"));
  }
}

// CLOSED 0x00 // LISTEN 0x14 // ESTABLISHED 0x17 // FIN_WAIT 0x18 // CLOSE_WAIT 0x1C // UDP 0x22

Thanks for your feedback.

Unfortunately, it does not seem to be a socket overflow. You will find below the output of Serial.print.
PS:

  • The first socket is NTP server.
  • The second is Elasticsearch server
{"_index":"sensors","_type":"temperature","_id":"AVgHLHI3lOyM2a5lq6CE","_version":1,"_shards":{"total":1,"successful":1,"failed":0},"created":true}
disconnecting.
Pass {"location": "bureau", "value": 27.20, "timestamp": 1477589079824}
Socket#0:0x22 8888 D:129.250.35.250(123)
Socket#1:0x0 50599 D:192.168.1.68(9200)
Socket#2:0x0 0 D:0.0.0.0(0)
Socket#3:0x0 0 D:0.0.0.0(0)
connecting...connected
HTTP/1.1 201 Created
Content-Type: application/json; charset=UTF-8
Content-Length: 147

{"_index":"sensors","_type":"temperature","_id":"AVgHLHt-lOyM2a5lq6CF","_version":1,"_shards":{"total":1,"successful":1,"failed":0},"created":true}
disconnecting.
Pass {"location": "bureau", "value": 27.20, "timestamp": 1477589081188}
Socket#0:0x22 8888 D:129.250.35.250(123)
Socket#1:0x0 50600 D:192.168.1.68(9200)
Socket#2:0x0 0 D:0.0.0.0(0)
Socket#3:0x0 0 D:0.0.0.0(0)
connecting...failed
ATTEMPTING TO REINIT0.0.0.0

disconnecting.
Pass {"location": "bureau", "value": 27.20, "timestamp": 1477589083590}
Socket#0:0x0 68 D:255.255.255.255(67)
Socket#1:0x0 0 D:0.0.0.0(0)
Socket#2:0x0 0 D:0.0.0.0(0)
Socket#3:0x0 0 D:0.0.0.0(0)
connecting...failed
ATTEMPTING TO REINIT0.0.0.0
MAX_FAILED REACHED, RESETTING
setup()
0.0.0.0
Transmit NTP Request
No NTP Response :-(
end setup()
Socket#0:0x22 8888 D:255.255.255.255(67)
Socket#1:0x0 0 D:0.0.0.0(0)
Socket#2:0x0 0 D:0.0.0.0(0)
Socket#3:0x0 0 D:0.0.0.0(0)
connecting...failed
ATTEMPTING TO REINIT0.0.0.0

disconnecting.
Pass {"location": "bureau", "value": 26.80, "timestamp": 66337}
Socket#0:0x0 68 D:255.255.255.255(67)
Socket#1:0x0 0 D:0.0.0.0(0)
Socket#2:0x0 0 D:0.0.0.0(0)
Socket#3:0x0 0 D:0.0.0.0(0)
connecting...failed
ATTEMPTING TO REINIT

PS, the whole code is here : ZeroBin.net

Any other idea ?

Hmm… Strange…

Appears you are having internet problems acctually (or hardware problems on the Ethernet Shield maybe).
You see how the second and third connections fails and when you restart it the Arduino doesn’t connect to the internet again?

All subsequent connections fails, even the NTP request fails, and when you print the socket status it shows the UDP socket connected.

I see your comment on the SetSyncInterval, that on subsequent NTP request it crashes? You are probably doing something wrong. One thing that you could do (even if it’s not related, it’s a good pratice), to move the NTP variables to the scope of the request (in a single function) and to start and stop the UDP only for making the request, there is no need to spend 1 of 4 sockets open only for a request that will be made once in a while.

Here is my NTP function, you should try it and adapt to your needs.

time_t getNTP() {
  byte timeZone = -3;
  byte NTP_PORT = 123;
  byte packetLength = 0;
  unsigned long startMillis = 0;
  unsigned long secsSince1970 = 0;
  char timeServer[] = {"time.nist.gov"};
  byte packetBuffer[48] = {0b11100011, 0, 6, 0xEC, 0, 0, 0, 0, 0, 0, 0, 0, 49, 0x4E, 49, 52};
  udp.begin(UDP_PORT);
  udp.beginPacket(timeServer, NTP_PORT);
  udp.write(packetBuffer, 48);
  udp.endPacket();
  startMillis = millis();
  while (millis() - startMillis <= 2500) {
    packetLength = udp.parsePacket();
    if (packetLength >= 48) {
      udp.read(packetBuffer, 48);
      secsSince1970 = (unsigned long)packetBuffer[40] << 24;
      secsSince1970 |= (unsigned long)packetBuffer[41] << 16;
      secsSince1970 |= (unsigned long)packetBuffer[42] << 8;
      secsSince1970 |= (unsigned long)packetBuffer[43];
      secsSince1970 = secsSince1970 - 2208988800UL + timeZone * SECS_PER_HOUR;
      break;
    }
  }
  udp.stop();
  return secsSince1970;
}

Another advice would be to stop using the String (capital “S”) in your code, I see that you seens to be familiar with char arrays, so there is really no need to use it.

Other thing is that the wait for the Serial to begin is only needed in Leonardo, again, probably nothing to do with you problem, but…

  while (!Serial) {
    ; // wait for serial port to connect. Needed for native USB port only
  }

Besides that, there is nothing jumping to my eyes that could be causing this kind of issue other then maybe your modification to the Ethernet Library, I strongly recommend that you dump that modification and get the last library version.

Have you tried the WebServerST I mentioned as it is to see if the problem persists?