Ethernet Shield possibly seizing

Hi Folks,

I'm looking for an idea why my Mega2560 with Ethernet shield feeding data to Xively (aka Cosm aka Pachube) regurally after approx. 17hrs 35mins is seized on client.connect(). Applying watchdog the sketch is restarted and in fact everything runs as expected but ...

The components:

  • cableTV modem for connection to internet
  • cable modem
  • router Zyxel NBG 4615
  • Mega2560 with Ethernet Shield
  • the sketch is being improved time to time (runs 24/7 more than 2 years now), it is compiled by IDE 1.0.5 with Ethernet library

Shield has fixed IP address so there should be lease time on router (at least I can't find it there). The client connects to fixed Xively IP address on internet. Sketch measures the atmospheric pressure and send the data to Xively every 30 secs. Part of the sketch where program stops:

// this method makes a HTTP connection to the server:
void sendData(String thisData1) {
  // if there's a successful connection:

  if (client.connect()) {                                          //<-------------------------------------------------------problem here
    Serial.println("connecting...");
    // send the HTTP PUT request. 
    // fill in your feed address here:
    client.print("PUT /api/25574.csv HTTP/1.1\n");
    client.print("Host: www.pachube.com\n");
    // fill in your Pachube API key here:
    client.print("X-PachubeApiKey: ....ApiKeyHere... \n");
    client.print("Content-Length: ");

    // calculate the length of the sensor reading in bytes:
    int thisLength = thisData1.length();//getLength(thisData1);
    client.println(thisLength, DEC);

    // last pieces of the HTTP PUT request:
    client.print("Content-Type: text/csv\n");
    client.println("Connection: close\n");

    // here's the actual content of the PUT request:
    client.println(thisData1);

    // note the time that the connection was made:
    lastConnectionTime = millis();
  } 
  else {
    // if you couldn't make a connection:
    Serial.println("connection failed");
  }
}

Generally everything runs smoothly approx. 17hrs 35mins ( cca 63 300 secs ) and then sketch will not pass "client.connect()" statement and watchdog fires the reset of the board. Within above time period there is a really few cases that "if (client.connect() )" is not true I will get the print out in "else" branch.

But there is neither positive or negative Serial.print when "something" stucks the sketch.

Does anybody have idea where I should be looking for the reason ??

The way of my thinking:
I use millis() and some variables working with time declared as "unsigned long int".
Millis() roll-over shouldn't be applied here as the reset is after relatively short time.
IP connection - I'm not sure enough but there is no connection sketch should go through "else" branch until it is connected.
Router - running of renew of shield IP - I'm not sure again but rather not in this case.

Thank you in advance for the hints what to do ..

Cheers
Vladimir

Post all your code. I do not see where you are reading the response from the server. If you leave any characters in the w5100 rx buffer, the connection may not close correctly and "eats" that socket. There are only 4 sockets in a w5100.

edit: If the problem is client.connect() continuously returns false after working for a long while, then that is normally an indication all the sockets are unavailable.

Hi SurferTim,

so herebelow is my all code - sorry for non-English comments and some variables (test1-test3) I use to check and count where the programme is. For sake of good order the all code is quite complex as it contains also the communication via xBee nevermind it regurally stops in indicated place. The code is so large = it is in attachment.

Further I'm attaching the capture file from Serial port with a part before stuck and after. FYI "Pocet cyklu =" means number of calling client.connect(), "Pocet pripojeni =" means number of successful passes through true branch ...

I use (maybe) old fashioned way how feed data to Xively based on example sketch feeding to Pachube. I know there is a new Xively Library using a bit different method but as far I'm to lazy to adapt it ... :wink:

I hope it is not very confusing ...

Cheers and thank
Vladimir

ActualCode.txt (17.8 KB)

SerialPort Capture.txt (7.14 KB)

I suspect this code. If the connection breaks (fails) or the server stalls, you will never get a disconnect message from the server, so the client.connected() call will always return true and never disconnect.

  // if there's no net connection, but there was one last time
  // through the loop, then stop the client:
  if( !client.connected() && lastConnected ) {
    Serial1.println("disconnecting.");
    client.stop();
  }

You will need some type of timeout to abort that "if(!client.connected()" check. This code in the playground has a timeout feature that prevents the seizing (lockup).
http://playground.arduino.cc/Code/WebClient
Look for the connectLoop variable in the getPage() function. That controls the timeout.

OK, SurferTim, I applied suggested code, compiled it and .... we'll see :wink: I'll let you know.

For time being thanks for your kind assistance
Cheers
Vladimir

Hi my dear supporter, :wink:

unfortunately I have to state that even with suggested modification the sketch didn't pass magic limit ...

Nevermind application of your suggestion (I hope I applied it well) improved the code in that way that now the output to Serial1.print() is more clear. Before I had time to time a mixture of readings Ethernet buffer and incoming data from xBees. Also, as you may see in attached Capture file, ALL connection to server were successful not using false branch of 'if' statement. Within last "running season" there were 7 forced client.stop() in "Timeout" way. The Timeout was reduced from 10sec to 3sec waiting time only in order not to fire 8sec Watchdog. But ....

SurferTim, in attachment I'm sending you both the modificated code and Capture file with Serial1 readings around the mysterious limit.

I'm just thinking in order to determine whether it is rather time depending issue (doesn't matter if Ethenet shield, router or server) or number of calling client.connect() to increase the feeding Xively interval from existing cca 30sec to i.e. 40sec ... But it will take another approx. 17,5hrs to get expected result :~

SurferTim, thanks a lot for your kind help. If you or somebody else have another idea what to change, check or modified - let me know.

Cheers
Vladimir

Code_Modification1.ino (18 KB)

CAPTURE_new.TXT (7.65 KB)

The capture file doesn't show the ethernet seizing. Here is the last transaction in your capture:

connecting...
HTTP/1.1 200 OK
Date: Fri, 27 Sep 2013 20:45:20 GMT
Content-Type: text/plain; charset=utf-8
Content-Length: 0
Connection: close
X-Request-Id: 00d3f1376eb7e6db6582e319d1f5e30e285e7db9
Cache-Control: max-age=0
Vary: Accept-Encoding

disconnecting.

The "HTTP/1.1 200 OK" and "disconnecting" means everything went ok. What happens after this?

edit: Check your SRAM before each ethernet transaction. Maybe you have a memory leak somewhere. Add this function to your code and call it every iteration of sendData().

int freeRam() {
  extern int __heap_start,*__brkval;
  int v;
  return (int)&v - (__brkval == 0 ? (int)&__heap_start : (int) __brkval);  
}

// then in your code 

void sendData( String thisData1 ) {
  Serial1.print(F("SRAM available: "));
  Serial1.println(freeRam());

// rest of your code
}

edit2: This is the part that fails? This is the Arduino code failing, not the ethernet shield. If it was the shield, you would get "connection failed" over and over. Check your SRAM. If you run out, it will cause fails like this reboot.

Jsme ve funkci ...
Pocet cyklu = 2104
Setup is completed .....

I was thinking about running out of SRAM before I entered the forum. I used some code found on internet but there was nothing terrible in respect of SRAM figure shown. I added your SRAM check and we will see.

According to me, as you may see in previous Capture file, something has happen when calling statement 'if( client.connect( server, 80 ) )' because Serial1.print before is send to Serial port but nothing else from True branch (the number of connections) or False branch ("connection failed") is sent to Serial ...

The 'if( client.connect( server, 80 ) )' is critical in running time approx. 17:35hrs - the question is what is it - time, SRAM, something else ??

Now I added freeRam() figure to datastream to Xively so we can see also in the graph how it is changing. Next round I expect tomorrow morning local time :wink:

Thanks for now
Vladimir

Now I added freeRam() figure to datastream to Xively so we can see also in the graph how it is changing. Next round I expect tomorrow morning local time

You should know sooner than that. It won't just suddenly run out of SRAM. If that is the problem, the SRAM will probably slowly decrease over that time until it runs out at 17:35.

The String data type had problems with memory leaks in the past. That would be my first suspect.

If you take a look on www.xively.com/feeds/25574, the last graph - there is freeRam value. Now the sketch is running more than 2,5hrs and there is no negative or rapid progress of freeRam value. It is stable, maybe a few bytes down but nothing falling down regularly to ZERO after 17hrs. If there is no suddenly drop down, I think running out of SRAM is rather not the issue .....

Actually, that looks pretty good. It shouldn't be varying either way after several update cycles.

I have run my client sketch code for several days without fail. I do not consider a failure to connect, a connection break, or a server stall as a fail. The sketch should go right on as if nothing happened. I induce those errors to test my code.

I don't use Xively. I have my own server.

And I never use the String data type, and I mean NEVER! Every time I have tried it, it has crashed my code. They say it is fixed in V1.0.5, but I have no reason to use a dynamic memory allocation scheme on a processor with this limited amount of SRAM. I can do the allocation manually just fine using character arrays.

Attached is a new Capture file, but there is nothing new on my side. SRAM value is stable until the crash and the program is lost in 'if( client.connect( server, 80 ) )' in Loop no. 2104 ...

I'll be thinking to try to adapt a new Xively library if it gets some change. Before, maybe, I'll try to learn more and apply your idea with character arrays.

That's all from me.

All the best
Vladimir

CAPTURE_0929.TXT (22.5 KB)

Let me allow (at least for now) the final report. I made some tests during the week I have found out that there is critical call no. 2104 of client.connect() in my sketch. I went through Ethernet library, especially EthernetClient.cpp and the problematic part for me is:

  while (status() != SnSR::ESTABLISHED) {
    delay(1);
    if (status() == SnSR::CLOSED) {
      _sock = MAX_SOCK_NUM;
      return 0;
    }
  }

The program couldn't get here ESTABLISHED answer while called in 2104th pass (I don't why everytime this number ??), the code fallen down into "endless" loop and finally my 8 sec WatchDog fired and restarted the sketch from beginning ... The endless in quotes means I didn't check whether after longer time ESTABLISHED arrives or not.

I found the solution for me here Ethernet, connect(), timeout - Troubleshooting - Arduino Forum adding one #include line in beginning of my code and two others in Setup as described. Now there is a termination and I get "No connection" in critical pass no. 2104 but the sketch runs further without problem ...

I don't why there is 2104 pass of the code so problematic for me and I have no idea if it is rather HW, SW or server issue. Nevermind it works for me :wink: