Go Down

Topic: Ethernet shield locking up for lack of sockets (Read 635 times) previous topic - next topic

solar_eta

Hi I have a data logger on a Leonardo board that outputs info from the analog inputs to both an Sd card and to a web page. It also gets a daily time stamp update via UDP. Every few months the whole thing locks up.

I have checked the available free SRAM at various places in the code and the smallest this gets is 700bytes.

 I am beginning to think its a frozen socket problem. As I've a standard variable 'systime' in my code that clocks up the seconds from an interrupt I made a variant  of SurferTims solution to close a frozen socket but it won't compile.

I've put call to the modified function at the beginning of the loop

The modified function is as follows:-

Code: [Select]
void CheckSockStatus()
{
for (int i = 0;  i < MAX_SOCK_NUM; i++){
  uint8_t s = W5100.readSnSR(i);
 
  if((s == 0X17) || (s = 0x1C)) {
      if(systime - connectTime[i] > 30UL){
       close(i);
      }
    }
    else connectTime[i] = systime;
    socketStat[i] = W5100.readSnSR(i);

}
}


The compiler tells me "'close' was not declared in this scope" when it gets to this piece of code


The setup code has the conventional
Code: [Select]

////////////////////////////
// We use fixed IP address (DHCP seem to add 3.6k to sketch)

  Ethernet.begin(mac,ip,localdns,gateway,subnet);
  digitalWrite(ethernetChipSelectPin,HIGH); // make sure its off
// line start the Ethernet server
  server.begin();
  Udp.begin(localPort);




SurferTim

#1
Jan 05, 2018, 12:28 pm Last Edit: Jan 05, 2018, 12:53 pm by SurferTim
Mine compiles. I just checked it.

What code were you using and what compile error did you get?

Edit: Which ethernet library are you using? There are a few now. Insure you add this:
#include <utility/socket.h>.
If you do not include this in your code, you will get "'close' was not declared in this scope".

This is the function I use:
Code: [Select]
void checkSockStatus()
{
  unsigned long thisTime = millis();

  for (int i = 0; i < MAX_SOCK_NUM; i++) {
    uint8_t s = W5100.readSnSR(i);

    if((s == 0x17) || (s == 0x1C)) {
        if(thisTime - connectTime[i] > 30000UL) {
          Serial.print(F("\r\nSocket frozen: "));
          Serial.println(i);
          close(i);
        }
    }
    else connectTime[i] = thisTime;

    socketStat[i] = W5100.readSnSR(i);
  }
}

solar_eta

Found the compile problem - typo in the '#include' statements!!!

As the problem may be coming from the UDP request to the time server should I also be looking for an 0x22 returned by the W5100.readSnSR(I) as well as 0x17 & 0x1C?

Or any of the other returned codes defined in W5100.h for that matter?

The annoying thing is that I can't make this fault happen.


SurferTim

Are you using server or client code? If server, you may not be able to duplicate it, but I can. My challenge was caused by a port scanner. The port scanner was searching for open ports, but when it found one, it didn't close the connection, causing the loss of one socket. Four of those scans and you are DOOMED!

solar_eta

I presume that I'm using server code the 'setup' includes

Code: [Select]
  Ethernet.begin(mac,ip,localdns,gateway,subnet);
  digitalWrite(ethernetChipSelectPin,HIGH); // make sure its off
// line start the Ethernet server
  server.begin();
  Udp.begin(localPort);
//Check presence of SD card
  if (!sd.begin(sdChipSelectPin));


and in the 'loop' I have

Code: [Select]
// Ethernet outputs here
 // listen for incoming clients
  byte requestType;
  char c;
   
  client = server.available();
  if (client){   
    while (client.connected()){
      if (client.available()) {
        memset(fileRequest, 0 ,sizeof(fileRequest)); //clear the inputBuffer
        if (client.readBytesUntil('/',fileRequest,MAX_PAGE_NAME_LEN)){
          if (strcmp(fileRequest,"GET ") == 0){
          requestType = 1 ; //search for 'GET'
          }          else if (strcmp(fileRequest,"POST ") == 0){
          requestType = 2;  //search for 'POST'
          }
        //gather what comes after the '/'
        memset(fileRequest, 0 ,sizeof(fileRequest)); //clear the inputBuffer
        //EofN = 0;
          if( client.find("") ){
            fileBufferLen = 0;
            *fileRequest = 0;
            while(fileBufferLen < MAX_PAGE_NAME_LEN ){
              c = client.read();
              if( c == 0 ){
                fileBufferLen = 0;   // timeout returns 0 !
              }
              else if((c == 32)||(c == 63)) {  // space character or ?
                fileRequest[fileBufferLen] = 0; // terminate the string
                break;
              }
              else{
                fileRequest[fileBufferLen++] = c;
              }
            }
            fileRequest[fileBufferLen] = 0;
            // Note: inputBuffer full before the closing post_string encountered
          }
          else fileBufferLen = 0;    //failed to find the prestring
        }//end if(client.readBytes
        client.find("\r\n\r\n");       //have to find the blank line & make sure we read to the end
        //client.find(dblReturn);       
        if (requestType == 2){ //do a POST
           
          actionUpDate();
          }//end if (requestType == 2)
        // GET or POST Give the string asked for
        // or if no file name give index.htm
        if (fileBufferLen == 0) {
          strcpy(fileRequest,"index.htm"); 
        }//end if
        sendFile(fileRequest);
        delay(1);
        client.stop();
      }.......


to listen for requests for info over the intranet and send the web page/file to the client.

The NTP routine is as follows

Code: [Select]
unsigned long GetNTP(byte NTP_Address[4])
{
  union                           //Define union of unsigned long ntpEpoch
  {                               //to get time stamp from bigendian
    unsigned long value;          //bytes in the returned buffer
    byte element[4];
  }ntpEpoch;
  ntpEpoch.value =0;              // set it to zero to clear random values.
 
  const byte ntp_BUFFER_LEN = 48;
  byte ntp_Buffer[ntp_BUFFER_LEN];
  //digitalWrite(sdChipSelectPin, HIGH);      // make sure SD Card is OFF
  // set all bytes in the inputBuffer to 0
    memset(ntp_Buffer, 0, ntp_BUFFER_LEN);
   
  // Initialize values needed to form NTP request
  // (see URL above for details on the packets)
   ntp_Buffer[0] = 0b11100011;   // LI, Version, Mode
   ntp_Buffer[1] = 0;            // Stratum, or type of clock
   ntp_Buffer[2] = 6;            // Polling Interval
   ntp_Buffer[3] = 0xEC;         // Peer Clock Precision
    // 8 bytes of zero for Root Delay & Root Dispersion
   ntp_Buffer[12]  = 49;
   ntp_Buffer[13]  = 0x4E;
   ntp_Buffer[14]  = 49;
   ntp_Buffer[15]  = 52;
   
    // all NTP fields have been given values, now
    // you can send a packet requesting a timestamp
    Udp.beginPacket(NTP_Address, 123); //NTP requests are to port 123
    Udp.write(ntp_Buffer,ntp_BUFFER_LEN);
    Udp.endPacket();
    delay(980); //could do with being bigger but may cause problem with 'one second interupt'

    if (Udp.parsePacket()) {
      // We've received a packet, read the data from it
      Udp.read(ntp_Buffer,ntp_BUFFER_LEN);  // read the packet into the buffer
      //the timestamp starts at byte 40 of the received packet and is four bytes,
      //long higest byte first combine the four bytes into a long integer
     
      ntpEpoch.element[0] = ntp_Buffer[43];
      ntpEpoch.element[1] = ntp_Buffer[42];
      ntpEpoch.element[2] = ntp_Buffer[41];
      ntpEpoch.element[3] = ntp_Buffer[40];
     
      /*  //original shift left and or the 4 bytes
      for (byte i = 40; i < 44; i++){
        ntpEpoch = (ntpEpoch << 8) | ntp_Buffer[i];
       
      } */
      // this is NTP time (seconds since Jan 1 1900):
      // now convert NTP time into everyday time:
      // Unix time starts on Jan 1 1970. In seconds, that's 2208988800:
      // but we are woring on a 2000 base so subtract 100 years:
      ntpEpoch.value = ntpEpoch.value - 3155673600UL;// year 1970 base is 2208988800UL sec
                                                     //from 1900 ans 2000 946684800UL from 1970 Unix time:
    }  //end 'If ( Udp.parsePacket() )' if we didnt have a time ntpEpoch.value is still zero
  delay(1);
  return ntpEpoch.value;
  }



All of which is a pretty standard way of doing things. Once I got the original problems of the Malloc.c out of the code including moving up to IDE 1.0.6 the unit ran 24/7 for two years without a hitch but now it is suffering hangs at irregular intervals.

Like your problem I wonder if my broadband is being sniffed for open ports, PC's etc have their own AV software but I'm not sure how to stop the Arduino from being on the receiving end of such inquisition.

To cut down the size of the code I've already taken all the DHCP and other unused code out of the Ethernet librariey and also taken some stuff out of SdFat to make the thing smaller. The whole now code compiles to 27700byte under IDE 1.0.6 and to 33000 under 1.6.8 so moving up on the IDE is not an option.





SurferTim

If your ethernet shield has a public IP, or a port forward from a public IP, you are subject to port scans. You are using the correct solution.

solar_eta

No public IP or port forwarding. It should all be my side of the router but I doubt that BT Home Hubs are that clever to stop scanning from outside.

SurferTim

My server code has a ShowSockStatus function to determine what is causing the socket loss. You should call that function when you lose all your sockets. If you have questions, post the output of the ShowSockStatus function.

solar_eta

I'll try that once I've slimmed down the whole sketch a bit. Thanks for the support and sugestions

solar_eta

The Sketch refuses to be slimmed down, its at about 28000, so Ive looked for other ways of outputting what is going on by modifying your 'ShowSockStatus()' to output to a .CSV file on the SD card every loop! Ok so it slows things down but it shows the request to the Arduino for the web page only opens two sockets. I presuming that one is for the HTML and the other for the style sheet. So far so good.  I've tried stress testing by firing up another PC to download the same page and the number of sockets in use only rises to 3. That still leaves some headroom for the occasional UDP request for an NTP time stamp from the web.

By the way the webpage being served by the Arduino has meta refresh call every 15 seconds so you can watch the changes in the system the Arduino is controlling.

Now if I include a 'checkSockStatus()',to kill off rogue external sniffing, the web page appears on the PC when requested for 30-40 seconds and then disappears! IE11 does not say why but Firefox gives the message 'Connection to server was reset whilst the page was loading'. The only way to get anything back is to reset the Arduino. So it would appear that the close(i) in the 'checkSockStatus()' is closing a socket that is actually in use! which is not good.

I've also tried an extra line in the 'checkSockStatus()' instead of the close(i); this returns the number of sockets open.

Code: [Select]
int CheckSockStatus()
{
 // Check for locked sockets every loop
int numOpen = 0;
for (int i = 0;  i < MAX_SOCK_NUM; i++){
  uint8_t s = W5100.readSnSR(i);
  if((s == 0x17) || (s = 0x1C)) {
    //numOpen ++;
   if(systime - connectTime[i] > 60UL){
   //close(i);*/
     numOpen ++;
    }
  }
  else connectTime[i] = systime;
    //socketStat[i] = W5100.readSnSR(i);
  }
return numOpen;
}


Now when I add this to the record file that grows on the SD card it always shows '4' whereas the 'ShowSockStatus() output that the status of some of the sockets is 0x00.


I include an abstract copy of the record file below

Day/time Sockets in use Status
15 17:01:30 4 Socket#0 :0x14 80 D: 0.0.0.0 0
        Socket#1 :0x0 8888 D: 192.168.1.254 -123
               Socket#2 :0x0 0 D: 0.0.0.0 0
               Socket#3 :0x0 0 D: 0.0.0.0 0
15 17:01:31 4 Socket#0 :0x14 80 D: 0.0.0.0 0
        Socket#1 :0x0 8888 D: 192.168.1.254 -123
                Socket#2 :0x0 0 D: 0.0.0.0 0 Getting initial time stamp
               Socket#3 :0x0 0 D: 0.0.0.0 0
15 17:01:31 4 Socket#0 :0x14 80 D: 0.0.0.0 0
        Socket#1 :0x0 8888 D: 192.168.1.254 -123
               Socket#2 :0x0 0 D: 0.0.0.0 0
        Socket#3 :0x0 0 D: 0.0.0.0 0
15 17:01:31 4 Socket#0 :0x14 80 D: 0.0.0.0 0

More of the same

15 17:01:40 4 Socket#0 :0x0 80 D: 192.168.1.66 -54757 Now serving web page?
               Socket#1 :0x14 80 D: 192.168.1.254 -123 Still waiting for UDP?
               Socket#2 :0x0 0 D: 0.0.0.0 0
        Socket#3 :0x0 0 D: 0.0.0.0 0
15 17:01:40 4 Socket#0 :0x0 80 D: 192.168.1.66 -54757
        Socket#1 :0x14 80 D: 192.168.1.254 -123
                Socket#2 :0x0 0 D: 0.0.0.0 0
        Socket#3 :0x0 0 D: 0.0.0.0 0
15 17:01:40 4 Socket#0 :0x0 80 D: 192.168.1.66 -54757
        Socket#1 :0x14 80 D: 192.168.1.254 -123
                Socket#2 :0x0 0 D: 0.0.0.0 0
        Socket#3 :0x0 0 D: 0.0.0.0 0
15 17:01:40 4 Socket#0 :0x0 80 D: 192.168.1.66 -54757
          Socket#1 :0x14 80 D: 192.168.1.254 -123
                Socket#2 :0x0 0 D: 0.0.0.0 0
                        Socket#3 :0x0 0 D: 0.0.0.0 0
15 17:01:40 4 Socket#0 :0x0 80 D: 192.168.1.66 -54757
        Socket#1 :0x17 80 D: 192.168.1.66 -54758
        Socket#2 :0x0 0 D: 0.0.0.0 0
        Socket#3 :0x0 0 D: 0.0.0.0 0
15 17:01:40 4 Socket#0 :0x14 80 D: 192.168.1.66 -54757
        Socket#1 :0x17 80 D: 192.168.1.66 -54758
        Socket#2 :0x0 0 D: 0.0.0.0 0
        Socket#3 :0x0 0 D: 0.0.0.0 0

ETC ETC

I'm at a loss to know what is going on.

SurferTim

#10
Jan 16, 2018, 07:02 pm Last Edit: Jan 16, 2018, 08:49 pm by SurferTim
If the sockets are showing a status of 0x17, then the client is not closing the connection. The code I gave you above should free up those sockets in a minute or so.

Edit: a socket status of 0x0 means the socket is available. Here is a status code list.
0x0 = available
0x14 = waiting for a connection
0x17 = connected
0x1C = connected waiting for close
0x22 = UDP
One of the sockets should always show 0x14.

solar_eta

It looks as if there is a clash between the CheckSockStatus() closing sockets that have been open too long and the Meta refresh in the web page being served! I've tried other web browsers (edge/safari etc) and all have the same problem of not refreshing the screen presumably because the socket got closed by the serving Arduino and the meta refresh could not then see the Arduino.

I've tried both shortening & lengthening the time for the meta refresh and the 'systime - connectTime =' delta all to no great effect. In your experience how long is it after forcing the sockets to close that they are willing to talk again across the Ethernet?

If I have to have CheckSockStatus() to kill off rogue sniffing then it looks as though I need another way of pushing a web page refresh out to the PC rather than the simple HTML meta refresh.

SurferTim

#12
Jan 19, 2018, 04:39 pm Last Edit: Jan 19, 2018, 05:12 pm by SurferTim
I made a modification to the ethernet library to get the socket number for the web client. If I desire, I can change the connectTime to the current value. That restarts the timeout counter for that socket.

However, Meta refresh should open a new socket every time the page refreshes. I use JavaScript and frames to refresh a page.

Meta refresh should do this:
Open a connection,
send a request,
get a response,
close the connection,
wait for the refresh time on the recently downloaded page,
repeat

The bad part of Meta refresh is one failed download, and it fails. The JavaScript method doesn't.

Go Up