Pages: 1 [2] 3   Go Down
Author Topic: Ethernet stability  (Read 4447 times)
0 Members and 1 Guest are viewing this topic.
Offline Offline
Newbie
*
Karma: 0
Posts: 33
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

I thought that the read-from-client part that you have inside your timeout section should not be able to happen (in theory...)
The Ethernet library has a timeout and retries built into it and if the connection hangs while waiting for data then shouldn't the library force-close the socket? thus allowing exit from the wait/read loop?
Perhaps this force-close fails if a flush is not done first? or is it?
I'm just speculating here as I don't understand the internals of the library, which is what prompted me to do it in assembly.
I'd still rather understand whats happening in the high-level code  smiley-confuse

Do you know if its possible to read the W5100 socket status register?

That would certainly help me with trying to pinpoint where things are going wrong, or at least perhaps rule some things out.

Logged

Miramar Beach, Florida
Offline Offline
Faraday Member
**
Karma: 147
Posts: 6040
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

The w5100 has a problem with hardware fails. If the connection is closed (not a fail) by the server, all goes well. If the connection breaks, the server close message never gets to the client, and that while loop never exits.

If the ethernet firmware/library is supposed to timeout on its own during a receive, it doesn't. At least last I checked, and I think you are rechecking it right now.  smiley
Logged

Offline Offline
Newbie
*
Karma: 0
Posts: 33
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Yes, you are right about no (in the W5100 anyway) receive timeout. I dont understand the Ethernet library enough to know if it is in the software or not.
I was getting mixed up with the connect/transmit timeout. I implemented a receive timer in my assembly language version for just that reason or, as you say, you never get out of the loop.
I kept a stack of testing logs when I was working on the client side so I just had a look through them and right enough, the receive timeout error crops up several (2~5) times daily for Thingspeak. It tends to happen around the same period as 502 gateway unavailable conditions, also a daily occurrence. I put it down to server loading during busy periods, where it could not keep up (?)
The other errors that repeatedly come up are...
While waiting from a response to the connect request, a close arrives from the remote end, or...
Connection request simply times out.
At least both of these are simple to deal with, but as you pointed out, the receive side can just hang forever. I suspect that will be the source of at least one of my problems so thanks for highlighting that one.

The timeouts/bad-gateway errors seem to not happen for ages then a block of them appear off and on for perhaps 20min, then all back to plain sailing again. Its quite possible that sometimes the Arduino would get 'stuck'  at this point, but also possible for it to get lucky and sail through and not get caught till another time.

I'll try with the timeouts in place and see how it goes.
Logged

Global Moderator
Offline Offline
Brattain Member
*****
Karma: 485
Posts: 18815
Lua rocks!
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

I have an Ethernet shield that checks a couple of other servers, roughly every 10 seconds. It ran OK for weeks but eventually seemed to hang occasionally. I added a watchdog reset and so far, no problems. In setup I added:

Code:
  // watchdog setup in case shield hangs
  wdt_enable(WDTO_8S);  // reset after eight seconds, if no "pat the dog" received

Before connecting to a client I have:

Code:
    wdt_reset();  // give me eight seconds to do stuff (pat the dog)

And that's it! (Plus an include at the start of the file):

Code:
#include <avr/wdt.h>
Logged


Miramar Beach, Florida
Offline Offline
Faraday Member
**
Karma: 147
Posts: 6040
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

I do not use a watchdog timer in any of my code. But that is just me...

OTOH, Nick, I am using IDE v1.0.4, and I know how you are about upgrades.  smiley-wink
Do you use that timeout code? Like I told the OP, the fails that happen once every couple weeks or months are the tough ones to find.
Logged

Global Moderator
Offline Offline
Brattain Member
*****
Karma: 485
Posts: 18815
Lua rocks!
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

Quote
Upgrade: Exchange your old bugs for new ones.

But as it turns out I am on 1.0.4 right now. smiley

I don't fiddle around with Ethernet timeouts, they just seem to happen in a timely way. The watchdog lines are all I use, and to be honest I don't know if they kick in often because the board just seems to keep working.

I have one in my garage monitoring if the roller door is open or not, with no watchdog, and I've never had to reboot that except once I think after a brown-out of the house power. I'm talking a couple of years here.
Logged


Miramar Beach, Florida
Offline Offline
Faraday Member
**
Karma: 147
Posts: 6040
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

AH HA!! I KNEW IT!! Deep down inside, you always wanted a reliable current version.  smiley

I bumped your karma 'cause I like your stuff.
Logged

Global Moderator
Offline Offline
Brattain Member
*****
Karma: 485
Posts: 18815
Lua rocks!
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

Thanks. smiley

Actually 1.0.4 is the first one to fix the annoying issue with free() causing crashes, so it is good to have installed.
Logged


Miramar Beach, Florida
Offline Offline
Faraday Member
**
Karma: 147
Posts: 6040
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

@Nick: Do you use that timeout code? None of my sketches need a watchdog timer.
Logged

Global Moderator
Offline Offline
Brattain Member
*****
Karma: 485
Posts: 18815
Lua rocks!
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

No. This is it: No watchdog, no timeout.

Code:
/*
 
 Garage door open/closed detector.
 
  Based on Web Server by:
 
 created 18 Dec 2009
 by David A. Mellis
 modified 4 Sep 2010
 by Tom Igoe
 
 Modified by Nick Gammon
 9th Feb 2011
 
 */

#include <SPI.h>
#include <Ethernet.h>

// Enter a MAC address and IP address for your controller below.
// The IP address will be dependent on your local network:
byte mac[] = {  0x90, 0xA2, 0xDA, 0x00, 0x2D, 0xA1 };

// our address
byte ip[] = { 10, 0, 0, 240 };

// the router's gateway address:
byte gateway[] = { 10, 0, 0, 1 };

// the subnet:
byte subnet[] = { 255, 255, 255, 0 };

// which pin to connect the relay to (held high by internal pullup)
#define relayPin 7

// Initialize the Ethernet server library
// with the IP address and port you want to use
// (port 80 is default for HTTP):
Server server(80);

void setup()
{
  // start the Ethernet connection and the server:
  Ethernet.begin(mac, ip, gateway, subnet);
  server.begin();
 
    // initialize the relay pin as a input:
  pinMode(relayPin, INPUT);
  digitalWrite(relayPin, HIGH);   // set pullup resistor

}

void loop()
{
  // listen for incoming clients
  Client client = server.available();
  if (client) {
    // an http request ends with a blank line
    boolean currentLineIsBlank = true;
    while (client.connected()) {
      if (client.available()) {
        char c = client.read();
        // if you've gotten to the end of the line (received a newline
        // character) and the line is blank, the http request has ended,
        // so you can send a reply
        if (c == '\n' && currentLineIsBlank) {
          // send a standard http response header
          client.println("HTTP/1.1 200 OK");
          client.println("Content-Type: text/html");
          client.println();

          client.print ("Garage door is ");
          byte door = digitalRead(relayPin);
          if (door == HIGH)
            client.println ("closed.");
          else
            client.println ("open.");
       
          break;
        }
        if (c == '\n') {
          // you're starting a new line
          currentLineIsBlank = true;
        }
        else if (c != '\r') {
          // you've gotten a character on the current line
          currentLineIsBlank = false;
        }
      }
    }
    // give the web browser time to receive the data
    delay(1);
    // close the connection:
    client.stop();
  }
}
Logged


Miramar Beach, Florida
Offline Offline
Faraday Member
**
Karma: 147
Posts: 6040
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

@Nick: That is a server sketch. My server stuff doesn't need the timeout code either, just the client sketches. This isn't one of those errors you find right away. I had to create the error to find it. It is a "it runs for a few days, then crashes" kinda thing.
Logged

Global Moderator
Offline Offline
Brattain Member
*****
Karma: 485
Posts: 18815
Lua rocks!
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

OK, well my client that monitors my garage door server (the server sketch above) is the one I put the watchdog in.

I have it turn on an LED when it starts to query the server and turn it off once completed. Every now and again (like, every few weeks) it would stay on. It hasn't since I put the watchdog in.

I also have it monitoring my son's Minecraft server, by connecting to it and finding how many players are online. So it now connects to two servers. So far so good. It's kind-of cool seeing the number of players on a 8x8 LED matrix sitting there.

The client sketch does not have a timeout, other than the watchdog.
Logged


Miramar Beach, Florida
Offline Offline
Faraday Member
**
Karma: 147
Posts: 6040
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Quote
The client sketch does not have a timeout, other than the watchdog.
Then that is substituting for the timeout as it probably was for the OP. We are trying to eliminate the need for a watchdog restart.
Logged

Global Moderator
Offline Offline
Brattain Member
*****
Karma: 485
Posts: 18815
Lua rocks!
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

Fair enough, but for something that goes wrong occasionally I would be happy to have it there.
Logged


Offline Offline
Newbie
*
Karma: 0
Posts: 33
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

At last I have found some answers to my reliability issues. It seems to be in my service provider end, not in my software after all.
I did a test with my new Mega1284 board http://byremote.blogspot.com.au/2013/07/the-new-mini-biggie.html using a Wiz820 module, running side by side with my original system. My original system (an Ethermega) logs the assorted sensor data, the new board a test sinewave.
Looking at the two lots of logged data on Thingspeak, the breaks in the data match up.
Then I put my 1284 board at a friends place logging the test sine-wave and the breaks go away.
At home I use wireless broadband with Telstra (phone line internet is not available), my friends place uses ADSL2 down the phone line.
Using the ADSL internet, the large gaps and dropout of data are no longer present. There is still occasions when one or two posts have been missed but these are rare. It happens from both locations but they do not synchronize, they seem to be random. The 30min, 1hr gaps have gone with the wired internet. On the home front, the large drops are still present.
This gives me some hope that I'm not perhaps doing something fundamentally wrong after all smiley
Logged

Pages: 1 [2] 3   Go Up
Jump to: