Ethernet data transfer stalling or failing

yes available() tells you how many bytes are in buffer for reading, but it can be used in if statement to evaluate to true (any non zero value) or false (0 value).

anyway, you should never rely on available to determine it is the end of stream.

Do you control the server sending the data (I think so since it is a local ip address), are you sure the server is not the one stalling?

int length = 0;
while (client.connected()) {
while (client.available()) {
char c = client.read();
length++;
if (c== -1) {
client.stop();
}
}
}

stream read() is a blocking operation, if there is nothing to read, it blocks and stays there until it comes back with data or returns -1 if there is no more data/end of stream. I have not encountered any stream implementation where read() will return -1 if it is not the end of stream.

BTW, you are right that if the server is the one stalling, by not sending more data and by not disconnecting, then available will always be false and connected will always be true, and you will be stuck in the loop.

try the above code. I highly doubt this is a hardware issue. if your server is truly not stalling, then I see absolutely no reason why the above code won't work.

PaulS:

What confuses me is why available() seems to never return true after the data has started to stall, but presumably at some point we'd expect it to become true when more data was available in the buffer from the source?

First, available() returns the number of bytes available to read, not true or false.

Second, this really looks like a failure on the part of the server to send all the data.

Apologies, you're right (although technically I think we can both agree that available() returns something that can be considered 'truthy').

However, I'm pretty sure that it's not the server stalling. I'll see if I can produce some Wireshark logs to demonstrate this, but anecdotally for the moment, what I see is the server and arduino sending packets and ACKing, but then the arduino fails to ACK, and the server retransmits the latest packet again and again until the connection is closed.

I'll try to get a concrete TCP log to hopefully demonstrate this.

Here's a wireshark log of what's going on between the arduino (192.168.1.17) and the server (192.168.1.22) when it seems to get stuck in the 'connected but never available' loop:

No.     Time        Source                Destination           Protocol Length Info
  75049 617.344130  192.168.1.17          192.168.1.22          TCP      60     [TCP Port numbers reused] fpitp > tram [SYN] Seq=0 Win=2048 Len=0 MSS=1460
  75050 617.344324  192.168.1.22          192.168.1.17          TCP      58     tram > fpitp [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460
  75051 617.344508  192.168.1.17          192.168.1.22          TCP      60     fpitp > tram [ACK] Seq=1 Ack=1 Win=2048 Len=0
  75053 617.345776  192.168.1.17          192.168.1.22          TCP      60     fpitp > tram [PSH, ACK] Seq=1 Ack=1 Win=2048 Len=4
  75054 617.345829  192.168.1.22          192.168.1.17          TCP      54     tram > fpitp [ACK] Seq=1 Ack=5 Win=65535 Len=0
  75055 617.345958  192.168.1.17          192.168.1.22          TCP      60     fpitp > tram [PSH, ACK] Seq=5 Ack=1 Win=2048 Len=1
  75056 617.345979  192.168.1.22          192.168.1.17          TCP      54     tram > fpitp [ACK] Seq=1 Ack=6 Win=65535 Len=0
  75057 617.346337  192.168.1.17          192.168.1.22          TCP      64     fpitp > tram [PSH, ACK] Seq=6 Ack=1 Win=2048 Len=10
  75058 617.346371  192.168.1.22          192.168.1.17          TCP      54     tram > fpitp [ACK] Seq=1 Ack=16 Win=65535 Len=0
  75059 617.346527  192.168.1.17          192.168.1.22          TCP      60     fpitp > tram [PSH, ACK] Seq=16 Ack=1 Win=2048 Len=1
  75060 617.346552  192.168.1.22          192.168.1.17          TCP      54     tram > fpitp [ACK] Seq=1 Ack=17 Win=65535 Len=0
  75061 617.346720  192.168.1.17          192.168.1.22          TCP      60     fpitp > tram [PSH, ACK] Seq=17 Ack=1 Win=2048 Len=1
  75062 617.346746  192.168.1.22          192.168.1.17          TCP      54     tram > fpitp [ACK] Seq=1 Ack=18 Win=65535 Len=0
  75063 617.347400  192.168.1.22          192.168.1.17          TCP      300    tram > fpitp [PSH, ACK] Seq=1 Ack=18 Win=65535 Len=246
  75064 617.347423  192.168.1.22          192.168.1.17          TCP      1514   tram > fpitp [ACK] Seq=247 Ack=18 Win=65535 Len=1460
  75091 617.549843  192.168.1.17          192.168.1.22          TCP      60     fpitp > tram [ACK] Seq=18 Ack=1707 Win=1286 Len=0
  75092 617.549900  192.168.1.22          192.168.1.17          TCP      1340   [TCP Window Full] tram > fpitp [ACK] Seq=1707 Ack=18 Win=65535 Len=1286
  75223 618.449705  192.168.1.22          192.168.1.17          TCP      1340   [TCP Window Full] [TCP Retransmission] tram > fpitp [ACK] Seq=1707 Ack=18 Win=65535 Len=1286
  75416 619.954817  192.168.1.22          192.168.1.17          TCP      1340   [TCP Window Full] [TCP Retransmission] tram > fpitp [ACK] Seq=1707 Ack=18 Win=65535 Len=1286
  75763 622.580803  192.168.1.22          192.168.1.17          TCP      1340   [TCP Window Full] [TCP Retransmission] tram > fpitp [ACK] Seq=1707 Ack=18 Win=65535 Len=1286
  76443 627.679946  192.168.1.22          192.168.1.17          TCP      1340   [TCP Window Full] [TCP Retransmission] tram > fpitp [ACK] Seq=1707 Ack=18 Win=65535 Len=1286
  77878 637.667428  192.168.1.22          192.168.1.17          TCP      1340   [TCP Window Full] [TCP Retransmission] tram > fpitp [ACK] Seq=1707 Ack=18 Win=65535 Len=1286
  79925 652.912510  192.168.1.22          192.168.1.17          TCP      1340   [TCP Window Full] [TCP Retransmission] tram > fpitp [ACK] Seq=1707 Ack=18 Win=65535 Len=1286
  85163 682.866999  192.168.1.22          192.168.1.17          TCP      1340   [TCP Window Full] [TCP Retransmission] tram > fpitp [ACK] Seq=1707 Ack=18 Win=65535 Len=1286

I think this demonstrates that the server is waiting for the Arduino to send an ACK packet.

I have not encountered any stream implementation where read() will return -1 if it is not the end of stream.

Welcome to the Wonderful World of Arduino!

If client.read() returns -1 (int, not char), that means "no characters available", not end of stream. That just means you have read everything in the packet you have received so far. It should be the same as
client.available() == 0

That does not mean the server is finished sending packets. It will close the connection when it is finished sending packets.

@lazyatom: Do you use Linux?

SurferTim:
@lazyatom: Do you use Linux?

Sorry, I didn't reply earlier. I'm using Mac OS X 10.7.3 (aka 'Lion') on a 64bit iMac.

doughboy:
Do you control the server sending the data (I think so since it is a local ip address), are you sure the server is not the one stalling?

I do; it's a very simple server but you're right to consider it. You can see it in the gist (Arduino Ethernet flaky transfer example · GitHub) - it's a very simple ruby script. However, my reading of the wireshark log above is that the server is waiting for a response from the arduino.

doughboy:
int length = 0;
while (client.connected()) {
while (client.available()) {
char c = client.read();
length++;
if (c== -1) {
client.stop();
}
}
}

I replaced my code with this (tweaked slightly - only length++ if c != -1), but then it seems to fail to connect after the first request:

4501: IP address: 192.168.1.17
4504: Connected.
6285: Total read: 8346
Total failures: 0/1
7286: Couldn't connect.
8288: Couldn't connect.
9288: Couldn't connect.
10290: Couldn't connect.
11290: Couldn't connect.
12292: Couldn't connect.
13292: Couldn't connect.
14294: Couldn't connect.
15294: Couldn't connect.
16295: Couldn't connect.
17296: Couldn't connect.

... obviously the next time loop() is run, the client.connect() is failing, but I'm not sure why moving client.stop() inside the while loops would cause this. I had assumed that after the call to client.stop(), the client would be in a good state to connect again.

Close enough to Linux. You might want to apply the fix for the "605 Bug" I mentioned above, just to be sure.
http://code.google.com/p/arduino/issues/detail?id=605

I have applied that fix to every release since v0022. It will be fixed in the new release, according to the Arduino code crew.

If it doesn't correct the problem, at least you will be ahead of the latest release! :slight_smile:

ok, Tim is right, I checked the Ethernet Library source and found this

  uint16_t ret = W5100.getRXReceivedSize(s);
  if ( ret == 0 )
  {
    // No data available.
    uint8_t status = W5100.readSnSR(s);
    if ( status == SnSR::LISTEN || status == SnSR::CLOSED || status == SnSR::CLOSE_WAIT )
    {
      // The remote end has closed its side of the connection, so this is the eof state
      ret = 0;
    }
    else
    {
      // The connection is still up, but there's no data waiting to be read
      ret = -1;
    }
  }

How about changing the test code to

 int length = 0;
    while (client.connected()) {
         while (client.available()) {
               char c = client.read();
               length++;
               if (c<= 0) {  //try c==0 also
                   client.stop();
               }
         }
   }

FWIW, I still think it is still true that a return of -1 means there is no more data to be read, forever. 0 just means the server disconnected or is not there anymore.

Because it could be sending a zero byte. Or a -1 byte (not int). When the server is finished, it will close the connection. This has not failed for me:

    while(client.connected()) {
       // the server will close the connection when it is finished sending packets
       while(client.available()) {
         // ...but not until this buffer is empty
         Serial.write(client.read());
       }
     }
    // close your end after the server closes its end
    client.stop();
    int length = 0;
    while (client.connected()) {
         while (client.available()) {
               char c = client.read();
               if (c<= 0) {  
                   client.stop();
               } else {
                     length++;
               }
         }
   }
   client.stop();

There is no problem receiving or sending any 8 bit values over the ethernet connection, including 0 and -1 (255). The client.read() function returns an int value. If the int value is less than 256, then it is a legitimate character. The "no characters available" is a -1 INT return (65535, not 255).

doughboy:

    int length = 0;

while (client.connected()) {
         while (client.available()) {
               char c = client.read();
              // here is your problem
              // what if this is a binary file, rather than text
               if (c<= 0) { 
                   client.stop();
               } else {
                     length++;
               }
         }
   }
   client.stop();

int length = 0;
    while (client.connected()) {
         while (client.available()) {
               int c = client.read();
               // happy now? :)
               if (c<= 0) {  
                   client.stop();
               } else {
                     length++;
               }
         }
   }
   client.stop();

I've applied the patch from Google Code Archive - Long-term storage for Google Code Project Hosting. to w5100.h (I'm pretty confident I edited the right file, because I missed out a semicolon the first time and nothing would compile ;))

SurferTim:
Because it could be sending a zero byte. Or a -1 byte (not int). When the server is finished, it will close the connection. This has not failed for me:

Alas, no joy. Here's the full 'SurferTim' sketch:

#include <SPI.h>
#include <Ethernet.h>

byte mac[] = { 0x90, 0xA2, 0xDA, 0x00, 0x86, 0x67 }; //physical mac address
EthernetClient client;

#define debug(a) Serial.print(millis()); Serial.print(": "); Serial.println(a);
#define debug2(a, b) Serial.print(millis()); Serial.print(": "); Serial.print(a); Serial.println(b);

void setup(){
  Serial.begin(9600);

  // start the Ethernet connection:
  if (Ethernet.begin(mac) == 0) {
    debug("DHCP Failed");
    // no point in carrying on, so do nothing forevermore:
    while(true);
  }
  // print your local IP address:
  debug2("IP address: ", Ethernet.localIP());
}

char* host = "192.168.1.22";
int port = 4567;
char *path = "/";

int totalRequestSize = 8346;
int failures = 0;
int requests = 0;

void loop() {
  if (client.connect(host, port)) {
    debug("Connected.");
    client.print("GET "); client.print(path); client.println(" HTTP/1.0\n"); 
    int length = 0;

    while(client.connected()) {
      debug("Still connected");
      // the server will close the connection when it is finished sending packets
      while(client.available()) {
        // ...but not until this buffer is empty
        Serial.write(client.read());
        length++;
      }
      debug("Done reading for now...");
    }
    // close your end after the server closes its end
    client.stop();
    
    requests++;
    if (length != totalRequestSize) {
      failures++;
    }
    debug2("Total read: ", length);
    Serial.print("Total failures: "); Serial.print(failures); Serial.print("/"); Serial.println(requests);
  } else {
    debug("Couldn't connect.");
  }
  delay(1000);
}

... and here's the output:

4501: IP address: 192.168.1.17
4705: Connected.
4707: Still connected
HTTP/1.1 200 OK 
X-Frame-Options: sameorigin
X-Xss-Protection: 1; mode=block
Content-Type: text/html;charset=utf-8
Content-Length: 8100
Server: WEBrick/1.3.1 (Ruby/1.9.3/2011-10-30)
Date: Wed, 07 Mar 2012 18:26:43 GMT
Connection: close

################################################################################
################################################################################
<snip>
################################################################################
################################################################################
################################################################################
13361: Done reading for now...
13394: Total read: 8346
Total failures: 0/1
14443: Connected.
14445: Still connected
HTTP/1.1 200 OK 
X-Frame-Options: sameorigin
X-Xss-Protection: 1; mode=block
Content-Type: text/html;charset=utf-8
Content-Length: 8100
Server: WEBrick/1.3.1 (Ruby/1.9.3/2011-10-30)
Date: Wed, 07 Mar 2012 18:26:52 GMT
Connection: close

################################################################################
################################################################################
<snip>
################################################################################
################################################################################
################################################################################
23100: Done reading for now...
23135: Total read: 8346
Total failures: 0/2
24182: Connected.
24184: Still connected
HTTP/1.1 200 OK 
X-Frame-Options: sameorigin
X-Xss-Protection: 1; mode=block
Content-Type: text/html;charset=utf-8
Content-Length: 8100
Server: WEBrick/1.3.1 (Ruby/1.9.3/2011-10-30)
Date: Wed, 07 Mar 2012 18:27:02 GMT
Connection: close

################################################################################
################################################################################
################################################################################
<snip>
################################################################################
################################################################################
################################################################################
32840: Done reading for now...
32874: Total read: 8346
Total failures: 0/3
33923: Connected.
33924: Still connected
33925: Done reading for now...
33933: Still connected
HTTP/1.1 200 OK 
X-Frame-Options: sameorigin
X-Xss-Protection: 1; mode=block
Content-Type: text/html;charset=utf-8
Content-Length: 8100
Server: WEBrick/1.3.1 (Ruby/1.9.3/2011-10-30)
Date: Wed, 07 Mar 2012 18:27:12 GMT
Connection: close

################################################################################
################################################################################
################################################################################
<snip>
################################################################################
################################################################################
################################################################################
42638: Done reading for now...
42672: Total read: 8346
Total failures: 0/4
43720: Connected.
43722: Still connected
HTTP/1.1 200 OK 
X-Frame-Options: sameorigin
X-Xss-Protection: 1; mode=block
Content-Type: text/html;charset=utf-8
Content-Length: 8100
Server: WEBrick/1.3.1 (Ruby/1.9.3/2011-10-30)
Date: Wed, 07 Mar 2012 18:27:22 GMT
Connection: close

################################################################################
################################################################################
################################################################################
<snip>
################################################################################
################################################################################
################################################################################
############################46257: Done reading for now...
46290: Still connected
46315: Done reading for now...
46349: Still connected
46373: Done reading for now...
46407: Still connected
46432: Done reading for now...
46466: Still connected
46490: Done reading for now...
46523: Still connected
46548: Done reading for now...
46581: Still connected
46607: Done reading for now...
46640: Still connected
46665: Done reading for now...
46698: Still connected
46723: Done reading for now...
46756: Still connected
46781: Done reading for now...
46815: Still connected
46839: Done reading for now...
46873: Still connected
<snip>
72265: Still connected
72291: Done reading for now...
72324: Still connected
72348: Done reading for now...
72382: Still connected
72407: Done reading for now...
72440: Still connected
72465: Done reading for now...
72499: Still connected
72523: Done reading for now...
72557: Still connected
72582: Done reading for now...
72614: Still connected
72640: Done reading for now...

As you can see, at some point it just stalls; I left it for around 25 seconds before pulling the plug. I'll paste the results of the 'doughboy' test implementation next (but it's the same story, basically)

Here's the corresponding 'doughboy' implementation, using int not char, and with Serial output to make the logging comparable:

#include <SPI.h>
#include <Ethernet.h>

byte mac[] = { 0x90, 0xA2, 0xDA, 0x00, 0x86, 0x67 }; //physical mac address
EthernetClient client;

#define debug(a) Serial.print(millis()); Serial.print(": "); Serial.println(a);
#define debug2(a, b) Serial.print(millis()); Serial.print(": "); Serial.print(a); Serial.println(b);

void setup(){
  Serial.begin(9600);

  // start the Ethernet connection:
  if (Ethernet.begin(mac) == 0) {
    debug("DHCP Failed");
    // no point in carrying on, so do nothing forevermore:
    while(true);
  }
  // print your local IP address:
  debug2("IP address: ", Ethernet.localIP());
}

char* host = "192.168.1.22";
int port = 4567;
char *path = "/";

int totalRequestSize = 8346;
int failures = 0;
int requests = 0;

void loop() {
  if (client.connect(host, port)) {
    debug("Connected.");
    client.print("GET "); client.print(path); client.println(" HTTP/1.0\n"); 

    int length = 0;
    while (client.connected()) {
      debug("Still connected");
         while (client.available()) {
               int c = client.read(); // Yes, int! :)
               if (c<= 0) {  
                   client.stop();
               } else {
                 Serial.write(c);
                 length++;
               }
         }
      debug("Done reading for now...");
   }
   client.stop();
       
    requests++;
    if (length != totalRequestSize) {
      failures++;
    }
    debug2("Total read: ", length);
    Serial.print("Total failures: "); Serial.print(failures); Serial.print("/"); Serial.println(requests);
  } else {
    debug("Couldn't connect.");
  }
  delay(1000);
}

... and the serial monitor output:

4501: IP address: 192.168.1.17
4504: Connected.
4505: Still connected
HTTP/1.1 200 OK 
X-Frame-Options: sameorigin
X-Xss-Protection: 1; mode=block
Content-Type: text/html;charset=utf-8
Content-Length: 8100
Server: WEBrick/1.3.1 (Ruby/1.9.3/2011-10-30)
Date: Wed, 07 Mar 2012 18:41:16 GMT
Connection: close

################################################################################
################################################################################
################################################################################
<snip>
################################################################################
################################################################################
################################################################################
##############################################################################13102: Done reading for now...
13135: Still connected
13160: Done reading for now...
13194: Still connected
13218: Done reading for now...
13252: Still connected
13277: Done reading for now...
13310: Still connected
13335: Done reading for now...
13368: Still connected
13393: Done reading for now...
13426: Still connected
13452: Done reading for now...
<snip>
30840: Still connected
30865: Done reading for now...
30899: Still connected
30923: Done reading for now...
30957: Still connected
30982: Done reading for now...
31015: Still connected

FWIW, via Wireshark I'm still seeing the same TCP Retransmission behaviour as always from the server; in this most recent case, the Ardunio ACKs at 8069 bytes (so achingly close the full payload!), then the server sends another packet back with 194 further bytes, and ends up retransmitting the same packet ten times before I pull the plug (...in other words, I'm still relatively confident that it's not the server).

I should also say again, I really do appreciate you all taking the time to help me with this headscratcher.

@doughboy: Did you go to U.C. Berkeley? -1 int means you have not received the next packet yet. NOT the last packet. The server will close the connection when it is finished.

You can think anything you want. You can try to close the connection any way you want. My stuff works, and I have plenty of spare time to wait while you figure that out. :smiley:

@lazyatom: Normally, that corrects the problem. I'll look at your code.

SurferTim:
@lazyatom: Normally, that corrects the problem. I'll look at your code.

Thanks; if you want to replicate the server, you can find it in the gist I posted earlier: Arduino Ethernet flaky transfer example · GitHub

You'll need ruby and the 'sinatra' rubygem; running 'ruby server.rb' (if that's what you called it) will bring up a server process on port 4567.

I don't think the patch is related to your issue, because you are getting available==0 and not > 1024.

I suggest just try a different server.

I'll go with doughboy on that one. Try a standard http server that returns something you can read. It may be your server.

But if you are seeing this on your serial display, that is not printing all this because "client.available() == 0". It is returning a non-zero number when it shouldn't be. If it were returning zero, it would not have printed any of this.

################################################################################
################################################################################
################################################################################

################################################################################

ok, you can skip the test for 0 and -1 then. This is really a non standard implementation of stream read.

this is from the library source

int EthernetClient::read() {
  uint8_t b;
  if ( recv(_sock, &b, 1) > 0 )
  {
    // recv worked
    return b;
  }
  else
  {
    // No data available
    return -1;
  }
}

doughboy:
ok, you can skip the test for 0 and -1 then. This is really a non standard implementation of stream read.

this is from the library source

int EthernetClient::read() {

uint8_t b;
 if ( recv(_sock, &b, 1) > 0 )
 {
   // recv worked
   return b;
 }
 else
 {
   // No data available
   return -1;
 }
}

Stick around a while, doughboy. I could use the help! :slight_smile: