Pages: [1] 2 3   Go Down
Author Topic: Ethernet data transfer stalling or failing  (Read 9555 times)
0 Members and 1 Guest are viewing this topic.
Offline Offline
Newbie
*
Karma: 0
Posts: 14
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

I've been working on a sketch wherein some data is downloaded from an HTTP server and is then processed on the Arduino (printed, as it happens, but I don't think that's important). In my original sketches, I was occasionally seeing transfers fail midway through.

The original sketch was more complex (the data was being cached to a file on an SD card, and there was code which attempted to handle pauses in the data), but upon whittling the sketch down to a minimal skeleton, I am still seeing around 10% of the transfers fail to download all of the data.

I've attached the sketch and serial monitor log to a gist - https://gist.github.com/1994216 - and I've included the sketch and a snippet of the log here:

Code:
#include <SPI.h>
#include <Ethernet.h>

byte mac[] = { 0x90, 0xA2, 0xDA, 0x00, 0x86, 0x67 }; //physical mac address
EthernetClient client;

#define debug(a) Serial.print(millis()); Serial.print(": "); Serial.println(a);
#define debug2(a, b) Serial.print(millis()); Serial.print(": "); Serial.print(a); Serial.println(b);

void setup(){
  Serial.begin(9600);

  // start the Ethernet connection:
  if (Ethernet.begin(mac) == 0) {
    debug("DHCP Failed");
    // no point in carrying on, so do nothing forevermore:
    while(true);
  }
  // print your local IP address:
  debug2("IP address: ", Ethernet.localIP());
}

char* host = "192.168.1.22";
int port = 4567;
char *path = "/";

int totalRequestSize = 8346;
int failures = 0;
int requests = 0;

void loop() {
  if (client.connect(host, port)) {
    debug("Connected.");
    client.print("GET "); client.print(path); client.println(" HTTP/1.0\n");
    int length = 0;
    while(client.connected() && !client.available()) delay(1); //waits for data
    while(client.available()) {
      client.read();
      length++;
    }
    client.stop();
    requests++;
    if (length != totalRequestSize) {
      failures++;
    }
    debug2("Total read: ", length);
    Serial.print("Total failures: "); Serial.print(failures); Serial.print("/"); Serial.println(requests);
  } else {
    debug("Couldn't connect.");
  }
}

Code:
170569: Connected.
172350: Total read: 8346
Total failures: 6/93
172555: Connected.
174336: Total read: 8346
Total failures: 6/94
174339: Connected.
175706: Total read: 1706
Total failures: 7/95
175709: Connected.
177753: Total read: 4889
Total failures: 8/96
177756: Connected.
179536: Total read: 8346
Total failures: 8/97
179539: Connected.
181321: Total read: 8346
Total failures: 8/98
181324: Connected.
182976: Total read: 7740
Total failures: 9/99
182979: Connected.
184761: Total read: 8346
Total failures: 9/100
184764: Connected.
186546: Total read: 8346
Total failures: 9/101
186549: Connected.
188190: Total read: 2989
Total failures: 10/102
188193: Connected.
189976: Total read: 8346
Total failures: 10/103
189979: Connected.
191762: Total read: 8346
Total failures: 10/104

The data which is being downloaded is simply an ASCII file for 100 rows of 80 '#' characters. As you can see in the logs above, most requests download 8346 bytes (the response headers + the ~8000 byte file), but intermittently there are requests where the data terminates earlier at any point in the stream.

It's running on an Arduino Uno and a recent Ethernet shield (including an SD card slot), using the Arduino 1.0 IDE.

As you can imagine, this is quite frustrating, as I'd like this system to run without much intervention. As I mentioned above, earlier versions of the sketch included code to wait for a certain period of time if there was no data available, checking periodically to see if the data can be resumed before finally closing the connection, but I still see the behaviour with that logic present.

Does anyone have any ideas about what might be going on, or even how I can make this sketch simpler to try and isolate where the issue might be?
Logged

Miramar Beach, Florida
Offline Offline
Faraday Member
**
Karma: 147
Posts: 6038
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

This code may not work like you think. It is not waiting for any new packets.
Code:
   while(client.connected() && !client.available()) delay(1); //waits for data
    while(client.available()) {
      client.read();
      length++;
    }
    client.stop();

This will do better. Replace that with this:
Code:
   while(client.connected()) {
       // the server will close the connection when it is finished sending packets
       while(client.available()) {
         // ...but not until this buffer is empty
         client.read();
         length++;
       }
     }
    // close your end after the server closes its end
    client.stop();

edit: My bad. I forgot a bracket.
« Last Edit: March 07, 2012, 11:59:15 am by SurferTim » Logged

Offline Offline
Newbie
*
Karma: 0
Posts: 14
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

Thanks for that pointer. However, I believe the stalling behaviour still exists (and unfortunately with your change it's harder to see). Here's the output in serial monitor with your change in place:

Code:

4501: IP address: 192.168.1.17
4504: Connected.
6284: Total read: 8346
Total failures: 0/1
6489: Connected.
8268: Total read: 8346
Total failures: 0/2
8271: Connected.
10052: Total read: 8346
Total failures: 0/3
10055: Connected.
11836: Total read: 8346
Total failures: 0/4
11839: Connected.
13620: Total read: 8346
Total failures: 0/5
13623: Connected.

... at which point it just waits. I believe it's now stuck within the loop. In fact, with a few tweaks I think I can demonstrate that it's true. If I tweak your replacement slightly:

Code:
    while(client.connected()) {
       debug("Still connected");
       // the server will close the connection when it is finished sending packets
       while(client.available()) {
         // ...but not until this buffer is empty
         client.read();
         length++;
       }
       debug("Done reading for now...");
     }
    // close your end after the server closes its end
    client.stop();

then I see this output in serial monitor:

Code:
13834: Connected.
13836: Still connected
13837: Done reading for now...
13845: Still connected
15647: Done reading for now...
15649: Total read: 8346
Total failures: 0/6
15662: Connected.
15682: Still connected
17482: Done reading for now...
17484: Total read: 8346
Total failures: 0/7
17498: Connected.
17517: Still connected
19318: Done reading for now...
19320: Total read: 8346
Total failures: 0/8
19334: Connected.
19353: Still connected
19995: Done reading for now...
19996: Still connected
19996: Done reading for now...
20019: Still connected
20044: Done reading for now...
20077: Still connected
20102: Done reading for now...
20135: Still connected
20160: Done reading for now...
20194: Still connected
20218: Done reading for now...
20252: Still connected
20277: Done reading for now...
20311: Still connected
20335: Done reading for now...
20368: Still connected
20393: Done reading for now...
20426: Still connected
20452: Done reading for now...
20485: Still connected
20510: Done reading for now...
20543: Still connected
20568: Done reading for now...
20601: Still connected
20626: Done reading for now...
20660: Still connected
20684: Done reading for now...
20718: Still connected
20743: Done reading for now...
20776: Still connected
20801: Done reading for now...

(and so on).

Any ideas?
Logged

Miramar Beach, Florida
Offline Offline
Faraday Member
**
Karma: 147
Posts: 6038
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

It must be receiving characters there. Why don't you take a look?

Code:
      while(client.available()) {
         // ...but not until this buffer is empty
         Serial.write(client.read());
         length++;
       }

edit: You might also want to disable the SD SPI interface on your shield. That caused me some grief like that.
Code:
void setup()
{
   Serial.begin(9600);
   pinMode(4,OUTPUT);
   digitalWrite(4,HIGH);
   // rest of setup
}
« Last Edit: March 07, 2012, 12:19:35 pm by SurferTim » Logged

SF Bay area
Offline Offline
Sr. Member
****
Karma: 3
Posts: 268
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

isn't it that with streams, only reading a value of -1  tells you it is at the end of stream?

your code is only relying on client.available()

source can always send partially filled buffers, in which case, available will return false, but that does not mean it is the end of stream.
Logged

Offline Offline
Newbie
*
Karma: 0
Posts: 14
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

I think at the point it gets stuck it in the loop, it's not really receiving anything - it's just that 'connected()' is true, but 'available()' is false, and seems to stay that way. Here's the serial monitor log with the extra logging you suggest (and slightly trimmed for your scrolling benefit):

Code:
4501: IP address: 192.168.1.17
4504: Connected.
4505: Still connected
HTTP/1.1 200 OK
X-Frame-Options: sameorigin
X-Xss-Protection: 1; mode=block
Content-Type: text/html;charset=utf-8
Content-Length: 8100
Server: WEBrick/1.3.1 (Ruby/1.9.3/2011-10-30)
Date: Wed, 07 Mar 2012 17:19:08 GMT
Connection: close

################################################################################
################################################################################
################################################################################
<...94 identical rows snipped for brevity...>
################################################################################
################################################################################
################################################################################
13190: Done reading for now...
13223: Total read: 8346
Total failures: 0/1
13473: Connected.
13475: Still connected
HTTP/1.1 200 OK
X-Frame-Options: sameorigin
X-Xss-Protection: 1; mode=block
Content-Type: text/html;charset=utf-8
Content-Length: 8100
Server: WEBrick/1.3.1 (Ruby/1.9.3/2011-10-30)
Date: Wed, 07 Mar 2012 17:19:17 GMT
Connection: close

################################################################################
################################################################################
################################################################################
<... 42 rows snipped for brevity ...>
################################################################################
################################################################################
###########################################################################17828: Done reading for now...
17862: Still connected
17887: Done reading for now...
17921: Still connected
17945: Done reading for now...
17978: Still connected
18003: Done reading for now...
18036: Still connected
18062: Done reading for now...
<... another snip, but you can see that it's still waiting 30 seconds later... >
46516: Still connected
46541: Done reading for now...
46574: Still connected
46599: Done reading for now...
46632: Still connected
46657: Done reading for now...
46691: Still connected
46715: Done reading for now...
46749: Still connected
46774: Done reading for now...
46808: Still connected
46832: Done reading for now...
46865: Still connected
46891: Done reading for now...
46923: Still connected

Thanks for taking the time to help, I definitely appreciate it. Do you have any other things I might try to isolate what's going wrong?
Logged

Offline Offline
Newbie
*
Karma: 0
Posts: 14
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

isn't it that with streams, only reading a value of -1  tells you it is at the end of stream?

your code is only relying on client.available()

source can always send partially filled buffers, in which case, available will return false, but that does not mean it is the end of stream.

That's interesting - so if I understand correctly, you're saying that I should really be checking whether or not read() returns -1? What confuses me is why available() seems to never return true after the data has started to stall, but presumably at some point we'd expect it to become true when more data was available in the buffer from the source?
Logged

Seattle, WA USA
Offline Offline
Brattain Member
*****
Karma: 615
Posts: 49388
Seattle, WA USA
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Quote
What confuses me is why available() seems to never return true after the data has started to stall, but presumably at some point we'd expect it to become true when more data was available in the buffer from the source?
First, available() returns the number of bytes available to read, not true or false.

Second, this really looks like a failure on the part of the server to send all the data.
Logged

Miramar Beach, Florida
Offline Offline
Faraday Member
**
Karma: 147
Posts: 6038
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

isn't it that with streams, only reading a value of -1  tells you it is at the end of stream?
Not in this case. if client.read() returns -1, that means "no characters available".

So it locks up in the "while(client.available)" loop?

I don't know if this is your problem, but if you use Linux, you may suffer from this malady. It causes "client.available()" to return non-zero values when it shouldn't.
http://code.google.com/p/arduino/issues/detail?id=605
Logged

SF Bay area
Offline Offline
Sr. Member
****
Karma: 3
Posts: 268
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

yes available() tells you how many bytes are in buffer for reading, but it can be used in if statement to evaluate to true (any non zero value) or false (0 value).

anyway, you should never rely on available to determine it is the end of stream.

Do you control the server sending the data (I think so since it is a local ip address), are you sure the server is not the one stalling?

    int length = 0;
    while (client.connected()) {
         while (client.available()) {
               char c = client.read();
               length++;
               if (c== -1) {
                   client.stop();
               }
         }
   }


stream read() is a blocking operation, if there is nothing to read, it blocks and stays there until it comes back with data or returns -1 if there is no more data/end of stream.  I have not encountered any stream implementation where read() will return -1 if it is not the end of stream.

BTW, you are right that if the server is the one stalling, by not sending more data and by not disconnecting, then available will always be false and connected will always be true, and you will be stuck in the loop.

try the above code. I highly doubt this is a hardware issue. if your server is truly not stalling, then I see absolutely no reason why the above code won't work.
« Last Edit: March 07, 2012, 12:54:44 pm by doughboy » Logged

Offline Offline
Newbie
*
Karma: 0
Posts: 14
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

Quote
What confuses me is why available() seems to never return true after the data has started to stall, but presumably at some point we'd expect it to become true when more data was available in the buffer from the source?
First, available() returns the number of bytes available to read, not true or false.

Second, this really looks like a failure on the part of the server to send all the data.

Apologies, you're right (although technically I think we can both agree that available() returns something that can be considered 'truthy').

However, I'm pretty sure that it's not the server stalling. I'll see if I can produce some Wireshark logs to demonstrate this, but anecdotally for the moment, what I see is the server and arduino sending packets and ACKing, but then the arduino fails to ACK, and the server retransmits the latest packet again and again until the connection is closed.

I'll try to get a concrete TCP log to hopefully demonstrate this.
Logged

Offline Offline
Newbie
*
Karma: 0
Posts: 14
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

Here's a wireshark log of what's going on between the arduino (192.168.1.17) and the server (192.168.1.22) when it seems to get stuck in the 'connected but never available' loop:

Code:

No.     Time        Source                Destination           Protocol Length Info
  75049 617.344130  192.168.1.17          192.168.1.22          TCP      60     [TCP Port numbers reused] fpitp > tram [SYN] Seq=0 Win=2048 Len=0 MSS=1460
  75050 617.344324  192.168.1.22          192.168.1.17          TCP      58     tram > fpitp [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460
  75051 617.344508  192.168.1.17          192.168.1.22          TCP      60     fpitp > tram [ACK] Seq=1 Ack=1 Win=2048 Len=0
  75053 617.345776  192.168.1.17          192.168.1.22          TCP      60     fpitp > tram [PSH, ACK] Seq=1 Ack=1 Win=2048 Len=4
  75054 617.345829  192.168.1.22          192.168.1.17          TCP      54     tram > fpitp [ACK] Seq=1 Ack=5 Win=65535 Len=0
  75055 617.345958  192.168.1.17          192.168.1.22          TCP      60     fpitp > tram [PSH, ACK] Seq=5 Ack=1 Win=2048 Len=1
  75056 617.345979  192.168.1.22          192.168.1.17          TCP      54     tram > fpitp [ACK] Seq=1 Ack=6 Win=65535 Len=0
  75057 617.346337  192.168.1.17          192.168.1.22          TCP      64     fpitp > tram [PSH, ACK] Seq=6 Ack=1 Win=2048 Len=10
  75058 617.346371  192.168.1.22          192.168.1.17          TCP      54     tram > fpitp [ACK] Seq=1 Ack=16 Win=65535 Len=0
  75059 617.346527  192.168.1.17          192.168.1.22          TCP      60     fpitp > tram [PSH, ACK] Seq=16 Ack=1 Win=2048 Len=1
  75060 617.346552  192.168.1.22          192.168.1.17          TCP      54     tram > fpitp [ACK] Seq=1 Ack=17 Win=65535 Len=0
  75061 617.346720  192.168.1.17          192.168.1.22          TCP      60     fpitp > tram [PSH, ACK] Seq=17 Ack=1 Win=2048 Len=1
  75062 617.346746  192.168.1.22          192.168.1.17          TCP      54     tram > fpitp [ACK] Seq=1 Ack=18 Win=65535 Len=0
  75063 617.347400  192.168.1.22          192.168.1.17          TCP      300    tram > fpitp [PSH, ACK] Seq=1 Ack=18 Win=65535 Len=246
  75064 617.347423  192.168.1.22          192.168.1.17          TCP      1514   tram > fpitp [ACK] Seq=247 Ack=18 Win=65535 Len=1460
  75091 617.549843  192.168.1.17          192.168.1.22          TCP      60     fpitp > tram [ACK] Seq=18 Ack=1707 Win=1286 Len=0
  75092 617.549900  192.168.1.22          192.168.1.17          TCP      1340   [TCP Window Full] tram > fpitp [ACK] Seq=1707 Ack=18 Win=65535 Len=1286
  75223 618.449705  192.168.1.22          192.168.1.17          TCP      1340   [TCP Window Full] [TCP Retransmission] tram > fpitp [ACK] Seq=1707 Ack=18 Win=65535 Len=1286
  75416 619.954817  192.168.1.22          192.168.1.17          TCP      1340   [TCP Window Full] [TCP Retransmission] tram > fpitp [ACK] Seq=1707 Ack=18 Win=65535 Len=1286
  75763 622.580803  192.168.1.22          192.168.1.17          TCP      1340   [TCP Window Full] [TCP Retransmission] tram > fpitp [ACK] Seq=1707 Ack=18 Win=65535 Len=1286
  76443 627.679946  192.168.1.22          192.168.1.17          TCP      1340   [TCP Window Full] [TCP Retransmission] tram > fpitp [ACK] Seq=1707 Ack=18 Win=65535 Len=1286
  77878 637.667428  192.168.1.22          192.168.1.17          TCP      1340   [TCP Window Full] [TCP Retransmission] tram > fpitp [ACK] Seq=1707 Ack=18 Win=65535 Len=1286
  79925 652.912510  192.168.1.22          192.168.1.17          TCP      1340   [TCP Window Full] [TCP Retransmission] tram > fpitp [ACK] Seq=1707 Ack=18 Win=65535 Len=1286
  85163 682.866999  192.168.1.22          192.168.1.17          TCP      1340   [TCP Window Full] [TCP Retransmission] tram > fpitp [ACK] Seq=1707 Ack=18 Win=65535 Len=1286

I think this demonstrates that the server is waiting for the Arduino to send an ACK packet.
Logged

Miramar Beach, Florida
Offline Offline
Faraday Member
**
Karma: 147
Posts: 6038
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Quote
I have not encountered any stream implementation where read() will return -1 if it is not the end of stream.
Welcome to the Wonderful World of Arduino!

If client.read() returns -1 (int, not char), that means "no characters available", not end of stream. That just means you have read everything in the packet you have received so far. It should be the same as
client.available() == 0

That does not mean the server is finished sending packets. It will close the connection when it is finished sending packets.

@lazyatom: Do you use Linux?
Logged

Offline Offline
Newbie
*
Karma: 0
Posts: 14
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

@lazyatom: Do you use Linux?

Sorry, I didn't reply earlier. I'm using Mac OS X 10.7.3 (aka 'Lion') on a 64bit iMac.
Logged

Offline Offline
Newbie
*
Karma: 0
Posts: 14
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

Do you control the server sending the data (I think so since it is a local ip address), are you sure the server is not the one stalling?

I do; it's a very simple server but you're right to consider it. You can see it in the gist (https://gist.github.com/1994216) - it's a very simple ruby script. However, my reading of the wireshark log above is that the server is waiting for a response from the arduino.


   int length = 0;
    while (client.connected()) {
         while (client.available()) {
               char c = client.read();
               length++;
               if (c== -1) {
                   client.stop();
               }
         }
   }

I replaced my code with this (tweaked slightly - only length++ if c != -1), but then it seems to fail to connect after the first request:

Code:
4501: IP address: 192.168.1.17
4504: Connected.
6285: Total read: 8346
Total failures: 0/1
7286: Couldn't connect.
8288: Couldn't connect.
9288: Couldn't connect.
10290: Couldn't connect.
11290: Couldn't connect.
12292: Couldn't connect.
13292: Couldn't connect.
14294: Couldn't connect.
15294: Couldn't connect.
16295: Couldn't connect.
17296: Couldn't connect.

... obviously the next time loop() is run, the client.connect() is failing, but I'm not sure why moving client.stop() inside the while loops would cause this. I had assumed that after the call to client.stop(), the client would be in a good state to connect again.
Logged

Pages: [1] 2 3   Go Up
Jump to: