Offline
Newbie
Karma: 0
Posts: 14
|
 |
« on: March 07, 2012, 11:48:10 am » |
I've been working on a sketch wherein some data is downloaded from an HTTP server and is then processed on the Arduino (printed, as it happens, but I don't think that's important). In my original sketches, I was occasionally seeing transfers fail midway through. The original sketch was more complex (the data was being cached to a file on an SD card, and there was code which attempted to handle pauses in the data), but upon whittling the sketch down to a minimal skeleton, I am still seeing around 10% of the transfers fail to download all of the data. I've attached the sketch and serial monitor log to a gist - https://gist.github.com/1994216 - and I've included the sketch and a snippet of the log here: #include <SPI.h> #include <Ethernet.h>
byte mac[] = { 0x90, 0xA2, 0xDA, 0x00, 0x86, 0x67 }; //physical mac address EthernetClient client;
#define debug(a) Serial.print(millis()); Serial.print(": "); Serial.println(a); #define debug2(a, b) Serial.print(millis()); Serial.print(": "); Serial.print(a); Serial.println(b);
void setup(){ Serial.begin(9600);
// start the Ethernet connection: if (Ethernet.begin(mac) == 0) { debug("DHCP Failed"); // no point in carrying on, so do nothing forevermore: while(true); } // print your local IP address: debug2("IP address: ", Ethernet.localIP()); }
char* host = "192.168.1.22"; int port = 4567; char *path = "/";
int totalRequestSize = 8346; int failures = 0; int requests = 0;
void loop() { if (client.connect(host, port)) { debug("Connected."); client.print("GET "); client.print(path); client.println(" HTTP/1.0\n"); int length = 0; while(client.connected() && !client.available()) delay(1); //waits for data while(client.available()) { client.read(); length++; } client.stop(); requests++; if (length != totalRequestSize) { failures++; } debug2("Total read: ", length); Serial.print("Total failures: "); Serial.print(failures); Serial.print("/"); Serial.println(requests); } else { debug("Couldn't connect."); } }
170569: Connected. 172350: Total read: 8346 Total failures: 6/93 172555: Connected. 174336: Total read: 8346 Total failures: 6/94 174339: Connected. 175706: Total read: 1706 Total failures: 7/95 175709: Connected. 177753: Total read: 4889 Total failures: 8/96 177756: Connected. 179536: Total read: 8346 Total failures: 8/97 179539: Connected. 181321: Total read: 8346 Total failures: 8/98 181324: Connected. 182976: Total read: 7740 Total failures: 9/99 182979: Connected. 184761: Total read: 8346 Total failures: 9/100 184764: Connected. 186546: Total read: 8346 Total failures: 9/101 186549: Connected. 188190: Total read: 2989 Total failures: 10/102 188193: Connected. 189976: Total read: 8346 Total failures: 10/103 189979: Connected. 191762: Total read: 8346 Total failures: 10/104
The data which is being downloaded is simply an ASCII file for 100 rows of 80 '#' characters. As you can see in the logs above, most requests download 8346 bytes (the response headers + the ~8000 byte file), but intermittently there are requests where the data terminates earlier at any point in the stream. It's running on an Arduino Uno and a recent Ethernet shield (including an SD card slot), using the Arduino 1.0 IDE. As you can imagine, this is quite frustrating, as I'd like this system to run without much intervention. As I mentioned above, earlier versions of the sketch included code to wait for a certain period of time if there was no data available, checking periodically to see if the data can be resumed before finally closing the connection, but I still see the behaviour with that logic present. Does anyone have any ideas about what might be going on, or even how I can make this sketch simpler to try and isolate where the issue might be?
|
|
|
|
|
Logged
|
|
|
|
|
Miramar Beach, Florida
Offline
Faraday Member
Karma: 50
Posts: 3448
|
 |
« Reply #1 on: March 07, 2012, 11:56:45 am » |
This code may not work like you think. It is not waiting for any new packets. while(client.connected() && !client.available()) delay(1); //waits for data while(client.available()) { client.read(); length++; } client.stop();
This will do better. Replace that with this: while(client.connected()) { // the server will close the connection when it is finished sending packets while(client.available()) { // ...but not until this buffer is empty client.read(); length++; } } // close your end after the server closes its end client.stop();
edit: My bad. I forgot a bracket.
|
|
|
|
« Last Edit: March 07, 2012, 11:59:15 am by SurferTim »
|
Logged
|
|
|
|
|
Offline
Newbie
Karma: 0
Posts: 14
|
 |
« Reply #2 on: March 07, 2012, 12:10:27 pm » |
Thanks for that pointer. However, I believe the stalling behaviour still exists (and unfortunately with your change it's harder to see). Here's the output in serial monitor with your change in place: 4501: IP address: 192.168.1.17 4504: Connected. 6284: Total read: 8346 Total failures: 0/1 6489: Connected. 8268: Total read: 8346 Total failures: 0/2 8271: Connected. 10052: Total read: 8346 Total failures: 0/3 10055: Connected. 11836: Total read: 8346 Total failures: 0/4 11839: Connected. 13620: Total read: 8346 Total failures: 0/5 13623: Connected.
... at which point it just waits. I believe it's now stuck within the loop. In fact, with a few tweaks I think I can demonstrate that it's true. If I tweak your replacement slightly: while(client.connected()) { debug("Still connected"); // the server will close the connection when it is finished sending packets while(client.available()) { // ...but not until this buffer is empty client.read(); length++; } debug("Done reading for now..."); } // close your end after the server closes its end client.stop();
then I see this output in serial monitor: 13834: Connected. 13836: Still connected 13837: Done reading for now... 13845: Still connected 15647: Done reading for now... 15649: Total read: 8346 Total failures: 0/6 15662: Connected. 15682: Still connected 17482: Done reading for now... 17484: Total read: 8346 Total failures: 0/7 17498: Connected. 17517: Still connected 19318: Done reading for now... 19320: Total read: 8346 Total failures: 0/8 19334: Connected. 19353: Still connected 19995: Done reading for now... 19996: Still connected 19996: Done reading for now... 20019: Still connected 20044: Done reading for now... 20077: Still connected 20102: Done reading for now... 20135: Still connected 20160: Done reading for now... 20194: Still connected 20218: Done reading for now... 20252: Still connected 20277: Done reading for now... 20311: Still connected 20335: Done reading for now... 20368: Still connected 20393: Done reading for now... 20426: Still connected 20452: Done reading for now... 20485: Still connected 20510: Done reading for now... 20543: Still connected 20568: Done reading for now... 20601: Still connected 20626: Done reading for now... 20660: Still connected 20684: Done reading for now... 20718: Still connected 20743: Done reading for now... 20776: Still connected 20801: Done reading for now...
(and so on). Any ideas?
|
|
|
|
|
Logged
|
|
|
|
|
Miramar Beach, Florida
Offline
Faraday Member
Karma: 50
Posts: 3448
|
 |
« Reply #3 on: March 07, 2012, 12:15:59 pm » |
It must be receiving characters there. Why don't you take a look? while(client.available()) { // ...but not until this buffer is empty Serial.write(client.read()); length++; } edit: You might also want to disable the SD SPI interface on your shield. That caused me some grief like that. void setup() { Serial.begin(9600); pinMode(4,OUTPUT); digitalWrite(4,HIGH); // rest of setup }
|
|
|
|
« Last Edit: March 07, 2012, 12:19:35 pm by SurferTim »
|
Logged
|
|
|
|
|
SF Bay area
Offline
Full Member
Karma: 0
Posts: 156
|
 |
« Reply #4 on: March 07, 2012, 12:24:08 pm » |
isn't it that with streams, only reading a value of -1 tells you it is at the end of stream?
your code is only relying on client.available()
source can always send partially filled buffers, in which case, available will return false, but that does not mean it is the end of stream.
|
|
|
|
|
Logged
|
|
|
|
|
Offline
Newbie
Karma: 0
Posts: 14
|
 |
« Reply #5 on: March 07, 2012, 12:25:35 pm » |
I think at the point it gets stuck it in the loop, it's not really receiving anything - it's just that 'connected()' is true, but 'available()' is false, and seems to stay that way. Here's the serial monitor log with the extra logging you suggest (and slightly trimmed for your scrolling benefit): 4501: IP address: 192.168.1.17 4504: Connected. 4505: Still connected HTTP/1.1 200 OK X-Frame-Options: sameorigin X-Xss-Protection: 1; mode=block Content-Type: text/html;charset=utf-8 Content-Length: 8100 Server: WEBrick/1.3.1 (Ruby/1.9.3/2011-10-30) Date: Wed, 07 Mar 2012 17:19:08 GMT Connection: close
################################################################################ ################################################################################ ################################################################################ <...94 identical rows snipped for brevity...> ################################################################################ ################################################################################ ################################################################################ 13190: Done reading for now... 13223: Total read: 8346 Total failures: 0/1 13473: Connected. 13475: Still connected HTTP/1.1 200 OK X-Frame-Options: sameorigin X-Xss-Protection: 1; mode=block Content-Type: text/html;charset=utf-8 Content-Length: 8100 Server: WEBrick/1.3.1 (Ruby/1.9.3/2011-10-30) Date: Wed, 07 Mar 2012 17:19:17 GMT Connection: close
################################################################################ ################################################################################ ################################################################################ <... 42 rows snipped for brevity ...> ################################################################################ ################################################################################ ###########################################################################17828: Done reading for now... 17862: Still connected 17887: Done reading for now... 17921: Still connected 17945: Done reading for now... 17978: Still connected 18003: Done reading for now... 18036: Still connected 18062: Done reading for now... <... another snip, but you can see that it's still waiting 30 seconds later... > 46516: Still connected 46541: Done reading for now... 46574: Still connected 46599: Done reading for now... 46632: Still connected 46657: Done reading for now... 46691: Still connected 46715: Done reading for now... 46749: Still connected 46774: Done reading for now... 46808: Still connected 46832: Done reading for now... 46865: Still connected 46891: Done reading for now... 46923: Still connected
Thanks for taking the time to help, I definitely appreciate it. Do you have any other things I might try to isolate what's going wrong?
|
|
|
|
|
Logged
|
|
|
|
|
Offline
Newbie
Karma: 0
Posts: 14
|
 |
« Reply #6 on: March 07, 2012, 12:29:21 pm » |
isn't it that with streams, only reading a value of -1 tells you it is at the end of stream?
your code is only relying on client.available()
source can always send partially filled buffers, in which case, available will return false, but that does not mean it is the end of stream.
That's interesting - so if I understand correctly, you're saying that I should really be checking whether or not read() returns -1? What confuses me is why available() seems to never return true after the data has started to stall, but presumably at some point we'd expect it to become true when more data was available in the buffer from the source?
|
|
|
|
|
Logged
|
|
|
|
|
Seattle, WA USA
Offline
Brattain Member
Karma: 312
Posts: 35483
Seattle, WA USA
|
 |
« Reply #7 on: March 07, 2012, 12:31:53 pm » |
What confuses me is why available() seems to never return true after the data has started to stall, but presumably at some point we'd expect it to become true when more data was available in the buffer from the source? First, available() returns the number of bytes available to read, not true or false. Second, this really looks like a failure on the part of the server to send all the data.
|
|
|
|
|
Logged
|
|
|
|
|
Miramar Beach, Florida
Offline
Faraday Member
Karma: 50
Posts: 3448
|
 |
« Reply #8 on: March 07, 2012, 12:33:35 pm » |
isn't it that with streams, only reading a value of -1 tells you it is at the end of stream? Not in this case. if client.read() returns -1, that means "no characters available". So it locks up in the "while(client.available)" loop? I don't know if this is your problem, but if you use Linux, you may suffer from this malady. It causes "client.available()" to return non-zero values when it shouldn't. http://code.google.com/p/arduino/issues/detail?id=605
|
|
|
|
|
Logged
|
|
|
|
|
SF Bay area
Offline
Full Member
Karma: 0
Posts: 156
|
 |
« Reply #9 on: March 07, 2012, 12:47:29 pm » |
yes available() tells you how many bytes are in buffer for reading, but it can be used in if statement to evaluate to true (any non zero value) or false (0 value).
anyway, you should never rely on available to determine it is the end of stream.
Do you control the server sending the data (I think so since it is a local ip address), are you sure the server is not the one stalling?
int length = 0; while (client.connected()) { while (client.available()) { char c = client.read(); length++; if (c== -1) { client.stop(); } } }
stream read() is a blocking operation, if there is nothing to read, it blocks and stays there until it comes back with data or returns -1 if there is no more data/end of stream. I have not encountered any stream implementation where read() will return -1 if it is not the end of stream.
BTW, you are right that if the server is the one stalling, by not sending more data and by not disconnecting, then available will always be false and connected will always be true, and you will be stuck in the loop.
try the above code. I highly doubt this is a hardware issue. if your server is truly not stalling, then I see absolutely no reason why the above code won't work.
|
|
|
|
« Last Edit: March 07, 2012, 12:54:44 pm by doughboy »
|
Logged
|
|
|
|
|
Offline
Newbie
Karma: 0
Posts: 14
|
 |
« Reply #10 on: March 07, 2012, 12:48:02 pm » |
What confuses me is why available() seems to never return true after the data has started to stall, but presumably at some point we'd expect it to become true when more data was available in the buffer from the source? First, available() returns the number of bytes available to read, not true or false. Second, this really looks like a failure on the part of the server to send all the data. Apologies, you're right (although technically I think we can both agree that available() returns something that can be considered 'truthy'). However, I'm pretty sure that it's not the server stalling. I'll see if I can produce some Wireshark logs to demonstrate this, but anecdotally for the moment, what I see is the server and arduino sending packets and ACKing, but then the arduino fails to ACK, and the server retransmits the latest packet again and again until the connection is closed. I'll try to get a concrete TCP log to hopefully demonstrate this.
|
|
|
|
|
Logged
|
|
|
|
|
Offline
Newbie
Karma: 0
Posts: 14
|
 |
« Reply #11 on: March 07, 2012, 12:57:59 pm » |
Here's a wireshark log of what's going on between the arduino (192.168.1.17) and the server (192.168.1.22) when it seems to get stuck in the 'connected but never available' loop: No. Time Source Destination Protocol Length Info 75049 617.344130 192.168.1.17 192.168.1.22 TCP 60 [TCP Port numbers reused] fpitp > tram [SYN] Seq=0 Win=2048 Len=0 MSS=1460 75050 617.344324 192.168.1.22 192.168.1.17 TCP 58 tram > fpitp [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 75051 617.344508 192.168.1.17 192.168.1.22 TCP 60 fpitp > tram [ACK] Seq=1 Ack=1 Win=2048 Len=0 75053 617.345776 192.168.1.17 192.168.1.22 TCP 60 fpitp > tram [PSH, ACK] Seq=1 Ack=1 Win=2048 Len=4 75054 617.345829 192.168.1.22 192.168.1.17 TCP 54 tram > fpitp [ACK] Seq=1 Ack=5 Win=65535 Len=0 75055 617.345958 192.168.1.17 192.168.1.22 TCP 60 fpitp > tram [PSH, ACK] Seq=5 Ack=1 Win=2048 Len=1 75056 617.345979 192.168.1.22 192.168.1.17 TCP 54 tram > fpitp [ACK] Seq=1 Ack=6 Win=65535 Len=0 75057 617.346337 192.168.1.17 192.168.1.22 TCP 64 fpitp > tram [PSH, ACK] Seq=6 Ack=1 Win=2048 Len=10 75058 617.346371 192.168.1.22 192.168.1.17 TCP 54 tram > fpitp [ACK] Seq=1 Ack=16 Win=65535 Len=0 75059 617.346527 192.168.1.17 192.168.1.22 TCP 60 fpitp > tram [PSH, ACK] Seq=16 Ack=1 Win=2048 Len=1 75060 617.346552 192.168.1.22 192.168.1.17 TCP 54 tram > fpitp [ACK] Seq=1 Ack=17 Win=65535 Len=0 75061 617.346720 192.168.1.17 192.168.1.22 TCP 60 fpitp > tram [PSH, ACK] Seq=17 Ack=1 Win=2048 Len=1 75062 617.346746 192.168.1.22 192.168.1.17 TCP 54 tram > fpitp [ACK] Seq=1 Ack=18 Win=65535 Len=0 75063 617.347400 192.168.1.22 192.168.1.17 TCP 300 tram > fpitp [PSH, ACK] Seq=1 Ack=18 Win=65535 Len=246 75064 617.347423 192.168.1.22 192.168.1.17 TCP 1514 tram > fpitp [ACK] Seq=247 Ack=18 Win=65535 Len=1460 75091 617.549843 192.168.1.17 192.168.1.22 TCP 60 fpitp > tram [ACK] Seq=18 Ack=1707 Win=1286 Len=0 75092 617.549900 192.168.1.22 192.168.1.17 TCP 1340 [TCP Window Full] tram > fpitp [ACK] Seq=1707 Ack=18 Win=65535 Len=1286 75223 618.449705 192.168.1.22 192.168.1.17 TCP 1340 [TCP Window Full] [TCP Retransmission] tram > fpitp [ACK] Seq=1707 Ack=18 Win=65535 Len=1286 75416 619.954817 192.168.1.22 192.168.1.17 TCP 1340 [TCP Window Full] [TCP Retransmission] tram > fpitp [ACK] Seq=1707 Ack=18 Win=65535 Len=1286 75763 622.580803 192.168.1.22 192.168.1.17 TCP 1340 [TCP Window Full] [TCP Retransmission] tram > fpitp [ACK] Seq=1707 Ack=18 Win=65535 Len=1286 76443 627.679946 192.168.1.22 192.168.1.17 TCP 1340 [TCP Window Full] [TCP Retransmission] tram > fpitp [ACK] Seq=1707 Ack=18 Win=65535 Len=1286 77878 637.667428 192.168.1.22 192.168.1.17 TCP 1340 [TCP Window Full] [TCP Retransmission] tram > fpitp [ACK] Seq=1707 Ack=18 Win=65535 Len=1286 79925 652.912510 192.168.1.22 192.168.1.17 TCP 1340 [TCP Window Full] [TCP Retransmission] tram > fpitp [ACK] Seq=1707 Ack=18 Win=65535 Len=1286 85163 682.866999 192.168.1.22 192.168.1.17 TCP 1340 [TCP Window Full] [TCP Retransmission] tram > fpitp [ACK] Seq=1707 Ack=18 Win=65535 Len=1286
I think this demonstrates that the server is waiting for the Arduino to send an ACK packet.
|
|
|
|
|
Logged
|
|
|
|
|
Miramar Beach, Florida
Offline
Faraday Member
Karma: 50
Posts: 3448
|
 |
« Reply #12 on: March 07, 2012, 01:01:23 pm » |
I have not encountered any stream implementation where read() will return -1 if it is not the end of stream. Welcome to the Wonderful World of Arduino! If client.read() returns -1 (int, not char), that means "no characters available", not end of stream. That just means you have read everything in the packet you have received so far. It should be the same as client.available() == 0 That does not mean the server is finished sending packets. It will close the connection when it is finished sending packets. @lazyatom: Do you use Linux?
|
|
|
|
|
Logged
|
|
|
|
|
Offline
Newbie
Karma: 0
Posts: 14
|
 |
« Reply #13 on: March 07, 2012, 01:05:58 pm » |
@lazyatom: Do you use Linux?
Sorry, I didn't reply earlier. I'm using Mac OS X 10.7.3 (aka 'Lion') on a 64bit iMac.
|
|
|
|
|
Logged
|
|
|
|
|
Offline
Newbie
Karma: 0
Posts: 14
|
 |
« Reply #14 on: March 07, 2012, 01:09:27 pm » |
Do you control the server sending the data (I think so since it is a local ip address), are you sure the server is not the one stalling?
I do; it's a very simple server but you're right to consider it. You can see it in the gist ( https://gist.github.com/1994216) - it's a very simple ruby script. However, my reading of the wireshark log above is that the server is waiting for a response from the arduino. int length = 0; while (client.connected()) { while (client.available()) { char c = client.read(); length++; if (c== -1) { client.stop(); } } }
I replaced my code with this (tweaked slightly - only length++ if c != -1), but then it seems to fail to connect after the first request: 4501: IP address: 192.168.1.17 4504: Connected. 6285: Total read: 8346 Total failures: 0/1 7286: Couldn't connect. 8288: Couldn't connect. 9288: Couldn't connect. 10290: Couldn't connect. 11290: Couldn't connect. 12292: Couldn't connect. 13292: Couldn't connect. 14294: Couldn't connect. 15294: Couldn't connect. 16295: Couldn't connect. 17296: Couldn't connect.
... obviously the next time loop() is run, the client.connect() is failing, but I'm not sure why moving client.stop() inside the while loops would cause this. I had assumed that after the call to client.stop(), the client would be in a good state to connect again.
|
|
|
|
|
Logged
|
|
|
|
|
|