Pages: 1 ... 15 16 [17] 18   Go Down
Author Topic: Ethernet (w5100) sketch hangs  (Read 23059 times)
0 Members and 1 Guest are viewing this topic.
Valencia, Spain
Online Online
Faraday Member
**
Karma: 146
Posts: 5490
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

I'm with zoomkat. I want a practical server, but in a non-crash sort of way. I want the ethernet shield to survive multiple requests, but I don't expect it will work like my Linux server.

Does it do HTTP keep-alive? That can really reduce the number of simultaneous connections...

My webserver needs a little bit more work but it does a lot more than most of the others I've seen (which is why I wrote it!), eg. HTTP keep-alive.

I'll be releasing the code when I get around to it (and when I'm happy this problem is solved). Maybe I can find some volunteers to test it here... :-)

Logged

No, I don't answer questions sent in private messages (but I do accept thank-you notes...)

Miramar Beach, Florida
Offline Offline
Faraday Member
**
Karma: 146
Posts: 6009
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Code:
Maybe I can find some volunteers to test it here... :-)
Maybe. smiley-wink

Do you have plans to use all 4 sockets simultaneously? That seems to be a bigger problem for me than keep-alive. I found that out yesterday after my Arduino/shield took a two hour pounding by draythomp's scripts. It was not as successful as enlightening. My end worked ok (add: it did not crash or stop), but it seems when a new connection request is received before the previous request is complete, everything gets buggered up on the client end.

Edit: It appears the ethernet library code is attempting to service all sockets in rotation. Maybe it was draythomp had used all my sockets and there was not a socket remaining. I was sending a lot of data to try to jam up the connection and cause the shield to generate a packet that was too big. But I did not see a fail due to that. It looked like timeouts.

That is the challenge with one loop trying to service 4 sockets. If one connection bogs down, the rest of the connections may timeout before the loop gets back to their sockets. Maybe that is where your keep-alive would help, but you would need to break out of the current socket, or send the keep-alive to the other sockets with each send to the current socket.

And even more: I just tried to test this with my cell phone, but my provider (Verizon) will not let me establish a port 80 connection to an IP address. Domain names only. smiley-sad
« Last Edit: October 26, 2011, 10:28:52 am by SurferTim » Logged

New River, Arizona
Offline Offline
God Member
*****
Karma: 19
Posts: 935
Arduino rocks
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

I understand the capabilities argument.  And, if people bought it knowing that it only supported one connection at a time, that would be ok to.  But, there's four sockets, the reason it doesn't support four connections is because the code is in a loop to find the very first socket that matches, not the one it listened to.  I won't jump off the cliff and call this a bug, but it is for sure an oversight, even the comment in the code that indicates they were thinking about it when they first put the code together indicates that.

Additionally, I haven't found long discussions that describe how this is only a device that will support one thing at a time.  zoomcat once indicated that he didn't think that anyone combined the client and server processing on the same machine.  My very first working device was both client and server.  The server told me what was going on and the client code went and got the time from the NIS HTTP servers.  I regularly have two clients active at a time.  It's easy to send the request and then go back to the loop code and send another request to a different server then back to the loop and do other stuff until the requests from the servers come back or time out.

I haven't had more than two of them active at a time, but I suspect I could get three without too much trouble.  I have had as many as six clients active on the same device, just not at the same time.

To use an example, suppose you were sending a picture of something from an arduino that was automatically photographing wildlife.  This could take a minute or so over a flaky wireless link and each time it failed because some request came in and clobbered the interaction because the ethernet code always chooses the first socket?

People say that you should choose a different device to handle stuff like that.  There's some truth there, but the mega2560 is a very capable device, the 5100 is an incredible little ethernet chip and both of them are far more capable of doing things than any of the projects I've seen done on them.  Just because the price and size allows us to put them in a coffee pot, doesn't mean that's the only place they should be used.

I mean, how much fun would it be if we could only flash one LED at a time?
Logged

Trying to keep my house under control http://www.desert-home.com/

Miramar Beach, Florida
Offline Offline
Faraday Member
**
Karma: 146
Posts: 6009
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

@draythomp: I do think this is a great device, but it has some limitations due to memory and speed restrictions, and we may have exceeded that yesterday. The pages my arduino was sending were really big, and you were pulling them fast! It appeared you were making new requests before the previous request had finished downloading. That would not happen if you were using a meta refresh to reload the page. Maybe I'll run a test with the meta refresh.

Thanks for a great test. I learned a lot!

Edit: This is good news. The w5100 apparently does use the MSS value. I used my router to choke the connection speed down to 56k, removed my change and the TX buffer does not fill up, even if the connection slows to a crawl. It stops at a point well short of my router MSS value. So it appears my code change is not the cure.
« Last Edit: October 26, 2011, 12:04:04 pm by SurferTim » Logged

Valencia, Spain
Online Online
Faraday Member
**
Karma: 146
Posts: 5490
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Edit: This is good news. The w5100 apparently does use the MSS value. I used my router to choke the connection speed down to 56k, removed my change and the TX buffer does not fill up, even if the connection slows to a crawl. It stops at a point well short of my router MSS value. So it appears my code change is not the cure.

I'll have do some more testing when I get the chance ... but where did my huge packets and "fragmentation needed" messages come from? I definitely saw them.

Logged

No, I don't answer questions sent in private messages (but I do accept thank-you notes...)

Valencia, Spain
Online Online
Faraday Member
**
Karma: 146
Posts: 5490
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Do you have plans to use all 4 sockets simultaneously? That seems to be a bigger problem for me than keep-alive.

There's no reason why it shouldn't... :-)

Technically you're supposed to send special replies when your server is busy so the browsers know how to retry in a sensible way.
Logged

No, I don't answer questions sent in private messages (but I do accept thank-you notes...)

Miramar Beach, Florida
Offline Offline
Faraday Member
**
Karma: 146
Posts: 6009
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

I just did some testing. It does rotate through the sockets. It does timeout one of the connections if another connection gets slow.

I have a connection with two computers on it that are both bandwidth restricted to 56k. The html doc I am sending is a 17 second download at that bandwidth, and refreshes 10 seconds after the download is complete. One will timeout eventually. But that is all.  smiley
Logged

Valencia, Spain
Online Online
Faraday Member
**
Karma: 146
Posts: 5490
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

I just did some testing. It does rotate through the sockets. It does timeout one of the connections if another connection gets slow.

Ideally you would want to use three for sending data and reserve the fourth for sending proper "server busy" responses to everybody else.
Logged

No, I don't answer questions sent in private messages (but I do accept thank-you notes...)

New River, Arizona
Offline Offline
God Member
*****
Karma: 19
Posts: 935
Arduino rocks
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

The problem with the server cutting short pages is annoying, but I can live with it.  It seems to only happen when I slam my little server, and like everyone says, I can get a unix box of some kind and set it up as a server if I ever need to.  So, I guess I'll just let that situation rest.

However, there are two annoying things left that I wonder about.  The first is how long the board takes to get working.  It seems to take a long time sometimes to get established on my network.  I can tell when this happens because the little leds act strange for up to a couple of minutes before settling to the way they are when running properly.  I can over come this situation with hardware, but it would be nice to have a software procedure to follow.

The other one that is really annoying, is that when I do a client request, it will sometimes hang in the connect() so long that my 10 second watchdog will time out.  I haven't timed this recently, but I seem to remember it setting there for around 30 seconds before giving up.  I didn't want to change the timeout and retry registers since they are at the chip level and would change all the sockets timing.  But this effort got me to looking at the code and there is a loop in connect that doesn't have a timer around it.  I think I can mess around in there and solve this problem.  Reconnecting isn't a problem if the other machine doesn't respond, it's that darn long wait in a loop that is the problem.  With my watchdog timer, my device actually resets to end the wait, and a blocking connect doesn't sound like a nice thing to have in the code.

Wish me luck.
Logged

Trying to keep my house under control http://www.desert-home.com/

New River, Arizona
Offline Offline
God Member
*****
Karma: 19
Posts: 935
Arduino rocks
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

Well, that was easy.  I added another connect with at timeout value, keeping the original as well.  Now, I can call it with a small value and it will let go and come back with a failure.  Then I have a choice of letting it run again at the same value or increasing the time or whatever.  But, it doesn't hang there long enough to get hit by the watchdog.  This way, the servers I have on my local lan use the original connect and my outside servers use the timed one.  Should have done this months ago.

The problems will come when I move up a version in the IDE, I'll have to carry my changes forward and make them work with a new library.  But, the world ain't perfect.
Logged

Trying to keep my house under control http://www.desert-home.com/

Valencia, Spain
Online Online
Faraday Member
**
Karma: 146
Posts: 5490
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Here's a screenshot of the output from my packet sniffer...



The W5100 is trying to send packets with 1514 bytes in them and they're getting rejected ("Fragmentation needed") because the MSS of the connection is 512 bytes.

I only write 64 bytes at a time when sending the file, it's the W5100 that's making big packets.

Conclusion: The W5100 joins writes together into big packets.
« Last Edit: October 28, 2011, 02:33:45 am by fungus » Logged

No, I don't answer questions sent in private messages (but I do accept thank-you notes...)

Valencia, Spain
Online Online
Faraday Member
**
Karma: 146
Posts: 5490
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

This shows where it all goes wrong...

When I send 64 bytes, a 118 byte packet goes out over the cable. My TCP header is therefore 54 bytes.

Here we see a bunch of packets being sent with no "ACK" messages coming back from my phone. Eventually the W5100 decides something is wrong and re-sends the data. It sends 1514 bytes.



Where does the number '1514' come from?

If we subtract 54 from 1514 we get 1460. That looks like a familiar number...where have I seen it before?  smiley-confuse


Logged

No, I don't answer questions sent in private messages (but I do accept thank-you notes...)

Miramar Beach, Florida
Offline Offline
Faraday Member
**
Karma: 146
Posts: 6009
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

A normal header is 40 bytes. The normal maximum packet size is 1500 bytes. The MSS would then be 1460.
If the header size increases to 54, the MSS should be 1446.

Can you check you MSS value after the connection is established?

Edit: I wonder if IPv6 has anything to do with this? From Wikipedia:

Quote
Fragmentation

Unlike in IPv4, IPv6 routers never fragment IPv6 packets. Packets exceeding the size of the maximum transmission unit of the destination link are dropped and this condition is signaled by a Packet too Big ICMPv6 type 2 message to the originating node, similarly to the IPv4 method when the Don't Fragment bit set.[1]
« Last Edit: October 28, 2011, 05:16:26 am by SurferTim » Logged

Valencia, Spain
Online Online
Faraday Member
**
Karma: 146
Posts: 5490
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

A normal header is 40 bytes.

Right...I was just trying to figure out the header.

The sniffer is reporting the complete packet:
14 bytes - Ethernet
20 bytes - IP datagram header
20 bytes - TCP datagram header
...data

Total: 54 bytes of header

I assume the 'Ethernet' part of the header is discarded before the packet is passed to the IP controller.
« Last Edit: October 28, 2011, 05:39:21 am by fungus » Logged

No, I don't answer questions sent in private messages (but I do accept thank-you notes...)

Miramar Beach, Florida
Offline Offline
Faraday Member
**
Karma: 146
Posts: 6009
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

The ethernet part may not be discarded. I have not checked what it does to the packet size, but I can insert packet and routing marks into the header with my router. That would really gum up the works if the w5100 did not take that into account.

Edit:
@fungus: You seem to be in a unique position with your setup. If you get around to it, try a test for me.  Just after the connection is established, maybe try something like this?
Code:
mssSize = readSnMSSR();
mssSize -= 100;
writeSnMSSR(mssSize);

I have wondered if writing to the SnMSSR register makes a difference. Since you are already checking packet size, yours would be the easiest setup to check.

If it works, your packet size should be 1415 rather than 1515. That should go ok.

« Last Edit: October 28, 2011, 07:47:17 am by SurferTim » Logged

Pages: 1 ... 15 16 [17] 18   Go Up
Jump to: