New River, Arizona
Offline
God Member
Karma: 15
Posts: 876
Arduino rocks
|
 |
« Reply #150 on: October 20, 2011, 05:40:42 pm » |
Has anyone taken a look at the code I mentioned above?
Next, I think measuring this thing causes it to work better. Aside from the pin 4 fix, I have monitoring code in the ethernet library that doesn't change anything, just reports stuff. It wouldn't fail. So, I made a batch file to just nail it as fast as it could with http get requests. I even asked for the favicon.ico that is so annoying when it shows up constantly in the logs. It ran through a couple of thousand iterations. This is on an arduino that is making client requests to several other internet devices and web sites at the same time. My packet count shot up well over 300K and the darn thing just kept on trucking. It did the other requests, monitored an XBee network and never a hiccup. We're talking well over 2000 get requests to the device as fast as it could respond.
A few days ago, it would never have survived that. So, I'm going back to the library I started with and comment out the pin 4 code and try it again. If it fails, I'm going to step through the changes and monitoring code one at a time until I get more information. I'm also leaving in the changes I made to my sketch to cut down on packets. Things that tried to client.print(variable) and such. I suspect, that Surfer is right, lots of little packets will kill it. But, I'll see.
Sometimes you just have to start over.
|
|
|
|
|
Logged
|
|
|
|
|
New River, Arizona
Offline
God Member
Karma: 15
Posts: 876
Arduino rocks
|
 |
« Reply #151 on: October 20, 2011, 07:48:46 pm » |
You don't really ever need to look at TD_RD. Use the free space register.
The W5100 only updates its internal copy of TX_WR after you send a "SEND" command.
What I was wondering about is that, since we have two processors using the same block of memory without locks on the memory, if the 5100 could be handling a retransmission or etx or something at the same time that we were copying data into the send buffer. Since the buffer is circular, it could be possible that the 5100 could be reading from it at the same time that we are writing and the consequent updates to the pointer would clash. It didn't The 5100 datasheet is clear that the two pointers TX_RX and TX_WR have to be equal and the status register should be correct before copying data into the transmit buffer. The code in the 5100 routines doesn't do this or check it like that. However, extensive testing hasn't shown a problem. That's what I was comparing to see if there was a problem.
|
|
|
|
|
Logged
|
|
|
|
|
New River, Arizona
Offline
God Member
Karma: 15
Posts: 876
Arduino rocks
|
 |
« Reply #152 on: October 21, 2011, 12:32:08 am » |
OK, found at least one problem. I took my device all the way back (except for single byte packets) and started over. I have my curl script that pounds it as fast as the arduino can respond so I can test a lot in a little time.
Ran it and it just worked fine. I mean thousands of interaction (get /) and no problems at all. Remember, pin 4 was not initialized.
Got mad and changed my code to send one of my long strings out in a loop that would cause it to send a LOT of one byte packets, around 80, right in the middle of the page...twice because it is in a loop. It lasted about 100 iterations before it locked up. Second test 60 iterations. It got as low as 30 and as high as 162 before locking up. But it did lock up.
Put the code back in for pin 4 and reran the test. It locked up in a similar fashion, time after time.
I put a check in the write data processing routine to check the values of the registers against what they should be and no problem was reported even when it locked up.
So, I took the code out that separated it into multiple single byte packets and ran the test again. It worked a full 1000 times, so I ran it again...and again. It's setting there blissfully working just fine and I have taken my changes out of the ethernet library.
Net, multiple 1 byte packets will kill it. I don't know exactly why, but it definitely does. Pin 4 didn't seem to make any difference at all, but I am setting it high just in case (and, because it's an unknown if I don't). I also notice there was a very visible slow down in the amount of traffic that I was able to generate with the 1 byte packets (pretty freakin' obvious, right?).
So, all you folks out there that are using the streams library to talk to the ethernet or client.print(variable); are in for a surprise.
I'm just going to set back and watch it now...........sigh.
|
|
|
|
|
Logged
|
|
|
|
|
Miramar Beach, Florida
Offline
Faraday Member
Karma: 50
Posts: 3449
|
 |
« Reply #153 on: October 21, 2011, 04:25:36 am » |
Got mad and changed my code to send one of my long strings out in a loop that would cause it to send a LOT of one byte packets, around 80, right in the middle of the page...twice because it is in a loop. It lasted about 100 iterations before it locked up. Second test 60 iterations. It got as low as 30 and as high as 162 before locking up. But it did lock up. This is the bug I am talking about. It is only when you send a bunch of packets in a row that it happens, and that was what you tried, wasn't it? Add: Now for the not-so-good news. My "correction code" will not prevent that. I just ran another test, and the w5100 has less manners than I thought. I inserted another register read immediately after the send. The w5100 indicates it has read the data from the buffer. // The registers are read just prior to the send: TX_RD = 32 TX_WR = 32 // 30 byte send here and immediate register read // the w5100 changed the TX_RD to equal TX_WR, indicating it is finished reading TX_RD = 62 TX_WR = 62 // BUT IT ISN'T!! 20 seconds later with NO send, the values are now this: TX_RD = 1216 TX_WR = 1216 Bad manners! If it had any manners or knew how to share, the registers would still be TX_RD = 62 TX_WR = 62
|
|
|
|
« Last Edit: October 21, 2011, 05:33:01 am by SurferTim »
|
Logged
|
|
|
|
|
Valencia, Spain
Offline
Edison Member
Karma: 65
Posts: 2251
|
 |
« Reply #154 on: October 21, 2011, 04:37:01 am » |
What I was wondering about is that, since we have two processors using the same block of memory without locks on the memory, if the 5100 could be handling a retransmission or etx or something at the same time that we were copying data into the send buffer. Since the buffer is circular, it could be possible that the 5100 could be reading from it at the same time that we are writing and the consequent updates to the pointer would clash. It didn't
That makes no sense. The W5100 has to keep the data in the buffer until it receives an ACK from the destination. It will only update the TX_RD pointer after the ACK packet has arrived. The 5100 datasheet is clear that the two pointers TX_RX and TX_WR have to be equal before copying data into the transmit buffer.
Where does it say that? It's not in my copy of the docs. If that's true then why bother with the whole circular buffer thing? What's the free space register for...? (nb. I assume you meant "TX_RD", not "TX_RX")
|
|
|
|
|
Logged
|
|
|
|
|
Valencia, Spain
Offline
Edison Member
Karma: 65
Posts: 2251
|
 |
« Reply #155 on: October 21, 2011, 04:49:18 am » |
Got mad and changed my code to send one of my long strings out in a loop that would cause it to send a LOT of one byte packets, around 80, right in the middle of the page...twice because it is in a loop. It lasted about 100 iterations before it locked up. Second test 60 iterations. It got as low as 30 and as high as 162 before locking up. But it did lock up. This is the bug I am talking about. It is only when you send a bunch of packets in a row that it happens, and that was what you tried, wasn't it? Add: Now for the not-so-good news. My "correction code" will not prevent that. I just ran another test, and the w5100 has less manners than I thought. What happens if you wait for the "SEND_OK" bit to appear in the interrupt register after you send each packet? This bit lets you know that the W5100 has fully processed your "SEND" command. It might be possible that if you do two "SEND" commands very quickly you could cause bad things to happen ... or that the W5100 has limited memory to remember where the start/end of all the individual packets are in the transmit buffer. One of the things I don't like about the standard driver is that it doesn't use the interrupt bits to check for completion of actions. (Remember, to clear the "SEND_OK" bit you write a 4 to the interrupt register - this isn't very clear in the docs)
|
|
|
|
|
Logged
|
|
|
|
|
Miramar Beach, Florida
Offline
Faraday Member
Karma: 50
Posts: 3449
|
 |
« Reply #156 on: October 21, 2011, 04:59:07 am » |
Thanks again, fungus. I'll look into that also. Something has to be the flag that lets the Arduino know it is finished with that register. Obviously the TX_RD and TX_WR registers being equal isn't it.
|
|
|
|
|
Logged
|
|
|
|
|
Valencia, Spain
Offline
Edison Member
Karma: 65
Posts: 2251
|
 |
« Reply #157 on: October 21, 2011, 05:40:20 am » |
// 30 byte send here and immediate register read // the w5100 changed the TX_RD to equal TX_WR, indicating it is finished reading
TX_RD = 62 TX_WR = 62
// BUT IT ISN'T!! 20 seconds later with NO send, the values are now this:
TX_RD = 1216 TX_WR = 1216
I find that impossible to believe...there must be something else going on.
|
|
|
|
|
Logged
|
|
|
|
|
Miramar Beach, Florida
Offline
Faraday Member
Karma: 50
Posts: 3449
|
 |
« Reply #158 on: October 21, 2011, 05:49:24 am » |
I find that impossible to believe...there must be something else going on. Me too! That is the part that is so confusing. At first I did think it was the SPI collision with the SD reader. But that problem is gone. This is too repeatable to be a coincidence. And this doesn't happen just now and then. Those registers are different EVERY 20 SECONDS!. They are NEVER the same as when I left them after the last send. Always different by at least a thousand. Add: If it is using that TX_WR pointer to build a packet, that is ok. I could deal with that if it would change the registers in the correct order. Like this: TX_WR = 62 TX_RD = 62 // 30 byte send TX_WR = 92 TX_RD = 62 // Arduino wants to send again, but it knows the w5100 isn't finished because TX_WR != TX_RD // so it waits // w5100 needs to build a packet, so it grabs some memory (TX_WR += 1000 as an example), but leaves the TX_RD register alone // the registers have not been equal yet. TX_WR = 1092 TX_RD = 62 // Arduino tries the send again, but it knows the w5100 still isn't finished because TX_WR != TX_RD // so it waits again // w5100 finishes the send and sets the TX_RD register TX_WR = 1092 TX_RD = 1092 // Now the Arduino knows the w5100 is finished. TX_WR = TX_RD
|
|
|
|
« Last Edit: October 21, 2011, 06:26:47 am by SurferTim »
|
Logged
|
|
|
|
|
Valencia, Spain
Offline
Edison Member
Karma: 65
Posts: 2251
|
 |
« Reply #159 on: October 21, 2011, 07:21:52 am » |
Add: If it is using that TX_WR pointer to build a packet, that is ok. I could deal with that if it would change the registers in the correct order. Like this:
In my experience the W5100 doesn't change TX_WR. Try printing out the socket number as well - make sure you're looking at the same socket (it's all I can think of...)
|
|
|
|
« Last Edit: October 21, 2011, 07:23:40 am by fungus »
|
Logged
|
|
|
|
|
Miramar Beach, Florida
Offline
Faraday Member
Karma: 50
Posts: 3449
|
 |
« Reply #160 on: October 21, 2011, 07:50:50 am » |
I know why mine is showing different TX_RD and TX_WR values now. It is because I am an idiot! I am opening and closing a new socket for each send. That will not create this error, and why the values are different for each send. My bad.
The error should occur sending multiple packets on the same socket connection. In that case, my "correction code" should work. If I break up my send into multiple packets and check the registers between packets, they increment as I expect.
Here is a thought. Is there anything that prevents that buffer from wrapping around and overwriting itself if the w5100 gets behind sending packets?
And another thought. If it isn't building the packets in this buffer, where is it building them? How many packets can it store there before that memory overflows.
|
|
|
|
« Last Edit: October 21, 2011, 08:18:33 am by SurferTim »
|
Logged
|
|
|
|
|
Valencia, Spain
Offline
Edison Member
Karma: 65
Posts: 2251
|
 |
« Reply #161 on: October 21, 2011, 10:42:48 am » |
Here is a thought. Is there anything that prevents that buffer from wrapping around and overwriting itself if the w5100 gets behind sending packets?
Nope. That's entirely up to you. You can write random values to TX_WR if you want to but don't expect useful results. And another thought. If it isn't building the packets in this buffer, where is it building them? How many packets can it store there before that memory overflows.
In the "Reserved" RAM from 0x0800-0x4000...? It has to store the start/size of each packet as you write them to the buffer so all that info probably goes in the reserved RAM too. This is why I suggested above that it might run out of space if you send lots of single byte packets without checking the interrupt flag for success after each one.
|
|
|
|
|
Logged
|
|
|
|
|
New River, Arizona
Offline
God Member
Karma: 15
Posts: 876
Arduino rocks
|
 |
« Reply #162 on: October 21, 2011, 12:46:13 pm » |
Here is a thought. Is there anything that prevents that buffer from wrapping around and overwriting itself if the w5100 gets behind sending packets?
Yes there is. In socket send it hangs in a loop until there's enough room to put the stuff in. That's one of the loops I talked about earlier that could (possibly) hang up forever. There's some weird use of the various flags that is difficult to understand. On page 33 of the datasheet is where they talk about the tx RR and RW registers being equal to be part of a comprehensive check of the end of a transmission sequence. The other parts of that are the IR register being right and the SR having the proper value. In their support forum, they call out that ALL that stuff has to be in place before copying data into the buffer, and the ethernet library appears to be doing a good job of checking that stuff. While it's true that you can write random stuff into the memory of the 5100, it's not true that you can if you are using the ethernet library. There are a lot of checks to prevent that. Also, like Surfer, I checked the tx RW register pretty heavily for changes and it was always right. The chip indicates that it isn't done messing with the tx buffer using the tx RSR register. That's why they check it twice in getTXFreeSize, although their pseudocode in the the datasheet shows hanging in a loop until it doesn't change as another method. That's essentially what Surfer did (if I read it right). The ethernet code DOES check the IR register for success after each packet sent in socket send. However, there is an escape in there for when the other end closes the connection. That's another loop that I wonder about. There are two loops in there that have the possibility of waiting forever. Especially if one of those tiny packets gets lost somewhere and there's some obscure bug in the ACK, RETRY logic. So, I'm convinced that the problem we're seeing is due to tiny packets going into the buffer faster than the chip can service them since it has to actually send them before it can move on. I'm not sure what to do about it though. The board isn't run in interrupt mode, it's in poll mode and polling takes time that allows for things like this to happen.
|
|
|
|
|
Logged
|
|
|
|
|
Miramar Beach, Florida
Offline
Faraday Member
Karma: 50
Posts: 3449
|
 |
« Reply #163 on: October 21, 2011, 01:11:15 pm » |
I submitted an enhancement request on the Arduino IDE code website a while back. I just went back there, and someone had left a comment that it had fixed his problem. It concerned accessing the transmit buffer free space register and sending a large image file. http://code.google.com/p/arduino/issues/detail?id=642&start=200And he was sending each byte in its own packet, then checking TXFreeSize(). That is what client.free() in this code returns. That is what this would do, right? while (client.free() > 0) { client.write(currentFile.read()); } Edit: If you didn't notice, this guy has figured out the SS lines for the two SPI interfaces. He is reading from the SD card, and writing to the w5100 without changing the SS lines. 
|
|
|
|
« Last Edit: October 21, 2011, 01:21:56 pm by SurferTim »
|
Logged
|
|
|
|
|
Valencia, Spain
Offline
Edison Member
Karma: 65
Posts: 2251
|
 |
« Reply #164 on: October 21, 2011, 01:33:37 pm » |
On page 33 of the datasheet is where they talk about the tx RR and RW registers being equal to be part of a comprehensive check of the end of a transmission sequence ...... ALL that stuff has to be in place before copying data into the buffer
Doesn't that make the whole circular buffer thing pointless? The ethernet code DOES check the IR register for success after each packet sent in socket send.
Oh, yes, in "socket.cpp"... However, there is an escape in there for when the other end closes the connection. That's another loop that I wonder about. There are two loops in there that have the possibility of waiting forever. Especially if one of those tiny packets gets lost somewhere and there's some obscure bug in the ACK, RETRY logic.
All arrows in the state diagram on page 29 lead back to "CLOSED" so it can't loop forever without a bug in the W5100. It's easy to find out if that's where the crash is happening...add a timer to the loop which tests IR and print messages: uint16_t sendto(...) { ... unsigned long start = millis(); while ( (W5100.readSnIR(s) & SnIR::SEND_OK) != SnIR::SEND_OK ) { /* m2008.01 [bj] : reduce code */ if ( W5100.readSnSR(s) == SnSR::CLOSED ) { close(s); return 0; } if ((millis()-start)>10000) { start = millis(); Serial.println("wait for IR is taking too long!"); } } ...
So, I'm convinced that the problem we're seeing is due to tiny packets going into the buffer faster than the chip can service them since it has to actually send them before it can move on.
If that's true it should have been solved by waiting for (TX_RD==TX_WR) before sending each packet.
|
|
|
|
|
Logged
|
|
|
|
|
|