Pages: 1 2 3 [4]   Go Down
Author Topic: Ethernet Shield Unreliable?  (Read 9919 times)
0 Members and 1 Guest are viewing this topic.
Miramar Beach, Florida
Offline Offline
Faraday Member
**
Karma: 144
Posts: 5985
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

The routines are not blocking. You can do stuff in the while loop.
Code:
while(client.connected())
{
   // reset watchdog timer here or anything else
   // but stay in the loop until client.connected() is false 

   while(client.available())
   {
      client.read();
   }
}
Logged

Offline Offline
Newbie
*
Karma: 0
Posts: 17
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Well I made the change, I put that code you posted in the sendDataToServer() function (and put a watchdog reset in the outer loop) and still have the problem.

I am struggling now, will probably have the time to post my full code on the weekend.
Logged

Miramar Beach, Florida
Offline Offline
Faraday Member
**
Karma: 144
Posts: 5985
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

I was hoping that solved it for you.

Post your code this weekend and I'll give it a try. I have a couple php servers and access to the server logs.

Wouldn't want to ruin your beer!  smiley-grin
« Last Edit: February 02, 2012, 06:45:24 pm by SurferTim » Logged

Offline Offline
Newbie
*
Karma: 0
Posts: 17
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

It may be difficult to test it fully because there is other hardware attached, which you won't have, including some dallas 1-wire DS18B20  temperature sensors and this nifty display/input device based on the TM1638, which has 8x 7-segs, 8x buttons and 8x bi-color LEDs all driven by 3 lines.

http://www.dealextreme.com/p/81873?r=68099021

The sensors and display operate fine without the Ethernet Client, I tracked it down to the ethernet client by lighting a separate LED for each function within the loop then clearing that LED one the function completes.  It always failed during the Ethernet client part of the code, so I could see the LED for this function each time it locked up(before I had the watchdog to auto-reset).

I can create a duplicate database and destination for you to test your data if you wish, it's a piece of cake.  This way you can test with the same platform I am using.

Well it's my birthday today so I will not be on for a while.

Cheers for the help, everyone who drinks one of my beers appreciates it.
Logged

Offline Offline
Newbie
*
Karma: 0
Posts: 17
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Full source code (watch out, it's long, in fact too long to post here...)

http://kadenetwork.com/Beer/beer.ino
Logged

Miramar Beach, Florida
Offline Offline
Faraday Member
**
Karma: 144
Posts: 5985
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Happy birthday! Have a beer for me.  smiley

I have not tested it yet, but I think this is incorrect. I cut out all the testing stuff so you can see it.
Code:
while(client.connected())
{
   while(client.available())
   {  
      //appears to lock up here...
      client.read();
   }

   client.stop(); // This may try to close the connection before you receive anything

}

The client.stop() should be after the server closes the connection.

Code:
while(client.connected())
{
   while(client.available())
   {  
        client.read();
   }
}

client.stop();

Give that a try.

Add: If that doesn't do it, there is one more thing you can try. Here is the bug report and patch. It buggers up the return value from client.available(). It will stay in that loop forever.
http://code.google.com/p/arduino/issues/detail?id=605

The Arduino crew promises this will be fixed in the next version. But who knows when that will be. The report was filed 6 months ago.  smiley-sad

This is the same bug/patch that hardcore was complaining about back in reply#32. That patch fixed his problem.

I am beginning to think that "the next version" is like the spanish word "manana". It does not really mean "tomorrow". It means "not today".



« Last Edit: February 03, 2012, 03:14:22 pm by SurferTim » Logged

Offline Offline
Jr. Member
**
Karma: 0
Posts: 98
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

I've given up on these patches, instead  I  'pulled' the central repository off github and a am applying the patches to my own copy, but only after testing each one, then merging in my own patches.

There is another 'stupid' bug in w5100.cpp:
Code:
initSS();
  writeMR(1<<RST);
  writeTMSR(0x55);
  writeRMSR(0x55);
 
for (int i=0; i<MAX_SOCK_NUM; i++) {
    SBASE[i] = TXBUF_BASE + SSIZE * i;
    RBASE[i] = RXBUF_BASE + RSIZE * i;
  }

If people READ the chips data sheet they would see that it takes 10ms for the chip to 'software' reset (W5100 V1.2.4 P.64), as such the code should be:

Code:
initSS();
  // Issue a software reset to the W5100 chip.
  writeMR(1<<RST);
  //wait 100ms for the chip to physically reset(it takes 10ms but manufacturer request 100ms).
  delay(100);
  // Continue with the setup of the internal registers.
  writeTMSR(0x55);
  writeRMSR(0x55);

fortunately the W5100 chip has some default values, so as the reset is stamping on them, they are being reinitialized to the default,
There may be a chance this bug is stamping on the setting up of the sockets.....

I'm still trying to get an answer back off the manufacturer as to *if the  RST bit is 'testable' because if it was, then I would drop in a subroutine, both BEFORE & after the reset, (which would negate the need for the stupid delay(300)smiley-wink
With a TESTABLE state that could be returned up the stack. Currently when initializing a library on the arduino they mostly return 'void'.



Logged

Miramar Beach, Florida
Offline Offline
Faraday Member
**
Karma: 144
Posts: 5985
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

@hardcore: What is the delay(300) for? I assumed that was a power-up delay.

At one point during testing, I was checking the operation of the w5100 by reading one of the registers the begin routine just wrote. I rearranged the code to pass the return value back up the stack with a uint8_t type return rather than void type. With a void return, it will run that begin() routine like all is ok, even when there is no ethernet shield connected. There is something wrong about that. If the shield has failed mechanically, how would you know?

Code:
writeRMSR(0x55);
// if it isn't 0x55, return error
if(readRMSR() != 0x55) return(0);
// rest of setup
// return ok
return(1);

Logged

Offline Offline
Newbie
*
Karma: 0
Posts: 17
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Happy birthday! Have a beer for me.  smiley

I have not tested it yet, but I think this is incorrect. I cut out all the testing stuff so you can see it.
Code:
while(client.connected())
{
   while(client.available())
   {  
      //appears to lock up here...
      client.read();
   }

   client.stop(); // This may try to close the connection before you receive anything

}

The client.stop() should be after the server closes the connection.

Code:
while(client.connected())
{
   while(client.available())
   {  
        client.read();
   }
}

client.stop();

Give that a try.

Add: If that doesn't do it, there is one more thing you can try. Here is the bug report and patch. It buggers up the return value from client.available(). It will stay in that loop forever.
http://code.google.com/p/arduino/issues/detail?id=605

The Arduino crew promises this will be fixed in the next version. But who knows when that will be. The report was filed 6 months ago.  smiley-sad

This is the same bug/patch that hardcore was complaining about back in reply#32. That patch fixed his problem.

I am beginning to think that "the next version" is like the spanish word "manana". It does not really mean "tomorrow". It means "not today".


Yes, the Stop was in the wrong place, it was the last thing that I changed and accidentally put it there...

I have already applied the 605 patch.

Will try putting the stop on the outside to see it that resolves the problem.  I still have my doubts though.

** off to the garage to reprogram my beer controller **
Logged

Miramar Beach, Florida
Offline Offline
Faraday Member
**
Karma: 144
Posts: 5985
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

If that doesn't do it, you may need to set up the serial output and watch what is happening with the serial monitor in that client.available() loop. The first thing I would check is the value returned by client.available. If it is always larger than 1000, even tho you have made no additional requests from the server, then the 605 fix did not work in your case.

Before the 605 fix, the value returned from client.available (in my case) would start at about 1400 and go down to 1024, then start again at about 1400, and do that over and over forever. If you display the characters from the client.read(), it will be the same few hundred characters over and over. That will lock it in that loop. The watchdog timer would have to recover it at that point.
Logged

Offline Offline
Newbie
*
Karma: 0
Posts: 17
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

well it has now reset twice since I made that change.  There must be something else that is doing this that I haven't seen or taken into consideration.

It's a pain to debug it with serial because I have to bring out my laptop to the garage to do it and it could be hours before I see the problem.
Logged

0
Offline Offline
Tesla Member
***
Karma: 141
Posts: 9551
Arduino rocks
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Quote
It's a pain to debug it with serial because I have to bring out my laptop to the garage to do it and it could be hours before I see the problem.

From a high level view, have you actually tried geting simple client code to connect reliably to your server for several days? If not, that maybe a place to start.
Logged

Consider the daffodil. And while you're doing that, I'll be over here, looking through your stuff.   smiley-cool

Miramar Beach, Florida
Offline Offline
Faraday Member
**
Karma: 144
Posts: 5985
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

It seems you only have a few choices remaining. Do as zoomkat suggests and run a network test with a simple client program (he has some pretty good code for that), or use the serial monitor.

You could use one of your leds to signal that condition, but you would need to stare at the led for maybe several hours, or record it with a video recorder. It will light up for only 8 seconds, then the watchdog timer will reset it. There will be no history like the serial monitor.

Add: I know it is a pain to troubleshoot this. It was a real pain in the backside for me to find that 605 bug and patch. But isn't your beer worth it?  smiley

I would (and did) start with this:
Code:
Serial.println("Starting read");
while(client.connected())
{
   while(client.available())
   {  
        Serial.println(client.available(),DEC);
        client.read();
   }
}
client.stop();
Serial.println("Finished read");

Then put
Serial.println("Setup finished");
at the end of the setup() routine.

By the order of the messages in the serial monitor, you should be able to determine where it is hanging up. I removed the watchdog reset and added the Serial output in the available loop.

« Last Edit: February 05, 2012, 10:19:58 am by SurferTim » Logged

Offline Offline
Jr. Member
**
Karma: 0
Posts: 98
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

@hardcore: What is the delay(300) for? I assumed that was a power-up delay.

At one point during testing, I was checking the operation of the w5100 by reading one of the registers the begin routine just wrote. I rearranged the code to pass the return value back up the stack with a uint8_t type return rather than void type. With a void return, it will run that begin() routine like all is ok, even when there is no ethernet shield connected. There is something wrong about that. If the shield has failed mechanically, how would you know?

Code:
writeRMSR(0x55);
// if it isn't 0x55, return error
if(readRMSR() != 0x55) return(0);
// rest of setup
// return ok
return(1);


The  Delay(300); appears to have been put in the library by some other programmer because there was  a problem with the ethernet shield taking longer to come up that the arduino.
however when the programmer forces a chip reset, they failed to allow for the physical constraints of the chip, it all depends on just  'how far' into the code you can get in 10mS

The problem with checking other registers, is that one does not know 'how' the W5100  resets internally, you would have to know exactly which registers are configured and which one was configured last but the W5100.
Then there is the issue of doing this over I2C, if the Buss is flapping about , it is possible to get values back for a non existent device.
(I picked up a power reset bug on the Mega with a W5100+SD card, where after a reset you MUST physically turn OFF both devices, BEFORE configuring them)
Code:
pinMode(SS_PIN, OUTPUT); // set the SS pin as an output
                                // (necessary to keep the board as
                                // master and not SPI slave)
  digitalWrite(SS_PIN, HIGH); // and ensure SS is high

  // Ensure we are in a consistent state after power-up or a reset
  // button These pins are standard for the Arduino w5100 Rev 3
  // ethernet board They may need to be re-jigged for different boards
  pinMode(ETHER_CS, OUTPUT); // Set the CS pin as an output
  digitalWrite(ETHER_CS, HIGH); // Turn off the W5100 chip! (wait for
                                // configuration)
  pinMode(SD_CS, OUTPUT);       // Set the SDcard CS pin as an output
  digitalWrite(SD_CS, HIGH); // Turn off the SD card! (wait for
                                // configuration)
The above cleared up loads of problems, specifically because during a reset you do not know HOW the pins will come up, and there were situations where the SD card was selected at the same time as the W5100 was being configured, or visa versa, I have some really nice captures of the SPI bus after a reset ,with and without the above patch.

The other issue is that many of the *bugs* don't actually appear until you hit the right values, casting from a Uint to an Int can cause a sign extension.
One other issue is the shear sluggedness of the library, on my initial testing of cleaning up the raw driver, I found I could get no speed increase even after some fairly shrewd optimizations, which leads to the conclusion that the delays are being introduced higher up the communication stack.

HC
« Last Edit: February 05, 2012, 07:52:40 pm by hardcore » Logged

Pages: 1 2 3 [4]   Go Up
Jump to: