So yeah,
I asked a lot of questions so when i'm on a trail i'd like to contribute!
Been struggling for a while now. I have multiple (25+) arduino Ethernet devices that send and receive UDP at a rate of 10 to 40Hz, and packages over 40bytes. During testing we found that the units randomly froze. Some make it to 24 hours but most freeze after 2 to 4 hours. Resetting them with a watchdog did the trick, but there was an issue because the PWM output was also resetted.
I tried a bunch load... from disabling the SPI pin (High) of the SD socket to resetting the W5100 with an output pin...
After a typical debugging session found out that it froze during a routine that sends the UDP packages, and after some more debugging found out that the function
Udp.endPacket();
caused all the problems. It calls the function but never recovers from it...
Digging into the include file, I found allot of "while" loops that wait until a certain condition is met. One of them, in the socket.cpp is the following (found after yes, more debugging!):
int sendUDP(SOCKET s)
{
W5100.execCmdSn(s, Sock_SEND);
/* +2008.01 bj */
while ((W5100.readSnIR(s) & SnIR::SEND_OK) != SnIR::SEND_OK )
{
if (W5100.readSnIR(s) & SnIR::TIMEOUT)
{
/* +2008.01 [bj]: clear interrupt */
W5100.writeSnIR(s, (SnIR::SEND_OK|SnIR::TIMEOUT));
return 0;
}
}
/* +2008.01 bj */
W5100.writeSnIR(s, SnIR::SEND_OK);
/* Sent ok */
return 1;
}
And sometimes it never recovers from this while loop!
So what I did, was the most DIRTY way of finding out what actually happens:
int sendUDP(SOCKET s)
{
W5100.execCmdSn(s, Sock_SEND);
int t;
/* +2008.01 bj */
while (((W5100.readSnIR(s) & SnIR::SEND_OK) != SnIR::SEND_OK ) && t<25500)
{
if (W5100.readSnIR(s) & SnIR::TIMEOUT)
{
/* +2008.01 [bj]: clear interrupt */
W5100.writeSnIR(s, (SnIR::SEND_OK|SnIR::TIMEOUT));
return 0;
}
t++;
}
/* +2008.01 bj */
W5100.writeSnIR(s, SnIR::SEND_OK);
if(t>25000)
{
return 2;
}
/* Sent ok */
return 1;
}
Just increase a integer so when it reaches above a set amount, exits the while loop and lets me know.
And guess what: The devices now run for over 5 days, without issues... It does return "2" on occasion, but continues to run without problems!
So, I know it's not the cleanest way of finding out, but it works for me!
I'd really like your collective thoughts of constructive criticism and like to know if someone else has this issue or another way of solving it.