Running OSC communication between a computer and several Arduino Megas (#A000067 MEGA2560 REV 3) with Ethernet-PoE shields (#A000075 Rev3). Everything is fine, but after a few days a few of them (not always the same few) end up in a mode where the communication works okay one way (Arduinos sending OSC data to the computer) but not the other way around (Arduinos don’t receive OSC messages from computer).
As it truly takes a while (sometimes several days) for this to happen, I wasn’t able to isolate yet if the issue is code-related.
Has anyone experienced similar problems before? Or do you have an idea where this issue could originate from?
Computer and PoE Arduinos are connected over a Netgear FS116P PoE switch. And I know some report problems with specific switches, but this one works fine, and the problem really only occurs after a few days (not to prematurely exclude the switch as the problem source).
You may have socket problems. Add this to your sketch and call this function when you think appropriate. It will display the status of all sockets on the serial monitor. It appears OSC uses UDP, so look for a socket with a status of 0x22 and the correct port you expect OSC traffic sent to.
#include <utility/w5100.h>
byte socketStat[MAX_SOCK_NUM];
void ShowSockStatus()
{
for (int i = 0; i < MAX_SOCK_NUM; i++) {
Serial.print(F("Socket#"));
Serial.print(i);
uint8_t s = W5100.readSnSR(i);
socketStat[i] = s;
Serial.print(F(":0x"));
Serial.print(s,16);
Serial.print(F(" "));
Serial.print(W5100.readSnPORT(i));
Serial.print(F(" D:"));
uint8_t dip[4];
W5100.readSnDIPR(i, dip);
for (int j=0; j<4; j++) {
Serial.print(dip[j],10);
if (j<3) Serial.print(".");
}
Serial.print(F("("));
Serial.print(W5100.readSnDPORT(i));
Serial.println(F(")"));
}
}
Thanks for the tip. I implemented the function and am logging the socket status at a regular basis. Will report back, if at some point the system truly corrupts either IP address or port.
I imagine all I'd need to do then is monitor the socket information, compare it to what it's supposed to display, and then force a hardware reset in case of discrepancies.
Could not replicate the problem yet while capturing its output. Frustrating. If I could cause the problem faster, it would be a lot easier to debug it.
What would be a way of expediting the arrival of the socket issue? Simply pushing a lot more data? At a faster interval?
Any variables in the Ethernet library that I could set lower in order to force the issue? (I've seen people mention MAX_SOCK_NUM. Or possibly some waiting durations? )
My watchdog setting forces a restart if the code stalls for more than 1sec. After restart the ethernetUDP seems to open a second port even though the first one is still open. This does NOT happen after every watchdog restart. I'm not saying it's unrelated, but it seems like only once in a while it leads to the blocking of one-way communication.
So while it still is able to send out communication on one port, it is now listening for incoming communication on the wrong second port?
Would I be able to close / reopen these ports with code, or is this where I really need a hardware reset? Possible a longer delay before attempting to re-open a port after restart?