W5100 stop responding after several udp packets

Hi everybody,

I'm using an Arduino Mega 2560 with the ethernet shield (W5100) to talk to a device that talks trough TCP and UDP (udp port 2222 only), I can change the Request Poll Interval (RPI) from 1 to 3200 ms, we usually use 50ms but it fails like an hour later, I try at 1ms just to verify if changing the speed makes it fail faster and it did, at 1ms I'm getting the error approximately 5 minutes later after the communication starts

At the beginning it was just hanging the arduino, then I noticed that it was getting stuck in the udp library function "socketSendUDP" so I made a little modification to the library to let it continue but eventually the ethernet shield will just stop working anyways.

In order to replicate the problem I simplify the arduino code and I removed the TCP portion of it and I made an app that sends UDP packets to an specified IP and UDP Port, and it still happens.

The error that I'm getting from the UDP library is SnIR::RECV (0x04)

this is the arduino code

#include <Ethernet.h>
#include <EthernetUdp.h>
#define UDP_TX_PACKET_MAX_SIZE 200

byte mac[] = {0xDE, 0xAD, 0xBE, 0xEF, 0xFE, 0xED};
IPAddress ip(192, 168, 1, 177);
IPAddress myDns(192, 168, 1, 1);
IPAddress gateway(192, 168, 1, 1);
IPAddress subnet(255, 255, 255, 0);

unsigned int UDPPort = 2222;      // local port to listen on
unsigned char packetBuffer[UDP_TX_PACKET_MAX_SIZE];  // buffer to hold incoming packet,
EthernetUDP Udp;

void setup() {
  pinMode(4, OUTPUT);
  digitalWrite(4, HIGH);
  Serial.begin(115200);
  Ethernet.init(10);
  Ethernet.begin(mac, ip, myDns, gateway, subnet);
  Udp.begin(UDPPort);
  Serial.print("Adapter address:");
  Serial.println(Ethernet.localIP());
}

void loop() {
  int packetSize = Udp.parsePacket();
  if (packetSize) {
    Udp.read(packetBuffer, packetSize);
    byte udp_response[] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
    IPAddress remote = Udp.remoteIP();
    Udp.beginPacket(remote, 2222);
    Udp.write(udp_response, sizeof(udp_response));
    int err = Udp.endPacket();
    if (err != 1) {
      Serial.print("MEM: ");
      Serial.print(freeRam());
      Serial.print(", err: ");
      Serial.println(err);
    }
  }
}
int freeRam () {
  extern int __heap_start, *__brkval;
  int v;
  return (int) &v - (__brkval == 0 ? (int) &__heap_start : (int) __brkval);
}

and this is the app to send the UDP packets
UDPTransmitter.zip (7.4 KB)

this is the modification in the UDP library

int EthernetClass::socketSendUDP(uint8_t s)
{
	SPI.beginTransaction(SPI_ETHERNET_SETTINGS);
	W5100.execCmdSn(s, Sock_SEND);

	/* +2008.01 bj */
	int countertimeout = 0;
	while ( (W5100.readSnIR(s) & SnIR::SEND_OK) != SnIR::SEND_OK ) {
		if (W5100.readSnIR(s) & SnIR::TIMEOUT) {
			/* +2008.01 [bj]: clear interrupt */
			W5100.writeSnIR(s, (SnIR::SEND_OK|SnIR::TIMEOUT));
			SPI.endTransaction();
			//Serial.printf("sendUDP timeout\n");
			return SnIR::TIMEOUT;
		}
		if (W5100.readSnIR(s) & SnIR::RECV) {
			/* +2008.01 [bj]: clear interrupt */
			W5100.writeSnIR(s, (SnIR::DISCON));
			SPI.endTransaction();
			//Serial.printf("sendUDP timeout\n");
			return SnIR::RECV;
		}
		SPI.endTransaction();
		yield();
		SPI.beginTransaction(SPI_ETHERNET_SETTINGS);
		//countertimeout++;
		//if(countertimeout > 1000) return W5100.readSnIR(s);
	}

	/* +2008.01 bj */
	W5100.writeSnIR(s, SnIR::SEND_OK);
	SPI.endTransaction();

	//Serial.printf("sendUDP ok\n");
	/* Sent ok */
	return 1;
}

be aware that this happens after several maybe thousand of packets later, it usually takes less than 10 minutes at 1ms RPI

Thanks in advance

Which version of the Arduino IDE are you using ? Your code looks significantly different to that I have such that I cannot be clear on what changes you have made, could you clarify what you have changed maybe mark it up in some way, or post a difference output?

The data you receive is never looked at (not surprising as you have gone to the trouble of reducing the code to clearly show the problem), but can you print out the size of the data given by parsePacket? Just wondering if some "rogue" packet on the network could be causing a problem (unlikely but worth just checking).

What are you using to send the packets to the Mega? Can you post the code (so anyone can try using exactly the code you have). If, for example, it works for them and not for you then that might point to a hardware issue with the Ethernet card.

Do you have an alternative Ethernet card, and Mega on which you could try - just to eliminate the hardware. You could probably use a Uno if you have not got another mega as the code looks as though it should run on either (and probably many other Arduino's too!).

Hi Paul, thanks for answering, sorry for the late response, I guess I should have mention that I work third shift :slight_smile:.

1.8.15

Do you mean the library code? or the sketch code?

  • Library code: it originallly returns a boolean, true when completed or false when timed out, the problem is that it wasn't timing out, so I modified to return an integer as an "error code" with the actual response from the transaction:
  • Sketch: of course I based my code in the "UDPSendReceiveString" example, you can use that example (I guess) just replace the following:
char ReplyBuffer[] = "acknowledged";

with the following

char ReplyBuffer[] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};

the one from 192.168.1.2 is what I receive from the machine, the one from 192.168.1.177 is what I send back

I guess it got hidden in the mess of my message but I actually upload the app to talk to the mega:

Not at work, I can try tomorrow with a different arduino, but I don't have any other ethernet card

I have been testing different things, this is what I found:

  1. If I only receive at 1ms, there is no problem at all, at least I tested for more than 30 minutes at it never failed, remember that it usually fails in <= 5 minutes.

  2. the same happens if I only send at the same RPI

  3. I try it again sending and receiving and the problem showed up again

  4. trying to see if the transition from receive to send is too fast I added a 5ms delay between receive and send, and guess what, it ran for hours without a single problem

    Udp.read(packetBuffer, UDP_TX_PACKET_MAX_SIZE);
    delay(5);
    Udp.beginPacket(Udp.remoteIP(), Udp.remotePort());
    Udp.write(ReplyBuffer);
    Udp.endPacket();

this could be a solution but only if my RPI is >= 10ms, which I'm ok with that, but more than just making it work for this application I would like to actually understand and if possible fix the problem, maybe is more of a physical problem in the W5100 or maybe is a bug in the library.

Reading the W5100 datasheet and digging in the library it looks like I will always receive an "SnIR::RECV" which means there is data received (even if I tried to clear it it didn't clear), what is happening is that I'm not receiving the "SnIR::SEND_OK" neither "SnIR::TIMEOUT".

I was blaming the machine I'm talking to but it happens also with the app with the computer, which tells me is not the machine (maybe I'm wrong)

Thanks for your help, I really appreciate it, hopefully we can find the root cause and maybe a solution to this problem

I forgot, there is another change needed in the "UDPSendReceiveString" example, the original buffer size is 24 bytes and as you can see I'm receiving at least 25, I did the following to increase the buffer size

I defined again the UDP_TX_PACKET_MAX_SIZE to 200 just in case that I received at huge packet

#include <Ethernet.h>
#include <EthernetUdp.h>
#define UDP_TX_PACKET_MAX_SIZE 200

Hi,

The replyBuffer you use appears to be 22 characters long - how come the software receiving that sometimes seems to get larger amounts and replies with larger amounts? Have I missed something? The amount appears to exceed your new buffer size of 200 in one case.

The delay of 5ms apparently allowing everything to work raises the obvious question, what happens as you reduce that to 4,3,2 &1? Even 0? If things work at 1ms but not 0ms, perhaps using micros to see if you can get a clearer value for a required pause.

I am wondering if the problem might be with SPI and not coping with lots of very rapid changes. Have you changed an of the SPI settings or are they all still default? I would have thought that if the problem was with SPI then it would have shown up somewhere else such as those using SDfat - but a slight difference could be all it needs to work.

When you changed the code and added your extra section it appears to include original comments that made me think the original code you had was different to mine - my mistake.

I suspect that we are in different time zones (I'm in BST) which might also make one message each per day quite likely on this thread!

Just notice that in w5100.h there is a definition:

// Safe for all chips
#define SPI_ETHERNET_SETTINGS SPISettings(14000000, MSBFIRST, SPI_MODE0)

but also for slower models:

#define SPI_ETHERNET_SETTINGS SPISettings(8000000, MSBFIRST, SPI_MODE0)

Might be worth trying the setting for the slower model and see if that has any effect. It could solve one problem but create another of course!

Hi, I tried from 0 to 3 and all of them fail, when I tried 4 it gaves me a single fail from time to time but not as bad as in 3ms, 5ms seems to be the best lowest delay so far

I was thinking the same but being a Mega I think is safe to run at 14MHZ, actually lowering that speed may create more issues due that I believe I have a not enough response time from the library

I thought that the change would only affect the speed of the SPI bus, not that of the Arduino so should not alter the response time of the library - are we on the same page?

Yes, we are, actually I tried something else, I enabled the Web server along with the UDP connection and it crashes the same way as soon as it reloads the webpage, to me it looks like even if the W5100 has the capability to have multiple sockets opened at the same time it can't read them simultaneously, making it crash.

I was using the Web server example that reads the 6 analog inputs

You might be right, but I would have expected such an issue to have shown up before. Do you know exactly which version of the Ethernet shield you have. IIRC earlier ones had an SD card slot that did not work for example. It could also be a fault on that one Ethernet card that only shows up under heavy load.

I have not seen that on an Arduino Ethernet card, but have not used that many Arduino ethernet cards. I have seen the ethernet on a workstation cause problems resulting in rebooting, the diagnostics said al was OK, but eventually a way was found to show the fault reproducibly - once the motherboard (which included ethernet) was changed the problem disappeared. The engineer who changed it was very reluctant to change the MB thinking it was a software issue, but when showed the problem only appeared on one workstation accepted that it might be HW, he changed the MB and could see that the problem had gone, but was still unsure about why.

I may need to test it in another ethernet card, but I don't have any extra right now, I may get another later, for the moment I can live without the Web server and with the 5ms delay.

Thanks a lot for all your help