UDP fails to send after a web page is served

Ok i have a fairly large (HUGE) program that up to now works perfectly.

It hosts an internal web page (Mega with w5100 chip (freetronics ethermega))

It has to send a UDP packet once a minute as a heart beat to another arduino, and thats where things get weird.

It fails to send if i load the web page on any pc. I have wireshark running on my PC and for testing, i have the UDP send on a once per second call.
So wireshark works perfectly and i get 1 second UDP packets.

However as soon as i browse to the internal web page, the udp packets stop. However, the udp packet send returns no errors. Nothing at all changes, it's just that the packets stop being sent.

I have showed socket status before and after the http client send, and showSockStatus shows that the packet was sent from my udp to the PC. Before and after browsing the hosted web page, you can definitely see the listening port 80 open, and on socket 2, you can see the data being sent to the web browser.
Socket 0 stays open on my sketch as a listening port (listening on 540)
Socket 1 is server.begin
Socket 2 and 3 are free, usually udp.begin uses socket 2, but if i happen to browse a site, it will switch to socket 3.

Whats even weirder - is on my internal hosted website is a button that calls the UDP.send manually - and THAT WORKS.
However the 1 second count down timer that calls exactly the same code doesnt work.

So literally:
every second - UDP send (works for as long as i dont touch the web site)
Browse the internal website - UDP send stops working.
Browse the internal website and click on the button that says "Send" and it calls the 1 second code anyway, that works - but the 1 second timer UDP code - which was working.. never works again apart from very very sporadically, while continuing to browse the hosted http pages, the 1 second one works again.

But MANUALLY calling the code works every time.
I also have an ICMP ping instance but that was disabled for this test.

Things i have tried:
Ethernet libs. (1.0 up to 2.0)
Serial.printing all return status - all return codes point to the UDP packet sending.
No error messages at all.
Showsock status shows only the send port moving from 2 to 3 after accessing the internal web page, so i was thinking "code issue with socket 3?" but then when i do a manual send USING the web page that still works.

Using the NTP set time client (which also uses UDP) works fine.

Sometimes after a minute it starts working again on the one second timer, sometimes when closing the serial console, it sends a few more packets, but once i browse that internal website, the only guaranteed way to get the 1 second timer interval packets that are called once per second from a countdown timer is to .. call the 1 second interval packet send manually.

Ok - another oddity i just discovered.
If i trigger the failure by browsing the website and the serial console is open.. if i close the serial console.. IT STARTS WORKING AGAIN for a minute or two... then breaks again.

Come to think of it i have had issues with locking up when sending lots of data over the serial port while communicating with the wiznet chip - is there any kind of crossover with sending serial data and communicating over the SPI bus to the w5100?

Without having seen your code I guess you have an out of memory problem.
There may be other problems, I would start checking if your timer code (whatever that means) runs correctly. Keep in mind that you must no call any code depending on interrupts (p.e. using the Serial object) during an interrupt handler.

Yea the code is a big project, about 25 individual tabs. I am thinking it is a memory issue, and i have a fairly large buffer allocated that i put some canary code into, but that came out fine as well.

I was using an individual session based UDP "sender" code, open the port, send, close the port, and i changed to opening it right at the beginning of the code and leaving it open for use. That actually fixed the issue, but it then binds up a single port on the W5100, and there is only 4.

2 being used for full time UDP comms, 1 for the web socket, leaving only one for the occasional NTP time keeping packet.

The reason im using 2 ports, one for send and one for recieve is that there is that UDP W5100 bug where it locks up if waiting for a recv it recvs something else. Unless thats been fixed?/fixable?

If thats been fixed, i will stick with using one single UDP port for all communications which would be ideal, as then i can put my ICMP code back in. The 4 ports on the W5100 are a bit of a pain, and my code is using it to the fullest extent with a web server, icmp, ntp and 2 way UDP comms.

I don't remember having heard of this bug. Do you have a reference?
What does that mean: "it recvs something else"? If it receives a TCP packet during that time?

You should definitely think about having chosen the right board for the task. For such a communication centric application a full-fledged OS may be a better choice (p.e. Raspberry Pi).

GitHub - fredilarsen/Ethernet: Ethernet Library for Arduino regarding the lockup.
"The problem occurs while sending a UDP packet and an incoming packet arrives at about the same time. A loop in EthernetClass::socketSendUDP in socket.cpp will loop until SEND_OK or TIMEOUT is received, but this never happens so the loop will go on forever, locking up the Arduino. A code may be received, but this is typically RECV."

Yea, im looking at a more powerful system, but the reason i and the client are loving these is that they dont have a possibly flaky sd card, plus the new dev time in programming on a full linux system. plus, the arduino mega just has soooo many IO options.

I have trimmed down some of the code though and i think i can drop the UDP down to only one open session being used by NTP and the UDP comms, leaving 3 sockets spare. ICMP is still critical though as these units need to check they have continous network connections, and reboot the W5100, or the ENTIRE unit if connectivity is lost.

I understand that this was never included into the main library because this code simply resets the hardware if the described situation occurs. This would break many applications out there, including yours. It seems that this is a hardware bug and Fred Larsen did his best to fix that into the library but in my opinion the error handling shouldn't be done in the library but in the application. It might be reasonable to reset the hardware but the application must know about this, it might be necessary to restart the whole application.

That's probably the better solution for your case than to use Fred Larsen's version of the Ethernet library.

If you use an industrial grade SD card and optimize the OS to make just as many writes as necessary a Raspberry Pi would not give you more troubles than an ordinary Arduino.
Programming on Linux might involve learning a few new things but many things are much easier on Linux than they are on Arduino (networking is definitely one of these areas) so if the application is rather network oriented (as yours seems to be) I still think a Linux based solution is the better choice. There are other boards with much more IOs if that's what you actually need.

A hybrid solution is the Yun, although that board hasn't as many IOs as the Mega2560 of course.

I still haven't found a direct way to reset the W5100 in these situations, in some cases i have managed to do something to the chip that even hardware resets didnt fix, i had to actually cut power to it. I would like to avoid those kinds of situations.

i do have a watchdog implemented to at least reboot the device if it hangs up trying to talk to the ethernet chip, but there are situations - aka the port lockup that allow the sketch to keep running, and the sketch has zero idea that there is an issue.

For example in my issue here, the socket send works fine each time before and after the "Bug" occurs. udp.send etc all return the proper codes, but according to wireshark, that packet never leaves the W5100, and as far as i can tell not even the library is aware of it. It could even be a hardware bug with the W5100 which would suck pretty badly.

I thought i would post an update to this.

it turned out to be bad power, the wiznet chip seems to be very picky about its power delivery. I was running my freetronics mega of the micro USB, as soon as i started to power it externally using a 12 volt power brick all my problems went away.

I also had to go through all my code to confirm i didn't have any buffer overruns and i definitely found a few that could cause some issues.

Thanks for all your help. The entire code base is around 6500 lines long, compiles to around 85k in size and at compile time uses 5091 bytes. Its an fairly complicated bit of code including a web browser, log browser, NTP client, temperature, ph, flow rate sensing, controlling relays, web based authentication, sending email, UDP communications, pinging gateways, and on the fly rebooting of the network interface without rebooting the entire unit, it has watchdog support, RTC support, browsable file interface to download and delete files, i have got it transferring fires at around 32kb a second too, as well as an on the fly serial logging system, and soon to be support is a verbose and terse logging to serial that also gets logged to SD card.

its a beast!

1 Like

This topic was automatically closed 120 days after the last reply. New replies are no longer allowed.