ntp client stuck in Udp.endPacket()

After having made heavy modifications to my code, my ntp client is stuck in Udp.endPacket() and stays there forever. Any idea?

Yes. Post your heavily modified code, or edit it down to isolate the problem. Did it work without mods?

Thanks, Yes, original code was stable. There's certainly a bug I cannot see. I'll post the code when back home.

OK. If you have time, try editing it down to just the basics and see if you get the same fail. If so, post that code.

Will do. Indeed it's pretty lengthy

The sketch and the faulty output. I left all the functions just for reference but most of loop is commented
IDE is 1.6.5, mega board and ethernet shield
Thanks

_2_minimal_with_bug.ino (50.2 KB)

_2 minimized.txt (3.22 KB)

It will be a few hours before I can check your code. Which Arduino are you using? It is a mega 2560?

Add some error checking to the packet sends.

beginPacket returns 1 if a socket was available, and 0 if not. endPacket returns 1 if the next device en route to the destination accepted the packet, and 0 if not.

Here is how I evaluate that:

  if(Udp.beginPacket(ntpServer, ntpPort) == 1) {
    Udp.write(packetBuffer, NTP_PACKET_SIZE);
    if(Udp.endPacket() != 1) Serial.println(F("Send error"));
  }
  else {
    Serial.println(F("Socket error"));
  }

You don't want to try to send a packet if there was no socket available. I have not tested that to see what happens, but I can almost guarantee the result will not be pretty.

Thanks SurferTim, my board is mega256, ide 1.6.5

You wrote "endPacket returns 1 if the next device en route to the destination accepted the packet, and 0 if not." The next device is the router and their exchange is in layer 2 so that it should always be accepted by the router and else would be sent again by the mac layer. More probably, it (endPacket) looks for a L4 uknowledge by the ntp server of a udp packet.

I am embarrassed: This code behave as a quantic system: whenever I try to zoom into it by accelerating the failure occurrence, it changes behavior and starts working! For example with the original line 220

 if(ntpSuccess){epochLocal=epoch; initClock();epoch2hms(epochLocal, hmsLocal);}

it will "always" fail but this always is not so frequent as it happens only once an hour. If the line is changed into

if(ntpSuccess){epochLocal=promptEpoch(); initClock();epoch2hms(epochLocal, hmsLocal);}

so as not to wait so long it will always (a much better always this time) succeed!

Now this is really strange: why should this change in the code influence the other end (ntp server) in any way? This suggests the conclusion that there is some bug hidden in the code but again, the failure rate is so slow that we cannot say it's deterministic.

In your code you suggest to test if(Udp.endPacket() != 1)

but, referring to output file pasted below

... gottosync sync index is 9 sending ntp packet sending begin packet begin packet sent, sending write packet write packet sent, sending end packet

the function Udp.endPacket() never returns so how would we (in case of failure) test it's value? and how come a function from Arduino library never returns? Aren't they using some kind of timeout so that it returns whatever the result?

Are you aware there are two sendNTPpacket functions in your posted code? One before getEpoch and one after.

So what you are saying is the beginPacket function is returning 1? In your case, the beginPacket function call could fail for two reasons. 1) DNS resolution of your NTP server failed. 2) No socket available.

I don't know what you mean by this:

More probably, it (endPacket) looks for a L4 uknowledge by the ntp server of a udp packet.

UDP does not get an acknowledge from the destination device, only the next device en route to that destination. The only way to determine if the packet was received is a response UDP packet from the server. That would be in your getEpoch function.

edit: Like I said before, if the beginPacket function call fails for the reasons I listed above, the endPacket function will certainly fail (or crash).

SurferTim: Are you aware there are two sendNTPpacket functions in your posted code? One before getEpoch and one after.

two sendNTPpacket? I only see one(there is another gerEpoch commented out) At what lines are yo seeing two definitions of sendNTPpacket?

I ran at 15:00 My time, the exact code posted with your code. It failed. Here is the output ... gottosync sync index is 4 sending ntp packet

stuck there

Analysing the code

if(Udp.beginPacket(ntpServer, ntpPort) == 1) {
    Udp.write(packetBuffer, NTP_PACKET_SIZE);
    if(Udp.endPacket() != 1) Serial.println(F("Send error"));
  }
  else {
    Serial.println(F("Socket error"));
  }

it is clear that Udp.beginPacket returned a 1

and probably meaning that again udp.endPacket() never returned

Never mind the packet acknowledge protocol for now

I iterate my question: Why is it that udp.endPacket does not return? Why is it stuck and not aborted by a timeout?

I'll run again and display the results of the 3 calls. Hopefully it will fail again stuck in udp.endPacket

This is the code for 16:00

  Serial.print("Udp.beginPacket is ");Serial.println(Udp.beginPacket(ntpServer, ntpPort));
  Serial.print("Udp.write is ");Serial.println(Udp.write(packetBuffer, NTP_PACKET_SIZE));
  Serial.print("Udp.endPacket is ");Serial.println(Udp.endPacket());

OK, post the serial monitor display for that code for 16:00. It should be: Udp.beginPacket is 1 Udp.write is 48 Udp.endPacket is 1

If not, you have a problem.

edit: My bad. I will reiterate my answer. If the beginPacket function fails, the endPacket function will fail or crash. I have not tested which it will do.

If the beginPacket and write functions succeed, then the endPacket function should return after a timeout if the send to the next device en route to the destination fails. This part I have tested. As I recall, the timeout value will be about 2 seconds.

output of the run

gottosync sync index is 5 sending ntp packet Udp.beginPacket is 1 Udp.write is 48 Udp.endPacket is

Conclusion: Udp.endPacket did not return which is resumed by the title of this thread.

If as you said it SHOULD return after a ~2" timeout it would mean that i) I have broken code in the library or ii) my program corrupts variables use by endPacket

What do you think? We are lucky to have a consecutive failure series. What should we try for 17:00 :confused:

That is strange. Are you certain you are not running out of SRAM? That would cause corrupted variables.

I made all these "heavy" modifications to a stable code because after having added a web server I had some instabilities and so I decided to remove all Strings and to go for char arrays because I read that there was no mean to know that a dynamic allocation evoked by a String won't fail. I don't know if I run out of memory. I'll free 2KB of global vars for the 17:00 run

It is coincidental that you should mention the String data type. I was just going through your code to check for that. I have never had good luck with the String type. It has ALWAYS managed to crash my code, and the larger the code size, the more likely and quicker it crashes.

Sketch uses 31,614 bytes (12%) of program storage space. Maximum is 253,952 bytes. Global variables use 3,458 bytes (42%) of dynamic memory, leaving 4,734 bytes for local variables. Maximum is 8,192 bytes. 17:00 gottosync sync index is 6 sending ntp packet Udp.beginPacket is 1 Udp.write is 48 Udp.endPacket is

18:00 moved some strings to program memory Sketch uses 31,608 bytes (12%) of program storage space. Maximum is 253,952 bytes. Global variables use 2,826 bytes (34%) of dynamic memory, leaving 5,366 bytes for local variables. Maximum is 8,192 bytes. gottosync sync index is 7 sending ntp packet Udp.beginPacket is 1 Udp.write is 48 Udp.endPacket is

Have you tried the nap client example again with the same setup stuff as your new code?

SurferTim: Have you tried the nap client example again with the same setup stuff as your new code?

nap client??