I'm hoping someone with more knowledge of these things might be able to shed some light on an odd problem I'm having.
I have two "Internet of Things" microcontroller projects, one older that sends data to ThingSpeak, one newer that sends data to GroveStreams. Both synchronize their time with an NTP server once an hour at 57 minutes after the hour. The code is significantly different between the two. Likewise the hardware is not identical (different MCU, different Internet offload module).
The older project uses a single hard-coded server (time-c.nist.gov) while the newer project does a DNS lookup on pool.ntp.org and so uses a different server every time it is reset. UDP being what it is, I built retry mechanisms into both in case a response is not received from the NTP server. If no response is received after some number of retries, the code resets the MCU via the watchdog timer. The retry parameters are different. The old project will reset after five failures to receive an NTP response, with 60 seconds between retries. The newer project resets after three failures with 30 seconds between retries.
I have seen several instances now where both start failing to receive NTP responses simultaneously. This may go on for some number of minutes. Last night, the newer project reset five times, after each reset it tried a different NTP server, all of which failed to respond (except on the last reset of course). Starting at the same time, the older project saw three failures but did not reset. Then both recovered and went on their merry way.
I can't imagine what might cause this. It's definitely not a general network outage, as data continues to be sent to ThingSpeak and GroveStreams successfully. Also, I know that I was online during at least one of these occurrences and didn't notice any issues from the computer.
It's as though all the NTP servers tried are offline at the same time, but that doesn't make any sense to me. This really isn't a huge issue but it is certainly unexpected.
Is your router accepting the NTP packet? The Arduino ethernet library endPacket() function returns a value to determine if the next device (not the destination device) took the packet. Does your code allow the same error checking? This is from EthernetUdp.h.
// Finish off this packet and send it
// Returns 1 if the packet was sent successfully, 0 if there was an error
virtual int endPacket();
beginPacket also allows error checking. It also returns a value depending on the parameters passed.
// Start building up a packet to send to the remote host specific in ip and port
// Returns 1 if successful, 0 if there was a problem with the supplied IP address or port
virtual int beginPacket(IPAddress ip, uint16_t port);
@SurferTim, good question, I was not checking the returns from those calls. I will do so. If it were the router, that would certainly explain it. What might be a reason for the router not accepting NTP packets (temporarily)?
Thanks very much!
"Never test for an error condition you don't know how to handle." :-[
That depends on how your Arduino gets its IP. If from the router by dhcp, maybe the lease expired. Some routers are fussy that way. Maybe the ethernet port on the router temporarily fails.
The router is where I would start looking, especially since the fails happen simultaneously. That is almost certainly a router problem.
I agree, the router is a prime suspect. The Arduino is using DHCP. My code does an Ethernet.maintain() hourly. I do check the status from that and they all look good. I can see in the router log where DHCP assigned the IP address every time the Arduino restarted, and there are no other error messages. It's a pretty sketchy log though; even though I've got all the options turned on, there's very little information in it. It consists almost entirely of DHCP IP assignments and UPnP events.
Then I would modify your code to include the error checking I mentioned earlier in this thread. It may not be the localnet side of the router. It may be your ISP or some device on the WAN side of the router that fails.
If the router continues to take the packets, but no packets return, then that will tell you something also.
SurferTim:
I'll see if we can get that part fixed (hopefully).
Thanks for that! I hope to get the changes to test the return value into my code today. Then it will probably need to run for a while. Not sure exactly how often this occurs, my best guess is once or twice a week.
Got one. The NTP server that had been working for about 30 hours straight started failing to respond (or at least it looked like it from here). Three failures, then a reset, got a different NTP server, three timeouts in a row, another reset, yet a different server, one timeout, then a response on the second try.
No bad status codes from either beginPacket() or endPacket(). I guess that means the router is OK and my ISP or something on the WAN is the trouble.
Not sure how to troubleshoot further. Seems like I'd have to know right when it happens, then do a tracert to find where the failure is. In this case the problem only lasted about two minutes. Maybe I just live with it, and tune my retry code accordingly.
PS: Interestingly, the other IoT logger synced successfully while the one failed, so I guess it doesn't always occur simultaneously as I thought previously.
How often are you polling the NTP server? Some are a bit sensitive how often the same IP contacts them and will quit responding to requests from that IP for a while.
SurferTim:
Or use a localnet NTP server like I do.
Haha, sweet. I've wondered about that.
How often are you polling the NTP server? Some are a bit sensitive how often the same IP contacts them and will quit responding to requests from that IP for a while.
Once per hour. NIST says not faster than once every four seconds (!) so nowhere close to that. Also I don't know whether that would explain the situation where my MCU restarts and gets a new server IP from the pool each time. I assume any throttle would be at the server level, but I suppose it could be done at the pool.
Guess you better! I'm sitting here watching mine run. I have my router set to issue the same IP to my RPi, but you could just as easily set a static IP in the RPi. I'm using the wifi shield, but you could use the ethernet shield just as well. I have the timeServer IP set to my RPi IP and it is working great! Let me know if your's does as well.