I have some of those ESP-01 modules and I made a project with one that I thought was working fine until I realized that after some time, the module stopped accepting commands over WiFi ... so I did some troubleshooting and decided to open a continuous ping to the module and see what happens ... and as it turns out, it stops responding to pings at exactly 5 minutes after the module boots up. I have dwindled the sketch down to this, and the behavior is still the same:
I was thinking about that, but watchdog timers simply reboot the device after a certain amount of time has passed. This isn't what's happening, and I have a problem with the notion that I would have to reboot these things every 5 minutes just to keep them working ... I've never read anywhere where that is required for these devices unless I'm missing something?
Thats a worthy effort for sure, I'll do that ... it's just that the 5-minute thing is too weird and wouldn't indicate to me that it's losing connection with the WiFi router because in every test I've done (and I've done about 8 or 9 so far, it stops responding at EXACTLY 5 minutes after boot. I know it's 5 minutes because I run a stopwatch after the first ping is responded to when I apply power.
But I'll do some serial checking as well just to be thorough.
OK, So I modified the sketch so that it sends a string to the serial port every 5 seconds and that continues even after it stops responding to pings. So the device is definitely not locking up or anything like that.
It is odd that the wifi drops in such a constant time. Obviously you could add a check for it in loop and reconnect on failure, but I'd be inclined to chase it down to see why it's dropping.
I can try that ... but what is even weirder, is after it stops responding to pings, it will respond intermittently ... as you can see here, it continuously pings and where it stops is exactly at that 5 minute mark, but if I let it keep pinging, you can see that it does respond ONCE IN A WHILE:
64 bytes from 10.10.10.22: icmp_seq=294 ttl=255 time=15.646 ms
64 bytes from 10.10.10.22: icmp_seq=295 ttl=255 time=9.134 ms
64 bytes from 10.10.10.22: icmp_seq=296 ttl=255 time=9.693 ms
64 bytes from 10.10.10.22: icmp_seq=297 ttl=255 time=2.613 ms
64 bytes from 10.10.10.22: icmp_seq=298 ttl=255 time=15.844 ms
64 bytes from 10.10.10.22: icmp_seq=299 ttl=255 time=2.795 ms
64 bytes from 10.10.10.22: icmp_seq=300 ttl=255 time=2.589 ms
64 bytes from 10.10.10.22: icmp_seq=301 ttl=255 time=8.813 ms
64 bytes from 10.10.10.22: icmp_seq=302 ttl=255 time=2.754 ms
64 bytes from 10.10.10.22: icmp_seq=303 ttl=255 time=2.471 ms
64 bytes from 10.10.10.22: icmp_seq=304 ttl=255 time=3.340 ms
64 bytes from 10.10.10.22: icmp_seq=305 ttl=255 time=10.046 ms
64 bytes from 10.10.10.22: icmp_seq=306 ttl=255 time=6.349 ms
64 bytes from 10.10.10.22: icmp_seq=307 ttl=255 time=9.557 ms
64 bytes from 10.10.10.22: icmp_seq=308 ttl=255 time=12.686 ms
Request timeout for icmp_seq 309
Request timeout for icmp_seq 310
Request timeout for icmp_seq 311
Request timeout for icmp_seq 312
Request timeout for icmp_seq 313
Request timeout for icmp_seq 314
64 bytes from 10.10.10.22: icmp_seq=315 ttl=255 time=39.435 ms
Request timeout for icmp_seq 316
Request timeout for icmp_seq 317
Request timeout for icmp_seq 318
Request timeout for icmp_seq 319
Request timeout for icmp_seq 320
Request timeout for icmp_seq 321
Request timeout for icmp_seq 322
Request timeout for icmp_seq 323
Request timeout for icmp_seq 324
Request timeout for icmp_seq 325
Request timeout for icmp_seq 326
Request timeout for icmp_seq 327
Request timeout for icmp_seq 328
Request timeout for icmp_seq 329
Request timeout for icmp_seq 330
Request timeout for icmp_seq 331
Request timeout for icmp_seq 332
Request timeout for icmp_seq 333
Request timeout for icmp_seq 334
Request timeout for icmp_seq 335
Request timeout for icmp_seq 336
Request timeout for icmp_seq 337
Request timeout for icmp_seq 338
Request timeout for icmp_seq 339
Request timeout for icmp_seq 340
Request timeout for icmp_seq 341
Request timeout for icmp_seq 342
Request timeout for icmp_seq 343
Request timeout for icmp_seq 344
64 bytes from 10.10.10.22: icmp_seq=345 ttl=255 time=38.485 ms
Request timeout for icmp_seq 346
Request timeout for icmp_seq 347
64 bytes from 10.10.10.22: icmp_seq=348 ttl=255 time=90.263 ms
Request timeout for icmp_seq 349
Request timeout for icmp_seq 350
Request timeout for icmp_seq 351
Request timeout for icmp_seq 352
Request timeout for icmp_seq 353
Request timeout for icmp_seq 354
Request timeout for icmp_seq 355
Request timeout for icmp_seq 356
Request timeout for icmp_seq 357
Request timeout for icmp_seq 358
Request timeout for icmp_seq 359
Request timeout for icmp_seq 360
Request timeout for icmp_seq 361
Request timeout for icmp_seq 362
Request timeout for icmp_seq 363
Request timeout for icmp_seq 364
64 bytes from 10.10.10.22: icmp_seq=365 ttl=255 time=82.612 ms
Request timeout for icmp_seq 366
Request timeout for icmp_seq 367
Request timeout for icmp_seq 368
Request timeout for icmp_seq 369
Request timeout for icmp_seq 370
Request timeout for icmp_seq 371
Request timeout for icmp_seq 372
Request timeout for icmp_seq 373
Request timeout for icmp_seq 374
Request timeout for icmp_seq 375
Request timeout for icmp_seq 376
Request timeout for icmp_seq 377
Request timeout for icmp_seq 378
Request timeout for icmp_seq 379
Request timeout for icmp_seq 380
Request timeout for icmp_seq 381
Request timeout for icmp_seq 382
Request timeout for icmp_seq 383
Request timeout for icmp_seq 384
Request timeout for icmp_seq 385
Request timeout for icmp_seq 386
Request timeout for icmp_seq 387
Request timeout for icmp_seq 388
Request timeout for icmp_seq 389
Request timeout for icmp_seq 390
Request timeout for icmp_seq 391
Request timeout for icmp_seq 392
Request timeout for icmp_seq 393
Request timeout for icmp_seq 394
Request timeout for icmp_seq 395
Request timeout for icmp_seq 396
Request timeout for icmp_seq 397
Request timeout for icmp_seq 398
Request timeout for icmp_seq 399
Request timeout for icmp_seq 400
Request timeout for icmp_seq 401
Request timeout for icmp_seq 402
Request timeout for icmp_seq 403
Request timeout for icmp_seq 404
Request timeout for icmp_seq 405
Request timeout for icmp_seq 406
Request timeout for icmp_seq 407
Request timeout for icmp_seq 408
Request timeout for icmp_seq 409
Request timeout for icmp_seq 410
Request timeout for icmp_seq 411
Request timeout for icmp_seq 412
Request timeout for icmp_seq 413
Request timeout for icmp_seq 414
Request timeout for icmp_seq 415
Request timeout for icmp_seq 416
Request timeout for icmp_seq 417
Request timeout for icmp_seq 418
Request timeout for icmp_seq 419
Request timeout for icmp_seq 420
Request timeout for icmp_seq 421
Request timeout for icmp_seq 422
Request timeout for icmp_seq 423
64 bytes from 10.10.10.22: icmp_seq=424 ttl=255 time=81.614 ms
Request timeout for icmp_seq 425
Request timeout for icmp_seq 426
Request timeout for icmp_seq 427
Request timeout for icmp_seq 428
Request timeout for icmp_seq 429
Request timeout for icmp_seq 430
Request timeout for icmp_seq 431
Request timeout for icmp_seq 432
Request timeout for icmp_seq 433
64 bytes from 10.10.10.22: icmp_seq=434 ttl=255 time=82.821 ms
Request timeout for icmp_seq 435
Request timeout for icmp_seq 436
Request timeout for icmp_seq 437
Request timeout for icmp_seq 438
Request timeout for icmp_seq 439
Request timeout for icmp_seq 440
Request timeout for icmp_seq 441
Request timeout for icmp_seq 442
Request timeout for icmp_seq 443
Request timeout for icmp_seq 444
Request timeout for icmp_seq 445
Request timeout for icmp_seq 446
Request timeout for icmp_seq 447
Request timeout for icmp_seq 448
Request timeout for icmp_seq 449
Request timeout for icmp_seq 450
Request timeout for icmp_seq 451
Request timeout for icmp_seq 452
Request timeout for icmp_seq 453
Request timeout for icmp_seq 454
64 bytes from 10.10.10.22: icmp_seq=455 ttl=255 time=25.565 ms
Request timeout for icmp_seq 456
Request timeout for icmp_seq 457
64 bytes from 10.10.10.22: icmp_seq=458 ttl=255 time=9.826 ms
Request timeout for icmp_seq 459
Request timeout for icmp_seq 460
Request timeout for icmp_seq 461
Request timeout for icmp_seq 462
Request timeout for icmp_seq 463
64 bytes from 10.10.10.22: icmp_seq=464 ttl=255 time=113.111 ms
Request timeout for icmp_seq 465
Request timeout for icmp_seq 466
Request timeout for icmp_seq 467
Request timeout for icmp_seq 468
64 bytes from 10.10.10.22: icmp_seq=469 ttl=255 time=115.671 ms
64 bytes from 10.10.10.22: icmp_seq=470 ttl=255 time=5.442 ms
Request timeout for icmp_seq 471
64 bytes from 10.10.10.22: icmp_seq=472 ttl=255 time=83.093 ms
Request timeout for icmp_seq 473
Request timeout for icmp_seq 474
Request timeout for icmp_seq 475
Request timeout for icmp_seq 476
Request timeout for icmp_seq 477
Request timeout for icmp_seq 478
Request timeout for icmp_seq 479
64 bytes from 10.10.10.22: icmp_seq=480 ttl=255 time=50.763 ms
Request timeout for icmp_seq 481
Request timeout for icmp_seq 482
Request timeout for icmp_seq 483
Request timeout for icmp_seq 484
Request timeout for icmp_seq 485
Request timeout for icmp_seq 486
The router is a Netgear XR1000 NetDuma gaming router... I assigned a static IP address to it in the router, which it does pick up when it boots - I changed the IP from 10.10.10.22 - statically assigned to the device to 10.10.10.25 which is now assigned to it via DHCP in the router ... no change in the results - and there is nothing that I can think of that would cause the router to block it after 5 minutes ... hell, it would be magic that the router would even realize that the device rebooted in the first place except for disconnecting and reconnecting to WiFi, but no router that I know of - at least not in the SOHO space - has the ability to drop a WiFi connection at a precise number of minutes after it connects... do you?
Well, it looks like I sort of solved the problem - using a work-around.
I put a non-blocking timer in the Loop and every 5 minutes and 50 seconds, I issue this command:
WiFi.reconnect();
And that causes it to drop out for about 5 seconds, but it then continues to respond normally until the next reconnect. It's not the greatest solution, but for this application, it will certainly work just fine.
I'd get a copy of wireshark and start looking at packet traffic. Your issue still feels a little like an IP conflict. When you assigned a static address, was it outside the range that your router uses for DHCP?
@wildbill - I'll tell you some interesting findings I discovered in troubleshooting this problem ... I did learn that when I log into the router via the web interface (as that is all that this router makes available) and when the pings stop happening, all I have to do is refresh the web page and it will continue pinging for another 5 minutes then it will stop. This is what led me to look into having the ESP re-authenticate every 4 min.50 ... see just because refreshing the router page like that 'kicks' the ESP - still doesn't tell me if the problem is in the router or the ESP ... all it tells me is that "something different happened" - which then wakes the ESP back up somehow ... but is that because the router has been refreshed and therefore whatever it did to "time out" the ESP has been reset? OR could it be that refreshing the router causes it to send out a new beacon packet ... or a re-serialized beacon packet thereby causing the ESP to see something different and then change its behavior?
Also, I have a WiFi thermostat on this router as well as 12 light switches, and even some WiFi light bulbs in lamps that all use some kind of microcontroller and I've never lost communication with them ... nor my printer, laptop, or my oscilloscope ... or my VOIP handsets ... the only device that I've ever seen fall off the network like this is the ESP ... which leads me away from the router even further as the source of the problem since no other device that uses it ever gets the cold shoulder from the router...
Clearly, I need to test it with a different access point or different router to see if I get the same results, because without doing some deep diving into the packets as you suggest ... there's no way I'm going to get to the bottom of this issue. And even with a nice Wireshark capture, I still might not find anything useful - or I might ... but that's a lot of work for a problem where my workaround is clicking along perfectly so far ...
But it does beg the question ... what IS really going on?
This router - as I mentioned, is a Netgear wiz-bang gaming router, but it runs the Net Duma OS which is the only router OS that I know of that does active geo-fencing and is aware of which game you're playing so that it properly fences for those servers, etc. But I've not been very impressed with their operating system at all ... it's got some quirks that I'm not happy about and the fact that I can't even access it from a terminal is another negative for me ... and I spoke to the people at Net Duma and their customer support techs say that they know of no way at all to gain access to the router at the terminal level ... but I know there has to be a way ... they program the damn things somehow and they certainly would need that kind of access for deep troubleshooting so there must be a way ... probably a serial port on the motherboard is what I'm thinking.
@wildbill - OK, NOW I think it has something to do with my router... Amazon delivered 5 brand new ESP-01's yesterday and I just uploaded the bare bones sketch where it only authenticates to Wifi and does nothing else ... the loop has no code at all and it does the exact same thing ... exactly at 5 minutes is stops pinging.
But I can't for the life of me even think of what kind of setting in the router would be responsible for this.
@6v6gt's suggestion reminds me of a problem I had several jobs ago. My boss and I, along with an installation engineer were setting up some new software. We worked with our infrastructure guys to set up a front end web server and proprietary back end database and started testing.
It worked fine. I spent a morning running successful data pulls and all was good. When I came back from lunch, it didn't work any more. With a new session, I was back in business. I went to a meeting and when I came back it was broken a second time.
After many many tests and meetings, it became apparent that the infrastructure folks had a policy that databases must live in the DMZ and that there was a firewall between the web server and the DB server. The network guy swore up and down that the firewall could not possibly be a problem - "It just routes packets" he said.
His analysis turned out to be incorrect. The firewall silently killed any TCP session that had had no traffic for thirty minutes which broke the connection between the servers. The software had a setting that enabled keep alive, we set it and all was well.
Five minutes seems a bit short, but the symptoms sound similar and perhaps ICMP traffic does not count to your router. Perhaps try sending a UDP packet once a minute (or establish a TCP connection) to demonstrate that the ESP is still active.
Here's another war story, from my first Arduino project. I was using a wifi shield on an Uno to pass data to a server. It worked well, for a while, but would usually crash within 24 hours.
It transpired that my ISP periodically sends me huge (1400 byte) ethernet packets and the shield library was using a 400 byte buffer and no bounds checking.
It doesn't fit the symptom of your observation of 300 seconds from boot to failure, but perhaps there is something going on within your network that is upsetting the ESP.
@wildbill - I do in fact send a UDP packet to the ESP every 20 seconds in a Java program that I wrote because I wanted to make sure the device was able to receive commands when I chose to send them. The code in the Java program just silently sends the word "ping" to the ESP which then responds with the word "pong" and if the Java program gets that response, it just does it again 20 seconds later. If it fails to get a "pong" after 5 attempts in a row, it throws an alert on the screen that tells me that the ESP isn't responding. This is how I discovered this problem in the first place.
And I use UDP exclusively with ESPs in my projects ... can't remember why I decided to go that way, maybe it was easier or something.
Also, the network guy in the story about the database ... should not have responded with "the firewall just routes packets" ... because commercial quality firewalls are a maze of configurations and rule sets ... timing out an allowance rule into a DMZ when a session goes dark for 30 minutes doesn't seem unreasonable at all to me. He should have went over his configs before responding IMHO.