I want to reboot the ESP8266 after 24 days, how?

My ESP-12 (Adafruit Huzzah breakout) device is supposed to run unattended 24/7 at a remote location to collect and report data and I want to secure that it does not run into unexpected trouble due to millis() going into negative territory...

So I thought to use this inside loop() to restart after 24 days:

    #ifdef MAX_OPERATE_TIME //defined as 2073600000 ms
        if (millis() > MAX_OPERATE_TIME) 
        {
            Serial.println("Max run time exceeded, restarting device");
            mqtt_client.publish("aspo/p1meter/admin", "Max run time exceeded, restarting device");
            delay(1000); //To allow serial and mqtt messages are delivered
            ESP.restart();
        }
    #endif

But will this work correctly?

Possible issues:

  • ESP.restart() not working properly (I have read old posts regarding this)
  • The serial and mqtt message deliveries not completed before restart
  • delay() is maybe blocking so nothing at all happens while it runs?

How does millis() go negative?

This is a XY problem.

millis() will never go negative, it returns an 'unsigned' long, which does roll over after about 50 days, but if you calculate elapsed time correctly this is not an issue.

always works fine for me.

that should not be an issue, (1000 ms is way more than enough to complete Serial) but you should close 'any' wifi connections before restart, and if you are using EEPROM, make sure you 'commit()'

which delay() ?

You are introducing problems while trying to fix a problem that is not there.

Did you see the Blink Without Delay page ? https://www.arduino.cc/en/Tutorial/BuiltInExamples/BlinkWithoutDelay.
If you follow that, then there is no problem with millis(). During a rollover from 0xFFFFFFFF to 0x00000000 then it will still work perfectly fine.

Every library that uses millis() should use it in the right way.

If you wrote bad code with millis(), then it could cause a problem. In that case I suggest to fix the bad code.

This delay in Arduino.h:

void delay(unsigned long);

As it is being used below:

Since I cannot find out how it is implemented it is hard to determine if it allows other stuff to run during the delay period.
And if it is a millisecond delay, which I believe it is.

Anyway I have started a terst now while setting the trigger 2 hours ahead.
Will see what happens on both serial and mqtt.

It seems that you don't understand how the millis() works.
So it would be better that you increase your experience rather than making unnecessary things like restarting the arduino.

1 Like

That was just an example, I know how millis() works but not how delay() and yield() operate. But I just saw that mqtt_client.publish() is blocking so the mqtt message will be sent at least...
And I am having some other issues with the device when it runs for long periods so I figured it to be best to actually restart it in a controlled manner.

You must be having issues with the code.

yield() executes scheduled tasks (like wifi responses and keeping connection alive.
delay() does nothing (other than call yield() ) during the period, the program is held up, but any interrupts are still allowed to be executed (which may include data being transferred to the UART btw) but no new lines of the program are executed that are in the normal sequence of the program.

Thanks!

so hopefully this knowledge is not related to this

1 Like

Fix the issues instead of applying bandaids.

What are the issues? Perhaps we can help to fix them.

I don't have accurate enough logging, it turns out that it resets randomly (to me at least) by looking at the serial terminal. I will have to add to this Windows application some form of logging to disk in order to be able to go back and look at what has happened before the startup report is sent (I am sending a boot message by calling a website php script from the device when it has gotten a WiFi connection on start).
But it might be that it had WiFi problems at the time, I really do not know.
In one such instance it came back up without sending anything at all for a couple of hours before I saw that it was mute...

But right now I am waiting for the next programmed reboot (I shortened the timeout for testing). It should happen in about 35 minutes.

A place tp start: A common problem that can cause that sort of behavior is writing or reading detond the bounds of an array. Examine all array operations for possible bounds violations.

What Arduino board are you using?

This:

And what are you doing with it ? If it resets after a significant time, heap-fragmentation is the most common culprit.

I am building an electricity meter data extractor to be connected to the meter at our summer home in order to monitor the consumption over the winter when we are not there. It will also measure the temperature in the meter box (on the pole where the electric connection is made). This will be close enough to real outside since the box is aluminum.
The collected data are sent hourly to my website database.

It will be accompanied by 2 other devices (RaspberryPi units) which measure the temperatures inside the two houses and report this to the website database too.

Since the summer home is in a remote area I am looking for a fail-safe system such that it can be left alone for the winter until late April or so.

In my experience, there is no need to manually reset the ESP, but several times a year i reset my internet router.
if something

If you share the code we can pint to possible causes. By now my ESP's are reliable, but there are pitfalls, mainly heap fragmentation and you should not disable interrupts. Wifi connection is very dependant on them being turned on. Anyway, share the code and we can have a look.

It is not practical, 68 k of cpp files and 24 k of include files...
And a whole bunch of defines that enable certain parts of functionality...

But I have now had it running without crashing for days on end and I am also now logging everything that is being sent out of he serial port, so if there is a crash I should at least see the aftermath and what happened last before the crash.
OTOH the last log before might well be many minutes before.

I did test the idea of forced reboot at a certain time from start and it did work as expected. The code for that part is this located at the end of loop():

    #ifdef MAX_OPERATE_TIME
        if (millis() > MAX_OPERATE_TIME) 
        {
            char message[] = "Max run time exceeded, restarting device in 1 second";
            Serial.println("Max run time exceeded, restarting device");
            send_metric_text("admin", message); //MQTT publishing of a message
            delay(1000); //To make sure serial and mqtt messages are delivered
            mqtt_client.disconnect();
            ESP.restart();
        }
    #endif

Note that I have planned to set MAX_OPERATE_TIME to 2075400000, which is 24 full days plus a half hour to not be synced at the same UTC time as last time it started.

Not yet able to deploy the system because a snow blizzard here over the weekend and start of this week has rendered the summer home difficult to reach...

So right now after sending the above I have noticed that it is getting into trouble regarding the MQTT connection...
It stays inside the following loop in the loop() procedure:

    if (!mqtt_client.connected())
    {
        if (now - LAST_RECONNECT_ATTEMPT > 5000)
        {
            LAST_RECONNECT_ATTEMPT = now;

            if (mqtt_reconnect())
            {
                LAST_RECONNECT_ATTEMPT = 0;
            }
        }
    }
    else
    {
        mqtt_client.loop();
    }

And the corresponding logging is like this:

====== SendWebRequest exit ======
MQTT connection attempt 1 / 10 ...
MQTT Connection failed: rc=-2
 Retrying in 5 seconds

MQTT connection attempt 2 / 10 ...
MQTT Connection failed: rc=-2
 Retrying in 5 seconds

MQTT connection attempt 3 / 10 ...
MQTT Connection failed: rc=-2
 Retrying in 5 seconds

Not much to see here unfortunately.
Meanwhile the MQTT broker is fully accessible to me...
I have a PuTTY session running towards it where I subscribe to everything so I can see it on that screen arriving properly.
Something must be afoot in the mqtt library....
It started at 17:56:09 after an idle stretch of 8 minutes.
And then at 2022-11-24 18:13:31 it again processed some data and sent out MQTT messages....
Now I have to see if it has lost sync by waiting for the next hour starting when it should report the data.

LATER:
Turns out that it recovered and was able to report the hourly data OK:

====== SendWebRequest called ======
2022-11-24 19:00:00  millis: 30555952

And before that it was sending other data normally too.
So it recovers, but I still wonder why the hickup?

Maybe a missing yield() somewhere .