2 boards - mutual hardware watchdog

Hello,
I have trouble with "simple" part of my project.

I am using two boards in my project - ESP8266 and ESP 01.
ESP8266 is doing lot of stuff - collecting information of sensors (sending them to Google spreadsheet), evaluates the values and controls the relays.

My idea was to add HW watchdog - each controller will control another one
So if ESP8266 gets into an error state (its happening when WiFi is lost right during connecting - calling HTTPS GET) ESP 01 will trigger a restart.
And vice versa it will work the same - if ESP 01 is not responding - ESP8266 will trigger a restart.


Both devices are rebooting if there is LOW impulse on the RST pin.

The logic level on the WATCHDOG pin changes every 1000 ms.
The logic level is controlled by the pin PIN_INPUT.
At each change noted on the pin PIN_INPUT, a countdown is started - if the countdown reaches 60000 ms, a restart is triggered - with a LOW value on the pin PIN_OUTPUT.

Similar code runs on both two boards.

Unfortunately, I don't know why, but the resets happen even if there is no problem on the boards ... quite rarely (randomly), but it happens.
Does anyone have any idea what could be causing the random reset?

In theory it should work like this without problems..

`
#define PIN_INPUT 1    
#define WATCHDOG 2 
#define PIN_OUTPUT 3  

int control= 0;
int last_control = 0;
bool watchdog = false;


//promenne pro casovani milis()
unsigned long previousMillis = 0;  
const long interval = 500;       // frequency of logic level check - 500 ms

unsigned long previousMillisCheck = 0;

unsigned long previousMillisWD = 0;  
const long intervalWD = 1000;      // timer for watchdog - every 1s changes  logical level
const long timeWithoutResponse = 60000;      // maximum allowable time between state change checks


void setup() {
Serial.begin(9600);   
delay(500);

pinMode(PIN_INPUT, INPUT);
pinMode(PIN_OUTPUT, OUTPUT);
pinMode(WATCHDOG, OUTPUT);

digitalWrite(PIN_OUTPUT, HIGH);
}

// begin ********* HW WATCHDOG - CONTROL STATE OF ESP8266  *********
  if (currentMillis - previousMillis >= interval) {
    previousMillis = currentMillis;

    control= digitalRead(PIN_INPUT);

    if (control != last_control){
      previousMillisCheck = currentMillis;
      }

    if (currentMillis - previousMillisCheck >= timeWithoutResponse){
      previousMillisCheck = currentMillis;
      digitalWrite(PIN_OUTPUT, LOW);
      Serial.print("restarting......");
      delay(500);
    }    

    last_control = control;
    digitalWrite(VYSTUP, HIGH);
   }
// end    ********* HW WATCHDOG - CONTROL STATE OF ESP8266  *********


// begin ********* HW WATCHDOG INTERNAL CYCLE  *********
if (currentMillis - previousMillisWD >= intervalWD) 
  {
    previousMillisWD = currentMillis;
    Serial.print("WATCHDOG: "); 
    if (watchdog == false) {
      digitalWrite(WATCHDOG, LOW);
      Serial.println(watchdog);
    } else {
      digitalWrite(WATCHDOG, HIGH);
      Serial.println(watchdog);
    }
    watchdog = !watchdog;
  }
// end   ********* HW WATCHDOG INTERNI CYKLUS  *********`

the builtin watchdogs do not work?

I don't want to use built in watchdog because I want to log (send notification) when board is not responding. So thats the reason I want to use 2 boards for mutual control.

you have a 2s heartbeat (1s HIGH and 1s LOW) that is respected only if the rest of the code does take less than 1s.

Also on the other side you poll from time to time, but how can you be guaranteed that you don't poll at the wrong time, missing the alternance?

for example in this scenario


the ESP#2 only sees the pin LOW ➜ no change, triggers a reset

the external heartbeat should be on an interrupt pin to not depend on what the rest of the code is doing and the loop from time to time will check that everything is working fine.

1 Like

Yes you are absolutely right!
Thank you very much for your time with explanation.

I just added some randomization of the control period and now it works just fine.
(It's still possible to have the problem you describe, but it's very unlikely now)

#define PIN_INPUT 1    
#define WATCHDOG 2 
#define PIN_OUTPUT 3  

int control= 0;
int last_control = 0;
bool watchdog = false;


//promenne pro casovani milis()
unsigned long previousMillis = 0;  
const long interval_min = 500;       // frequency of logic level check - 500 ms
const long interval_max = 2000;

unsigned long previousMillisCheck = 0;

unsigned long previousMillisWD = 0;  
const long intervalWD = 1000;      // timer for watchdog - every 1s changes  logical level
const long timeWithoutResponse = 60000;      // maximum allowable time between state change checks


void setup() {
Serial.begin(9600);   
delay(500);

pinMode(PIN_INPUT, INPUT);
pinMode(PIN_OUTPUT, OUTPUT);
pinMode(WATCHDOG, OUTPUT);

digitalWrite(PIN_OUTPUT, HIGH);
}

// begin ********* HW WATCHDOG - CONTROL STATE OF ESP8266  *********
  if (currentMillis - previousMillis >= random(interval_min, interval_max)) {
    previousMillis = currentMillis;

    control= digitalRead(PIN_INPUT);

    if (control != last_control){
      previousMillisCheck = currentMillis;
      }

    if (currentMillis - previousMillisCheck >= timeWithoutResponse){
      previousMillisCheck = currentMillis;
      digitalWrite(PIN_OUTPUT, LOW);
      Serial.print("restarting......");
      delay(500);
    }    

    last_control = control;
    digitalWrite(VYSTUP, HIGH);
   }
// end    ********* HW WATCHDOG - CONTROL STATE OF ESP8266  *********

that should indeed cover your need unless Murphy strikes :slight_smile:

1 Like

I believe you can assign a watchdog interrupt to a user ISR. It could log. Might be a little tricky if you have to re-enable interrupts to log.

The issue would be, what is causing the lack of response. If it's a software flaw, perhaps you can just find and fix the flaw.

Are you currently experiencing crashes? Apart from the one that was caused by your solution?

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.