Interesting use of Watch Dog Timer to guard blocking code.

I had a problem last week with a library function locking up under certain conditions and was looking at ways around it. The function should have had a timeout feature or something to prevent lockups, but it didn't. I didn't want to try to modify the library or anything, so instead I started thinking about ways to use the WDT to get around this type of thing in general.

I nearly didn't post this because I thought it was kind of weird and kludgy. However someone posted a question about this very thing today so I thought that I'd make my findings available. Kludgy as it is, it does actually work quite nicely. :slight_smile:

Firstly understand the rational. You shouldn't need to do this except as a last resort. Blocking functions that can potentially lock up indefinitely should be written with a timeout option so that this is not necessary. If however you are trying to use a library function that you either don't have the time or don't have the ability to modify, then this "guard code" method is worth considering.

Although I used the watch dog timer I didn't use it in the traditional way.

  • It isn't in continuous operation. It is only enabled to guard specific code sections (usually a call to a library function) and then disabled when that section completes.

  • It doesn't reset the processor on timeout. It instead recovers from the fault and returns the program to where it left off, but with a flag set to indicate that it didn't complete.

  • Any timer could have been used for this function, but I choose the WDT as most of my sketches don't use it, so it's available (and Timer0 ... Timer2 are useful for so many other things).

How the code works is basically that you just add a short preamble before the call to the potentially locking up code. This preamble saves the stack pointer, resets the "timeout" flag, resets the WDT count, enables the WDT interrupt, and finally saves a return address to the point immediately prior to the point where the suspect (potentially locking) function call occurs.

If the suspect code executed normally (determined by the cleared "timeout" flag) then you disable the WDT interrupt as soon as this section completes.

If however the "timeout" flag is set at the point where the function is called, then you know that it failed and you can execute alternate code. For example an error message, or blinking LED, or a try again, or an alternate algorithm - whatever you like.

One nice thing about this method is that it can be reused in as many places as you like throughout your code. The same preamble and the same WDT interrupt routine can be reused over and over, but with individual actions on each piece of guarded code. (Because the interrupt returns to the specific piece of code that failed each time, so each piece can have it's own unique action if needed.)

Here is my code and an example of its usage. The example is deliberately trivial, waiting on a button press that may or may not ever happen. Obviously the code could easily have just timed this out without needing anything elaborate like this. It's just to make the example simple.

// The following code shows how to use the watch dog timer to guard a peice of
// blocking code that can potentially "hang" indefinitely.

extern "C" void prepareWdtInterrupt();

// Global variables used by the WDT guard code
//--------------------------------------------
volatile bool wdTimeout = false;
byte spSaveL, spSaveH, retAddrL, retAddrH;
//--------------------------------------------

#define btnPin 12

#define wdtInterruptMask 0x40       // Mask for interrupt enable bit. Note that we don't use WDT for reset at all.
#define wdtEnableChanges 0x18       // Must be written within 2 clock cycles of changing WDT timer settings.
#define wdt250ms  0x04              // Define a few possible timeout values. Others are available - see datasheet.
#define wdt500ms  0x05
#define wdt1000ms 0x06
#define wdt2000ms 0x07
#define wdt4000ms 0x20

void setup() {
  Serial.begin(9600);  // start serial for output
  pinMode(btnPin,INPUT_PULLUP);
  Serial.println("Ready to go");

  // The following code initializes the WDT for a 4 second time out. Interrput and reset function intially disabled
  //---------------------------------------------------------------------------------------------------------------
  cli();
  WDTCSR |= wdtEnableChanges;                 // Enable changing the WDR prescaler
  WDTCSR =  wdt4000ms;                        // Set for 4.0 second interval with intr (and reset) disabled  
  sei();
  //---------------------------------------------------------------------------------------------------------------
} // end setup

void loop() {
  Serial.println();
  Serial.println("You have 4 seconds to press the button");

// The following code prepares the watchdog break from potentially blocking code
  //------------------------------------------------------------------------------------------------------------------
  wdTimeout = false;                          // Clear the timeout flag
  spSaveL = SPL;                              // Save SP for later restoration
  spSaveH = SPH;
  prepareWdtInterrupt();                      // Enable WDT intrpt, reset WDT count and save return address to "here"
                                              // WDT_vect will now return directly to the following statement
  //------------------------------------------------------------------------------------------------------------------

  if (!wdTimeout) {                           // If we arrived here from normal program flow do this code.
    waitForButton();                          // Do our potentially "hanging" code
    WDTCSR &= ~wdtInterruptMask;              // Now past the blocking code so we disable the WDT interrupt
    Serial.println("Button Pressed");
  }
  else {                                      // If we arrived here due to a WDT interrupt do this code.
    SPL = spSaveL;                            // Readjust the stack to be correct for this point in program
    SPH = spSaveH;
    Serial.println("WDT timeout: The user is asleep ");
  } 
} // end loop

With the external prepareWdtInterrupt() and WTD_vect() being written in asm in an associated "S" file with the code below.

#include "avr/io.h"
#define wdtInterruptMask 0x40;

.global prepareWdtInterrupt
.global WDT_vect

prepareWdtInterrupt:                      // Enables WDT and resets WDT count. Saves ret address for WDT recovery
  pop   r18
  pop   r19
  push  r19
  push  r18                               // Get return address
  sts   retAddrH,r18                      // Save it for later
  sts   retAddrL,r19                      // Save return address
  wdr                                     // Reset WDT count value
  lds   r18,    WDTCSR
  ori   r18,    wdtInterruptMask          // Enable watchdog interrupt
  sts   WDTCSR, r18
ret

WDT_vect:
  wdr
  ldi   r18,    01
  sts   wdTimeout, r18                    // Set timeout flag
  
  lds   r18,    WDTCSR
  andi  r18,    ~wdtInterruptMask         // Disable watchdog interrupt
  sts   WDTCSR, r18
  
  lds   r18,    retAddrH
  lds   r19,    retAddrL
  push  r19
  push  r18                               // Set return point to just after where prepareWdtInterrupt() was called
reti

The full sketch is attached if anyone wants to try it. The sketch uses a push button from pin12 (UNO) to gnd.

testWDT.zip (2.06 KB)

I am not in favour of using the watchdog-timer as a substitute for writing a program properly.

The purpose of the WDT is to get you out of a problem even if something goes wrong with properly written code - perhaps because some external element behaves badly.

...R

Hi Robin. Of course it is preferable to write the code so that it will time out instead of locking up. The above is specifically meant as a work around for when we have a misbehaving library function that you may be unable to rewrite. This is meant as an option (other than rewriting the library) for dealing with library code, not for functions that you have written yourself.

I totally agree that if you have a function that you have written yourself, and it can lock up under certain conditions, then get to the bottom of what causes it to lock up and fix it properly. This is meant to be for functions that you may not be able to rewrite yourself.