Three watchdog questions

Hello everyone,

I decided to implement the watchdog given rare but persistant instabilities of my system. I managed to implement it and it works just fine. I also found that using the interrupt combined with reset allows longer than 8" timeouts.

First question
The mega2560 datasheet -ATmega640/1280/1281/2560/2561, paragraph 12.5.2 Watchdog Control status reg WDCSR- recommends

" To stay in Interrupt and System Reset Mode, WDIE must be set after each interrupt. This should however not be done within the interrupt service routine itself, as this might compromise the safety-function of the Watchdog System Reset mode.

Can someone please suggest why? In what would it compromise safety of operation more than having WDIE set out of the ISR? It seems to me that both are roughly equally unsafe.

second question
Why have they chosen to imply WDE by the WDRF flag of the MCUSR? Can someone help me justify their assertion "This feature ensures multiple resets during conditions causing failure, and a safe start-up after the failure"? ... and this would mean that given that the boot loader clears MCUSR we can not take advantage of this feature, assuming there is such an advantage?

The third question is about the reset source. I understood that there are boot loaders allowing app code access to MCUSR; can someone please point me to a resources explaining how to do it? I mean what object/hex code? how to burn the boot code?

Thank you very much for your help

Guy

guy_c:
I understood that there are boot loaders allowing app code access to MCUSR; can someone please point me to a resources explaining how to do it? I mean what object/hex code? how to burn the boot code?

Optiboot is a great example. Start here...

I believe @westfw has discussed the strategy in the Microcontrollers section.

Why have they chosen to imply WDE by the WDRF flag of the MCUSR?

I believe the idea is to force the processor into a continual reset if recovering from a watchdog reset fails.

Sorry if this seems trivial; do one need to compile it? How do one burns it?

The point of forcing WDE if WDRF is set: Imagine there is some outside incident (maybe a lot of noise on the power rails, electromagnetic fields, that kind of thing) that causes the board to freeze. Watchdog kicks it over. But the event has not yet finished, and the board freezes again before it can run it's initialization code to turn on the WDT. Without this behavior, the board would now be hung until manually reset. By forcing the watchdog on after a WDT reset, once the transient phenomena had ended, the board would restart. Recovering from unexpected events like that is exactly what the WDT is for, so it's key to the primary function of the WDT reset that it handle this case without getting hung up.

Re 1), I suspect the idea is that if the WDT interrupt always resets WDIE, and you have WDT reset and interrupt both enabled, it would mean that the program could get into a hung state where only the interrupt was firing, but the WDT reset would never fire because the ISR keeps resetting WDIE. In which case, why do you have both interrupt and reset enabled?

guy_c:
Sorry if this seems trivial; do one need to compile it?

Looks like it. @westfw does not include a HEX file for the ATmega2560 processor.

How do one burns it?

For processors with less Flash you can just use an Arduino running the ArduinoISP sketch. I have no idea if that works for the ATmega2560 processor.

@DrAzzy, Thanks

re 2, in fact it's a way to make the WDE remanent across reset cycle or, in other words, once set by the code it must be explicitly reset by the code and not by anything else. However, if the WD did not time out when a power shortage occurred (and the is the situation with a sound code / environment) then WDE would have to be set again by the initialization code since WDRF in MCUSR would be off(?)

re 1: I did not quite understand you point. Sorry

@Coding Badly, Thank you. I think I found a resource

In general you don't want to reset watchdog things in ISRs (of any sort), because the ISRs can continue to be triggered by their associated events even if the non-ISR code is looping around in la-la land (no longer operating properly.)

If you want a pre-compiled optiboot for ATmega2560, you can get it as part of a full hardware package from here:

The .hex files alone are also available here:
https://github.com/MCUdude/optiboot_flash/tree/master/atmega2560
However, there has been recent work on optiboot that is not yet merged into the MCUdude projects so if you need the most up to date optiboot you'll still want to compile it from source from the official repository.

You can use an Arduino as ISP programmer to burn the bootloader to the ATmega2560.

guy_c:
@DrAzzy, Thanks

re 2, in fact it’s a way to make the WDE remanent across reset cycle or, in other words, once set by the code it must be explicitly reset by the code and not by anything else. However, if the WD did not time out when a power shortage occurred (and the is the situation with a sound code / environment) then WDE would have to be set again by the initialization code since WDRF in MCUSR would be off(?)

re 1: I did not quite understand you point. Sorry

@Coding Badly, Thank you. I think I found a resource

My understanding is that if the power shortage occurred, if you’re trying to make your code bulletproof (which is what WDT reset is for), you would have set the BOD reset, so the BOD reset would ensure that the power had returned to normal operating conditions before letting the code restart cleanly. The WDT timeout would be for other cases (ex, electromagnetic interferance, or rapid shifts in supply voltage that didn’t trigger the BOD, but still caused the procedure to glitch out; the latter is a technique used by embedded technology hackers to break code protections, so I presume that it can cause unwanted behavior).

re: 1) Consider the case that the rest of the code glitched out, and the WDT was not being reset. But the ISR was still executing, so the WDT ISR kept running, but it turned the WDIE back on - thus the WDT reset would never occur, even though the program was no longer operating correctly. That is, it would completely defeat the purpose of using WDR. Hence, Atmel recommends against that because it is an irrational configuration - if you want your code to be bulletproof and reset in the event of a malfunction that causes the WDT reset to not be called (what WDR is for), and you are also using the WDT interrupt (perhaps to attempt to unglitch the system or output debugging information), you must not reset WDIE in that interrupt, since it could prevent the WDT reset functionality from resetting the malfunctioning code.

@DrAzzi: Thanks you for your detailed answer and you did succeed to convince me. Here is how I pictured it: Assuming that the app code, periodically, in sequence, resets the the watchdog and sets WDIE. It is then highly probable that if it succeed in one it will succeed in both and vice versa if it fails in the first it'll fail in both.

On the other hand, if the code is broken and the WD continues to work properly, it will call the ISR and if WDIE is set there things will get screwed up.

@pert: Thanks for this information. Honestly I did not imagine at first that in order to take advantage of the watchdog functionalities I'd need to mess with burning another boot and in a word it quite frightens me :frowning: . I decided to start by just distinguishing the power up reset from 'all other' resets with a hardware 'delayed' Vcc signal into an analogue input implemented with an RC net with ~1" charge time constant and ~ 1ms discharge. Once the watchdog integrated to my operational code I'll have more time to work on re-burning boot loaders