Pages: [1]   Go Down
Author Topic: Reliable and programmable Wachdog  (Read 266 times)
0 Members and 1 Guest are viewing this topic.
Offline Offline
Jr. Member
**
Karma: 0
Posts: 70
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

Hi, I'm trying to do a reliable and programmable Watchdog (think of it as an Industrial Watchdog).

I arrived to the conclusion that the best way to do it, is to attach a second ATmega (that can serve also as a co-processor for other tasks) which waits for a signal on "X" pin every "X" seconds, if it doesn't receive it then it executes the desired operation: Reset, print a message, alarm, etc.
The same will do the principal processor, so this way I can detect if it's the co-processor down.

I was also thinking that depending on which part of the loop the program is, I can write a value to a variable (Ex. a byte value) so if the processor/program freeze at this part, I can also print the error message.


Is there any better way to implement this? What I'm looking for is reliability.
Logged

Seattle, WA USA
Offline Offline
Brattain Member
*****
Karma: 549
Posts: 46090
Seattle, WA USA
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Quote
Is there any better way to implement this? What I'm looking for is reliability.
A watchdog reset is a band-aid for poorly written code (not all of which will be your fault). Fixing the code is a much better solution.
Logged

Poole, Dorset, UK
Offline Offline
Edison Member
*
Karma: 25
Posts: 1873
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

The second processor more than doubles the chance of a hardware failure and something like triples the odds of a software problem.

Think about using a handshake from the PC. (Handshake - a regular message from one processor to another that must be responded to).

Mark
Logged

United Kingdom
Offline Offline
Tesla Member
***
Karma: 220
Posts: 6587
Hofstadter's Law: It always takes longer than you expect, even when you take into account Hofstadter's Law.
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

If what you are trying to detect is temporary malfunction, use the builtin watchdog. If you are trying to detect permanent malfunction, then you need 2 mcus that monitor each other, just as you describe.

What are you going to do when a malfunction is detected? You need to be careful not to re-introduce a single point of failure.
Logged

Formal verification of safety-critical software, software development, and electronic design and prototyping. See http://www.eschertech.com. Please do not ask for unpaid help via PM, use the forum.

Offline Offline
Jr. Member
**
Karma: 0
Posts: 70
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

Thanks all for the answers.

We use the ATmega's for PLC and CNC machines, so all for industrial use. Code lines are normally +3000 lines and normally are ok, but problem can happen also on hardware side.

Either the way I need to double check if an error was made, and if it happens evaluate it and act accordingly.
To give you an example: Remember when on old windows 98 machines (I think also on newer ones) when you just suddenly plugged out the power connector and restarted the computer again the SO knew you didn't powered off correctly? This is done because when the SO boots up it stores the error in non-volatile storage, and when you power down the SO it set this to "0" again.
Same happens with machines, if something strange happens, as it is a partially blind system, you need don't know where the motors or others are positioned or if an operator hand is in between of the axis travel.

I don't want a watch-dog reset, I can't do that. I want a watch-dog that can evaluate an execute a program based on the reported error. Or at least a watch-dog that halts completely the system an prints an error message.
We also don't use any PC, it is an standalone system.

So I wanted to share my thoughts and recollect any better idea, because ideas distributed across several people work much much better.

Thanks to all!
Logged

Pages: [1]   Go Up
Jump to: