Hi, I'm new to Arduino. I've been working on an intercom project recently and I want to make it fail-safe. I even updated bootloaders on my Nanos to make the watchdog work properly on them and did some tests. At first, I wrote a simple program that sets the watchdog timer to 2s and delays the loop by 3s to trigger the watchdog, it worked as expected - Arduino was restarting every 2s. However, I noticed there is an Arduino state where the program goes wild, e.g. Serial stops working or pins swap their places, but the loop works and resets the watchdog timer as normal. It's a difficult situation because the program becomes completely useless, and it seems even the watchdog is powerless here.
The simplest way to bring this state on is to touch the Nano in such a way that your finger touches several Atmega pins and the oscillator at the same time while the program is running.
If the program was reliable, the TX diode should flash every 1s no matter what, but it stops forever when the above-mentioned state occurs.
Here is my question. Is it possible that this state will ever occur due to some interference? If so, is there a way to detect it and write a program that is able to make a recovery from it automatically?
I find it better to extract the condition to a variable with a meaningful name for better readability and then place it in an "if" statement. This may seem a bit excessive here, but in large projects with more complex conditions it enhances readability a lot and I'm used to doing it this way. For me, the code above looks just better than something like that:
Yes, I know.
It's not about making the board invulnerable to being touched, as the board is normally never touched during operation, however some other kind of external interference may occur and I am concerned whether it might cause the same issue we see when the specific area is touched.
This implies that in your experiment, you're destabilizing the oscillator, which means the internal clock will become unstable. I can well imagine this puts the microcontroller into an undefined state, especially if several internal clocks happen to become desynced (e.g. peripheral clocks are often derived from the core clock through a divisor).
I'm not sure if there's a good way to recover from this, since any code used to detect this would still rely on proper functioning of the microcontroller core itself. I suppose you could do something with an auxiliary microcontroller that checks proper functioning of the main unit, or even a pair of controllers running in a redundant setup and a separate mechanism checking them against each other. These are the sort of solutions you'll encounter in high-reliability computing.
However, for a regular hobby or even industrial project, all this is way overkill and you simply ensure that a clock circuit is reasonably stable (component choice, board layout) and isn't easily messed with by outside factors. There's not all that much you can or need to do.
Pretty much.
You design to the expected conditions. If your concern is that someone touching the circuit can cause it to become unstable, well, the simplest solution is to make sure that they can't touch it. If that's not enough, then go from there.
As far as why the watchdog isn't being helpful, ISTR that the AVR devices use the main clock to run the watchdog. You'd need to go to another chip family like STM32 that has an independent watchdog clock to work around this problem.
I've removed some inappropriate comments and associated replies. Some replies were also inappropriate, some were ok but it doesn't make sense to delete some and leave others in, so they are all gone. No penalties but any more inappropriate comments will result in a time out from the forum if I see them.
Now please either help the OP or say nothing.
OP please appreciate that the people helping you are volunteers working in their spare time, as are the moderators.