To build anything "more stable" is fine!
Professional safety analysis first tries to analyse the hazards involved, i.e.
- What is the exact kind of issue you expect?
- How high is the likelihood of its occurance?
- What are the consequences if it occurs?
Without a good answer to all three questions all well meant counter actions are vague..
A watch dog timer can identify the failure of a specific function, which can be caused by hardware OR by software issues.
Now, what to do in both of that cases? A simple reset is fine if you assume the problem has occured by a coincidence of rare events, hardly to be expected again in near future.
But what, if the cause is a fatal hardware issue? So a reset will help little, and there is not much difference with or without reset.
You have a quite different situation, when you need to enter a fail safe state in case of a no longer controllable situation. In that case - and only in that case - is external hardware mandatory. This hardware CAN try to reset the original function but in addition has to provide alternative means if this function cannot be revived.
A second microcontroller is adaquate for this. Of course it has to contain a little bit more intelligence than just resetting the master :-)
The advantage of an external microcontroller which does nothing (!) but try to reset the master is somewhat dubious.
I understand well that you are discussing the failure of the internal watch dog function. This is called "double error" and - if rare - is often not considered, except if the cost of a potential demage multiplied with its likelihood will justify the additional effort.
In this special Arduino case the failure of the internal watchdog however is assumed to be deterministic rather than probabilistic. You just have to look to find out!!
To say: I doubt that the internal watchdog works reliable, so I design an external watchdog, is fine and understandable, however not very rational.