What Register/Mechanism fascilitates restart after crash

So I was just pondering a hypothetical question. When an Arduino crashes it reboots ands starts from the beginning, similar to as if a watch dog reset occurs. I’m wondering what mechanism or control register controls that reset. I’m thinking to myself something must tell the chip to boot/reset.

Let’s say for whatever reason it crashes. Let’s use a simple overflow or perhaps multiple interrupts causing a race condition. The chip/program becomes unstable and boom it resets. What handled that reset?

This could be useful because it may allow for better tracking of where in the program the problem occurred. For instance with a watchdog reset you can have it save some information to the eeprom before it resets to help you track the problem if you can’t access the chip in real time via serial monitor.

Does anyone have any thoughts?

Charlie1985: When an Arduino crashes it reboots ands starts from the beginning, similar to as if a watch dog reset occurs.

That is not correct. A crash does not automatically cause a restart, most times it will just stop everything. For example a common cause of a crash is for a program to go into an endless loop. The watch-dog timer is the only way to deal with that because an Arduino has no operating system that can monitor the behaviour of running programs.

...R

Hangs are easier to deal with for sure, but I’m referring to a full blown crash. Here is an example of code that definitely crashes. I wrote this intentionally to cause a crash to see if I could find someone to flag when this kind of crash happens. I assure you that this code will crash and restart.

#include <EEPROM.h>                   //  EEPROM MEMORY CONTROL
#include <avr/wdt.h>                  //  WATCHDOG TIMER CONTROL


uint32_t  BadIdea[10];

void setup() {
  Serial.begin(1000000);
  Serial.println("program start");
  Serial.println("press enter to proceed");
  while(!Serial.available()){
  }
  

}

uint32_t a=0;
uint64_t b=0;
void loop() {
  Serial.println(a++);
  BadIdea[b++]=a;
}

Load it on then just hit enter. What I am curious about is what handles that restart? Surely something points back to 0. Or am I completely wrong in how that restart occurs

Charlie1985: What I am curious about is what handles that restart?

If it restarts I suspect you are just lucky and the error caused the program counter to jump back to the start (wherever that is).

...R

Hangs are easier to deal with for sure, but I'm referring to a full blown crash.

What do you mean by 'hang', 'crash' and 'full blown crash'?

To me there are 2 possible conditions; it's doing what it's supposed to do or it's not.

From the processor's point of view it just sees some memory from which it gets instructions and does stuff. If it asks for an instruction from its memory and gets one then it's happy. It has no idea about what those instructions are doing for you or whether they are correct for your program or not. It just chews through the instructions it gets regardless.

Nothing other than the WDT will reliably detect when the instructions the processor is getting are failing to do whatever it is you've programmed it to do. If your test code above causes a restart then it's by luck not because some special part of the processor has detected a problem. How would it know anyway? It has no idea what you want. All it knows is the instructions it sees.

Fair enough for the sake of this conversation I will define crash

A crash is when the chip stops functioning completely and abandons all tasks. I would not define a crash as a function returning the wrong value.

In my experience I suppose a crash could be defined as 2 possible things

1) It freezes. Stuck in infinite loop or something like an SPI device interrupting during another spi device communicating. These failures are easy to handle and find using the watch dog timer.

2) Stops mid process and starts back at the beginning. An example would be a race condition, or an overflow error like the one code sample I posted above.

I had never considered that I was lucky that it started over. Those are the only 2 types of crashes I have ever seen. And I have seen them many times in the past.

"Full blown crash" is not a defined concept. There is only unpredictable behavior, as the processor starts executing random bits of machine code. In some cases, the Arduino might restart; in others, it gets stuck in an endless loop.

There is no HALT instruction, so the cpu is always doing something.

If there was a ‘this code is not doing as intended register’ you would expect to find it listed in the datasheet …

Charlie, I think you are looking for something that's not there.

See Lock in amplifier experiment This guy is also looking for something that's not there.

I met a man upon the stair, He wasn't really there, He wasn't there again today, Oh how I wish he'd go away.

PerryBebbington: To me there are 2 possible conditions; it's doing what it's supposed to do or it's not.

Love it +1

...R

Ok Ok. I get it. I know there is no "this code isn't working register" I also understand that code either does what it is supposed to or doesn't do what it is supposed to. Those are all pretty obvious.

What I'm not convinced of is that it is luck whether it restarts or not. If you have an overflow like the one in the sample code I posted it restarts 100% of the time. There is no variability or chance in the outcome. I can write several other examples of different coding mistakes that cause it to restart.

It doesn't hang like a loop. It simply stops what it is doing and starts over. The fact that the outcome is predictable and repeatable removes luck or random chance as the reason for the outcome. There is some reason, cause, or mechanism that in those cases the chip restarts. I won't continue asking you guys/gals what could be directing the restart because clearly you don't think that anything makes it restart in those type of scenarios.

I know for a fact that if certain coding mistakes are made the chips don't hang, they crash and start over. I was just trying to figure out what manages that restart. You say nothing does. Ok nothing does. So why does it restart in those specific cases 100% of the time?

I'll repost when I find the answer. The behavior isn't random. Something directs it to jmp to the beginning.

Until I find what directs it....cheers

Try finding a race condition in extremely long and complicated programs and you will see that it randomly resets the chip at different times and different conditions. The only thing that remains constant is that when the error occurs the chip restarts. I'm just trying to see if there is a better way than reading thousands of lines of code to identify the source of the error. This is about finding the cause of the fault in a way that makes it easier to detect and remedy.

with a watchdog reset you can add a EEPROM.write to the WDT ISR to flag where in the code the error occurred, but that only works if the program hangs. Doesn't do any good if it restarts straight away

Fortunately this is actually hypothetical. I don't have any code that I am currently trying to find problems in. Just trying to see if I can save myself and maybe others online serious headache and tedious searching to identify problems in Arduino code. Arduino has its advantages for sure but debugging complicated problems in complicated code can be quite infuriating :)

Charlie1985: but debugging complicated problems in complicated code can be quite infuriating :)

That's why complicated programs should be constructed from a series of short single purpose functions that can each be tested on its own.

Relying on an probable, but not certain restart is not a solution.

...R

To review: What normally one calls a "crash" happens when some sort of unexpected error is detected in the running program, causing the program to stop in a way other than expected (without saving your files, or resetting the display, for example.) These unexpected errors can be detected by the program itself ("couldn't allocate data"), by the operating system of the computer its running on ("you tried to read data into the memory that you failed to allocate"), or by the hardware itself (in which case, hopefully the operating system notices, and prints a meaningful error message) ("you tried to copy data to a memory address that doesn't exist. Segmentation Violation!")

Typically when this happens, your program is aborted, the operating system cleans up the loose ends that it knows about, and you get data written to the console or some log file that helps you figure out what is going on so that you can fix it. If you have a "real" debugger (or maybe good debugging code inside your program itself), perhaps it sits in between your program and the OS, and lets you poke around AFTER the event causing the crash, but before all that cleanup is done.

Now, on an Arduino-class board:

  • There is no operating system.
  • The hardware does NOT detect errors like bad opcodes or invalid memory accesses, so even if there was an OS, it wouldn't notice most errors.
  • There isn't really any place to report errors.
  • The Arduino functions like digitalWrite() do check for some errors, but there's not much that can be done other than ignore them. If you do "digitalWrite(123, 123);" the code is (barely) smart enough not to try to turn on some non-existent port on some non-existent pin, but it doesn't have many options on reporting the error. The function will just return without doing anything.

In effect, the Arduino doesn't actually "crash"; it just goes and executes code somewhere that isn't the code that it's supposed to be running. :-(

Sometimes, perhaps even "often", "the wrong area of code" will be areas beyond the end of the user program, or areas at the beginning of program memory (which happens when you overwrite the return address of a function with a small integer.) In these cases, the AVR processor is likely to encounter code that will eventually cause the application to "noticeably crashed.) ("blank" program memory contains useless instructions that will eventually roll over into the bootloader and probably restart the sketch, and low program memory contains a bunch of "jump" instructions to a default "unused interrupt handler" that might restart the sketch (AVR) or infinitely loop doing nothing (ARM) But such behavior is far from guaranteed!

I understand what you are saying. There is a simple solution that unfortunately would require an external ram chip to store data on independently of the SRAM of the MCU. I guess the bottom line is that when debugging Arduino code I have to stick with the tried, true, and mind numbingly tedious methods.

I should probably see if there is something similar to Atmel's Debugger for the Arduino or perhaps write/build one. I have tried using the Atmel debugger with Arduino before but it was a nightmare of windows issues, so I gave upon trying that approach.

I wanted to correct something I wrote earlier, or perhaps I would like to be corrected on something I wrote earlier. I mentioned the possibility of saving an error code in EEPROM from inside the watch dog ISR and I think that is probably a bad idea as the EEPROM uses interrupts to work so calling it from inside an ISR is likely a very bad idea.