Arduino boot loop (segfault?)

Hello everyone!

I can’t understand how to recover from a potential error on my code.

A little story, I’m working on a feature request upon a working project. This project uses a standalone ATMega328p configured to work with 3.3v @8Mhz. Last year I met the “watchdog boot loop on Pro Mini” (ProMini is the target board I’m using), and I could solve it with guidance I found on this forum and using OPTIBOOT. So first I guess the problem may be related to that. Sadly it was not, and I could extract the exact pieces of the whole project that cause that issue. I load that code to Arduino UNO and same thing happened.

I know “why” the problem ocurrs, because just to ilustrate I’m intentionally increasing and writting to a pointer that points to a local variable (stack). Originally, it was a mistake because I was not properly controlling the increase of the pointer. The fact that it won’t recover after a reset, makes me wonder that it could happend in the real life and it would be a total mess.

This is the code:

#include <avr/wdt.h> 
#include <TimerOne.h>   

SoftwareSerial debugSerial(3,4);

void setup() {
  wdt_enable(WDTO_8S);
  debugSerial.begin(9600);
  debugSerial.println(MCUSR);

  Timer1.initialize(1000000);
  Timer1.attachInterrupt(timer_ISR);  
  
  char BUFFER[6];
  char* asciiPtr = BUFFER;
  while (1){ 
    *asciiPtr++ = 0;
    word address = asciiPtr;
    debugSerial.print("ADDRESS:");debugSerial.println(address);
  } 
}

void loop() {
}
  

void timer_ISR(void){
  
}

If you run this code, you’ll see on the SoftwareSerial prints the ADDRESS I’m writting to and after that, something like this. It may not happens just after uploading the code, but it would certainly happend after the watchdog resets it.

0
ADDRESS:2295
ADDRESS:2296
ADDRESS:2297
ADDRESS:2298
ADDRESS:2299
ADDRESS:2300
ADDRESS:2301
ADDRESS:2302
ADDRESS:2303
ADDRESS:2304
ADDRESS:2305
ADDRESS:2306
ADDRESS:2307
ADDRESS:2308
0
0
0
00
0
0
0
00
0
0

All the ZEROs are the print of MCUSR, and as you can see that’s the boot loop.

I’m using v1.1 of TimerOne by PaulS.

Some strange things that make the problem go away. Doing any of the below items “hides” the issue I’m facing:

  • Don’t sent timer interrupt —> comment Timer1.attachInterrupt(timer_ISR);

  • Don’t print the ADDRESS —> comment debugSerial.print(“ADDRESS:”);debugSerial.println(address);

I’ve been reading in some places that maybe what happens is that after the reset ocurrs, the watchdog timer is set to it’s minnimun (16ms) and that could be the problem. I’ve try several ways to disable watchdog right after setup begins with no luck.

Hope you can help me! Thanks in advance!

Why are you increasing asciiPtr without bounds?

Why are you using SoftwareSerial when Serial (and it's pins) are not used?

When you do stupid sh*t, why are you surprised when bad things happen?

Hi Paul!

Why are you increasing asciiPtr without bounds?

At first it was a mistake, in the example of my first post is to illustrate the problem

Why are you using SoftwareSerial when Serial (and it's pins) are not used?

As said, I'm working on a bigger project, that's just a small fragment and that project already uses Serial for another communication, that's why I'm using SoftwareSerial.

When you do stupid sh*t, why are you surprised when bad things happen?

I'm intentionally doing this sh*t, but it could have happended unintentionally. As I'm in a testing stage, I'd like to self-recover from anything that could happend.

I just want to know if there's a way to recover from that, apart from unplugging the power and then plug it again.

I'm intentionally doing this sh*t, but it could have happended unintentionally.

No, it could not have. You should NEVER increment a pointer to point beyond the end of the space allocated for the thing that the pointer points to.

I'd like to self-recover from anything that could happend.

You'd like to be able to shoot yourself in the foot, and recover from that. Well, then don't use C++. It not only WILL let you shoot yourself in the foot, it will help you aim and volunteer to pull the trigger for you.

This code will attempt to overwrite the entire memory with zeros. The arduino has memory-mapped registers, some of them rather important. For instance - memory address 5 might not be RAM at all, but might control (I dunno), the low-power sleep mode or something. If go slapping a zero into it … lord knows what might happen.

This isn't the case on bigger computers with an OS, because the OS protects memory from stuff like this. You will get a segfault and the program will be halted. On a microprocessor, not so much.

You may get this condition faster than you expect, because the asciiPtr variable is itself a variable on stack, and may be overwritten with zero. For that matter, the string "ADDRESS:" gets loaded into RAM before execution. When that letter 'A' gets overwritten with a zero, it will appear to be an empty string. That's what appears to be happening here.

In short: what everyone else said. Stomping over memory you don't own is a very, very common cause of C bugs, and the effects of doing it are what you see here - extremely weird, impossible behaviour. In fact, when a sketch starts doing something extremely weird, one of the first things to look for is buffer overruns like this.

Thank you both of you Paul!

I know it must never happend and I need to protect my code from doing that. There are other programmers in the team with not much experience in microcontrollers and don't have this solids concepts.

As we are also using watchdog as safety, I thought maybe this was a problem I can recover from. As you both well explan me I can not.

Thank you very much for your help.

Your car probably has ABS brakes. They can automatically fix a number of different situations. But if you crash your car, the ABS can't repair the car.