Watchdog fails to reset Nano

I have been chasing this problem for several weeks now. I have a solar tracker that uses a NANO, an RTC, an MPU-6050, and a IFX9201 motor controller. The system will run a day or so and then hang even though I use the standard avr/wdt.h. I am running optiboot on the NANO with an 8 second watchdog.

If I remove the watchdog call to wdt_reset() the system reboots every 8 seconds. The MPU is run by the I2CDEV library. I have removed all interrupts and just poll the MPU6050. I have stripped almost all of the functionality from the code and it still hangs.

Every second I print the RTC time and some pointing information so I know the system is alive. I have also used the watchdog.h library and placed a Serial.print(“Watchdog bite”) in the Watchdog ISR. I see occasional “Watchdog bite” messages but when the system hangs the last message is “Watchdog bite”.

There is a lot of code but My setup looks like this:

void setup() {
   wdt_disable();                                                                     // Disable watchdog
 
  Serial.begin(115200);                                                               // initialize serial communication 
  Wire.begin();                                                                       // join I2C bus (I2Cdev library doesn't do this automatically)
  SPI.begin();
  pinMode(LED_PIN, OUTPUT);                                                           // configure LED for output

  nBlinkLED(10);                                                                      // startup blink
  setupRTC();
  setupPointing();

  setupMPU6050();

  wdt_enable(WDTO_8S);                                                               // 8 second watchdog
  Serial.print("Init Done Rev 4.0 ");Serial.println(__DATE__);
 
}

My MPU Code look like this:

/**************************************************************************************************************************************************************/
/************************************************/  boolean GetMPUPacket(void) {
/**************************************************************************************************************************************************************/
// Checks if one or more packets are in the FIFO return true if a packet is read
 
  mpuIntStatus = mpu.getIntStatus();                                                        // get INT_STATUS byte bit0=DATA _RDY_INT bit3=I2C_MST _INT bit4=FIFO _OFLOW _INT
                                                                                            // if(mpuIntStatus !=3)Serial.println(mpuIntStatus,BIN);
  fifoCount = mpu.getFIFOCount();                                                           // get current FIFO count

  if ((mpuIntStatus & 0x10) || fifoCount == 1024) {                                         // check for overflow
    mpu.resetFIFO();                                                                        // reset so we can continue cleanly
    FIFO_OverflowCTR++;                                                                     // bump ctr
    //Serial.print(F(" FIFO_OverflowCTR= "));Serial.print(FIFO_OverflowCTR);                  // Print error
    return(false);
  } 
  if(fifoCount < packetSize) return(false);  
  mpu.getFIFOBytes(fifoBuffer, packetSize);                                                 // read a packet from FIFO                                                                                           // if(mpuIntStatus !=3)Serial.println(mpuIntStatus,BIN);
  fifoCount = mpu.getFIFOCount();
  FIFO_OverflowCTR = 0;                                                                     // reset count only sucessive failures
  //mpu.resetFIFO();                                                                          // reset so we can continue cleanly  
  return(true);
}
/**************************************************************************************************************************************************************/

I wasn’t quite sure where to post this. Any ideas welcome.

Thanks

Kurt

I thought the WDT could EITHER force a reset OR cause an interrupt. Sounds like the library uses the interrupt. Perhaps there is something going wrong with the interrupt handling?

Note: Doing Serial output in an ISR is generally a bad idea. In an ISR the interrupts are disabled and if the Serial buffer is too full for the message the system will hang waiting for the (disabled) serial interrupt to clear out the buffer.

Agree with all of the above. If you are using the watchdog in interrupt mode, it will never reboot the computer.

If you really want to recover from a hang, you MUST use the reboot option.

Thanks for the reply and the suggestions. On the ISR with Send Serial that is a good point that I should of thought of. I have actually tried it both with the standard reset watchdog and then with the interrupt. When I tried the ISR I did that to just see if I was getting watchdog bites and then moved on.

My circuit board has output A6 connected to the NANO reset line and I set that to an output and then to zero in the watchdog ISR to force a reset. If the serial buffer got filled then the ISR could of stuck in the watchdog bite print. My understanding on how the watchdog interrupt works is that you first got a watchdog interrupt and then got a reset on the next timeout.

So where I am now is I am just using the standard watchdog reset and I still get these hangs. I can’t imagine how the watchdog doesn’t reset the NANO. I have also printed the stack pointer and printed the free memory using <MemoryFree.h> utility. These values are always the same each pass thru the loop. I have a couple of these systems running so it isn’t something unique to a particular NANO.

I am also wondering if the NANO is needing a power on reset to get unstuck. It is easiest to interrupt the power and difficult to press the reset button when the unit hangs. I might test that idea on the next hang. I could also test if connecting the serial port achieves a reset when it is stuck.

I am just using the standard watchdog reset and I still get these hangs

Define "standard watchdog reset" -- most likely you are doing something wrong. You can set up the watchdog in 3-4 lines of code without using any library.

Please post the minimal code that demonstrates this problem.

Thanks for taking the time to try and help. So by standard watchdog I mean I used the <avr/wdt.h> library. I use three functions wdt_disable(); as the first line in setup(), wdt_enable(WDTO_8S); as the last line in setup, and wdt_reset().

I have not yet been able to isolate the code to the minimum that fails as it can take one to three days of operation for it to hang. I have eliminated the code that used an nRF24 to transmit status to eliminate some potential failures. I have eliminated all interrupts and all while loops that might hang. The are no pointer operations or heap operations. The basic flow is get the RTC time (now()), compute the sun location and set the desired panel angle, read the IMU accelerometer and compute the current angle, and turn the motor on east or west.

The I2C bus is used by the MPU and RTC libraries and the SPI bus is used to access the motor controller status byte that tells if an overcurrent or other motor fault has occurred.

I have been thinking about trying just the MPU example to see if this will run flawlessly for days however this uses interrupts and has a while loop that could potentially hang.

The main loop and service code is here(edited to fit). The MPU used to be in a ISR but is now just polled

void loop() {

  SunLocation();                                                                   // Compute Sun Position
  if (Sun.Elevation > MinTrackingElevation) dmpDataReady();                        // poll instead of interrupt

  CheckForRTCTimeSet();
....
}

void dmpDataReady()

{
  ISRLast        =  micros();
 
  interrupts();                                                                                   // Enable interrupts othwewise I2C doesn't work
  if(GetMPUPacket()) {                                                                       // Read the IMU 
    mpu.dmpGetQuaternion(&q, fifoBuffer);                                          // Need Quaternion for gravity
    mpu.dmpGetGravity(&gravity, &q);                                                 // Get gravity vector
    CheckDriveMotor(0);                                                                     // Update Current Angle position and check when good data was obtained
    ISRcounter++;
  }
 
    if ((long)(millis() - NextSend) > 0) {                                                        // Compute and send every second
      NextSend = millis() + MessageUpdateInterval;                                       // re-Start Timer
      float GravityMag = gravity.x * gravity.x + gravity.y * gravity.y + gravity.z * gravity.z;
      if ((GravityMag < 1.1)  && (GravityMag > 0.0) && (FIFO_OverflowCTR < 5)) wdt_reset();       // reset watchdog
      PrintClockDisplay(now());
    }
      else {
        Serial.print("gravity.x=");Serial.print(gravity.x);
        Serial.print(" gravity.y=");Serial.print(gravity.y);
        Serial.print(" gravity.z=");Serial.println(gravity.z);
      }
      FIFO_OverflowCTR = 0;
    
  ISR_micros = micros() - ISRLast;
  maxISR_micros = max(maxISR_micros,ISR_micros); 
}

Code snippets are useless. Did you pay attention to the cautions on the avr-libc watchdog page?

In your latest test, does the watchdog reset the MPU every 8 seconds if you leave out the wdt_reset() call?

Soy nueva en esto de programar en arduino.
Alguien me podría decir algún código para que cuando prenda un botón me encienda la pantalla.

-232511:
Soy nueva en esto de programar en arduino.
Alguien me podría decir algún código para que cuando prenda un botón me encienda la pantalla.

"I'm new to this programming in arduino.
Someone might tell me some code so that when a button garment I turn on the screen."
Way off topic.

I have run the code with the watchdog reset removed and it resets every time so it seems the watchdog is functioning. I did read a lot about the watchdog and I think I understood the need to reset it early in setup.

That is a real puzzler. You might try turning on the watchdog with the fuses, in case the watchdog is somehow being turned off. Try posting on the AVRFreaks forum -- they know all the tricks.

ispybadguys:
I have run the code with the watchdog reset removed and it resets every time so it seems the watchdog is functioning. I did read a lot about the watchdog and I think I understood the need to reset it early in setup.

Could it be that your code is still running the main loop that has the watchdog reset but something else has failed causing the sketch to stop functioning? No way to tell without looking at the full code.

My comments may or may not have any bearing on your problem.

I have a anemometer that uses a Nano. It uses an external interrupt to count the switch closures on the external anemometer device. A LCD display shows the 15 second average wind speed and the maximum wind speed. Each morning my wife records the values.

A while back I added a push button switch to ground the Nano reset pin, rather than unplugging the wall wart to reset the device.

Pressing the push button resets the Nano, but the reset takes perhaps 3 seconds to complete. At first I didn't think the reset button was working, but it takes the same time if using a test lead to ground the pin.

I wonder if the timing is causing you a problem?

Paul

Thanks guys. I did a couple of things two days ago. One was to start the MPU6050 DMP demo running to see if it hangs. It is still running but perhaps needs several more days. The other was a little change to the pointing code that may shed some light on is the code still running but not functioning. First let me say that I can supply the code. I attached a zip file. It is somewhat littered with debug prints.

So the code that feeds the dog runs one a second looks like this:

    if ((long)(millis() - NextSend) > 0) {                                                        // Compute and send every second
     NextSend = millis() + MessageUpdateInterval;                                                // re-Start Timer
      float GravityMag = gravity.x * gravity.x + gravity.y * gravity.y + gravity.z * gravity.z;
      if ((GravityMag < 1.1)  && (GravityMag > 0.0) && (FIFO_OverflowCTR < 5)) wdt_reset();       // reset watchdog
      PrintClockDisplay(now());
      Serial.print("\t x= "); Serial.print(gravity.x);
      Serial.print(" y= ");Serial.print(gravity.y);
      Serial.print(" z= ");Serial.print(gravity.z);
      Serial.print(" Desired= ");Serial.print(degrees(ActuatorDesiredPosition[0]));
      Serial.print(" PA= ");Serial.print(degrees(GetPanelAngle(0)));
      Serial.print(" AT= ");Serial.print(degrees(AxisTilt));
      Serial.print(" Im= ");Serial.print((MotorCurrent));
      Serial.print(" PWM= ");Serial.print((PWMdutycycle));Serial.print(" MotorCurrentOffset= ");Serial.print((MotorCurrentOffset));
      Serial.print(" St= "); Serial.print(ActuatorStatus, BIN); Serial.print(" "); Serial.print(ActuatorStatus, HEX); Serial.print(" ");
      Serial.print(" IFX= "); Serial.print(IFX9201data, BIN); Serial.print(" "); Serial.print(IFX9201data, HEX); Serial.print(" ");
      if (bitRead(ActuatorStatus, Actuator0DriveEnabled)) Serial.print(" Actuator0DriveEnabled ");
      if (bitRead(ActuatorStatus, Actuator0DriveDirectionWest)) Serial.print(" Actuator0DriveDirectionWest ");
      if (bitRead(ActuatorStatus, Actuator0Timeout)) Serial.print(" Actuator0Timeout ");
      if (bitRead(ActuatorStatus, Actuator0OverCurrent)) Serial.print(" Actuator0OverCurrent ");
      if (bitRead(ActuatorStatus, Actuator1DriveEnabled)) Serial.print(" Actuator1DriveEnabled ");
      if (bitRead(ActuatorStatus, Actuator1DriveDirectionWest)) Serial.print(" Actuator1DriveDirectionWest ");
      if (bitRead(ActuatorStatus, Actuator1Timeout)) Serial.print(" Actuator1Timeout ");
      if (bitRead(ActuatorStatus, Actuator1OverCurrent)) Serial.print(" Actuator1OverCurrent ");
    Serial.println(" ");
    }
      else {
        //Serial.print("gravity.x=");Serial.print(gravity.x);
        //Serial.print(" gravity.y=");Serial.print(gravity.y);
        //Serial.print(" gravity.z=");Serial.println(gravity.z);
      }

The statement if ((GravityMag < 1.1)… checks is the gravity vector looks sane and if the MPU FIFO is not spilling very often and then feeds the dog. So the TX LED blinks once a second when the program is running. When the system hangs I observe that this has stopped. I also notice that the arrays are not sun pointing. Can you imagine a situation where the dog could be being fed and the Serial.prints are not resulting in serial data output.

I have a hard time believing that the processor could ignore the watchdog pulling the reset or that the watchdog would fail to timeout.

In the last two days, while I was out of town none of the 4 testbeds crashed, although one of them did something funny that I can’r wrap my mind around just yet.

_2015_05_29_06_48_Tracker_Node4.zip (16.8 KB)

I think I have demonstrated the hang using the hang with the DMP example since the example stopped printing after 3 or so days.

I have not installed the watchdog yet. I will let it fail one more time to be sure and then install the watchdog.

The code is attached. I only changed the MPU address and commented out the serial input at the start of the example.

MPU6050_DMP6.ino (16.1 KB)

There is bug in some versions of the Nano Bootloader which makes it incompatible with the Watchdog Timer Reset.
See: http://forum.arduino.cc/index.php?topic=150419.0
I have solved this problem on one of my Nanos by installing the Uno Optiboot loader as suggested with no side effects so far