I have been chasing this problem for several weeks now. I have a solar tracker that uses a NANO, an RTC, an MPU-6050, and a IFX9201 motor controller. The system will run a day or so and then hang even though I use the standard avr/wdt.h. I am running optiboot on the NANO with an 8 second watchdog.
If I remove the watchdog call to wdt_reset() the system reboots every 8 seconds. The MPU is run by the I2CDEV library. I have removed all interrupts and just poll the MPU6050. I have stripped almost all of the functionality from the code and it still hangs.
Every second I print the RTC time and some pointing information so I know the system is alive. I have also used the watchdog.h library and placed a Serial.print(“Watchdog bite”) in the Watchdog ISR. I see occasional “Watchdog bite” messages but when the system hangs the last message is “Watchdog bite”.
There is a lot of code but My setup looks like this:
void setup() {
wdt_disable(); // Disable watchdog
Serial.begin(115200); // initialize serial communication
Wire.begin(); // join I2C bus (I2Cdev library doesn't do this automatically)
SPI.begin();
pinMode(LED_PIN, OUTPUT); // configure LED for output
nBlinkLED(10); // startup blink
setupRTC();
setupPointing();
setupMPU6050();
wdt_enable(WDTO_8S); // 8 second watchdog
Serial.print("Init Done Rev 4.0 ");Serial.println(__DATE__);
}
My MPU Code look like this:
/**************************************************************************************************************************************************************/
/************************************************/ boolean GetMPUPacket(void) {
/**************************************************************************************************************************************************************/
// Checks if one or more packets are in the FIFO return true if a packet is read
mpuIntStatus = mpu.getIntStatus(); // get INT_STATUS byte bit0=DATA _RDY_INT bit3=I2C_MST _INT bit4=FIFO _OFLOW _INT
// if(mpuIntStatus !=3)Serial.println(mpuIntStatus,BIN);
fifoCount = mpu.getFIFOCount(); // get current FIFO count
if ((mpuIntStatus & 0x10) || fifoCount == 1024) { // check for overflow
mpu.resetFIFO(); // reset so we can continue cleanly
FIFO_OverflowCTR++; // bump ctr
//Serial.print(F(" FIFO_OverflowCTR= "));Serial.print(FIFO_OverflowCTR); // Print error
return(false);
}
if(fifoCount < packetSize) return(false);
mpu.getFIFOBytes(fifoBuffer, packetSize); // read a packet from FIFO // if(mpuIntStatus !=3)Serial.println(mpuIntStatus,BIN);
fifoCount = mpu.getFIFOCount();
FIFO_OverflowCTR = 0; // reset count only sucessive failures
//mpu.resetFIFO(); // reset so we can continue cleanly
return(true);
}
/**************************************************************************************************************************************************************/
I wasn’t quite sure where to post this. Any ideas welcome.
I thought the WDT could EITHER force a reset OR cause an interrupt. Sounds like the library uses the interrupt. Perhaps there is something going wrong with the interrupt handling?
Note: Doing Serial output in an ISR is generally a bad idea. In an ISR the interrupts are disabled and if the Serial buffer is too full for the message the system will hang waiting for the (disabled) serial interrupt to clear out the buffer.
Thanks for the reply and the suggestions. On the ISR with Send Serial that is a good point that I should of thought of. I have actually tried it both with the standard reset watchdog and then with the interrupt. When I tried the ISR I did that to just see if I was getting watchdog bites and then moved on.
My circuit board has output A6 connected to the NANO reset line and I set that to an output and then to zero in the watchdog ISR to force a reset. If the serial buffer got filled then the ISR could of stuck in the watchdog bite print. My understanding on how the watchdog interrupt works is that you first got a watchdog interrupt and then got a reset on the next timeout.
So where I am now is I am just using the standard watchdog reset and I still get these hangs. I can’t imagine how the watchdog doesn’t reset the NANO. I have also printed the stack pointer and printed the free memory using <MemoryFree.h> utility. These values are always the same each pass thru the loop. I have a couple of these systems running so it isn’t something unique to a particular NANO.
I am also wondering if the NANO is needing a power on reset to get unstuck. It is easiest to interrupt the power and difficult to press the reset button when the unit hangs. I might test that idea on the next hang. I could also test if connecting the serial port achieves a reset when it is stuck.
I am just using the standard watchdog reset and I still get these hangs
Define "standard watchdog reset" -- most likely you are doing something wrong. You can set up the watchdog in 3-4 lines of code without using any library.
Please post the minimal code that demonstrates this problem.
Thanks for taking the time to try and help. So by standard watchdog I mean I used the <avr/wdt.h> library. I use three functions wdt_disable(); as the first line in setup(), wdt_enable(WDTO_8S); as the last line in setup, and wdt_reset().
I have not yet been able to isolate the code to the minimum that fails as it can take one to three days of operation for it to hang. I have eliminated the code that used an nRF24 to transmit status to eliminate some potential failures. I have eliminated all interrupts and all while loops that might hang. The are no pointer operations or heap operations. The basic flow is get the RTC time (now()), compute the sun location and set the desired panel angle, read the IMU accelerometer and compute the current angle, and turn the motor on east or west.
The I2C bus is used by the MPU and RTC libraries and the SPI bus is used to access the motor controller status byte that tells if an overcurrent or other motor fault has occurred.
I have been thinking about trying just the MPU example to see if this will run flawlessly for days however this uses interrupts and has a while loop that could potentially hang.
The main loop and service code is here(edited to fit). The MPU used to be in a ISR but is now just polled
void loop() {
SunLocation(); // Compute Sun Position
if (Sun.Elevation > MinTrackingElevation) dmpDataReady(); // poll instead of interrupt
CheckForRTCTimeSet();
....
}
void dmpDataReady()
{
ISRLast = micros();
interrupts(); // Enable interrupts othwewise I2C doesn't work
if(GetMPUPacket()) { // Read the IMU
mpu.dmpGetQuaternion(&q, fifoBuffer); // Need Quaternion for gravity
mpu.dmpGetGravity(&gravity, &q); // Get gravity vector
CheckDriveMotor(0); // Update Current Angle position and check when good data was obtained
ISRcounter++;
}
if ((long)(millis() - NextSend) > 0) { // Compute and send every second
NextSend = millis() + MessageUpdateInterval; // re-Start Timer
float GravityMag = gravity.x * gravity.x + gravity.y * gravity.y + gravity.z * gravity.z;
if ((GravityMag < 1.1) && (GravityMag > 0.0) && (FIFO_OverflowCTR < 5)) wdt_reset(); // reset watchdog
PrintClockDisplay(now());
}
else {
Serial.print("gravity.x=");Serial.print(gravity.x);
Serial.print(" gravity.y=");Serial.print(gravity.y);
Serial.print(" gravity.z=");Serial.println(gravity.z);
}
FIFO_OverflowCTR = 0;
ISR_micros = micros() - ISRLast;
maxISR_micros = max(maxISR_micros,ISR_micros);
}
I have run the code with the watchdog reset removed and it resets every time so it seems the watchdog is functioning. I did read a lot about the watchdog and I think I understood the need to reset it early in setup.
That is a real puzzler. You might try turning on the watchdog with the fuses, in case the watchdog is somehow being turned off. Try posting on the AVRFreaks forum -- they know all the tricks.
ispybadguys:
I have run the code with the watchdog reset removed and it resets every time so it seems the watchdog is functioning. I did read a lot about the watchdog and I think I understood the need to reset it early in setup.
Could it be that your code is still running the main loop that has the watchdog reset but something else has failed causing the sketch to stop functioning? No way to tell without looking at the full code.
My comments may or may not have any bearing on your problem.
I have a anemometer that uses a Nano. It uses an external interrupt to count the switch closures on the external anemometer device. A LCD display shows the 15 second average wind speed and the maximum wind speed. Each morning my wife records the values.
A while back I added a push button switch to ground the Nano reset pin, rather than unplugging the wall wart to reset the device.
Pressing the push button resets the Nano, but the reset takes perhaps 3 seconds to complete. At first I didn't think the reset button was working, but it takes the same time if using a test lead to ground the pin.
Thanks guys. I did a couple of things two days ago. One was to start the MPU6050 DMP demo running to see if it hangs. It is still running but perhaps needs several more days. The other was a little change to the pointing code that may shed some light on is the code still running but not functioning. First let me say that I can supply the code. I attached a zip file. It is somewhat littered with debug prints.
So the code that feeds the dog runs one a second looks like this:
The statement if ((GravityMag < 1.1)… checks is the gravity vector looks sane and if the MPU FIFO is not spilling very often and then feeds the dog. So the TX LED blinks once a second when the program is running. When the system hangs I observe that this has stopped. I also notice that the arrays are not sun pointing. Can you imagine a situation where the dog could be being fed and the Serial.prints are not resulting in serial data output.
I have a hard time believing that the processor could ignore the watchdog pulling the reset or that the watchdog would fail to timeout.
In the last two days, while I was out of town none of the 4 testbeds crashed, although one of them did something funny that I can’r wrap my mind around just yet.
There is bug in some versions of the Nano Bootloader which makes it incompatible with the Watchdog Timer Reset.
See: http://forum.arduino.cc/index.php?topic=150419.0
I have solved this problem on one of my Nanos by installing the Uno Optiboot loader as suggested with no side effects so far