Bizare ISR behavior, can't explain and need help

So I have started having some pretty weird issues with my Interrupt service routines. I have simplified the code to try testing for solutions without my entire program but the problem persists. In a nut shell I am placing the mcu into power down mode, setting a watch dog timer for 2 seconds, waking the device, in the real world I'm checking a sensor but in this simplified code I am just printing to the serial monitor. Long story short sometimes when it wakes up it throws some random characters, sometimes it crashes catastrophically, sometimes it just hangs forever, and sometimes it resets. Incidentally it works perfectly when the sleep mode is standby mode. Power down mode is where the issue is at. and until 3 days ago that was working fine to. I don't know what I changed and my backup isn't working any better. Somewhere I am making a simple and obvious mistake. So obvious that I simply cannot see it.

I have been staring at it for 3 days and trying every trick I know. Perhaps some fresh eyes can see the probably simple error I have made. I'm going to go ahead and answer the usual questions to save us all some time. Thank you all for any help that you can give.

Why am I putting it into power down mode? Because that's what I need to do, it's a power issue.

Why am I using a watchdog timer? I have an rtc as well and a button, the button interrupt works fine. The rtc interrupt is having the exact same issue as the WDT. My thought is that if I can figure out what is wrong with the WDT then I can solve the rtc issue as well.

Can you post the full project code? Believe me you don't want to sift through 1200 lines of code... Also the problem remains persistent in the simplified code so it is extremely probable that the error is inside the simplified code I have attached.

Why are you disabling the WDT after interrupt? It may take more than 2 seconds after waking up depending on what the sensor detects and I don't want it interrupting continuously during that process. The interrupt is only for waking the device from power down mode.

volatile uint8_t InterruptFlag=0;
char cmd=0;
int duh=0;

void setup() {
  Serial.begin(9600);
  delay(500);
  Serial.println("Starting");    
}


ISR(WDT_vect){
  InterruptFlag=1;
}


void loop() {
  cmd=0;
  Serial.print("test number: ");Serial.println(duh++);
  if(InterruptFlag==1){
    delay(50);
    Serial.println("watch dog");
    Serial.print("MCUSR: ");Serial.println(MCUSR,BIN);
    InterruptFlag=0;
    WDTCSR=24;
    WDTCSR=0;
  }
  delay(1000);
  if(Serial.available()){
    cmd=Serial.read();
    if(cmd=='1'){
      GoToSleep();
    }
  }
}

void GoToSleep(){
  ADCSRA &= ~(1 << 7);                        //  TURN OFF ADC CONVERTER

  
  WDTCSR = (24);            //  Change enable and WDE - also resets
  WDTCSR = (7);             //  Prescalers only to set time- get rid of the WDE and WDCE bit
  WDTCSR |= (1<<6);         //  Enable interrupt mode

  //SMCR |=(3<<2);                        //  Standby Mode
  SMCR |= (1 << 2);                       //  power down mode
  SMCR |= 1;                              //  enable sleep
  MCUCR |= (3 << 5);                      //  set both BODS and BODSE at the same time
  MCUCR = (MCUCR & ~(1 << 5)) | (1 << 6); //  then set the BODS bit and clear the BODSE bit at the same time
  __asm__  __volatile__("sleep");         //  In line assembler to go to sleep
  //SMCR |=(0<<0);                          //  I DON'T REALLY REMEMBER WHY
  ADCSRA |= (1 << 7);                     //  TURN ON ADC CONVERTER
}

Thank you all for any guidance or assistance you can give.

sketch_mar08a.ino (1.49 KB)

Please edit your post to correct the tags from "quote" to "code".

For outstanding tutorials on sleep modes, timers, etc. see https://www.gammon.com.au/power

Note that printing is interrupt driven. If the processor goes to sleep before printing is done, results are unpredictable. Use Serial.flush() to finish printing before setting sleep mode.

I corrected my post to have code instead of quote. I am going through his tutorial, and not finding a solution. Even tried simply copying and pasting his power down sleep example. This time when I used the button interrupt it restarted as well. First thought is that it is somehow using the default ISR, but I truly can't see how that is happening. This is killing me.

I use Nick Gammon's sleep code all the time and have no problems. So I wonder if you have a hardware issue; possibly intermittent connections.

Which Arduino?

What is connected to it? Post a wiring diagram.

Are all connections soldered or otherwise reliable?

I am using a custom build and a custom board with the Atmega1284p running a modified duel optiboot bootloader from low power labs. The only modification made to the bootloader was which pin the led is on and which pin the flash memory is on. Of course for this experiment I simplified everything so it doesn't have the flash hooked up or accessed in the software and I am not doing any over the air programming so there is no access or attempt to access the flash made. I agree it seemed like one of my connections might have come loose, so I dismantled everything and reassembled it and the problem persist. I couldn't help but ponder how a hardware connection could be causing issues with the internal watch dog timer but stranger things have happened so I agree with the thought.

Combining now an even simpler circuit with the simple sleep and wake commands the problem persists. I did notice something this time though that had previously gone unnoticed. There are no issues if I define the watchdog timer in the setup loop and never alter it. Any attempt to change the bits for the wdt triggers a system reset now. This persist when I used Nick Gammons walk through as well. This makes zero since. Data sheet says very clearly the process.

//  To Enable
WDTCSR = 0b00011000;  // enables the change bit and the wde bit (00011000=24)
WDTCSR = 0b00000111;  // sets the prescallers to 2 seconds and clears the wde bit(00000111=7)
WDTCSR = 0b01000111;  // enables the interrupt  (WDTCSR |= (1<<6);)

// Similarly to disable:
WDTCSR = 0b00011000;  // enables the change bit and the wde bit (00011000=24)
WDTCSR = 0b00000111;  // sets the prescallers to 2 seconds and clears the wde bit(00000111=7)
WDTCSR = 0b00000111;   // should leave it disabled

For the latest experiment I literally used the exact connections of the Moteino Mega and still have it reseting when trying to wake up. I apologize that I can't attach the files as there is a size limit but I am including links to the data sheet and the schematic from Low Power labs.

Schematics

Datasheet

I apologize if you think I should have posted to the Moteino forums but Arduino forums seemed to have a lot more information on problems with the watch dog timers. The strangest part of the whole thing is that it was working perfectly three days ago. I thought perhaps I had a ram error, or some other section of code was causing the problem, or perhaps something went wrong in the hardware. Unfortunately it doesn't seem to matter how much I simplify it, this problem persist. I am tempted to remove the mcu and mount a new one, but if I don't figure out what caused this problem in the first place I could find myself with a lot of sensors becoming non-operational after they are deployed. I am really scratching my head here.

Charlie

New information as it comes up. The timer now appears to be working correctly both enabling and disabling the wdt but only in Standby mode. Somehow the combination with power down is causing the WDT to initiate a reset. I just have to figure out why. Some reason I am leaning towards it being a timing issue. Perhaps the wdt isn't having time before it powers down. I truly do not know.

Tomorrow morning I am going to remove the mcu and place a new one on the board. I can't be certain but all of my problems started when I testing the reset function for the over the air programming. I was using the WDT to implement the reset as well. I had missed in the data sheet where it says to implement a watch dog interrupt and then a watch dog reset to save critical system parameters before shutdown. I have no other theories at this point except that when I did the reset without the interrupt that something in the chip was damaged. I have completely wiped all memory on the chip reburnt the bootloader and tried everything else. I will post if this solves the issue, but I would still like feedback on the theory. Is it possible that reseting without first interrupting could have damaged the chip. I don't want devices in the field that stop functioning correctly and if I don't resolve this problem accurately it may very well pop back up when more is at stake than my sunny disposition :).

Thank you all for everything.

Charlie

No dice with replacing the mcu. Still the exact same error. Does anyone have any other thoughts?

Don't really know the answer to your issue either, but since you requested thoughts:

  • In section 11.3.2 of the (updated) datasheet the examples show the watchdog being reset. When WDCE and WDE are set, the examples explicitly maintain the pre-scaler bit values. You reset them.
  • Likewise in that section, you can see how interrupts are being turned off for the duration of the timed sequence. This is important so that the changes can be guaranteed to be executed within the 4 cycle time. Not doing so could introduce indeterminate behaviour which could make testing and narrowing down the problem more difficult. You need to be scrupulous to get a fix on the issue.
  • You said: "sometimes when it wakes up it throws some random characters, sometimes it crashes catastrophically, sometimes it just hangs forever, and sometimes it resets". The first and last are obvious, but how are your determining what a "catastrophic crash" or a "hang" are? And what's the difference? Have you tried flashing the on-board LED periodically just to ensure that there is not just some problem with the Serial output? As a separate test, toggle the LED inside the interrupt handler.
  • You said: this time when I used the button interrupt it restarted as well. Are you absolutely sure that you defined your ISR correctly - no spelling mistakes etc.? Correct interrupt enabled? If you get that wrong there will be no compiler warning but the default action will likely be a reset. If it worked in the standby-sleep then fair enough, but it's always worth posting the code that you test with since it allows others to see potential mistakes, test with it themselves and even sometimes just helps trigger a thought.
  • I don't see any errata for the device, but it could be you have stumbled upon an 1284P bug. Seems a bit unlikely considering the functional similarity amongst other AVR devices, but who knows.
  • You asked: "is it possible that reseting without first interrupting could have damaged the chip". Seems extremely unlikely. Since you've swapped out the MCU now, it looks much less likely to be a fault with your particular physical device.

Hello arduarn,

I wanted to take a moment to thank you for your input and assistance. When you sent over that updated datasheet I really thought I understood the problem almost right away given some slight inconsistencies between the two data sheets. In particular the setting of the WDTON in the Fuses. Unfortunately that didn't resolve the issue. Your input about maintaining the prescalars seemed spot on too, but when applied the same results. Your insight was very helpful in that it led me down a slightly different line of thought, and alas while I did not find a way to solve the resetting issue. I did find a way to use it to my advantage. Also the button resetting had something to do with the AVRISP_mkii being plugged into the board??? That won't matter as it is unlikely that I will forget to unplug my programmer before shipping out the device, sure stranger things have happened but I think I'll be fine there :). In any event. Given that the watchdog runs perfectly if initialized in the setup, never gets turned off, and seems to prefer to reset; I figured why not use that as a critical systems fail safe(one of its intended purposes, anyway). I solved the other interrupt issues, so I use the rtc for 2 second interval wakeups and if the watchdog isn't reset after 8 seconds then the system resets. I am including this code as it may help someone else out in the future, who just happens to be using the same mcu, rtc, and needs the same general fix. I don't know. In any event thank you for your assistance. And a general thanks to everyone for their feedback.

Sincerely,
Charlie

//  RTC TIMER CODE SETUP  /////////////////////////////////////////////
#include <Wire.h>
#include <Rtc_Pcf8563.h>
Rtc_Pcf8563 rtc;
#include <avr/wdt.h>
////////////////////////////////////////////////////////////////////////

/* a flag for the interrupt */
volatile int alarm_flag=0;
int sleeptime=2;  // sleeptime in seconds

//  INTERRUPT VARIABLES
//  Enable sbi to change bits easily
#ifndef cbi     //  CLEAR BIT FUNCTION
#define cbi(sfr, bit) (_SFR_BYTE(sfr) &= ~_BV(bit))
#endif
#ifndef sbi     //  SET BIT FUNCTION
#define sbi(sfr, bit) (_SFR_BYTE(sfr) |= _BV(bit))
#endif


void setup() {
  WDTCSR|=24;
  WDTCSR=0b01101001;
  Serial.begin(9600);
  Serial.println("---------------------------------------------------");
  Serial.println("Starting");
  Serial.println("---------------------------------------------------");
  while(!Serial.available()){}

  pinMode(12, INPUT);           // set pin to input
  digitalWrite(12, HIGH);       // turn on pullup resistors
  pinMode(A7,INPUT);
  
  //  RTC CODE  //////////////////////////////////////////////////////
  rtc.initClock();
  //day, weekday, month, century(1=1900, 0=2000), year(0-99)
  rtc.setDate(12, 99, 3, 0, 19);
  //hr, min, sec
  String TIME=__TIME__;
  Serial.println(TIME);
  byte timec[]={8,27,0};
  rtc.setTime(timec[0],timec[1],timec[2]);
  ////////////////////////////////////////////////////////////////////////
}

ISR(PCINT0_vect){
  alarm_flag=1;
}

ISR(PCINT3_vect){
  alarm_flag=3;
}

void loop() {
  wdt_reset();
  if(alarm_flag==1){
    Serial.println("Button Pushed");
    while(1==1){}             //  test if reset will occur when system timer not reset
    delay(1000);
  }else if(alarm_flag==3){
    Serial.println("Clock Interrupt");
  }
  if(alarm_flag!=0){
    rtc.clearTimer();
    alarm_flag=0;
    PCICR=0;
    cbi (PCMSK0,PCINT7);
    cbi (PCMSK3,PCINT28);
  }
  Serial.print(rtc.formatTime());
  Serial.print(" ");
  Serial.println(rtc.formatDate());
  delay(250);
  

  gotosleep();
}


void gotosleep(){
  PCICR|=0b1001; // enables pinchange interrupt vectors 3 and 0
  sbi (PCMSK0,PCINT7); //  SPECIFY WHICH PIN
  sbi (PCMSK3,PCINT28); //  SPECIFY WHICH PIN

//  sbi (PCICR,PCIE2);  //  ENABLE INTERRUPT REGISTER
//  sbi (PCMSK2,PCINT22); //  SPECIFY WHICH PIN
  rtc.setTimer(sleeptime,10,false);

  ADCSRA &= ~(1 << 7);  //  TURN OFF ADC CONVERTER

  //SMCR |=(3<<2);      //Standby Mode
  SMCR |= (1 << 2); //power down mode
  SMCR |= 1;//enable sleep
  MCUCR |= (3 << 5); //set both BODS and BODSE at the same time
  MCUCR = (MCUCR & ~(1 << 5)) | (1 << 6); //then set the BODS bit and clear the BODSE bit at the same time
  __asm__  __volatile__("sleep");//in line assembler to go to sleep
  SMCR |=(0<<0);
    
  ADCSRA |= (1 << 7); //  TURN ON ADC CONVERTER
}

PS: for the sake of better communications: when I said catastrophic crash it was vomiting characters to the serial line nonstop at an incredible rate. It seems like it may have been printing the entire contents of it flash memory directly to serial port? Very bizarre. In any event thank you sincerely for your assistance.