Program resets via watchog - Help finding the bug!

Hi,

My home automation controller has been evolving in the past 5 years and I am now migrating from a MEGA to a zero based system.

After migration i have also added a watchdog function based on the sdtZero library.
For some reason the program resets about once every 24h and after ages of debugging I was able to norrow down the problem in the following function used to log events on an SD card.
So, in other words making EEPROM.read(memLog)=0, the program runs for days with no problem (just like it did on a Mega), otherwise it resets about once every 24h (random times)

bool logSDCard(char *logData, byte addTime){
   
  // addTine ==2 -> include time  
  // addTine ==0 -> no time add new line
  // addTine ==1 -> no time no new line
  
  // show line local & telnet
 
 // Log to SD card
// exit if logging is disabled or Card not valid
 
if (SDCARDValid==0 || EEPROM.read(memLog)<1 ) { return false; }        // logging disabled
 selectSDcard();
 File dataFile = SD.open(filename, FILE_WRITE);
 
 if (dataFile) {                                                           // if the file is available, write to it:
     
    if (addTime==2 || addTime==0) {dataFile.println();}
    if (addTime==2) {dataFile.print(timenow());}
           
     dataFile.print (logData); 
     dataFile.close();
    selectEthernet();
    return true;
   }  
 
  else {                               // if the file isn't open, pop up an error:
    digitalWrite (ErrorLed,HIGH);
    printMessage(PORT_S,5);
    dataFile.close();
    selectEthernet();
    return false;
   }
}

Of course this is not thew complete program which is huge anyway to post here or for anyone to follow, but I thought I d ask for any tips on finding the bug anyway.

Probably one of the used functions is the problem: selectSDcard(), timenow(), selectEthernet(), printMessage(), or logData is missing a zero-terminator.

If opening a new file fails, then you don't have to close it.

Could you try to make the code look better ? Put every space, every comma, every new line at the right place. Use the same style throughout the whole sketch.

Can't comment on the logic but do you meant

if ( (SDCARDValid==0 || EEPROM.read(memLog))<1 )

or

if (SDCARDValid==0 || (EEPROM.read(memLog)<1))

Notice the placement of parentheses

I believe the way written is fine and also compiles ok.
It means if either no sd card found OR logging disabled then return.

@hzrnbgy, it is two conditions with a 'or' in between, so it is okay.

if( wind == 0 || sunshine < 1)
{
  // no sailing in the sunshine today
}

You could try using my SafeString library which works on char[ ]s checks for and guards against missing '\0' and buffer overflows.

Here is the code of the above functions:


void selectEthernet(){
   
   ledflash=1;
   digitalWrite(SD_CARD_CD_DIO,HIGH);
   delayMicroseconds(20);
   digitalWrite(ethernetSelectPin,LOW);
   delayMicroseconds(20);
}

void selectSDcard(){

   digitalWrite(ethernetSelectPin,HIGH);
   delayMicroseconds(20);
   digitalWrite(SD_CARD_CD_DIO,LOW);
   delayMicroseconds(20); 
 }



void printMessage(Print& p, byte No)
{

 switch (No) {
    case 1 : p.println(F("Unrecognized command. ? for help."))    ; break;      
    case 2 : p.println(F("Aborted."));                             ; break;
    case 3 : p.println(F("Invalid entry. Aborted."))               ; break;     
    case 4 : p.print  (F("Invalid entry. Error Code:"))            ; break;     
    case 5 : p.println(F("*Error Openning file or file not found*")); break; 
    case 6 : p.println(F("Card failed, or not present"))           ; break;
    case 7 : p.println(F("No NTP Response :-("))                   ; break;
    case 8 : p.println(F("Missing or invalid parameter"))          ; break;
    case 9 : p.print(F("Error- Subscript out of range "))         ; break;
    case 10: p.println(F("SD Card OK"))                            ; break;
    case 11: p.println(F("Not changed"))                           ; break;
    case 12: p.println(F("Saved"))                                 ; break;
    case 13: p.println(F(" enabled"))                              ; break;
    case 14: p.println(F(" disabled"))                             ; break;
    case 15: p.println(F(" End of file"))                          ; break;
    case 16: p.println(F("Not set"))                               ; break;
    case 17: p.print (F("New ID: "))                               ;break;
    case 18: p.println(F("OK"))                                    ;break; 
    case 19: p.println(F(" on/off"))                               ;break;
    case 20: p.println(F(" deleted"))                              ;break;
    
   default : p.println(F("Unrecognized command. ? for help."))   ; break;  }


char *timenow() {
  
 char *timenow=(char *) malloc (25);
 timenow[0]=0;
 
 if (RTCValid==true) {
  byte second, minute, hour, dayOfMonth, month, year; 
  readDS3231time(&second, &minute, &hour, &currentDay, &dayOfMonth, &month, &year);  // retrieve data from DS3231
  sprintf(timenow,"%02d:%02d:%02d %02d-%02d-%d ",hour,minute,second,dayOfMonth,month,year);
  sprintf(filename,"%02d-%02d-%d.txt",dayOfMonth,month,year);
  }
 
 if (RTCValid==false) {
  uint16_t yr = year();
  if (yr<2010) yr=2010;
  sprintf(timenow,"%02d:%02d:%02d %02d-%02d-%d ",hour(),minute(),second(),day(),month(),year());
  sprintf(filename,"%02d-%02d-%d.txt",int16_t(day()),int16_t(month()),byte(yr-2000));
  currentDay=byte(day());
  }
 
return timenow;
}

}

what is the timeout of the watchdog?

1 Like

Every time the function runs, you are allocating 25 bytes of memory which you never free afterwards, eventually you will run out of memory.
< edit > The variable year in the first if statement should probably not be byte, unless year is always 255 or less.

char *timenow() {

  char *timenow = (char *) malloc (25);
  timenow[0] = 0;

  if (RTCValid == true) {
    byte second, minute, hour, dayOfMonth, month, year;
    readDS3231time(&second, &minute, &hour, &currentDay, &dayOfMonth, &month, &year);  // retrieve data from DS3231
    sprintf(timenow, "%02d:%02d:%02d %02d-%02d-%d ", hour, minute, second, dayOfMonth, month, year);
    sprintf(filename, "%02d-%02d-%d.txt", dayOfMonth, month, year);
  }

  if (RTCValid == false) {
    uint16_t yr = year();
    if (yr < 2010) yr = 2010;
    sprintf(timenow, "%02d:%02d:%02d %02d-%02d-%d ", hour(), minute(), second(), day(), month(), year());
    sprintf(filename, "%02d-%02d-%d.txt", int16_t(day()), int16_t(month()), byte(yr - 2000));
    currentDay = byte(day());
  }

  return timenow;
}

Try this instead.

  static char timenow[25];

And use snprintf which protects against buffer overflows

static char timenow[25];
. . 
snprintf(timenow, sizeof(timenow),"%02d:%02d:%02d %02d-%02d-%d ", hour, minute, second, dayOfMonth, month, year);
 snprintf(filename, sizeof(filename),"%02d-%02d-%d.txt", dayOfMonth, month, year);

Thanks for the advice. Will make the changes and try again.

Are the ethernetSelectPin and SD_CARD_CD_DIO the "SS" or "ChipSelect" pins of the SPI bus ? Then you don't have to do that.

A sketch of more than 6000 lines is long. Can you show us the sketch ? We could ask 1200 times and get 5 lines each time, but that will take longer than a year :roll_eyes:

It is possible to do something wrong with arrays or timing, without getting into problems. Running the same sketch on a different board might bring those things to the surface.

Yes, Those pins chose the SPI device.
What you mean i dont have to do it?
How else can it be done?

It is part of the SPI interface and it is inside the libraries.
Do you know a sketch that uses both Ethernet and the SD card that has to select the SPI ChipSelect in the sketch ?

Thanks for pointing this out!
I am using two separate shields for SD card and ethernet .
How do you define then two CS pins to be compatible with the SPI lib?

I'm sorry, but I have to bail out. I don't want to be more negative than my Reply #12. I could talk about the parameter in SD.begin() and the definition of SS in the Ethernet library and maybe we get another 5 lines of code, which again raises a new question.

Thanks again for the info.

Well, the sketch is in excess of 5000 lines and spans in 21 tabs (files).

I just thought that no one would ever spend the time to go through it and that’s why I firstly narrowed down the bug within a particular function before asking for help in the forum.

If however you are still willing to make the effort to read it, I would be more than grateful and will post the full sketch, zipped.