Watchdog reset while downloading/saving SPIFFS (ESP32)

Hi all,
I'm struggling with a watchdog reset.

What I'm trying to do, is to download a FW image (self OTA) using ftp client.
I chosed FTP, since I already are using it within the same project, for downloading a smaller file (~150kb) and it has been working great.

Now, when I download the ESP binary, which is larger, it always gives me a watchdog reset.

17:32:53.899 -> Downloading tester_fw.bin, size 1020688
17:35:12.820 -> ets Jul 29 2019 12:21:46
17:35:12.820 -> 
17:35:12.820 -> rst:0x8 (TG1WDT_SYS_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
17:35:12.820 -> configsip: 0, SPIWP:0xee
17:35:12.820 -> clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
17:35:12.820 -> mode:DIO, clock div:1
17:35:12.820 -> load:0x3fff0030,len:1344
17:35:12.820 -> load:0x40078000,len:13964
17:35:12.898 -> load:0x40080400,len:3600
17:35:12.898 -> entry 0x400805f0

I'm calling the download from setup(), have tried also from loop, but it does not seem to make much difference.

The loop in the ftp client, looks like this;

  while(dclient.available()) 
  {
    esp_task_wdt_reset();  
    delay(50);
    if( length >= 512)
    {
      dclient.readBytes(_buf, 512);

      if(file.write(_buf, 512) == 512){
        //FTPdbgn(".");
      }
      length -= 512;
	  }
    else
    {
      dclient.readBytes(_buf, length);

      if(file.write(_buf, length) == length){
        //FTPdbgn(".");
      }
      length = 0;
    }
  }

The esp_task_wdt_reset() I threw in, but it does not seem to make any difference - perhaps this is not the same watchdog(?), if I understand correctly, there are multiple watchdogs in esp32.

I then inserted the delay(50), which makes a difference - without it, the watchdog comes already after around 40 seconds.

I am now thinking, that it is the filesystem that is not happy.
Edit: if I remove the file write line, it only takes about 3 seconds to download, and no watchdog time (which makes sense, of course).

I hope someone can say "but you forgot to ..." - I'm at loss.

This is the complete ftp function for reference;

void ESP32_FTPClient::DownloadFileFS(fs::FS &fs, const char * filename, size_t length) {
  FTPdbgn("Send RETR");
  if(!isConnected()) return;
  client.print(F("RETR "));
  client.println(F(filename));
  
  char _resp[ sizeof(outBuf) ];
  GetFTPAnswer(_resp);

  uint8_t _buf[512];
  unsigned long _m = millis();
  String _filename = String("/") + String(F(filename));
  
  File file = fs.open(_filename.c_str(), FILE_WRITE);
  if (!file)
  {
    Serial.println("Failed to open file for writing");
    return;
  }

  while( !dclient.available() && millis() < _m + timeout) delay(1);

  while(dclient.available()) 
  {
    esp_task_wdt_reset();  
    delay(50);
    if( length >= 512)
    {
      dclient.readBytes(_buf, 512);

      if(file.write(_buf, 512) == 512){
        //FTPdbgn(".");
      }
      length -= 512;
	  }
    else
    {
      dclient.readBytes(_buf, length);

      if(file.write(_buf, length) == length){
        //FTPdbgn(".");
      }
      length = 0;
    }
  }
  file.close();
}

Any help/suggestions appreciated!

How do you know that spiffs works with larger images?

For me is a fail if you need files larger than 4kb.

I tested with this sketch on a xiao seeed studio s3.

#include "SPIFFS.h"
#ifndef DEBUG        // Do not modify.
#define DEBUG false  // <- Set your preference here
#define DebugPrint(...)       if(DEBUG){Serial.print(__VA_ARGS__);}
#define DebugPrintln(...)     if(DEBUG){Serial.println(__VA_ARGS__);}
#define DebugPrintf(...)      if(DEBUG){Serial.printf(__VA_ARGS__);}
#endif

// Global variables
const char* FILENAME = "/data.bin";
uint8_t* flashData = nullptr;
const int maxRetries = 5;
size_t successfulSizes[1024];  // Array to store successful file sizes
int successCount = 0;          // Counter for successful attempts

void initAndSaveToSPIFFS(const char* filename, size_t dataLength) {
  DebugPrint("SPIFFS init.");
  int retries = maxRetries;  // Number of retries for SPIFFS initialization
  while (retries--) {
    if (SPIFFS.begin(true)) {
      break;  // Exit the loop if initialization is successful
    }
    DebugPrint(" .");
    delay(1);  // Short pause before retrying
  }

  // Check if the file exists or needs to be deleted
  if (!SPIFFS.exists(filename) || SPIFFS.remove(filename)) {
    DebugPrintln("File deleted: " + String(filename));
  } else {
    DebugPrintln("Error deleting file: " + String(filename));
  }

  // Start the process of saving the vector to the file
  bool success = false;
  for (int attempt = 0; attempt < maxRetries; ++attempt) {
    DebugPrint("Opening file ");
    File file;
    int i = 0;
    // Retry opening the file for writing
    while (i < maxRetries) {
      file = SPIFFS.open(filename, FILE_WRITE);
      if (file) {
        DebugPrintln("File opened.");
        break;
      } else {
        DebugPrint(" .");
        delay(1);
        i++;
      }
    }

    // Write data to the file
    size_t bytesWritten = file.write(flashData, dataLength);
    file.close();
    DebugPrint("File written.");

    // Check the file size after writing
    File checkFile = SPIFFS.open(filename, FILE_READ);
    if (!checkFile) {
      DebugPrintln("\nError opening file for reading after writing.");
      return;  // Exit if there's an error opening the file for reading
    }
    size_t fileSize = checkFile.size();
    checkFile.close();

    // Verify if the file size is correct
    if (fileSize == 0 || fileSize != dataLength) {
      DebugPrint(".");
      delay(1);
    } else {
      success = true;  // Mark success if the file size matches and is not zero
      DebugPrintf("\nSUCCESS!    %zu K\n", fileSize / 1024);
      successfulSizes[successCount++] = dataLength / 1024;  // Store successful size
      break;
    }
  }
  if (!success) {
    DebugPrintf("\nFailed!    %zu K\n", dataLength / 1024);
  }
  DebugPrintln("-");
}

void setup() {
  Serial.begin(115200);
  while (!Serial) { delay(25); }
  DebugPrintln("Init serial monitor OK");

  // Loop to test different data sizes
  for (size_t k = 64; k <= 1024; k += 16) {
    size_t MAX_ARRAY_SIZE = (k * 1024);
    DebugPrintf("Trying:    %zu K\n", MAX_ARRAY_SIZE / 1024);
    DebugPrint("Allocating memory ");
    int attempt = 0;

    // Retry memory allocation
    while (attempt < maxRetries) {
      flashData = (uint8_t*)malloc(MAX_ARRAY_SIZE);
      if (flashData) {
        DebugPrintln(" Success!");
        for (size_t i = 0; i < MAX_ARRAY_SIZE; i++) {
          flashData[i] = (i % 255) + 1;
        }
        initAndSaveToSPIFFS(FILENAME, MAX_ARRAY_SIZE);  // Call the combined function
        memset(flashData, 0, MAX_ARRAY_SIZE);
        free(flashData);
        break;
      } else {
        DebugPrint(".");
        delay(1);
        attempt++;
      }
    }
    if (attempt == maxRetries) {
      DebugPrintln("\nMemory allocation failed.");
    }
    delay(1);
  }
  // Print successful file sizes in a tabular format
  Serial.println("\nSuccessful file sizes (in KB):");
  const int valuesPerLine = 10; // Change this to adjust the number of values per row
  for (int i = 0; i < successCount; i++) {
    Serial.printf("%4zu K ", successfulSizes[i]); // Adjust width here (4 spaces)
    if ((i + 1) % valuesPerLine == 0) { // Print new line after every X values
      Serial.println(); // New line after certain number of values
    }
  }
  Serial.println("\nFinished!");
}

void loop() {
  delay(10000);
}

That monitor output you posted is the startup after the reset. What does it say when the WDT triggers? It should look like

E (10139) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
E (10139) task_wdt:  - loopTask (CPU 1)
E (10139) task_wdt: Tasks currently running:
E (10139) task_wdt: CPU 0: IDLE
E (10139) task_wdt: CPU 1: loopTask
E (10139) task_wdt: Aborting.

including the offending task

That monitor output you posted is the startup after the reset. What does it say when the WDT triggers? It should look like

Hm, I guess I have to setup some callback to get that information? Did not know that I could hook into the watchdog.

I tested with this sketch on a xiao seeed studio s3.

I tried your sketch, but afaict, it fails because it is trying to allocate RAM, not because of the filesystem?!

Oh, forgot: under the Tools menu, about half-way down for an ESP32 board, is Core Debug Level. The WDT message is Error-level, so you just need to pick something other than None. Then re-upload.

Useful; I get a lot of other useful information, but it does not add anything to the watchdog reset.

I tried the settings "debug" and "verbose". Is there some define that I might have to enable as well? (Sorry, I am rather new to arduino & esp32..)

This is with Core Debug Level Verbose;

20:56:01.167 -> Downloading tester_fw.bin, size 1020688
20:56:41.195 -> ets Jul 29 2019 12:21:46
20:56:41.195 -> 
20:56:41.195 -> rst:0x8 (TG1WDT_SYS_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
20:56:41.195 -> configsip: 0, SPIWP:0xee
20:56:41.195 -> clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
20:56:41.195 -> mode:DIO, clock div:1
20:56:41.195 -> load:0x3fff0030,len:1344
20:56:41.264 -> load:0x40078000,len:13964
20:56:41.264 -> load:0x40080400,len:3600
20:56:41.264 -> entry 0x400805f0
20:56:41.581 -> [    24][D][esp32-hal-cpu.c:244] setCpuFrequencyMhz(): PLL: 480 / 2 = 240 Mhz, APB: 80000000 Hz
20:56:41.613 -> [    58][V][esp32-hal-uart.c:330] uartBegin(): UART0 baud(115200) Mode(800001c) rxPin(3) txPin(1)
20:56:41.613 -> [    66][V][esp32-hal-uart.c:416] uartBegin(): UART0 not installed. Starting installation
20:56:41.645 -> [    77][V][esp32-hal-uart.c:463] uartBegin(): UART0 initialization done.
20:56:41.645 -> [    92][V][esp32-hal-uart.c:330] uartBegin(): UART2 baud(115200) Mode(800001c) rxPin(16) txPin(17)
20:56:41.645 -> [   101][V][esp32-hal-uart.c:416] uartBegin(): UART2 not installed. Starting installation
20:56:41.645 -> [   109][V][esp32-hal-uart.c:463] uartBegin(): UART2 initialization done.
20:56:41.773 -> [   233][I][esp32-hal-i2c.c:75] i2cInit(): Initialising I2C Master: sda=21 scl=22 freq=100000

OK several printf's later, I have found the bug - a Serial.println() statement. Pheww.
Now my code connects to the server, checks version info, and download new firmware as needed, and restarts. Cool.

I have 20 nodes, and was rather tired of updating them through their web interface, so now I can simply relax with a cup of coffe while they do their work themselves... :wink:

Thank's for helping, the good thing is that one always learns from every things.

Do you see the WDT error with this sketch?

void setup() {
  enableLoopWDT();
}

void loop() {
  for (;;) {}  // don't return
}

I see it even with Core Debug Level: None

How was that the bug?

Yes, with that code, I get the output you mentioned;

07:37:10.039 -> Rebooting...
07:37:10.071 -> ets Jul 29 2019 12:21:46
07:37:10.071 -> 
07:37:10.071 -> rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
07:37:10.071 -> configsip: 0, SPIWP:0xee
07:37:10.071 -> clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
07:37:10.071 -> mode:DIO, clock div:1
07:37:10.071 -> load:0x3fff0030,len:1344
07:37:10.072 -> load:0x40078000,len:13964
07:37:10.075 -> load:0x40080400,len:3600
07:37:10.075 -> entry 0x400805f0
07:37:15.213 -> E (10131) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
07:37:15.214 -> E (10131) task_wdt:  - loopTask (CPU 1)
07:37:15.214 -> E (10131) task_wdt: Tasks currently running:
07:37:15.214 -> E (10131) task_wdt: CPU 0: IDLE0
07:37:15.214 -> E (10131) task_wdt: CPU 1: loopTask
07:37:15.214 -> E (10131) task_wdt: Aborting.
07:37:15.288 -> 
07:37:15.288 -> abort() was called at PC 0x400d4191 on core 0

I have a feeling the watchdog reset I got, was from another watchdog? I was not starting it explicitly.

The problem with the Serial.println was something like;

size_t fileSize = file.size();
Serial.printn("File size = " + fileSize);

This code was copy pasted from some example I found, I dont know if this is an allowed syntax or not - personally, I use Serial.printf() for all printouts instead, since it is more in line with my upbringing... :slight_smile:

It's allowed, but almost certainly a bad idea: one of those C foot-guns. "some string" is a const char*. Adding an integer to a pointer is allowed, and sometimes useful.

  Serial.println("one two" + 4);

But you've added a completely unrelated -- and large? -- number, so now it points who knows where, and telling Serial to print what is there, until it finds a NUL-terminator. It might actually be pointing to some restricted memory which causes a fault and reset, directly or indirectly.

I'm trying to remember from where I got that piece of code, but I can't really remember. Anyway, I am guilty of thinking that example code should work, while I know this is not always the case.. So in the end I'm not blaming anyone but myself.. :slight_smile:
And:- I learned a little, so it is not all bad, in the end.