Help debugging intermittent crashes or WDT resets in Library on ESP8266

Hi, I am developing an Arduino library for communicating with a blockchain: GitHub - as-iotex/iotex-arduino

I am experiencing intermittent crashes or WDT resets and I am having trouble pinpointing the issue. This only happens with ESP8266. ESP32 and Nano33 IoT are also supported and working fine with no issues.
I suspect it could be memory corruption related. I've tried debugging available RAM at various points and the device always has RAM available

In order to reproduce, please clone the repository to the Arduino sketch directory (I stil haven't published the library) and compile the Xrc20TokenTransferMultiple example using Arduino IDE. You would need to edit secrets.h to input your Wifi credentials
The expected output is that the program just loops and tries to send an action every couple of seconds. If everything works as expected, it should print the following every time the action is sent (Note the result is always error because the account for this private key doesn't have enough balance. This is fine):

Calling contract with data: 0xa9059cbb0000000000000000000000005840bf8e5d3f5b66ee52b9b933bdac9682e386d00000000000000000000000000000000000000000000000000de0b6b3a7640000
Result : ERROR_GRPC
Progrm finished

However, most of the times the ESP8266 crashes, with one of these two errors (I can't figure out why it's sometimes a crash and other times a WDT reboot):

  1. WDT reset:
ets Jan  8 2013,rst cause:4, boot mode:(3,6)

wdt reset
load 0x4010f000, len 3460, room 16
tail 4
chksum 0xcc
load 0x3fff20b8, len 40, room 4
tail 4
chksum 0xc9
csum 0xc9
v0007b8a0
~ld
  1. A Crash with the following stack dump:
Decoding stack results
0x4020be3e: iotex::Encoder::protobuf_encodeExecution(iotex::responsetypes::ActionCore_Execution&, unsigned char*, unsigned int) at /Users/Santos/Documents/Arduino/libraries/iotex-client/src/encoder/encoder.cpp line 204
0x4020beab: std::vector   >::emplace_back (unsigned char&&) at /Users/Santos/Library/Arduino15/packages/esp8266/tools/xtensa-lx106-elf-gcc/3.0.4-gcc10.3-1757bed/xtensa-lx106-elf/include/c++/10.3.0/bits/stl_uninitialized.h line 1022
0x4020b915: Cat    > >(std::vector   >, std::vector   > const&) at /Users/Santos/Library/Arduino15/packages/esp8266/tools/xtensa-lx106-elf-gcc/3.0.4-gcc10.3-1757bed/xtensa-lx106-elf/include/c++/10.3.0/bits/stl_uninitialized.h line 1022
0x40209b68: iotex::api::Wallets::sendExecution(unsigned char const*, unsigned char const*, iotex::responsetypes::ActionCore_Execution const&, unsigned char*) at /Users/Santos/Library/Arduino15/packages/esp8266/hardware/esp8266/3.0.2/cores/esp8266/WString.h line 79
0x40209b2e: iotex::api::Wallets::sendExecution(unsigned char const*, unsigned char const*, iotex::responsetypes::ActionCore_Execution const&, unsigned char*) at /Users/Santos/Documents/Arduino/libraries/iotex-client/src/api/wallet/wallets.cpp line 128
0x40203dfc: generate_k_rfc6979 at /Users/Santos/Documents/Arduino/libraries/iotex-client/src/extern/crypto/rfc6979.c line 44

Please note this example works and doesn't crash for ESP32 and also Nano33 IoT, so I doubt the code in my library is the cause (except for the high RAM usage which could corrupt the heap)

Could you let me know what steps I could take to troubleshoot this?

Many thanks

General ideas some/all of which you may have already done:

  1. loops should periodically call yield()
  2. callbacks, ISRs etc should be forced to stay in RAM (ICACHE_RAM_ATTR)
  3. ESP.getFreeHeap(); to check for memory usage
  4. progress to the minimum sketch which can still reproduce the problem.

This is actually very good. I'm debugging an ESP8266 problem that appears after a few days

Thanks very much , I already had tried all your suggestions:

  1. Even if all loops call yield, it still crashes
  2. There is no interrupt handling in my library code
  3. getFreeHeap always returns free heap
  4. I know it's still quite complex. But unfortunately this is the minimum sketch in which I can reproduce the issue. I would try to have the same sketch without Wifi conection and see what happens

I also tried debugging using the GDB stub and VisualGDB, but the debugger just stops in the exception handler and gives me no infrmation about the exception and no call stack. If I try stepping into the funcions then the debugger hangs or crashes...

I am currently trying to compile OpenOCD for ESP8266 in order to debug using PlatformIO
Will keep you updated on my progress in cse you are interested

This is actually very good. I'm debugging an ESP8266 problem that appears after a few days

Let me know if there is anything I can help with :slight_smile: