ESP32: ISR timing out (while it didn't with ESP8266)

I’m rewriting an ESP8266 program to run on an ESP32. It reads (and logs) sensor data, connects to a wifi for ntp synchronisation and there also is an LCD to show current sensor readings and statuses. I’m planning on running the ESP32 on batteries, so the LCD is activated by pushing a button.

Connecting to the wifi (or even a backup wifi), syncing with ntp and some sensor readings all have timeouts in the seconds range, so I use an interrupt to detect when the button is pushed. The interrupt service routine then switches on the LCD.

Although this concept worked on an ESP8266 (beginners luck?), pushing the button on an ESP32 results in a “Guru Meditation Error: Core 1 panic'ed (Interrupt wdt timeout on CPU1).” So, is the ESP32 that much more strict than the ESP8266 when it comes to ISR interruption times?

A simplified code example (including button debouncing) reproducing the error:

#include <LiquidCrystal_I2C.h>

#define MILLIS_MAX 4294967295  // = (2^32 - 1)
#define PIN_BUTTON 32

#define DISPLAY_I2C_ADDR 0x3F
#define DISPLAY_WIDTH 20
#define DISPLAY_HEIGHT 4

const unsigned short debounceDelay = 50;  // [ms]
const unsigned short displayTime = 2000;  // [ms]

// volatile since manipulated by an ISR
volatile unsigned long currentMillis;
volatile unsigned long lastDebounceMillis;   // the last time the interrupt was triggered
volatile unsigned long millisAtButtonPushed;
volatile boolean LCDOn;

LiquidCrystal_I2C lcd( DISPLAY_I2C_ADDR, DISPLAY_WIDTH, DISPLAY_HEIGHT );

unsigned long getMillisDelta( unsigned long millisStart, unsigned long millisEnd ) {
  if( millisEnd < millisStart ) return MILLIS_MAX - millisStart + millisEnd + 1;
  else return millisEnd - millisStart;
}

void IRAM_ATTR isr_falling_edge();
void IRAM_ATTR isr_rising_edge();
void IRAM_ATTR isr_falling_edge() {
  currentMillis = millis();
  if( getMillisDelta(lastDebounceMillis,currentMillis) > debounceDelay ) {
    millisAtButtonPushed = currentMillis;
    LCDOn = true;
    lcd.backlight(); 
    lcd.display();

    attachInterrupt( PIN_BUTTON, isr_rising_edge, RISING );
  }
}
void IRAM_ATTR isr_rising_edge() {
  currentMillis = millis();
  if( getMillisDelta(lastDebounceMillis,currentMillis) > debounceDelay ) {
    lastDebounceMillis = currentMillis;
    attachInterrupt( PIN_BUTTON, isr_falling_edge, FALLING );
  }
}

void setup() {
  Serial.begin( 115200 );

  lcd.begin( DISPLAY_WIDTH, DISPLAY_HEIGHT );
  lcd.init(); 
  lcd.backlight();

  pinMode( PIN_BUTTON, INPUT_PULLUP );
  attachInterrupt( PIN_BUTTON, isr_falling_edge, FALLING );

  delay(500);
  lcd.noDisplay();
  lcd.noBacklight();
  LCDOn = false;
}

void loop() {
  currentMillis = millis();
  if( LCDOn ) {
    Serial.println( "LCD should be on..." );
    if( getMillisDelta(millisAtButtonPushed,currentMillis) > displayTime ) {
      lcd.noDisplay();
      lcd.noBacklight();
      LCDOn = false;
    }
  }
  else Serial.println( "LCD should be off..." );
}

I’m well aware of the requirement to have ISR’s as short as possible (I’m not writing to Serial nor to the LCD). But it is also clear that the problem is caused by both lcd.backlight() and lcd.display(), since they cause the same timeout on their own and commenting out both statements eliminates the occurrence of the problem.

Plan B is to rewrite the code: set a flag in the ISR, and pick up its changed state in the next execution of the loop. Most of the time this will work and the LCD will activate almost immediately after the button is pushed, but since a loop can sometimes take several seconds (while timing out on wifi, ntp-sync or sensor readings), this will then result in an LCD which doesn’t seem to react to the button being pushed.

Before implementing plan B (or even plan C: checking the flag set by the ISR multiple times during the execution of the loop), I’m hoping for another, more elegant (correct?) way to get the desired result: activating an LCD the moment a button is pushed, without having to hope for ‘very short execution times of the loop’. Or is the ESP32 just much more strict when it comes to timeouts for interrupts than the ESP8266, and there’s no way around this...

Any ideas on how to get out of this, or is there just plan B?

Thanks in advance!

as you found out you have to be much more careful what functions you call in ESP32 interrupt routines than ESP8266 etc otherwise you get system crash
I think the solution is plan B
I usually set volatile flags in the ESP32 interrupt routines with the corresponding action executed in loop(), e.g.. calling lcd.backlight() and lcd.display()
you have to make sure code in loop() is non-blocking otherwise there can be delays in execution as you suggested

You can disable the wdt globally risking to prevent the system from freezing indefinitely or you can increase the wdt timeout period.

@GolamMostafa: Disabling the wdt globaly seems a bit to ‘quick and dirty’ to me, so maybe increasing the ISR timeout, but then again, the fact that the standard value results in problems is maybe an indication that I need to rethink my approach...

@horace: Rewriting timeout-loops within the code of the ‘eternal loop’ seems possible for the wifi connection and sensor readings (and thus spreading the timeout-test over consecutive executions of the ‘eternal loop’), but not for the ntp-sync, which is done by calling getLocalTime with it’s own internal timeout parameter/mechanism…

Maybe lighting a LED when the button is pushed, can be an alternative solution, preventing the assumption that the system does not respond to the push button (when in the timeout-checking loop of getLocalTime). Is a call to digitalWrite acceptable from within an ISR?

If you rewrite your code to use the multi-tasking capabilities of ESP32's built-in FreeRTOS, you won't need interrupts to detect buttons.

Also, you didn't show the code you're using for NTP synch, but done properly the ESP32 will handle that in the background.

The time it takes to push a button is very long in computer land. You can get rid of the isr to detect it by polling in loop().

gfvalvo, how would you recommend using FreeRTOS to respond to a button press ? I am thinking it would have to be either an interrupt or polling.

could you use SNTP time sync callback?
note the example code is using Serial.println() statements in the callback which is not recommended as it can corrupt any output it interrupts and possibly crash the program - copy time to a local volatile variable, e.g.

volatile time_t sntptime=NULL;
void timeSyncCallback(struct timeval* tv) {
  sntptime=tv->tv_sec;   // copy the time
  //Serial.println("\n----Time Sync-----");
  //Serial.println(tv->tv_sec);
  //Serial.println(ctime(&tv->tv_sec));
}

and access it

void loop() {
  if (sntptime != NULL) {
    Serial.println("\n----Time SyncLoop-----");
    Serial.println(sntptime);
    Serial.println(ctime((const time_t*)&sntptime));
    sntptime=NULL;
  }
}

Right. I'd have a FreeRTOS task do the polling. By properly setting task priorities and time slicing between the tasks, the polling task would poll and debounce the button and either handle the LCD or signal another task to do it.

1 Like

Again, I don't understand your problem with NTP synching. You only need to configure the synching once -- in your setup() function. After that ESP32 handles it in the background by periodically synching the local RTC to the NTP server. There's no blocking of your code. Getting the time from the RTC is very fast and does not require going out the NTP server.

Please repost your code and add the NTP and time function so that we can see what you're doing.

a bit confusing..
got your test sketch to work properly..
did a few things..
just using 1 ISR falling..
not touching lcd object just setting LCDOn var in ISR..
seems better anyways..

simmed here..
changed screen address for sim, change it back for your hardware..

running on batts, might be time to look into putting the mpu to sleep..

esp32/api-reference/system/sleep_modes

have fun.. ~q

@gfvalvo: I'm currently not on the computer containing the code, so I'll post it later.

I took inspiration from https://randomnerdtutorials.com/esp32-date-time-ntp-client-server-arduino/: after initializing in de setup-block for utc time with configTime(0, 0 **url_ntpServer**), and after a first synchronization was successful, getLocalTime will (if I understood correctly) return a timestamp originating either from the ESP32 RTC (if no wifi connection available) or from an ntp-server (if a wifi connection is available and syncing was successful).

So in the ‘eternal-loop’, every time I need a timestamp for the sensor readings, I call getLocalTime (which has a timeout argument). If the periode for NTP-sync has passed, I connect to the wifi before I call getLocalTime, so the RTC is synced while I get my timestamp.

That's untrue. Post your code.

My life would have been a lot ‘easier’ if my initial internet search for ntp syncing would have referred me to these pages, clearly explaining en few details of (background) sntp-syncing…

So yes, I’ll be implementing it. One thing I need to further investigate is how the syncing mechanism reacts when there is no wifi connection (bad signal at ESP32 location): will it keep trying to sync, of will it move on an only try again after the next sync interval has passed. I’ll probably find the answer in the links under ‘suggested readings’ :wink:

Thank you for the suggestion (and link).

Yes, that’s essentially ‘plan B’, which I was hesitant to implement because of the possibility of long execution times of a single main loop passage, due to naive coding of timeouts (while loops within the main loop) and incorrect understanding of the (s)ntp-syncing in the background…

That was the idea after I got it all working in the first place :slight_smile:

Yes, after reading SNTP time sync callback, (as suggested by @horace ) I realize it now.

Since I have to rethink/rewrite not only the part related to getLocalTime, but also get rid of the timeout-loops within the main loop, I’ll rewrite the whole thing to try and end up with a main loop that executes fast enough and handles timeouts in between different loop-executens (instead of ‘waiting’ for them to happen), I'll start with that. I think it makes more sense to fix the known problems first.

Thank you all for your feedback!

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.