Puzzled by ESP32 timers and mutex

I'm using ESP32 timers for the first time. I'm puzzled by what's happening when loop() enters a critical section and locks out the timer call back.

This is the code I'm using to demonstrate the behaviour that I don't understand. It's based on an exmple whose link is in the code. I got the mutex use from another example, which I don't have a link for right now, but could probably find it again if asked to.

#include <Arduino.h>
// Define data to be shared between loop() and onTimer() 
volatile uint32_t onTimerCount = 0;
// Define mutex to ensure loop() and onTimer() can't access shared data at the same time
portMUX_TYPE timerMux = portMUX_INITIALIZER_UNLOCKED;

hw_timer_t *myTimer = NULL;
// This is the call back for the timer. It's called every ms.
void IRAM_ATTR onTimer() {
  portENTER_CRITICAL_ISR(&timerMux); // Ensure loop() and onTimer() don't access shared data at the same time
    onTimerCount++;
  portEXIT_CRITICAL_ISR(&timerMux); // loop() can now access shared data
}

void setup() {
  Serial.begin(115200);
  // Set up 1kHz timer interrupt
  // See https://circuitdigest.com/microcontroller-projects/esp32-timers-and-timer-interrupts for explanation
  myTimer = timerBegin(0, 80, true);
  timerAttachInterrupt(myTimer, &onTimer, true);
  timerAlarmWrite(myTimer, 1000, true); // I removed three zeros to make it 1kHz instead of 1Hz
  timerAlarmEnable(myTimer); //Just Enable
}

void loop() {
  static uint32_t totalCount = 0;
  portENTER_CRITICAL(&timerMux); // Ensure loop() and onTimer() don't access shared data at the same time
    uint32_t onTimerCountSample1 = onTimerCount;
    delay(100); // To make sure onTimer() is called whilst mutex is locked.
    Serial.print("onTimerCount ");
    if(onTimerCount == onTimerCountSample1){
      Serial.print("didn't change ");
    }
    else{
      Serial.print("changed by ");
      Serial.print(onTimerCount-onTimerCountSample1);
    }
    Serial.println(" in loop's critical section");
    totalCount+= onTimerCount;
    onTimerCount = 0;
  portEXIT_CRITICAL(&timerMux); // onTimer() can now access shared data
  Serial.print("totalCount = ");
  Serial.println(totalCount);
  delay(900); // To make total delay ~1 second
}

Serial monitor output

onTimerCount didn't change  in loop's critical section
totalCount = 0
onTimerCount didn't change  in loop's critical section
totalCount = 1000
onTimerCount didn't change  in loop's critical section
totalCount = 2000
<snip>
onTimerCount didn't change  in loop's critical section
totalCount = 150000

That just keeps going following the same pattern.

The code is running on an ESP32 devkit C V4 and built with PlatformIO using

platform = espressif32 @ ~3.5.0
board = esp32dev
framework = arduino

What I don't understand is:

  • What context is onTimer() called from?
  • How, even though onTimer() is locked out by the mutex for ~100ms per second, the count goes up by 1000 (not 900) per second.

I'm guessing that:

  • Something is queueing the calls to onTimer() whilst the mutex is locking it out for 100ms. Then when loop() releases the lock, onTimer() will be called in a burst of 100 calls to 'catch up'
  • Maybe it's got something to do with FreeRTOS being used behind the scences.

Perhaps someone reading this will know what's going on and be kind enough to point me in the right direction for me to get a better understanding of how this is working.

That's a misuse (abuse) of Critical Sections. They are meant to last a handful of microseconds, not hundreds of milliseconds. Also, interrupts are disabled when non-ISR code enters a Critical Section. I can't speak to the behavior of Timer Interrupts or the delay() function while interrupts are disabled. I assume the former are queued to some level (at least one).

Critical sections are designed to provide low level atomic operations across a few instructions to implement the higher level synchronization functions. With the total misuse as you have done all sorts of undefined behavior (usually bad) can occur.

@gfvalvo @oldcurmudgeon Of course it's misuse. I know how critical sections and interrupt handlers should be used. I'd never use that in real code. This is an experiment to demonstrate an extreme case of the behaviour I'm trying to understand. If you know what context calls onTimer() or how it manages to apparently queue the 'missing calls to onTimer, well that would be useful information.

What made me curious was the example used (waits for?) a mutex in what the example claimed to be an interrupt handler. That didn't make sense. How would an interrupt handler wait on a mutex. Some other Googling had someone on the espressif forum saying their portENTER_CRITICAL_ISR and portENTER_CRITICAL were both implemented with spinlocks. Which struck me as a way to lead to critical deadlock. loop() isn't running because it's been interrupted. The interrupt handler will never complete because it's just spinning its wheels waiting for the mutex to be freed. So that can't be what's happening. Hence my questions about what's actually happening.

ESP Timer (High Resolution Timer) - ESP32 - β€” ESP-IDF Programming Guide v5.4 documentation.

Do you know what your delay(100) call actually does with interrupts disabled?

Can't happen. When non-ISR code enters a Critical Section, interrupts are disabled. So, an ISR can't run an attempt to enter a Critical Section at the same time.

Did you get the mutex example from this link (scroll down or search for " FreeRTOS Critical Section" on that page)?

https://www.circuitstate.com/tutorials/how-to-write-parallel-multitasking-applications-for-esp32-using-freertos-arduino/

Interrupts can also interrupt FreeRTOS tasks depending on how they are configured. If so what happens when you have a piece of code that is so critical that you don’t want the interrupts to block it even for a short time? Disabling the interrupts is one method, but that defeats the purpose of using interrupts. To solve the problem, FreeRTOS allows us to define critical section code using taskENTER_CRITICAL() and taskEXIT_CRITICAL() macros. You can call taskENTER_CRITICAL() just before your critical code and then call taskEXIT_CRITICAL() once done. Always make sure that your critical section code is as short as possible. Otherwise, it will adversely affect the interrupt response time.

So besides taking care of "as short as possible" the main difference between your approach and the use in this and other examples is that you have created a critical section inside of loop().

The portENTER_CRITICAL() macro and the counterpart portEXIT_CRITICAL() macro are designed for use in FreeRTOS tasks. They depend on the use of he FreeRTOS scheduler.

Vanilla FreeRTOS allows interrupts to be disabled and enabled by calling taskDISABLE_INTERRUPTS and taskENABLE_INTERRUPTS respectively. IDF FreeRTOS provides the same API. However, interrupts are only disabled or enabled on the current core.

The following quotes are taken from

https://docs.espressif.com/projects/esp-idf/en/v5.4/esp32/api-reference/system/freertos_idf.html

Symmetric multiprocessing [SMP] is a computing architecture where two or more identical CPU cores are connected to a single shared main memory and controlled by a single operating system.

Disabling interrupts is a valid method of achieving mutual exclusion in Vanilla FreeRTOS (and single-core systems in general). However, in an SMP system, disabling interrupts is not a valid method of ensuring mutual exclusion. Critical sections that utilize a spinlock should be used instead.

For further information you may also have at look at

https://github.com/ShawnHymel/introduction-to-rtos/tree/main

Have fun!
ec2021

P.S.: Just to add this information

The method for creating a critical section mutex for ESP32 is called portENTER_CRITICAL which is equivalent to taskENTER_CRITICAL() . Similarly, there is portEXIT_CRITICAL that is equivalent to taskEXIT_CRITICAL() .

1 Like

Thank you. This answers my question about what context its called from


I'll have a proper read of that document tomorrow.

I do not. My first guess is it would never return because the millis counter wouldn't be incrementing. My second guess would be that it returns immediately if it checks whether interrupts are enabled before waiting.

I assumed interrupts were not disabled because the critical section apears to be implemented with a mutex, rather than by disabling interrupts.

When I added that delay(100) in the critical section it was to "see what happened" in my attempt to find out what was actually going on.

I started off believing that onTimer() was an interrupt handler. The example text talked about it being an interrupt handler. The other, almost identical, example where I got the mutex code from uses the ISR form of portENTER_CRITICAL_ISR

Due to the use of the mutex and delay() appearing to work correctly, came my doubts about whether onTimer() was an interrupt handler, or more likely, just being called from some other, non-interrupt context. I now know the latter is the case.

That's true if the critical section mechanism is implemented by disabling interrupts. It looked to me like the critical section mechanism wasn't disabling interrupts, but using the mutex. If the critical section in loop() wasn't disabling interrupts, and onTimer() really was an interrupt handler, then I couldn't see how the interrupt handler wouldn't hang forever waiting for the mutext to be freed. Again making me think the first example I looked at has mislead me when it talked about onTimer() being an interrupt handler.

[Edit added:] Oh and the name of this function call helped me to think onTimer() was an interrupt handler

timerAttachInterrupt(myTimer, &onTimer, true);

It is.

No. I think it was the ESP32 example in this link
Thank you for the rest of your post. I'll have a read of the information you pointed me at tomorrow.

If that's the case, then why does it need a mutex? Every other 'enter critical section' implementation that disables interrupts, that I've come across in the last 40 or so years, doesn't use a mutex.

I think the answer to your question can be found here

https://en.wikipedia.org/wiki/Spinlock

The requirement is based on the fact that the ESP32 has two cores:

However, in an SMP system, merely disabling interrupts does not constitute a critical section as the presence of other cores means that a shared resource can still be concurrently accessed. Therefore, critical sections in IDF FreeRTOS are implemented using spinlocks.

Source again:

https://docs.espressif.com/projects/esp-idf/en/stable/esp32/api-reference/system/freertos_idf.html#critical-sections

1 Like

I'd say it's not used so much as a mutex but an ID flag. Probably to identify the Critical Section to coordinate between the two cores. See link provided by @ec2021.

Thank you. I'll look into that tomorrow.
Apologies for posting a screen shot of code. I used a screen shot because this view shows the defintions that are active. The sections of code slightly greyed out, with vertical lines down them, are incactive.


I'll dig deeper tomorrow

I've verified that what you said is correct by adding some code to loop() to check if millis() changes in the critical section:

void loop() {
  static uint32_t totalCount = 0;
  portENTER_CRITICAL(&timerMux); // Ensure loop() and onTimer() don't access shared data at the same time
    uint32_t onTimerCountSample1 = onTimerCount;
    uint32_t millisSample1 = millis();
    delay(100); // To make sure onTimer() is called whilst mutex is locked.
    uint32_t millisSample2 = millis();
    Serial.print("onTimerCount ");
    if(onTimerCount == onTimerCountSample1){
      Serial.print("didn't change ");
    }
    else{
      Serial.print("changed by ");
      Serial.print(onTimerCount-onTimerCountSample1);
    }
    Serial.print(" : millis() ");

    if(millisSample1 == millisSample2){
      Serial.print("didn't change ");
    }
    else{
      Serial.print("changed by ");
      Serial.print(millisSample2-millisSample1);
    }
    Serial.println(" in loop's critical section");
    totalCount+= onTimerCount;
    onTimerCount = 0;
  portEXIT_CRITICAL(&timerMux); // onTimer() can now access shared data
  Serial.print("totalCount = ");
  Serial.println(totalCount);
  delay(900); // To make total delay ~1 second
}

Serial monitor output:

onTimerCount didn't change  : millis() didn't change  in loop's critical section
totalCount = 0
onTimerCount didn't change  : millis() didn't change  in loop's critical section
totalCount = 1000
onTimerCount didn't change  : millis() didn't change  in loop's critical section
totalCount = 2000
onTimerCount didn't change  : millis() didn't change  in loop's critical section
totalCount = 3000

This topic, post#2 says delay() just returns if interrupts are off.

I'm very confused about how the delay(100) in the critical section affects the total count though.
With delay(100) in the critical section, plus delay(900) outside the critical section, the 1kHz onTimer() still counts 1000 each time round loop().
If I change the delay(100) to delay (200) then it counts 1100 each time round the loop():

onTimerCount didn't change  : millis() didn't change  in loop's critical section
totalCount = 0
onTimerCount didn't change  : millis() didn't change  in loop's critical section
totalCount = 1100
onTimerCount didn't change  : millis() didn't change  in loop's critical section
totalCount = 2200
onTimerCount didn't change  : millis() didn't change  in loop's critical section
totalCount = 3300

I put a digitalWrite(HIGH) / LOW pair before / after the delay() in the critical section and it's only a few uS. Confirming delay() is just returning without doing anything.


I'm at a loss to come up with a hypothesis about how the total count matches the sum of the delay(200) which is 'ignored' plus the delay(900).

Moved the digitalWrite() into onTimer() critical section, and changed HIGH/LOW pair into digitalWrite(23,!digitalRead(23)). Also changed the delays inside / outside the critical section in loop() to 10/90. This lets me see a whole 100ms cycle in one scope trace. There's no irregularities in the trace edge timings.

This is much simplified, but demonstrates the behaviour that I don't understand:

#include <Arduino.h>
portMUX_TYPE timerMux = portMUX_INITIALIZER_UNLOCKED;

void setup() {
  Serial.begin(115200);
  pinMode(22,OUTPUT);
  pinMode(23,OUTPUT);
}

void loop() {
  digitalWrite(23,!digitalRead(23));
  portENTER_CRITICAL(&timerMux);
    digitalWrite(22,HIGH);
    delay(50);
    digitalWrite(22,LOW);
  portEXIT_CRITICAL(&timerMux); // onTimer() can now access shared data
  delay(50);
}

In the scope traces: blue is port 23, which toggles every 100ms and red is showing how long the delay(50) in the critical section takes to execute.

At 100ms per division it shows that loop executes every 100ms

At 5us per division it shows that the delay(50) in the critical section takes a few uS to execute:

The question I have no idea how to answer is: How is the loop() execution time 100ms when one of the two delay(50)s only takes a few us?

I think you have found a bad example.
It looks, however, somewhat similar to the example here : Timer - - β€” Arduino ESP32 latest documentation . . .

/*
 Repeat timer example

 This example shows how to use hardware timer in ESP32. The timer calls onTimer
 function every second. The timer can be stopped with button attached to PIN 0
 (IO0).

 This example code is in the public domain.
 */

// Stop button is attached to PIN 0 (IO0)
#define BTN_STOP_ALARM 0

hw_timer_t *timer = NULL;
volatile SemaphoreHandle_t timerSemaphore;
portMUX_TYPE timerMux = portMUX_INITIALIZER_UNLOCKED;

volatile uint32_t isrCounter = 0;
volatile uint32_t lastIsrAt = 0;

void ARDUINO_ISR_ATTR onTimer() {
  // Increment the counter and set the time of ISR
  portENTER_CRITICAL_ISR(&timerMux);
  isrCounter = isrCounter + 1;
  lastIsrAt = millis();
  portEXIT_CRITICAL_ISR(&timerMux);
  // Give a semaphore that we can check in the loop
  xSemaphoreGiveFromISR(timerSemaphore, NULL);
  // It is safe to use digitalRead/Write here if you want to toggle an output
}

void setup() {
  Serial.begin(115200);

  // Set BTN_STOP_ALARM to input mode
  pinMode(BTN_STOP_ALARM, INPUT_PULLUP);

  // Create semaphore to inform us when the timer has fired
  timerSemaphore = xSemaphoreCreateBinary();

  // Set timer frequency to 1Mhz
  timer = timerBegin(1000000);

  // Attach onTimer function to our timer.
  timerAttachInterrupt(timer, &onTimer);

  // Set alarm to call onTimer function every second (value in microseconds).
  // Repeat the alarm (third parameter) with unlimited count = 0 (fourth parameter).
  timerAlarm(timer, 1000000, true, 0);
}

void loop() {
  // If Timer has fired
  if (xSemaphoreTake(timerSemaphore, 0) == pdTRUE) {
    uint32_t isrCount = 0, isrTime = 0;
    // Read the interrupt count and time
    portENTER_CRITICAL(&timerMux);
    isrCount = isrCounter;
    isrTime = lastIsrAt;
    portEXIT_CRITICAL(&timerMux);
    // Print it
    Serial.print("onTimer no. ");
    Serial.print(isrCount);
    Serial.print(" at ");
    Serial.print(isrTime);
    Serial.println(" ms");
  }
  // If button is pressed
  if (digitalRead(BTN_STOP_ALARM) == LOW) {
    // If timer is still running
    if (timer) {
      // Stop and free timer
      timerEnd(timer);
      timer = NULL;
    }
  }
}

However, this indicates you are using an old verion of the ESP32 Arduino core, for example the timerBegin() signature is now simply timerBegin( frequency )

and the example above uses a semaphore to control the critical section in the loop().

I moved the port 22 HIGH/LOW from around the delay(50) in the critical section to aroud the portEXIT_CRITICAL(). It appears that the missing 50ms delay is done when portEXIT_CRITICAL() is called.

#include <Arduino.h>
portMUX_TYPE timerMux = portMUX_INITIALIZER_UNLOCKED;

void setup() {
  Serial.begin(115200);
  pinMode(22,OUTPUT);
  pinMode(23,OUTPUT);
}

void loop() {
  digitalWrite(23,!digitalRead(23));
  portENTER_CRITICAL(&timerMux);
    delay(50);
    digitalWrite(22,HIGH);
  portEXIT_CRITICAL(&timerMux);
  digitalWrite(22,LOW);
  delay(50);
}


It looks like somehow the ignored delay(50) in the critical section is being deferred until the critical section exits.

Thank you for taking the time to reply.

I agree I'm using an older version. IDK exactly how that 3.5.0 relates to the version number of the breaking upgrade of the espressif stuff. I stuck to using that version because a lot of old / large ESP32 projects won't build without me doing pointless maintenance in projects that are a few years old (they work fine, and IDK how long it would take me to fix the builds for the new version)

I'm satisfied that my use of ESP32 timers is working in my real code (which isn't mentioned in this topic). This topic was started due to my curiousity about what's going on behind the scene.