Arduino is running 10 times slower in a tight loop than ESP-idf

I posted this yesterday again,...but messed up the post, by giving too much info. Here is everything bare as good as it gets:

Arduino: ~1.000.000 while(1)++
espidf: ~10.000.000 while(1)++

I even reinstalled Vs Code and Platformio, only C++ pack added and tested different board.json. Only difference was memory so far.
Everything is default in Platformio.
I tested many different build flags, but nothing changes it really dramatic, maybe 10-20%.

I read many threads, about that topic, but nothing helped in my case.
Whats going on?

patformio.ini

[env:esp32-s3-devkitc1-n16r8]
platform = espressif32
board = esp32-s3-devkitc1-n16r8
framework = arduino
monitor_speed = 115200

arduino:

#include <esp_timer.h>  
#include "driver/uart.h"


unsigned long long lastResetTime = 0;
unsigned long performanceCounter = 0;

void setup() {
  uart_set_baudrate(1, 115200);  
  lastResetTime = esp_timer_get_time();
}

void loop() {
  while (1)
  {  
    performanceCounter++;
    unsigned long long currentTime = esp_timer_get_time();

    if (currentTime - lastResetTime >= 1000000) {
       printf("Performance score: %lu\n", performanceCounter);
        performanceCounter = 0;
        lastResetTime = currentTime;
    }   /* code */
  }
}

I give you the code of espidf also,....maybe it cheated somehow..lol
But I confirmed it's counting +1 and one second is one second.
Not sure what to try else.... :see_no_evil:

espidf:

    lastResetTime = xTaskGetTickCount() * portTICK_PERIOD_MS;

    unsigned long performanceCounter = 0;
    while (true) {
        // Increment the performance counter
        performanceCounter++;
        // Check if one second has passed
        if (xTaskGetTickCount() * portTICK_PERIOD_MS - lastResetTime >= 1000) {
            // Log the performance score and reset the counter
            ESP_LOGI(TAG, "Performance score: %lu|", performanceCounter);
            performanceCounter = 0;  // Reset the counter
            lastResetTime = xTaskGetTickCount() * portTICK_PERIOD_MS;  // Update the reset time
        }
    }

Arduino is running 10 times slower in a tight loop then ESP-idf

Have you considered that the two processors might be running at different clock speeds? Or that they have different architectures? The tools you're using might also be generating non-comparable code. It's a bit like comparing apples to oranges.

You can't compare those two sketches.

What have Espressif said ?

Don't they they produce the ESP-idf app and the Arduino core for ESP32 ?

1 Like

He is comparing apples to oranges and thinks the IDE has some influence which it does not.

Hallo Schulz.
Ok this is my first week with C++, esp32 and platformio.
Yes I have used 240mhz in both examples.
Also tested different builts in platformio, and flags.
Why is the code not comparable??
I was just trying to make the simplest code, that outputs after one second in both.

two loops that count++ is like apples and oranges?
And yes my IDE in this case platformio(VS Studio) has an influence on the built.
That's why I included the .ini.

What sketch??? What are you talking about?
The first one (Arduino) is running 100% as it is....
The 2nd espidf is not important, since the question is, why arduino is so slow?
Or my settings are wrong? but I used defaults and others have little effect, that's why I am asking here!

The ESP-IDF code you posted is incomplete.

Several experienc ed folks have gently tried to tell you the test is false. I don;t care anymore so MUTING.

As I understand it, you're referring to Arduino (AVR assumed) and ESP-IDF, which is Espressif's official IoT Development Framework for the ESP-32. You might want to review the forum guidelines and provide more detail about your hardware setup. The way you’ve described it is a bit like saying the red car is faster than the blue one without any context!

1 Like

and??? The idf is NOT the problem!! so difficult?

Simpler: I can count really fast in esoidf, but arduino can't!
And the logic that is crucial, was posted by me.

but here you go.

#include <stdio.h>
#include "esp_log.h"
#include "esp_pm.h"
#include "esp_system.h"
#include "driver/uart.h"
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"

static const char *TAG = "Performance Test";

void app_main() {

    vTaskDelay(pdMS_TO_TICKS(1000));  // Delay for 1 second
        
    
    unsigned long lastResetTime = 0;

    // Get the initial time
    lastResetTime = xTaskGetTickCount() * portTICK_PERIOD_MS;

    unsigned long performanceCounter = 0;
    while (true) {
        // Increment the performance counter
        performanceCounter++;
        // Check if one second has passed
        if (xTaskGetTickCount() * portTICK_PERIOD_MS - lastResetTime >= 1000) {
            // Log the performance score and reset the counter
            ESP_LOGI(TAG, "Performance score: %lu|", performanceCounter);
            performanceCounter = 0;  // Reset the counter
            lastResetTime = xTaskGetTickCount() * portTICK_PERIOD_MS;  // Update the reset time
        }
    }
}

Well I am new to this stuff, and have posted all infos.
My hardware is in the .ini esp32-s3-devkit

Platformio does all in the background, I just select in the .ini
framework = arduino/espidf

Everything is default. Don't even know what you wanna know. What file do you want to see?
And there seems to be no magic flags, that give me back 10x.
And why should that test be invalid?? without explaining anything.
people seem really toxic here

I just managed to make my classes work in espidf with C++.
I thought maybe C++ framework is slowing down so much. But no change.

What do you mean with this? You are not measuring the increments.

What you are measuring is mainly the relative duration of the functions esp_timer_get_time() and xTaskGetTickCount() inside the loop.
Why should it be the same? Check the implementations or better the asm generated.

The main "meat" of your loops is a call to the timer function, every iteration. (I'll assume that any function call is more expensive than some simple math.)

But you're using DIFFERENT timer functions. esp_timer_get_time() in the Arduino code, and xTaskGetTickCount() in the IDF code.

And guess what? esp_timer_get_time() is about 10x more code than xTaskGetTickCount() !

try using xTaskGetTickCount() in the Arduino code (yes, it is available. That's how I got the listings here, not having an IDF setup installed.)

4037b41c <xTaskGetTickCount>:
4037b41c:	004136        	entry	a1, 32
4037b41f:	e63521        	l32r	a2, 40374cf4 <_iram_text_start+0x8f0> (3fc95c5c <xTickCount>)
4037b422:	0020c0        	memw
4037b425:	0228      	l32i.n	a2, a2, 0
4037b427:	f01d      	retw.n

(Good for FreeRTOS, to have such a nice and simple function!)

40377ca8 <esp_timer_get_time>:
40377ca8:	004136        	entry	a1, 32
40377cab:	f310a1        	l32r	a10, 403748ec <_iram_text_start+0x4e8> (3fc95be8 <systimer_hal>)
40377cae:	0b0c      	movi.n	a11, 0
40377cb0:	0652e5        	call8	4037e1e0 <systimer_hal_get_counter_value>
40377cb3:	012b40        	slli	a2, a11, 28
40377cb6:	41a4a0        	srli	a10, a10, 4
40377cb9:	2022a0        	or	a2, a2, a10
40377cbc:	4134b0        	srli	a3, a11, 4
40377cbf:	f01d      	retw.n

4037e1e0 <systimer_hal_get_counter_value>:
4037e1e0:	006136        	entry	a1, 48
4037e1e3:	0298      	l32i.n	a9, a2, 0
4037e1e5:	1183e0        	slli	a8, a3, 2
4037e1e8:	898a      	add.n	a8, a9, a8
4037e1ea:	0020c0        	memw
4037e1ed:	012822        	l32i	a2, a8, 4
4037e1f0:	d887a1        	l32r	a10, 4037440c <_iram_text_start+0x8> (40000000 <_heap_end>)
4037e1f3:	2022a0        	or	a2, a2, a10
4037e1f6:	0020c0        	memw
4037e1f9:	016822        	s32i	a2, a8, 4
4037e1fc:	0020c0        	memw
4037e1ff:	1828      	l32i.n	a2, a8, 4
4037e201:	f772d7        	bbci	a2, 29, 4037e1fc <systimer_hal_get_counter_value+0x1c>
4037e204:	1183d0        	slli	a8, a3, 3
4037e207:	898a      	add.n	a8, a9, a8
4037e209:	338b      	addi.n	a3, a3, 8
4037e20b:	1133d0        	slli	a3, a3, 3
4037e20e:	0020c0        	memw
4037e211:	112822        	l32i	a2, a8, 68
4037e214:	db55b1        	l32r	a11, 40374f68 <_iram_text_start+0xb64> (fffff <UserFrameTotalSize+0xffeff>)
4037e217:	993a      	add.n	a9, a9, a3
4037e219:	0020c0        	memw
4037e21c:	0938      	l32i.n	a3, a9, 0
4037e21e:	0020c0        	memw
4037e221:	1128a2        	l32i	a10, a8, 68
4037e224:	1033b0        	and	a3, a3, a11
4037e227:	0192a7        	bne	a2, a10, 4037e22c <systimer_hal_get_counter_value+0x4c>
4037e22a:	f01d      	retw.n
4037e22c:	0a2d      	mov.n	a2, a10
4037e22e:	fff9c6        	j	4037e219 <systimer_hal_get_counter_value+0x39>

Edit: so it looks like the main functional difference (aside from the extra level of functions) is that xTaskGetTickCount() returns a 32bit number whose load is assumed atomic on the 32bit CPU, while esp_timer_get_time() returns a 64bit value and goes to some pain to ensure atomicity (and you're doing 64bit math with it, so that's good, I guess. But it adds to the "apples vs oranges" nature of the comparison that people have complained about.)

Edit2: A couple notes:

  1. With a prefix of 0x4037, both (sets of) functions are in on-chip (fast) RAM, so one of my suspicions (that one was in slower QSPI flash) was unfounded.
  2. I am not familiar with xtensa assembly language, and have no idea exactly what memw or entry actually do, and I've only "educated guesses" about some of the other opcodes. But it's not necessary to be fluent in the machine architecture to get an idea of performance by counting instructions...
2 Likes

Per @westfw's points, the below code reports > 11,000,000 iterations per second. I'm thinking this is a case where OP knows just enough to be dangerous.

#include "Arduino.h"

void perfTask(void *pvParameters);

void setup() {
	Serial.begin(115200);
	delay(1000);
	BaseType_t returnCode {xTaskCreatePinnedToCore(perfTask, "Perf Task", 1900, NULL, 3, NULL, CONFIG_ARDUINO_RUNNING_CORE)};
	assert(returnCode == pdTRUE && "Failed to create Perf Task");

}

void loop() {
	vTaskDelete(NULL);
}

void perfTask(void *pvParameters) {
	vTaskDelay(1000);
	TickType_t lastTimeStamp {xTaskGetTickCount()};
	uint32_t counter {0};
	while (1) {
		TickType_t currentTimeStamp {xTaskGetTickCount()};
		counter++;
		if (currentTimeStamp - lastTimeStamp >= 1000) {
			log_i("%lu", counter);
			counter = 0;
			lastTimeStamp = currentTimeStamp;
		}
	}
}

Wow that's a great answer!! Thank you so much, I thought I go crazy, and thought noone is willing to explain/help.

I had the same thing in mind myself, and wanted to restructure, so that I count to x and measure the time at start and end.

The reason for my stupid mistake, was I know millis() is from arduino and is slow, thats why I used esp_timer_get_time().
I thought this was the native espidf method.
Never dared to try xTaskGetTickCount() directly.

And the result was , I actually got 30% more with arduino than espidf.
Both with platformios default settings, but I did set it to 240mhz as well.
I learned for one week nonstop(Did nothing else), guess I need a brake...lol. Should have not missed that fact.

Normally my plan was to learn the Arduino style with RTOS superloop first(went pretty well so far) and then ESP-idf FreeRTOS, just to see the difference. I structured it that way, that a switch would be not too diff.
I use classes in my project, since I need to learn them a bit.(Reason for C++)
Now everything is neatly organized, but just found out, that C++ is NOT supported natively in espidf.
I made classes work in espidf, but comes with lot of headaches, mostly the "casting" conversion or type thing. Can make it work, but a lot of extra stuff typing, + the extra stuff from espidf itself.
So my plan to try out both when I have more stuff added(eg. Display), will probably NOT work, since destroying my classes and converting to structures, will be too complicated, besides the other stuff.

Guess I have to decide, what I will use.
Arduino of course is easier, but I am not sure, with 5 sensors, 3.5inch TOUCHdisplay, bluetooth, and maybe esphome, if the performance with a superloop is sufficient.
Also I would like my display as fluent as possible, but going RTOS and espidf, would mean no classes, or no help. AI gets super confused with espidf and c++ as well.
Changing the watchdog timer in runtime was a pain, AI could NOT do it with C++.
I managed with the documentation finally.

anyway enough said,...and thank you for your ser...äh patience :wink:

why would you say that?

here is the code:

they basically save the context, disable interrupts, make a copy of the millis count, reestablish ISR and return the value...

Yes I get a little bit more than 11 million with my code as well. When you know how it's pretty easy :wink:

This test, gave me two main insights, "redeclaring" a variable inside a tight loop is better for the compiler, and not only is millis() slow, but my found alternative also.

Thank you for the info. I suspected "safety" features responsible for it too.
Thats why in my last answer, I already typed millis() is bad,..and changed it to slow.
But I guess this safety feature is NOT inside TaskGetTickCount(), and would mean, that it could give the wrong stats??
Maybe extremely seldom, but could be very bad, if it happens in running systems.
Would be interesting how the value could get affected? only a little?, always alot?, when it happens, cause then you could implement a simple validity check, or just measure 2 times when its that fast.
Millis gets even worst ,when executed in short intervalls.