Hello there.
I am doing all that stuff since one month only, but doing heavy testing to understand all better. I heard that RTOS has overhead, and superloops can actually be faster, depending on implemantation. But I don't understand how that could be, because my testing revealed the opposite.
This example, has of course very short tasks/methods after the time checks, and the longer the tasks go, the less this effect shows, but it's always there. Especially if you add more time checks.
Let me explain:
-Every 5000ms X happens (the watchdog gets reset.)
-Every 200ms X happens(a small function executes)
-The rest of the time, is used in a tight loop to emulate work done.
(This performance counter is actually one of the most useful function I did, to spot wrong configs, bottlenecks and very interesting stuff about performance I found out, that would be difficult to spot with manual time measurement. And as long as I program, it stays on, and displays the score every second. I can check performance as I write stuff, without adding manual checks, even though I have perfect functions that do that as well)
(But just a sidenote)
// Small periodic task
void task_smallTest(void *pvParameters) {
printf("%s created on core: %d\n", __FUNCTION__, xPortGetCoreID());
int32_t smallTask_lastResetTime = xTaskGetTickCount()* portTICK_PERIOD_MS; // Initialize smallTask_lastResetTime
MyESP32::wdt_addTask();
while (true) {
MyESP32::wdt_reset();
// Check if 200 ms have passed since the last wake time
if (xTaskGetTickCount() * portTICK_PERIOD_MS - smallTask_lastResetTime >= 200) {
smallTask_lastResetTime = xTaskGetTickCount()* portTICK_PERIOD_MS; // Update smallTask_lastResetTime
printf(".");
}
vTaskDelay(pdMS_TO_TICKS(1)); // Add a small delay to prevent busy-waiting
}
}
Now when I try to do the same in a superloop, I have only 25% the performance. 3 Million!
I use xTaskGetTickCount() * portTICK_PERIOD_MS, because it's actual faster than esp_timer_get_time=arduino millis().
But the problem here is, that the timechecks, are actually taking up a huge percentage, because the single tasks are very small. Of course you can do the timecheck once, and save it in variable, but then it gets unprecise, especially when tasks get bigger. Also it gives only 7million.
But at the same time, it showed, that RTOS can do these timechecks from one tick to the other, without taking up any CPU power as it seems. I have of course pinned all to one core, to make it fair.
This is pretty insightful, if there are not any other methods, do to that with a superloop.
That means a superloop can never be faster, the difference only may get smaller, if task execution gets longer.
But RTOS 3 task checks with time, is even 8% faster than just one check in main, with:
if time > 1000ms, then..
And this is when having everything inside the same scope in the superloop.
If you have scopechange, by calling functions, everything gets even worse for superloops, while RTOS has scope change integrated. That's actually crazy, if there isn't a better way for superloop.
No need to write my code, but just:
after 5000ms, do that.
after 1000ms, do that.
after 200ms, do that.
Didn't we cover this in a previous thread of yours? Rather than multiplying xTaskGetTickCount() by portTICK_PERIOD_MS at runtime, divide your period by portTICK_PERIOD_MS (which is done at compile time). Also, use TickType_t rather than int32_t for any variable that holds a value returned by xTaskGetTickCount().
And your code doesn't match your description. Your code has 1000 ms and 50 ms while your description has 5000 ms and 200 ms. Surely that alone can affect your measurements.
Also, I bet the RTOS has a much more efficient way to check for task readiness (for tasks that call vTaskDelay) than running xTaskGetTickCount() * portTICK_PERIOD_MS lots of times. In other words, most tasks will be sitting in some sort of "delay" queue until their delay time expires, and then the RTOS will start it up again.
Also, esp_timer_get_time() returns a uint64_t value in microseconds, while Arduino's millis() returns an unsigned long value in milliseconds, so the two aren't directly interchangeable.
If you're already using vTaskDelay(), use it for timing your task. Then there's no need to use xTaskGetTickCount() at all.
Edit: of course performanceCounter will always be printed out as 1, but at least you'll actually be more efficient. This is a good lesson that you have to be careful how you measure performance (make sure you're measuring the right thing).
Yes a lot faster, but that is not really the point! I am not asking to make RTOS faster, but how to achive the same in a superloop, without actually losing 80% of performance.
Is actually slower. 11,97 million vs 12.06(Because comparing it with a saved value, is slightly slower than doing a simple bitshift for the multiplication on the fly or something.
BUT also NOT the point. It'S not about making it 1-2% faster, since I know how to do it actually 300% faster, but how to do this with a superloop, without losing 80% performance.
after 5000ms, do that.
after 1000ms, do that.
after 200ms, do that.
Also the 50/200ms was a testing thing, mixed that up, but also makes "no" difference.
A function taking 2microseconds does not affect the maintask with 12 million, by running 20 or 5 times a second. but good spotting.
Yes I should use TickType_t and not uint32_t, even if they are the same on my system. TickType_t is helpful if you switch to embedded 16-bit or something, but also makes NO performance diference.
Not really, tested it all. Doing testing only since two weeks.
xTaskGetTickCount IS RTOS!!
In the background, RTOS is actually doing similar stuff, if not the same down the line.
No difference.
I actually used this method in main btw. (Was quiet the eureka, after I found it out by myself, to use that in my tasks. Got me from 6mill to 12 million, for two tasks. and as said, xTaskGetTickCount() has the same speed.
DELAY(5000) = vTaskDelay
but christop, this is just small details, I am just wondering how the superloop guys do this.
Sinec any timecheck, does cost performance, and the smaller the "functions", the bigger the effect of the time checks. But I heard people claim, superloops can be faster, I wonder how that can be?
I have 3 timechecks only, what happens if their are 20-30? that eats a huge amount of performance. And after my first display test with 480x320, I can use every performance gain.
I am not even using arduino, but esp-idf.
But i remember millis() goes straight to esp_timer_get_time(). My other time measurement functions, actually use uint32_t, despite esp_timer_get_time beeing uint64_t. If I never reach that high numbers, uint32_t is enough. Since I am not into long runs on my esp32 yet. Maybe I need uint64_t then. But also not important, but thx for the hint.
As I said, I did that in main, and also several times for testing. But it is slighly slower.
Maybe function call overhead?
this one is actually a tiny bit faster:
vTaskDelay(DELAYtime / portTICK_PERIOD_MS)
Btw your aduino numbers would be actually a bit faster, since arduino has 1000hz tickrate, and I still use esp-idf default 100hz. 1000hz gives me actually 10% performance on top.
And as you pointed out, vTaskDelay does not work in the performance task. And no it is NOT more efficient. And not sure what you mean by "testing the right thing".
It is just a number, that shows me, how much processing power is left, after all the stuff is done, that needs to be done. And it's independent of libraries or even framework Arduino/ESP-idf, since both have EXACTLY the same values, when both have 1000hz.
But still the question remains, how to do this in a superloop without losing 80%:
after 5000ms, do that.
after 1000ms, do that.
after 200ms, do that.
How do you know it's not more efficient? Doing it that way allows RTOS to handle everything, including task delays, which as your tests seem to show are more efficient than calling xTaskGetTickCount() over and over again. I imagine RTOS has a queue of tasks to run (a run queue), and when all are doing a vTaskDelay the run queue is empty so there's nothing for it to do. And there's probably a timer list which is checked once per tick. This list can be designed to store delta times so that only the first timer needs to be decremented and checked for zero. Once the first timer is removed (and the task it's for is moved to the run queue), the next one can be checked for zero, and so on. That's the way UNIX v6 works (from the mid-'70s), and it'd be difficult to get more efficient than that.
And what I mean by "testing the right thing" is that your tests for "performance" are measuring something that isn't exactly performance. Your tasks do almost nothing, so the program is almost entirely overhead. If you can make your tasks actually do something, performance will actually be measurable. And if your tasks actually do something, you're going to get a lot fewer than 12 million or 3 million iterations of work per second, and I imagine the overhead of either method is going to be mostly swallowed up by the actual work.
But one thing I would try in the "superloop" version is call xTaskGetTickCount() only once per loop and save its value in a TickType_t variable. That might be enough to give you a significant speedup (any little reduction in overhead, which is already quite small, should give you a significant boost in "performance").
Also, if the RTOS is faster, just use it? Personally I prefer to avoid superloop-type code, just because all my software engineering training has been in paradigms where blocking is reasonably efficient. FreeRTOS claims something around 84 cycles per context switch. That's 1/1000th of the total capacity of, say, an RP2040 Cortex M0+ core, doing 2000 context switches a second. Seems like nothing much to worry about.
Hello Christop.
I did not forget you. This was the first time, that RL forced me to take a brake from programming the whole day, for the first time since I started. It was much needed..lol.
Well I told you, that it is a bit slower. 10% in fact.
This is true for xTaskDelay and xTaskDelayUntil(same performance), because when you look under the hood, it also uses xTaskGetTickCount(), but with more overhead:
"xLastWakeTime = xTaskGetTickCount ();"
So this is really the fastest: (testet with shortest 1ms interval)
Multiplication is the fastest, even precalculation is not faster:
const int ticksThreshold = 5000 / portTICK_PERIOD_MS; // Calculate ticks threshold
if (xTaskGetTickCount() - smallTask_lastResetTime >= ticksThreshold) {
Sounds reasonable. I had a similar idea for a superloop, but much more basic:
You order your superloop from fasted to slowest(A-Z). Then when you know the runtime of each, you could put the If checks behind the first. B behind A. C behind B. But this gets quickly very messy, and does not work when other conditions then passed_time are used.
The above is also true for your suggestion. You would need to know the runtimes of each "task", and precision gets horrible, with variying timings for all your checks, if runtimes in the "tasks" vary.
Well it does exactly that, measuring the performance of the timing checks. And exactly that "overhead" was what I wanted to test. And yes I already mentioned it two times, that things will get less pronounced when more work is involved.
Well christop,..that is exactly what I am curious about now too, and would have been my next step anyway.
So I will let you take part in my testing:
For a more "reallife scenario", lets try this:
Watchdog reset every 5000ms
performance counter every 1000ms
big_task every 50ms. (printAlignedArrays) (several hundred lines of code, several functions involved) (Gets printed two times, with changed fantasy values)
small task every 1ms.(converting a long string to DotString) (1000 times for loop every 1
ms! = 1 million times per second)
string_toDotString example: 857364568920104826735 = 857.364.568.920.104.826.735
(a small function in C, that I made to a personal challenge, by not using if statements in the core of the function, and only one creation of a string allowed(= result). Took me pretty long to find a solution, but is lightning fast, without even using pointers or references) Al could only beat it by 10%, with heavy use of pointers.
printAlignedArrays: (Explanation at the end, for the ones who want to know)
(Takes two vector/arrays as input only) You can mix all types into one list, rest works automatically.
Would you agree, that this is more of a realistic workload?
I am no RTOS fanboy, I am unbiased, I did that all to find out how much performance "overhead" RTOS brings. Memory on my esp32-s3-N16R8 is not a problem.
But this is already an extreme example, with only 4 time checks. We both now, that this list is normaly growing. I need alone two time checks, for my non-blocking sensor readings, and then they are instantly(1 tick) made. So these two timechecks, would also be mostly "overhead" in a superloop, as you said.
But ok,....
Ok enough explanation, here the RESULTS:
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
RTOS code:
#include ".core/_extern/CoreCutils.h"
#include ".core/_extern/CoreCPPutils.h"
#include "driver/uart.h"
//------------------------------------------- My Includes -------------------------------------------
#include ".core/formating/variant_types.h"
#include ".core/formating/formating.h"
#include ".core/formating/variant_types.h"
#include ".core/config.h"
#include ".core/colors.h"
#include ".core/logging/logging.h"
#include ".core/logging/output.h"
#include ".core/logging/TIME_MACROS.h"
#include ".core/myESP32/myESP32.h"
#include ".core/myESP32/memory.h"
#include "printTest.h"
// TaskHandle_t task_performanceHandle = NULL;
// Performance counter task
void task_perf(void *pvParameters) {
size_t frequency = 1000;
LOG_INFO("%s created on core: %d | every: %lu ms \n", __FUNCTION__, xPortGetCoreID(), frequency);
uint32_t performance_lastResetTime = 0;
performance_lastResetTime = xTaskGetTickCount()* portTICK_PERIOD_MS;
uint32_t performanceCounter = 0;
MyESP32::wdt_addTask();
while (true) {
performanceCounter++;
if (xTaskGetTickCount()* portTICK_PERIOD_MS - performance_lastResetTime >= frequency) {
LOG_INFO("core: %d | RTOS Performance score: %s|\n", xPortGetCoreID(), Formating::integer_toDotString(performanceCounter).c_str());
performanceCounter = 0;
performance_lastResetTime = xTaskGetTickCount()* portTICK_PERIOD_MS;
MyESP32::wdt_reset();
vTaskDelay(pdMS_TO_TICKS(1));
}
}
}
// bigger periodic task
void task_biggerTest(void *pvParameters) {
size_t frequency = 50;
LOG_INFO("%s created on core: %d | every: %lu ms\n", __FUNCTION__, xPortGetCoreID(), frequency);
uint32_t smallTask_lastResetTime = xTaskGetTickCount()* portTICK_PERIOD_MS; // Initialize smallTask_lastResetTime
MyESP32::wdt_addTask();
while (true) {
// MyESP32::wdt_reset();
// Check if 200 ms have passed since the last wake time
if (xTaskGetTickCount() * portTICK_PERIOD_MS - smallTask_lastResetTime >= frequency) {
smallTask_lastResetTime = xTaskGetTickCount()* portTICK_PERIOD_MS; // Update smallTask_lastResetTime
printArrayTest();
}
}
}
// Small periodic task
void task_smallTest(void *pvParameters) {
size_t frequency = 1;
LOG_INFO("%s created on core: %d | every: %lu ms \n", __FUNCTION__, xPortGetCoreID(), frequency);
uint32_t smallTask_lastResetTime = xTaskGetTickCount()* portTICK_PERIOD_MS; // Initialize smallTask_lastResetTime
MyESP32::wdt_addTask();
while (true) {
// MyESP32::wdt_reset();
if (xTaskGetTickCount()* portTICK_PERIOD_MS - smallTask_lastResetTime >= frequency) {
smallTask_lastResetTime = xTaskGetTickCount()* portTICK_PERIOD_MS; // Update smallTask_lastResetTime
for (size_t i = 0; i < 1000 ; i++) {
const char* test = "7864507604586ß3850297q3456403534ß85634q860345878455q34wertr092345";
std::string result= Formating::string_toDotString(test,3);
}
// LOG_RAW(".");
}
// vTaskDelay(pdMS_TO_TICKS(1)); // Add a small delay to prevent busy-waiting
}
}
extern "C" void app_main() {
uart_set_baudrate(UART_NUM_0, 115200);
LOG_INFO_MENU("STARTING MAIN ......\n");
LOG_INFO("...\n");
MyESP32::init();
// Create Task_smallTest ----------
xTaskCreatePinnedToCore(task_smallTest, "task_smallTest", 2048, NULL, 1, NULL, 0);
// Create task_biggerTest ---------
xTaskCreatePinnedToCore(task_biggerTest, "task_biggerTest", 4096, NULL, 1, NULL, 0);
// Create task_performanceCounter -------------------
xTaskCreatePinnedToCore(task_perf, "task_smallTest", 2048, NULL, 1, NULL, 0);
vTaskPrioritySet(NULL, 9);
while (true) {
DELAY(5000);
MyESP32::wdt_reset();
LOG_INFO_MENU("core: %d | RESET main\n", xPortGetCoreID());
}
}
RTOS:
So originally 100% is 12 million.
That means I have about 30% free System/CPU power left.
Ok not bad.
I was really curious about the performance of the superloop, maybe have a guess yourself.
Here the code:
Wow
That means I have nearly no performance left with the superloop.
Or with other words: With RTOS I have 42x times the performance left vs the superloop.
And this example heavily favours the superloop, because I am maxed out, and have only 4 timechecks used. And we both know, that is not much.
At the very beginning, I expected RTOS to be slower, but this was very enlightning.
There is no way a superloop is faster or even close to RTOS.
Not with the tools I dioscovered so far, maybe there is a better way, but for sure not simpler than RTOS, because you need to use checks, which RTOS can do nearly for "free".
And for a 480x320 display with touch, this could mean the difference between feeling laggy or smooth in regards to fps.
And I can't help myself, but this is somehow really fun for me, making code as fast as possible. I have written all my functions in C and C++, and tested every little detail, and learned a lot. Also in the beginning, I made my program work in Arduino AND esp-idf. (same program, not two). At first I was pretty lost with all of it. Also what's C and C++. Classes and all that. Not to meantion the hardware. Needed 30+ hourse to make my display show anything. Not even 5 different Al could help me. In the end it was the way, chinese manuals label their graphics.(I don't wanna speak about it )
But big thanks to all the Als anyway, could have not learned so fast without them.
But one rule: Only use code, if I understand it 100%, if not, I dessect and test until I do.
Now I was tackling just the question of RTOS overhead...
I know I am a bit crazy but I know I am not the only one.
Here my cool little function, for the one person, that is left after that much text.
printAlignedArrays:
My biggest function, of all the helpers I did so far, it's pretty cool:
It takes as input only 2 vectors/arrays. One string for the tags, another vector of type for the values.
Yes you can mix float, and string, and bool, and ALL of C and C++ integer types in one vector/array.
You can have presets if you want, like memory or other stuff you are interested, and use it as starting point for debugging, and you simply add the values you want to see as well with: push_back(); the performanceCounter will be one of the values.(This was what I was originally going to do, until I got "sidetracked", if to use superloop or RTOS again.
This will be used for all my esp32 programming.
So just two lists basically. Colorcoding is automatic.
The function saves the last values per their "TagName", and prints the deltas color coded with their deltas, if they exist.
If the values are the same, it get's "grayed out". That's why the first time it's printed, it's only one line, and all grey.
If higher = green, if lower = red.
true false have their own colors. Strings are coded blue, if they have changed since the last update. You can show the list by simply calling it, either by placing in the code, or by time interval. The spacing is done automatically, and centered, but still everything aligned underneath each other. With all the different outputs and formats, not so easy. But the result is pretty good.
So it's really really easy to spot changes at one glance, and you can pack a lot of values.
I even use this for logging to file.
Did it originally for my SHT31 driver, to print registers, and it ended up getting bigger and bigger..lol.
Was quiet a challenge to work with variant type the first time. If you ever tried it, you know why it's a few hundred lines long and several functions. You cannot even do simple things like: if (value != 0)
I am thinking on extending a min/max line, but will rest it for now, because for printing a simple register it's already overkill...
No seriously, I think it's even helpful, if you wanna just print a few variables, why not print them ordered like that? maybe add default values.
Was a good exercise, learned a lot.
And very likely I won't really need it, since I bought an esp-prog anyway....
maybe I do.
Hello jaguailar. Your post is very interesting. You know I started all of this only a few weeks ago, and could have installed(portable), learned and written everything without the display in a few days. In fact I did all the parts I needed already separately, but then just wanted to structure everything from the ground, learn classes, and make simple tools like my own Logger with several filters and precompile filtering. And it led me into a several weeks journey of learning and building only my groundwork, for all the projects, like my basic toolkit.(You probably know what I mean) And it's already pretty complex, not to mention the several classes I already have for the project itself. But I want to intentionally structure it more professionally to learn the most, and not just hack everything together with globals.
So 100% modular, and way overengineered.
I intenionally set everything up, to be also used for RTOS in case I want it. But I tend to fall into every rabbithole anyway, but was trying somehow, to avoid RTOS for now at least, not to get overwhelmed. But I guess I have no other choice after these numbers.
But maybe you can understand, why I don't just use it.
Also this was the test, to find that out, since I did not expected RTOS to be that fast. 84 cycles seems pretty fast, also RP2040 is slower than a esp32, and your numbers confirm my for me shocking results.
And I was not even talking about context switching with christop(was already long enough). But I realized it myself. I tested impact of context switching right at the start, it's crazy how much it eats, and RTOS seem to make it on the fly. Pretty sure I can even improve the RTOS version even more. I made a nice time measurement class, and have not used it, but I already suspect/know where my performance is going. the 1 million times context switching of my task_smallTest. Pretty sure I can avoid the context switching, by somehow making the function it's own task, and let RTOS handle the context switching. Sure I pimp the shit out of it pretty good, and maybe learn automatic task creation/deletion as well, and it's speed/overhead.
And not to mention interupts,...pretty sure it handles them as fast,... damn RTOS is really crazy good. But then I still could not really start with my project, because that would involve more learning first. Because if RTOS, then I have to go full RTOS. With message ques and stuff. Or maybe not? Thank good my isolated approach, has only little race scenarios, but still unsure how to do it all. With a superloop, I already know how to do everything, piece of cake, just logic, but RTOS will have a few surprises I am sure....lol. And my overall structure and dataflow is clear to me, not so much with RTOS.
example, if you are interested?
hmm ok:
Growcontroller, 5 sensors, pwm, Display, touch, .... for now.
All modules are isolated, get data, no questions asked, process and send it on.
All data goes over datamanager. Single source of truth. Datamanager has no reason to talk back to sensormanager class, or even the sensor classes themself. So normally dataflow is pretty straight forward. But every module can log internal state via macros>logging class>output class. All other data gets collected by datamanager or (pushed, not sure yet) and the send to climatecontroller, display when needed, and saved(in a database) via formating > output.
So ín a superloop I was going for a pull approach via getter.
superloop:
Every 2 seconds Datamanager calls function in sensormanager, sensormanager then calls all sensors, prefilters them, handles sensor in general, and sends it to datamanager.
It ensures, datamanager has fresh data, and immediately can process it, and send to display, when an interupt for example is requesting it.
The intention was originally to do the superloop in datamanger, beeing modular and isolated, but datamanager can be in a tight scope with the display, that eats all the performance anyway.
But given my results with RTOS, it is not really needed to do it like that. I am not even sure I CAN do it like that. Maybe better to do the pull aproach. But before was pretty straight forward, since datamanager does not to talk back to sensormanager. Datamanger need instances for sure, but sensormanager>datamanger could have even been via a static instance. But with RTOS I cannot use a getter in sensormanager(who gets it from sensors) to get it right? I have to use the message ques now right? Not sure how all the instances play together with RTOS. Also the timing is not perfect now. Should datamanager collect the data, or should sensormanager now push it periodically to datamanager? But when datamanger is not requesting it right before usage, and then data can be in worst case 1999ms old. My overall gameplan was pretty solid, but now with RTOS I feel a bit lost again, how to structure it, and how the dataflow should be.
An RTOS is not quite as friendly to blocking code as, say, golang, because unlike a general purpose computer, we don't have essentially infinite RAM for stacks. So there is still going to be a little more in the way of orchestration of data than there might be in a desktop program.
That being said, I do think using an RTOS is the right call for any project that doesn't need constant baremetal levels of timing, and whose intended duration/level of complexity is more than a few days worth of work. But I'm saying that as a complete nonexpert. I just want to be clear about how much authority I have to speak (zero).
With regards to collecting data from a group of sensors, which needs to get rendered to a display: I know next to nothing about your use case and its tolerances, but here's the code I wrote for a weatherstation I have in my backyard. Maybe it would be helpful? Or maybe not. It's the blind leading the blind here.
One other thing I might add -- don't let yourself get too hung up on "doing it right." The best way to learn to do it right is to do it wrong, then rewrite it when you see how you screwed it up.
Thanks for your encouraging words. Does help. And it was the first time, I've seen code of someone else. (Except basic blink examples and stuff)
The weatherstation did not really help. It was only A-B if I understand, but my problem is more complex. Not sure if pull or push, and how to structure the dataflow with datamanager module. With superloop I had a perfect gameplan, but now cloud again, since I have to use RTOS now. But Ok, I guess I just need to start. I have debattet pages over pages with chatgpt, how to use RTOS in the different ways, and how to structure the overall data. But I planned for a custom hybrid approach, because the original RTOS had to much overhead, or so I thought, by using ques for all, to transfer data from A over B to C.
But anyway, took 3 days break from programming, and it was needed.
Oh hello jim-p. I remember, you helped my discover to read my chinese graphics the right way..lol. (I did mention it here)
No, I went full esp-idf( normal RTOS). At the start I made the hussle to make the program work in both arduino and esp-idf at the same time.(one program)
But that was too tedious.
Since I like to program everything myself and not use other tools, like making my own logger, and measurement tools, which are much better for me than the esp_log one, because I can control formating better, and still got the functionality like precompile filtering, and global levels + whitelist for modules, so I went esp-idf. Was just naturally.
Not sure, how the dataflow should be managed. I explained it in post #9 a bit.
But If you really wanna give a suggestion, I could explain in more detail if you want.
Well the 3 tasks tests are fully posted. RTOS and Superloop.
just change LOG_INFO with printf, and watchdog reset, with the actual one: esp_task_wdt_reset()
DELAY(5000) = vTaskDelay(5000 / portTICK_PERIOD_MS)
and all the includes.
And put your own tasks in it.
But my whole Basic esp32 default project file, without the growcontroller stuff is already 20 files. Or you really want all?
I may make it public, when finished, to sure where. Maybe I need to start looking into github too.
I'm only asking for the performance tests (the task-based version and the superloop version). As they are, they use code that's not posted or is posted separately (MyESP32, Formating, and the like), so it's not that easy to collect it all to play around with the same code that you're using to test performance. I have a few ideas to try, but it's not worth it if I have to figure out what all I need just to make your original code build.
Here's what I'd suggest, based on your code in post #1 (though I noticed your code uses last_wdt_reset in two different "tasks" which results in the second one not being used, and the watchdog won't ever be reset). This technique won't accumulate errors in timing (unless tasks take longer than their periods). It'd be interesting to see if this improves the performance measurement:
I have not tried or even read your suggestion, was busy by making you a working version.
Not pretty but is working.
This is for esp-idf! If you use arduino, you have to change the watchdog reset stuff mostly.
You can comment out all tasks you don't need here:
When no printing is involved, superloop gets 4-5 times faster.
Seems serial-print is blocking RTOS.
But RTOS still 8x faster, in a heavily favored superloop design.(Only 4 timechecks)
reality it will be much more, the more timechecks, and/or interrupts.
If you are not getting 12million on a esp32-s3, then add this in platformio.ini:
build_flags =
-DCORE_DEBUG_LEVEL=0
Or in sdkconfig, deactivate all debug stuff.
God I don'T even know if you can use esp-idf....but here you go..tested it.
They are both versions:
RTOS, and for the superloop, comment out app_main(), and uncomment the next app_main() underneath.
#include "driver/uart.h"
#ifdef __cplusplus
#include <variant>
#include <string>
#include <vector>
#include <cstdlib>
#include <ctime>
#endif
#include <stdio.h> // For fprintf and printf
#include <string.h> // For strerror and strrchr
#include <errno.h> // For errno
#include <stdint.h> // For int64_t
#include "esp_task_wdt.h"
#include "esp_pm.h" // Include ESP-IDF power management API
//------------------------------------------- My Includes -------------------------------------------
#ifdef __cplusplus
#include <string> // For std::string
#include <cstring> // For strcmp
#include "freertos/FreeRTOS.h"
#include "freertos/task.h" // For vTask_DELAY and xTaskGetTickCount
#endif
std::string string_toDotString(const char* string, size_t spacing) {
// Temporary buffer to hold the formatted string
size_t len = strlen(string);
size_t segments = len / spacing;
size_t mod = (len % spacing) ;
size_t s = 1;
char buffer[len + segments+1];
if (!mod == 0) {
for (size_t pre = 0; pre < mod;pre++) {
buffer[pre] = string[pre];
}
buffer[ mod ]= '.';
} else{
s=0;
mod--;
len--;
}
for (size_t j = mod+1; j < len + segments ; j+= spacing) {
for (size_t i = 0; i < spacing; i++) {
buffer[j+i] = string[j+i-s];
}
j++;
s++;
buffer[ j + spacing -1]= '.';
}
buffer[len + segments] = '\0';
if (buffer[0] == '-' && buffer[1] == '.') {
buffer[len + segments] = '\0';
// Shift characters to the left by one position starting from index 2
for (int i = 1; buffer[i + 1] != '\0'; ++i) {
buffer[i] = buffer[i + 1];
}
// null-terminated
buffer[strlen(buffer) - 1] = '\0';
}
return buffer;
}
// Performance counter task
void task_perf(void *pvParameters) {
size_t frequency = 1000;
printf("%s created on core: %d | every: %d ms \n", __FUNCTION__, xPortGetCoreID(), frequency);
uint32_t performance_lastResetTime = 0;
performance_lastResetTime = xTaskGetTickCount()* portTICK_PERIOD_MS;
uint32_t performanceCounter = 0;
while (true) {
performanceCounter++;
if (xTaskGetTickCount()* portTICK_PERIOD_MS - performance_lastResetTime >= frequency) {
printf("core: %d | RTOS Performance score: %s|\n", xPortGetCoreID(), string_toDotString(std::to_string(performanceCounter).c_str(),3).c_str());
performanceCounter = 0;
performance_lastResetTime = xTaskGetTickCount()* portTICK_PERIOD_MS;
vTaskDelay(pdMS_TO_TICKS(1));
}
}
}
// bigger periodic task
void task_biggerTest(void *pvParameters) {
size_t frequency = 50;
printf("%s created on core: %d | every: %d ms\n", __FUNCTION__, xPortGetCoreID(), frequency);
uint32_t smallTask_lastResetTime = xTaskGetTickCount()* portTICK_PERIOD_MS; // Initialize smallTask_lastResetTime
while (true) {
// MyESP32::wdt_reset();
// Check if 200 ms have passed since the last wake time
if (xTaskGetTickCount() * portTICK_PERIOD_MS - smallTask_lastResetTime >= frequency) {
smallTask_lastResetTime = xTaskGetTickCount()* portTICK_PERIOD_MS; // Update smallTask_lastResetTime
// printf(".");
// your big function here !!!
}
}
}
// Small periodic task
void task_smallTest(void *pvParameters) {
size_t frequency = 1;
printf("%s created on core: %d | every: %d ms \n", __FUNCTION__, xPortGetCoreID(), frequency);
uint32_t smallTask_lastResetTime = xTaskGetTickCount()* portTICK_PERIOD_MS; // Initialize smallTask_lastResetTime
while (true) {
// MyESP32::wdt_reset();
if (xTaskGetTickCount()* portTICK_PERIOD_MS - smallTask_lastResetTime >= frequency) {
smallTask_lastResetTime = xTaskGetTickCount()* portTICK_PERIOD_MS; // Update smallTask_lastResetTime
for (size_t i = 0; i < 1000 ; i++) {
const char* test = "7864507604586ß3850297q3456403534ß85634q860345878455q34wertr092345";
std::string result= string_toDotString(test,3);
}
// LOG_RAW(".");
}
}
}
void cpu_setClock()
{
esp_pm_config_t pm_config = {
.max_freq_mhz = 240, // Maximum frequency
.min_freq_mhz = 240, // Minimum frequency
.light_sleep_enable = false // Disable light sleep to keep fixed frequency
};
esp_err_t err = esp_pm_configure(&pm_config);
if (err != ESP_OK)
{
printf( "Failed to configure power management: %s\n", esp_err_to_name(err));
}
err = esp_pm_get_configuration(&pm_config);
if (err == ESP_OK)
{
printf("CPU frequency set to: %d/%d MHz\n",
pm_config.min_freq_mhz,
pm_config.max_freq_mhz
);
if (pm_config.light_sleep_enable == true) {
printf("CPU light sleep: ON\n");
}
else {
printf("CPU light sleep: OFF\n");
}
} else if (err == ESP_ERR_NOT_SUPPORTED) {
printf("Cannot set Clock Frequency: Please enable:'CONFIG_PM_ENABLE=y' in sdkconfig\n");
}
else {
printf("Failed to get power management configuration: %s\n", esp_err_to_name(err));
}
}
void wdt_setTimeout(int timeoutMs)
{
printf("Setting WDT timeout to: %d ms\n", timeoutMs);
esp_err_t err;
// Deinitialize the default WDT
err = esp_task_wdt_deinit();
if (err != ESP_OK && err != ESP_ERR_INVALID_STATE)
{
printf("Error deinitializing WDT: %s\n", esp_err_to_name(err));
return; // Exit if deinitialization fails
}
else {
printf("WDT de-initialized successfully\n");
}
// ESP-IDF-specific WDT initialization
esp_task_wdt_config_t config;
config.timeout_ms = 9000; // Set timeout to 10 seconds
config.idle_core_mask = 0; // Monitor all cores
config.trigger_panic = true; // Trigger panic on timeout
// Initialize watchdog with new configuration
err = esp_task_wdt_init(&config);
if (err != ESP_OK)
{
printf("Error initializing WDT in ESP-IDF: %s\n", esp_err_to_name(err));
return ; // Exit if initialization fails
}
else {
printf("WDT initialized successfully\n");
}
// Add current task to WDT monitoring
err = esp_task_wdt_add(NULL); // NULL is equivalent to xTaskGetCurrentTaskHandle()
if (err != ESP_OK)
{
printf("Error adding task to WDT in ESP-IDF: %s\n", esp_err_to_name(err));
return; // Exit if task addition fails
}
printf("Task added to WDT monitoring:\n");
printf("Watchdog: ON\n");
// m_wdtEnabled = true; // WDT was enabled successfully
}
extern "C" void app_main() {
uart_set_baudrate(UART_NUM_0, 115200);
printf("STARTING MAIN ......\n");
printf("...\n");
cpu_setClock();
wdt_setTimeout(9000);
// Create Task_smallTest ----------
xTaskCreatePinnedToCore(task_smallTest, "task_smallTest", 2048, NULL, 1, NULL, 0);
// Create task_biggerTest ---------
xTaskCreatePinnedToCore(task_biggerTest, "task_biggerTest", 4096, NULL, 1, NULL, 0);
// Create task_performanceCounter -------------------
xTaskCreatePinnedToCore(task_perf, "task_performance", 2048, NULL, 1, NULL, 0);
vTaskPrioritySet(NULL, 9);
while (true) {
vTaskDelay(5000 / portTICK_PERIOD_MS);
esp_task_wdt_reset();
printf("core: %d | RESET main\n", xPortGetCoreID());
}
}
// +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
// For comparison, here's the superloop version (comment out the RTOS version when using this)
// extern "C" void app_main() {
// uart_set_baudrate(UART_NUM_0, 115200);
// printf("STARTING MAIN ......\n");
// printf("...\n");
// cpu_setClock();
// wdt_setTimeout(9000);
// uint32_t performance_lastResetTime = xTaskGetTickCount() * portTICK_PERIOD_MS;
// uint32_t smallTest_lastResetTime = xTaskGetTickCount() * portTICK_PERIOD_MS;
// uint32_t biggerTest_lastResetTime = xTaskGetTickCount() * portTICK_PERIOD_MS;
// uint32_t watchdog_lastResetTime = xTaskGetTickCount() * portTICK_PERIOD_MS;
// uint32_t performanceCounter = 0;
// while (true) {
// performanceCounter++;
// // Watchdog reset
// if (xTaskGetTickCount() * portTICK_PERIOD_MS - watchdog_lastResetTime >= 5000) {
// watchdog_lastResetTime = xTaskGetTickCount() * portTICK_PERIOD_MS;
// esp_task_wdt_reset();
// }
// // Performance counter output
// if (xTaskGetTickCount() * portTICK_PERIOD_MS - performance_lastResetTime >= 1000) {
// printf("core: %d | SUPERloop Performance score: %s|\n", xPortGetCoreID(), string_toDotString(std::to_string(performanceCounter).c_str(),3).c_str());
// performanceCounter = 0;
// performance_lastResetTime = xTaskGetTickCount() * portTICK_PERIOD_MS;
// }
// // bigger test
// if (xTaskGetTickCount() * portTICK_PERIOD_MS - biggerTest_lastResetTime >= 50) {
// // Your bigger test here
// biggerTest_lastResetTime = xTaskGetTickCount() * portTICK_PERIOD_MS;
// }
// // small test
// if (xTaskGetTickCount() * portTICK_PERIOD_MS - smallTest_lastResetTime >= 1) {
// for (size_t i = 0; i < 1000 ; i++) {
// const char* test = "7864507604586ß3850297q3456403534ß85634q860345878455q34wertr092345";
// std::string result= string_toDotString(test,3);
// }
// smallTest_lastResetTime = xTaskGetTickCount() * portTICK_PERIOD_MS;
// }
// }
// }
Oh now I see your "solution". I have spoken 3 times about it!! Have you read everything?
If you want to be precise, you have to take into account runtime of A, in B.
And in E you have to take runtime of A+B+C+D into account. Gets messy really fast, or very unprecise.
If you have conditional AND timechecks, it gets even messier, and pretty fast a lot slower.