I'm profiling a critical function and I need to keep the execution time to the very minimum.
After lots of headaches, I have realized that if I declare and define the function in a different .h and .cpp file than the one that is calling the function, the computation time is much larger!!
Here's an example (obviously useless, just for demonstration):
main.ino
#include "fooAux.h"
#define N 100
unsigned long t1, t2;
int16_t fooIn(float x)
{
if (x > 0.5f)
return 2;
else
return 1;
}
void setup() {
Serial.begin(115200);
}
void loop()
{
float x = -1.0f;
// Function defined in current .cpp file
t1 = micros();
for(uint8_t i = 0; i < N; ++i)
{
fooIn(x);
asm("");
}
t2 = micros();
Serial.println("Time fooIn: " + String((float)(t2-t1)/N));
// Function defined in aux.cpp
t1 = micros();
for(uint8_t i = 0; i < N; ++i)
{
fooAux(x);
asm("");
}
t2 = micros();
Serial.println("Time fooAux: " + String((float)(t2-t1)/N));
delay(2000);
}
Yes of course, any function in a separate file cannot be easily inlined.
The compiler will often optimise calls to small functions it knows all about by inlining
its code at the call site, rather than generate code to call it.
Separate files are compiled separately and the resulting code combined by the linker
program which fixes all the calls between functions to have the right run-time addresses,
but the linker can never inline the code, and it imposes the most general calling strategy.
But also you have to consider whether the compiler has competely optimised away the call altogether -
if a function causes no side effects and its result is not used the compiler can simply ignore it, since
nothing else in the program can ever tell if it ran (except for benchmark timings)
Tinrik:
It takes like 20 times more time to execute!! What on earth is going on here?
Are you really sure that anything "executes" when you are getting the short time as a result?
My guess is, that as you are never using the results of fooIn() or fooAux() in any way, this happens:
The compiler "optimizes out" all of the function calls of fooIn() from the program, so that fooIn() is never actually executed.
And your "demonstration" code just shows, how clever optimizing compilers can be in optimizing code in certain situations and how the compiler can remove complete function calls if the result doesn't matter and never is used.
BTW: If your code is having "a critical function and I need to keep the execution time to the very minimum" then I'm wondering why you use "float" as a data type in your code.
Indeed I realized the compiler was optimizing out the call to fooIn(); if I comment it out I get the same result.
Then I guess I have to do some operation with the result of fooIn(), which will then impact the execution time. Is there a way to do this kind of benchmarking without such an impact? Is it possible to disable all compiler optimizations within the Arduino IDE?
then I'm wondering why you use "float" as a data type in your code.
Or Strings. Converting the float to a String so that print() can unwrap the string wrapped in the String is NOT a way to improve performance of anything.
obviously useless
What is obvious is that that is not your real code. It is NOT at all obvious that you are not doing the same stupid stuff in the real code.
Assing the return value to a volatile variable, and I'm sure you will see different results.
Tinrik:
Then I guess I have to do some operation with the result of fooIn(), which will then impact the execution time. Is there a way to do this kind of benchmarking without such an impact?
No. But you can create different for-loops "doing something relevant" and compare:
1.) a for-loop doing "hardly anything" but at the same time "something relevant", so the code gets executed
2.) a for-loop to be benchmarked, doing the same thing plus an extra function call
The time you want to measure is the time difference between those two then.
Tinrik:
Is it possible to disable all compiler optimizations within the Arduino IDE?
It depends on the Arduino-IDE version.
In some IDE versions it is impossible, perhaps.
In some IDE versions it requires complicated tweaking of the IDE and Arduino core.
In some other IDE versions it requires some simpler tweaking of the IDE and Arduino core.
Perhaps give that code a try for benchmarking:
#include "fooAux.h"
#define N 100
unsigned long t0, t1, t2;
int16_t fooIn(float x)
{
if (x > 0.5f)
return 2;
else
return 1;
}
void setup() {
Serial.begin(115200);
}
void loop()
{
float x = -1.0f;
volatile byte dummy=0;
x= x+dummy; // this line avouids 'x' to be treated as a 'const' value with a known value at compile time
// first create some 'nearly do anything' loop for comparison reasons
t0= micros();
for(uint8_t i = 0; i < N; ++i)
{
dummy+=i;
asm("");
}
t0=micros()-t0;
Serial.print("dummy= ");Serial.println(dummy);
Serial.print("t0= ");Serial.println(t0);delay(100);
dummy=0;
// Function defined in current .cpp file
t1 = micros();
for(uint8_t i = 0; i < N; ++i)
{
dummy+=fooIn(x);
asm("");
}
t1= micros()-t1;
Serial.print("dummy= ");Serial.println(dummy);
Serial.print("t1= ");Serial.println(t1);delay(100);
dummy=0;
// Function defined in aux.cpp
t2 = micros();
for(uint8_t i = 0; i < N; ++i)
{
dummy+=fooAux(x);
asm("");
}
t2 = micros()-t2;
Serial.print("dummy= ");Serial.println(dummy);
Serial.print("t2= ");Serial.println(t2);delay(100);
Serial.println();
Serial.print("Extra time for using fooIn(x)= ");Serial.println(t1-t0);
Serial.print("Extra time for using fooAux(x)= ");Serial.println(t2-t0);
Serial.println("--------------------\r\n");
delay(2000);
}
jurs:
It depends on the Arduino-IDE version.
In some IDE versions it is impossible, perhaps.
In some IDE versions it requires complicated tweaking of the IDE and Arduino core.
In some other IDE versions it requires some simpler tweaking of the IDE and Arduino core.