Function call overhead?? Speed problem

I have an unusual problem. I have a piece of code for robotic control which uses floating point maths. i have been trying to optimize it to speed it up. I have inserted a few lines of code using the micros() function to measure the time it takes to execute a piece of code to find out the bottleneck. The code looks like this in form (not the actual code):

function() {
:
time1=micros()
:
:
code
:
:
time1=micros()-time1;
serial.println(time1);
}

loop()
{
:
time2=micros();
function()
time2=micros()-time2;
Serial.println(time2);

:
}

And as it turns out, time 1 reports 1500 microseconds and time2 reports 56000 microseconds!!! Can someone please explain?

No.

I don't have my magical glass sphere at hand, otherwise I might have been able to have a look at "function()".

I don't understand your crystal ball comment. it does not matter what function() is. the point is time1 should be about the same as time2, rather than 54000 microsec apart. if you must know function() constains series of floating point calculations which is too long to list here. it takes 3 integer as x,y,z coordinate and calculate the angles the robotic arms must make to reach those coordinates.

i have been inserting the time measurement codes thru-out the my program to see where the hold up it and at the end it appears the bottleneck is with the arduino function all overheads. time1 measures the time it takes function() to complete. time2 measures the same but from the calling routine and they should not be hugely different.

I think madworm was trying to say this is hard to diagnose with pseudo code that won't compile. Can you make example sketches that display the problem? If it can't be reproduced in a simple example, it's hard to say it's not a problem elsewhere in your code.

I do believe the floating point calculations definitely will slow things down. If you search around a bit, there are several clever methods to avoid having to use it.

I tried to reproduce your problem using this test code:

void setup(void){
  Serial.begin(57600);
}

void function(void) {
  //long time1=0;
  long counter1=0;
  long counter2=0;
  long counter3=0;
  //time1=micros();
  delay(10);
  //time1=micros()-time1;
  //Serial.println(time1);
}

void loop(void){
  long time2;
  long counter1=0;
  long counter2=0;
  time2=micros();
  function();
  time2=micros()-time2;
  Serial.println(time2);
}

I get values of about 2048 microseconds, no matter if I do the timing inside "function()" or outside of it.

As maddoc says, the point here is not that the

:
:
code
:
:

part takes time, but that the only difference between time1 and time2 is the function call (more about this below). If the floating point math in the function code takes lots of time, it should be reflected in both time1 and time2.

I am one who thinks there is no need to post maddoc's actual code.

@ maddoc, note that the function call is not the only difference between time1 and time2. There is also the small matter of the serial.println(time1); line, which is not trivial, even at a high baud rate.

[edit]It looks like madworm and I were composing replies at the same time. To support my case, note that madworm's sample above has the serial.println commented out inside the function.[/edit]

thank you for all your replies, i have included snippets of the code below:

boolean Leg::PrepareNewOrigin(int _ox, int _oy, int _oz)
{
return(PrepareNewPosition(_ox,_oy,_oz,dx,dy,dz));
}

boolean Leg::PrepareNewPosition(int _ox, int _oy, int _oz, int _dx, int _dy, int _dz)
{
double wcoxa, deltax, deltay, deltaz, s, r_sq, ws_r, wtibfemur, wfemur_r;

unsigned long current_time=micros();
char buffer[16];

wcoxa=atan2(_dy-_oy,_dx-_ox);
deltax=(_dx-_ox)-cos(wcoxa)LENGTH_COXA;
deltay=(_dy-_oy)-sin(wcoxa)LENGTH_COXA;
deltaz=_dz-_oz;
r_sq=deltax
deltax+deltay
deltay;
s=sqrt(r_sq);
r_sq+=deltazdeltaz;
ws_r=atan(deltaz/s);
wtibfemur=acos((LENGTH_FEMUR
LENGTH_FEMUR+LENGTH_TIBIALENGTH_TIBIA-r_sq)/(2LENGTH_FEMURLENGTH_TIBIA));
wfemur_r=acos((LENGTH_FEMUR
LENGTH_FEMUR-LENGTH_TIBIALENGTH_TIBIA+r_sq)/(2LENGTH_FEMUR*sqrt(r_sq)));

if(r_sq<((LENGTH_FEMUR+LENGTH_TIBIA)(LENGTH_FEMUR+LENGTH_TIBIA)))
{
if(wcoxa > 0)
angleCoxa=(int)(wcoxa
180/M_PI);
else
angleCoxa=(int)(180+wcoxa*180/M_PI);
angleHip=(int)(90+(ws_r+wfemur_r)180/M_PI);
angleKnee=(int)(wtibfemur
180/M_PI);

IKdx=_dx;
IKdy=_dy;
IKdz=_dz;
IKox=_ox;
IKoy=_oy;
IKoz=_oz;
KinematicOK=true;
}
else KinematicOK=false;
current_time=micros()-current_time;
Serial.print("IKCalc/us");
Serial.println(current_time);

return KinematicOK;
}

boolean Robot::SetBodyPosition(byte _roll, byte _pitch, byte _crawl)
{
int crawl_amount,pitch_amount,roll_amount,i;
double cos_b, cos_c, sin_b, sin_c, r21, r23, r31,x,y,z;

unsigned long current_time;

// _roll, _pitch = 0..0xff; only high nibble used
// so 0,1,2,3,4,5,6 (0-111) = left or forward, 7,8 (112-143) = still; 9,a,b,c,d,e,f (144-255) = right or backward
// crawl = 0..0x07 - 0 = upright; 0x07 = crawl on the floor (MAX_CRAWL); only last 3 bits used

_roll>>=4;
_pitch>>=4;
_crawl&=0x07;

if(_roll != roll || _pitch != pitch || _crawl != crawl)
{
if(_roll < 7)
roll_amount=-((int)(7-_roll))(MAX_ROLL/7);
else if(_roll > 8)
roll_amount=((int)(_roll-8))
(MAX_ROLL/7);
else roll_amount=0;

if(_pitch < 7)
pitch_amount=-((int)(7-_pitch))(MAX_PITCH/7);
else if(_pitch > 8)
pitch_amount=((int)(_pitch-8))
(MAX_PITCH/7);
else pitch_amount=0;

current_time=micros();

r21=(double)roll_amount*M_PI/180;
sin_b=sin(r21);
cos_b=cos(r21);

r23=(double)pitch_amount*M_PI/180;
sin_c=sin(r23);
cos_c=cos(r23);

r21=sin_bsin_c;
r23=-cos_b
sin_c;
r31=-sin_b*cos_c;

current_time=micros()-current_time;
Serial.print("MatrixOne ");
Serial.println(current_time);

crawl_amount=(MAX_CRAWL/7)*(int)_crawl;

current_time=micros();

double x,y,z;
x=cos_bdouble(RF_ORIGINX)+sin_bdouble(RF_ORIGINZ);
y=r21double(RF_ORIGINX)+cos_cdouble(RF_ORIGINY)+r23double(RF_ORIGINZ);
z=r31
double(RF_ORIGINX)+sin_cdouble(RF_ORIGINY)+cos_bdouble(RF_ORIGINZ);

current_time=micros()-current_time;
Serial.print("MatrixRF ");
Serial.println(current_time);

if(!RobotLegs[RIGHT_FRONT].PrepareNewOrigin(
lrint(x),
lrint(y),
lrint(z)-crawl_amount))
return false;

current_time=micros();

// x=cos_bdouble(RR_ORIGINX)+sin_bdouble(RR_ORIGINZ);
// y=r21double(RR_ORIGINX)+cos_cdouble(RR_ORIGINY)+r23double(RR_ORIGINZ);
// z=r31
double(RR_ORIGINX)+sin_cdouble(RR_ORIGINY)+cos_bdouble(RR_ORIGINZ);
x=cos_b50.0+sin_b0.0;
y=r2150.0+cos_c(-50.0)+r230.0;
z=r31
50.0+sin_c*(-50.0)+cos_b*0.0;

current_time=micros()-current_time;
Serial.print("MatrixRR pre");
Serial.println(current_time);
:
:
:
}

[continued from above]

current_time=micros();

if(!RobotLegs[RIGHT_REAR].PrepareNewOrigin(
lrint(x),
lrint(y),
lrint(z)-crawl_amount))
return false;

current_time=micros()-current_time;
Serial.print("MatrixRR post");
Serial.println(current_time);
current_time=micros();

if(!RobotLegs[LEFT_FRONT].PrepareNewOrigin(
int(cos_bdouble(LF_ORIGINX)+sin_bdouble(LF_ORIGINZ)),
int(r21double(LF_ORIGINX)+cos_cdouble(LF_ORIGINY)+r23double(LF_ORIGINZ)),
int(r31
double(LF_ORIGINX)+sin_cdouble(LF_ORIGINY)+cos_bdouble(LF_ORIGINZ))-crawl_amount))
return false;

current_time=micros()-current_time;
Serial.print("MatrixLF ");
Serial.println(current_time);

:
:
}

With this code I get IKcalc reporting 1500 microsec, but MatrixRRPost reporting 56000 microsec. Hope this shows you what I mean...

(IkCalc should be about the same as matrixRRpost

Please use the code button (#) when posting code.

This:

function() {
:
time1=micros()
:
:
code
:
:
time1=micros()-time1;
serial.println(time1);
}

loop()
{
:
time2=micros();
function()
time2=micros()-time2;
Serial.println(time2);

:
}

is a very bad idea.

This:

function() {
:
time1=micros()
:
:
code
:
:
time1=micros()-time1;
}

loop()
{
:
time2=micros();
function()
time2=micros()-time2;
Serial.println(time1);
Serial.println(time2);

:
}

would be an improvement.

Serial.print("MatrixRR pre");
   Serial.println(current_time);

Let's assume "current_time" has a value of 1500. That's 4 characters, plus 12 for the label = 16 characters.
Assume serial speed is set to 9600bps, or 960 cps, so 16/960 is
16 666 us. Probably a little longer if the println sends CR/LF.
Much longer if you're using a slower line speed.

There is NO HardwareSerial output buffer, so, EACH and every character must be sent before println() returns.

So your time2 is heavily skewed because of the time taken by:

Serial.println(time1);

inside of function().

AWOL's suggestion of printing the times after acquiring time2 is a much better idea.

b

Thank you for the suggestion! I will give it a try. How do you experts profile your code without a debugger / profiler? My way just seems so cumbersome.

You could also try storing the timings in varaibles - like iShortest, iAverage, iLongest, iLoops for example, and then only print them when you issue a serial command.

(When a serial character is present, don't do a timing, but rather report the previous timings...)

if(Serial.available())
{
   // do announce timings to serially connected client
}
else
{
   // do work, collect stats, etc.
}

I have a couple of projects that work this way, making the timing collecting and reporting as small as possible (inline!) ensures that the profiling, doesn't totally munge the real readings. Especially in an interrupt service routine...