Go Down

### Topic: Function call overhead?? Speed problem (Read 3659 times)previous topic - next topic

##### Aug 27, 2009, 12:08 am
I have an unusual problem. I have a piece of code for robotic control which uses floating point maths. i have been trying to optimize it to speed it up. I have inserted a few lines of code using the micros() function to measure the time it takes to execute a piece of code to find out the bottleneck. The code looks like this in form (not the actual code):

function() {
:
time1=micros()
:
:
code
:
:
time1=micros()-time1;
serial.println(time1);
}

loop()
{
:
time2=micros();
function()
time2=micros()-time2;
Serial.println(time2);

:
}

And as it turns out, time 1 reports 1500 microseconds and time2 reports 56000 microseconds!!! Can someone please explain?

#1
##### Aug 27, 2009, 12:51 amLast Edit: Aug 27, 2009, 12:52 am by madworm Reason: 1
No.

I don't have my magical glass sphere at hand, otherwise I might have been able to have a look at "function()".

#2
##### Aug 27, 2009, 03:16 am
I don't understand your crystal ball comment. it does not matter what function() is. the point is time1 should be about the same as time2, rather than 54000 microsec apart. if you must know function() constains series of floating point calculations which is too long to list here. it takes 3 integer as x,y,z coordinate and calculate the angles the robotic arms must make to reach those coordinates.

i have been inserting the time measurement codes thru-out the my program to see where the hold up it and at the end it appears the bottleneck is with the arduino function all overheads. time1 measures the time it takes function() to complete. time2 measures the same but from the calling routine and they should not be hugely different.

#### Digger450

#3
##### Aug 27, 2009, 03:34 am
I think madworm was trying to say this is hard to diagnose with pseudo code that won't compile.  Can you make example sketches that display the problem?  If it can't be reproduced in a simple example, it's hard to say it's not a problem elsewhere in your code.

I do believe the floating point calculations definitely will slow things down.  If you search around a bit, there are several clever methods to avoid having to use it.

#4
##### Aug 27, 2009, 03:56 am
I tried to reproduce your problem using this test code:

Code: [Select]
`void setup(void){  Serial.begin(57600);}void function(void) {  //long time1=0;  long counter1=0;  long counter2=0;  long counter3=0;  //time1=micros();  delay(10);  //time1=micros()-time1;  //Serial.println(time1);}void loop(void){  long time2;  long counter1=0;  long counter2=0;  time2=micros();  function();  time2=micros()-time2;  Serial.println(time2);}`

I get values of about 2048 microseconds, no matter if I do the timing inside "function()" or outside of it.

#### TBAr

#5
##### Aug 27, 2009, 04:00 amLast Edit: Aug 27, 2009, 04:13 am by TBAr Reason: 1
As  maddoc  says, the point here is not that the
Code: [Select]
`::code::`
part takes time, but that the only difference between time1 and time2 is the function call (more about this below). If the floating point math in the function code takes lots of time, it should be reflected in both time1 and time2.

I am one who thinks there is no need to post  maddoc's  actual code.

@ maddoc, note that the  function call is not the only difference between time1 and time2. There is also the small matter of the serial.println(time1); line, which is not trivial, even at a high baud rate.

It looks like  madworm  and I were composing replies at the same time. To support my case, note that  madworm's  sample above has the serial.println commented out inside the function.[/edit]

#6
##### Aug 27, 2009, 08:42 am
thank you for all your replies, i have included snippets of the code below:

boolean Leg::PrepareNewOrigin(int _ox, int _oy, int _oz)
{
return(PrepareNewPosition(_ox,_oy,_oz,dx,dy,dz));
}

boolean Leg::PrepareNewPosition(int _ox, int _oy, int _oz, int _dx, int _dy, int _dz)
{
double wcoxa, deltax, deltay, deltaz, s, r_sq, ws_r, wtibfemur, wfemur_r;

unsigned long current_time=micros();
char buffer[16];

wcoxa=atan2(_dy-_oy,_dx-_ox);
deltax=(_dx-_ox)-cos(wcoxa)*LENGTH_COXA;
deltay=(_dy-_oy)-sin(wcoxa)*LENGTH_COXA;
deltaz=_dz-_oz;
r_sq=deltax*deltax+deltay*deltay;
s=sqrt(r_sq);
r_sq+=deltaz*deltaz;
ws_r=atan(deltaz/s);
wtibfemur=acos((LENGTH_FEMUR*LENGTH_FEMUR+LENGTH_TIBIA*LENGTH_TIBIA-r_sq)/(2*LENGTH_FEMUR*LENGTH_TIBIA));
wfemur_r=acos((LENGTH_FEMUR*LENGTH_FEMUR-LENGTH_TIBIA*LENGTH_TIBIA+r_sq)/(2*LENGTH_FEMUR*sqrt(r_sq)));

if(r_sq<((LENGTH_FEMUR+LENGTH_TIBIA)*(LENGTH_FEMUR+LENGTH_TIBIA)))
{
if(wcoxa > 0)
angleCoxa=(int)(wcoxa*180/M_PI);
else
angleCoxa=(int)(180+wcoxa*180/M_PI);
angleHip=(int)(90+(ws_r+wfemur_r)*180/M_PI);
angleKnee=(int)(wtibfemur*180/M_PI);

IKdx=_dx;
IKdy=_dy;
IKdz=_dz;
IKox=_ox;
IKoy=_oy;
IKoz=_oz;
KinematicOK=true;
}
else KinematicOK=false;
current_time=micros()-current_time;
Serial.print("IKCalc/us");
Serial.println(current_time);

return KinematicOK;
}

boolean Robot::SetBodyPosition(byte _roll, byte _pitch, byte _crawl)
{
int crawl_amount,pitch_amount,roll_amount,i;
double cos_b, cos_c, sin_b, sin_c, r21, r23, r31,x,y,z;

unsigned long current_time;

// _roll, _pitch = 0..0xff; only high nibble used
// so 0,1,2,3,4,5,6 (0-111) = left or forward, 7,8 (112-143) = still; 9,a,b,c,d,e,f (144-255) = right or backward
// crawl = 0..0x07 - 0 = upright; 0x07 = crawl on the floor (MAX_CRAWL); only last 3 bits used

_roll>>=4;
_pitch>>=4;
_crawl&=0x07;

if(_roll != roll || _pitch != pitch || _crawl != crawl)
{
if(_roll < 7)
roll_amount=-((int)(7-_roll))*(MAX_ROLL/7);
else if(_roll >
roll_amount=((int)(_roll-8))*(MAX_ROLL/7);
else roll_amount=0;

if(_pitch < 7)
pitch_amount=-((int)(7-_pitch))*(MAX_PITCH/7);
else if(_pitch >
pitch_amount=((int)(_pitch-8))*(MAX_PITCH/7);
else pitch_amount=0;

current_time=micros();

r21=(double)roll_amount*M_PI/180;
sin_b=sin(r21);
cos_b=cos(r21);

r23=(double)pitch_amount*M_PI/180;
sin_c=sin(r23);
cos_c=cos(r23);

r21=sin_b*sin_c;
r23=-cos_b*sin_c;
r31=-sin_b*cos_c;

current_time=micros()-current_time;
Serial.print("MatrixOne ");
Serial.println(current_time);

crawl_amount=(MAX_CRAWL/7)*(int)_crawl;

current_time=micros();

double x,y,z;
x=cos_b*double(RF_ORIGINX)+sin_b*double(RF_ORIGINZ);
y=r21*double(RF_ORIGINX)+cos_c*double(RF_ORIGINY)+r23*double(RF_ORIGINZ);
z=r31*double(RF_ORIGINX)+sin_c*double(RF_ORIGINY)+cos_b*double(RF_ORIGINZ);

current_time=micros()-current_time;
Serial.print("MatrixRF ");
Serial.println(current_time);

if(!RobotLegs[RIGHT_FRONT].PrepareNewOrigin(
lrint(x),
lrint(y),
lrint(z)-crawl_amount))
return false;

current_time=micros();

//    x=cos_b*double(RR_ORIGINX)+sin_b*double(RR_ORIGINZ);
//    y=r21*double(RR_ORIGINX)+cos_c*double(RR_ORIGINY)+r23*double(RR_ORIGINZ);
//    z=r31*double(RR_ORIGINX)+sin_c*double(RR_ORIGINY)+cos_b*double(RR_ORIGINZ);
x=cos_b*50.0+sin_b*0.0;
y=r21*50.0+cos_c*(-50.0)+r23*0.0;
z=r31*50.0+sin_c*(-50.0)+cos_b*0.0;

current_time=micros()-current_time;
Serial.print("MatrixRR pre");
Serial.println(current_time);
:
:
:
}

#7
##### Aug 27, 2009, 08:43 am
[continued from above]

current_time=micros();

if(!RobotLegs[RIGHT_REAR].PrepareNewOrigin(
lrint(x),
lrint(y),
lrint(z)-crawl_amount))
return false;

current_time=micros()-current_time;
Serial.print("MatrixRR post");
Serial.println(current_time);
current_time=micros();

if(!RobotLegs[LEFT_FRONT].PrepareNewOrigin(
int(cos_b*double(LF_ORIGINX)+sin_b*double(LF_ORIGINZ)),
int(r21*double(LF_ORIGINX)+cos_c*double(LF_ORIGINY)+r23*double(LF_ORIGINZ)),
int(r31*double(LF_ORIGINX)+sin_c*double(LF_ORIGINY)+cos_b*double(LF_ORIGINZ))-crawl_amount))
return false;

current_time=micros()-current_time;
Serial.print("MatrixLF ");
Serial.println(current_time);

:
:
}

#8
##### Aug 27, 2009, 08:45 am
With this code I get IKcalc reporting 1500 microsec, but MatrixRRPost reporting 56000 microsec. Hope this shows you what I mean...

(IkCalc should be about the same as matrixRRpost

#### AWOL

#9
##### Aug 27, 2009, 09:02 amLast Edit: Aug 27, 2009, 09:23 am by AWOL Reason: 1
Please use the code button (#) when posting code.

This:
Code: [Select]
`function() {:time1=micros()::code::time1=micros()-time1;serial.println(time1);}loop(){:time2=micros();function()time2=micros()-time2;Serial.println(time2);:}`

This:
Code: [Select]
`function() {:time1=micros()::code::time1=micros()-time1;}loop(){:time2=micros();function()time2=micros()-time2;Serial.println(time1);Serial.println(time2);:}`
would be an improvement.

Code: [Select]
`Serial.print("MatrixRR pre");   Serial.println(current_time);`
Let's assume "current_time" has a value of 1500. That's 4 characters, plus 12 for the label = 16 characters.
Assume serial speed is set to 9600bps, or 960 cps, so 16/960 is
16 666 us. Probably a little longer if the println sends CR/LF.
Much longer if you're using a slower line speed.
"Pete, it's a fool looks for logic in the chambers of the human heart." Ulysses Everett McGill.
Do not send technical questions via personal messaging - they will be ignored.
I speak for myself, not Arduino.

#### bhagman

#10
##### Aug 27, 2009, 06:15 pm
There is NO HardwareSerial output buffer, so, EACH and every character must be sent before println() returns.

So your time2 is heavily skewed because of the time taken by:

Code: [Select]
`Serial.println(time1);`
inside of function().

AWOL's suggestion of printing the times after acquiring time2 is a much better idea.

b

http://wiring.org.co/ - Wiring - Where Arduino came from. (Wiring begat Arduino)

http://loftypremises.com/ - Everything you wanted to know about me.... and less

http://roguerobotics.com/ - #1 Canadian Arduino distributor

#11
##### Aug 27, 2009, 08:43 pm
Thank you for the suggestion! I will give it a try. How do you experts profile your code without a debugger / profiler? My way just seems so cumbersome.

#### Spinlock

#12
##### Aug 28, 2009, 02:55 pm
You could also try storing the timings in varaibles - like iShortest, iAverage, iLongest, iLoops for example, and then only print them when you issue a serial command.

(When a serial character is present, don't do a timing, but rather report the previous timings...)

Code: [Select]
`if(Serial.available()){   // do announce timings to serially connected client}else{   // do work, collect stats, etc.}`

I have a couple of projects that work this way, making the timing collecting and reporting as small as possible (inline!) ensures that the profiling, doesn't totally munge the real readings.  Especially in an interrupt service routine...

Go Up