Slow execution times

I am seeing some strange slow-downs when using subroutines written
in separate tabs. For instance, when I put the following code in my
void loop ()

  start_time = millis();
  for (x = 1; x <=1000000; x++)
  {
    y= atan (1/x);
  }
  stop_time = millis();

The time required is 11,685 millis.

If I call the function like so..

  start_time = millis();
  for (x = 1; x <=1000000; x++)
  {
    test1(x, &y);
  }
  stop_time = millis();

with the following code in a separate tab:

double test1(double sent, double *returned)
{
*returned = atan(1/sent);
}

My total time is 170,242 millis.

Any ideas?

How are 'x' and 'y' declared?

-Mike

x and y are doubles. They are declared the same in both cases.

Here's the code from the second case...

#include <math.h>


double test1(double sent, double *returned);

void setup()
{
  Serial.begin (9600);
}

void loop()
{
  double x;
  double y;
  long start_time;
  long stop_time;
  long calc_time;
  start_time = millis();
  for (x = 1; x <=1000; x++){
    test1(x, &y);
  }
  stop_time = millis();
  calc_time = stop_time - start_time;
  Serial.println(calc_time);
}

Possibly not related to the problem, but, why is test1 defined to return a double, when it does not return a value?

The second example has more work for the processor to do.

Instead of setting just a call to atan, the second example has to set up the function call to test1, then call the atan function and then clean up the stack twice.

I am a bit shocked at first to see that it is 93% slower than the first, but...

Try changing to this:

double test1(double sent)
{
    return atan(1/sent);
}

which should give you at least 31% imrovement on the slow version if it is solely function overhead.

The time required is 11,685 millis.

Is that comma European decimal point, or a thousands separator?
(I'm guessing the latter).

Here's the same thing with the call to a subroutine in the same tab.
The entire code is posted below. Same results - time to calculate is 170,242

FYI - Arduino Mega, Arduino 0017, Mac Snow Leopard. Also, I guess I should warn
you I am a programming newbie.

Groove - You are correct, I am using the comma as a thousands seperator. Thanks for
reminding me that I am posting in an international environment!

PaulS - The problem originated in my robot code where the function actually returns
a useful value but I have chopped it down just to run this test. The returned value
doesn't get used in this test routine.

Spinlock - I will run your test next if I have time (and am not too dumb).

#include <math.h>

double test1(double sent, double *returned);

void setup()
{
  Serial.begin (9600);
}

void loop()
{
  double x;
  double y;
  long start_time;
  long stop_time;
  long calc_time;
  start_time = millis();
  for (x = 1; x <=1000000; x++)
   {
     test2(x, &y);
   }
  stop_time = millis();
  calc_time = stop_time - start_time;
  Serial.println(calc_time);
}

double test2(double sent, double *returned)
{
  *returned = atan(1/sent);
}

I would inline that funciton.

AlphaBeta said:

I would inline that funciton.

I definitely will. But some of the functions I'm calling are big and get
called over and over in the code.

Is this "call a subroutine" slow-down normal? Should I just expect an
almost twenty-to-one penalty for not inlining?

I am using 10% of my memory so far - and my project is only getting
started. As a guess, if I inline all of these it will triple my memory
requirements. Not horrible, but if there is a better solution I would like
to try it...

Thanks for the help so far, guys!

  start_time = millis();
  for (x = 1; x <=1000000; x++)
  {
    y= atan (1/x);
  }
  stop_time = millis();

What seems to be happening is that the compiler is optimising away the call to atan, since it works out that the result is never used. If you declare y as volatile double (which forces the compiler to write to y each time, even if it thinks it won't be used) then it is just as slow as the other test.

So this is just down to automatic inlining when the function is in the same tab?
Wow!

I need to go and check some asm...

double test2(double sent, double *returned)
{
  *returned = atan(1/sent);
}

this should be

void test2(double sent, double *returned)
{
  *returned = atan(1/sent);
}

Per Stimmer's request I used a volatile double, and things slowed
down to 168293 millis. Thus the fast compute times were bogus.

It appears that a single atan calc takes about 0.17 msec.

This is TOO SLOW! I guess I'm gonna have to change how i calculate
or find a faster processor...

#include <math.h>

volatile double result;
void setup()
{
  Serial.begin (9600);
}

void loop()
{
  double x;
  double y;
  long start_time;
  long stop_time;
  long calc_time;
  start_time = millis();
  for (x = 1; x <=1000000; x++)
   {
     result = atan(1/x);
   }
  stop_time = millis();
  calc_time = stop_time - start_time;
  Serial.println(calc_time);
}

double test2(double sent)
{
  return atan(1/sent);
}

The Atmega is not very good at floating point operations.

I think I can save a few clock cycles using an approximation I found
while surfing:

atan(x) = x/(1+ 0.28*x^2) for |x|<=1

atan(x) = pi/2 - x/(x^2 + 0.28) for |x| >=1

I threw this into my test code and got a 40 % savings, although I haven't
looked at it thoroughly. x=1 is a special case I think, but It's late and I
should be sleeping!!!!

98214 millis for this run..

FYI, I tested values under 1, and it was accurate to .01 radians.

#include <math.h>

volatile double result;
void setup()
{
  Serial.begin (9600);
}

void loop()
{
  double x;
  double y;
  long start_time;
  long stop_time;
  long calc_time;
  start_time = millis();
  for (x = 1; x <=1000000; x++)
   {
     if ((1/x) >=1)
     result = PI/2 - (1/x)/(1+0.28*square(1/x));
     else
     result = (1/x)/(1+0.28*square(1/x));  
   }
  stop_time = millis();
  calc_time = stop_time - start_time;
  Serial.println(calc_time);
}

Have you considered using a lookup table with linear approximation? Add to that fixed-point arithmetic and you should see a big difference in performance. With some Googling, You may even be able to find ready-to-use C code.

I'll second that suggestion of fixed point approximations. It will be more effective if you have your own atan() to go with it.

ATmega floating point is slow because the ATmega doesn't have any FP hardware. All FP is done in software, and that's fairly expensive, computationally.

What are your timing requirements? .17ms doesn't seem all that long for controlling a robot.

-j

Timing requirements are interesting. I'm building a 6 legged hexapod
(with three servos per leg) and call atan three times per leg for every
incremental step.

That means 18 calls every 20 msec, plus all the rest of my code.
So 15 percent of my computation time is JUST atan.

By the way kg4wsv, I'm glad you brought this up. Prior to this
post, I had calculated that I needed three calcs PER SERVO
(instead of three calcs per leg) which was 3*18=54 atan calcs for
every 20msec time slice. At my original .17msec that was TOO LONG!

I'll read more on thee look-up table today...

Again, thanks to all who responded. Nothing better than a helpful
forum.