The second example has more work for the processor to do.
Instead of setting just a call to atan, the second example has to set up the function call to test1, then call the atan function and then clean up the stack twice.
I am a bit shocked at first to see that it is 93% slower than the first, but...
Here's the same thing with the call to a subroutine in the same tab.
The entire code is posted below. Same results - time to calculate is 170,242
FYI - Arduino Mega, Arduino 0017, Mac Snow Leopard. Also, I guess I should warn
you I am a programming newbie.
Groove - You are correct, I am using the comma as a thousands seperator. Thanks for
reminding me that I am posting in an international environment!
PaulS - The problem originated in my robot code where the function actually returns
a useful value but I have chopped it down just to run this test. The returned value
doesn't get used in this test routine.
Spinlock - I will run your test next if I have time (and am not too dumb).
I definitely will. But some of the functions I'm calling are big and get
called over and over in the code.
Is this "call a subroutine" slow-down normal? Should I just expect an
almost twenty-to-one penalty for not inlining?
I am using 10% of my memory so far - and my project is only getting
started. As a guess, if I inline all of these it will triple my memory
requirements. Not horrible, but if there is a better solution I would like
to try it...
start_time = millis();
for (x = 1; x <=1000000; x++)
{
y= atan (1/x);
}
stop_time = millis();
What seems to be happening is that the compiler is optimising away the call to atan, since it works out that the result is never used. If you declare y as volatile double (which forces the compiler to write to y each time, even if it thinks it won't be used) then it is just as slow as the other test.
I think I can save a few clock cycles using an approximation I found
while surfing:
atan(x) = x/(1+ 0.28*x^2) for |x|<=1
atan(x) = pi/2 - x/(x^2 + 0.28) for |x| >=1
I threw this into my test code and got a 40 % savings, although I haven't
looked at it thoroughly. x=1 is a special case I think, but It's late and I
should be sleeping!!!!
98214 millis for this run..
FYI, I tested values under 1, and it was accurate to .01 radians.
#include <math.h>
volatile double result;
void setup()
{
Serial.begin (9600);
}
void loop()
{
double x;
double y;
long start_time;
long stop_time;
long calc_time;
start_time = millis();
for (x = 1; x <=1000000; x++)
{
if ((1/x) >=1)
result = PI/2 - (1/x)/(1+0.28*square(1/x));
else
result = (1/x)/(1+0.28*square(1/x));
}
stop_time = millis();
calc_time = stop_time - start_time;
Serial.println(calc_time);
}
Have you considered using a lookup table with linear approximation? Add to that fixed-point arithmetic and you should see a big difference in performance. With some Googling, You may even be able to find ready-to-use C code.
I'll second that suggestion of fixed point approximations. It will be more effective if you have your own atan() to go with it.
ATmega floating point is slow because the ATmega doesn't have any FP hardware. All FP is done in software, and that's fairly expensive, computationally.
What are your timing requirements? .17ms doesn't seem all that long for controlling a robot.
Timing requirements are interesting. I'm building a 6 legged hexapod
(with three servos per leg) and call atan three times per leg for every
incremental step.
That means 18 calls every 20 msec, plus all the rest of my code.
So 15 percent of my computation time is JUST atan.
By the way kg4wsv, I'm glad you brought this up. Prior to this
post, I had calculated that I needed three calcs PER SERVO
(instead of three calcs per leg) which was 3*18=54 atan calcs for
every 20msec time slice. At my original .17msec that was TOO LONG!
I'll read more on thee look-up table today...
Again, thanks to all who responded. Nothing better than a helpful
forum.