Fastest way to do sin(), cos() atan2()

Also:

(PI / 2 - atan2(rightY - leftY, rightX - leftX))

can surely be simplified to: [EDIT: check that this simplification is correct in all quadrants]

atan2(rightX - leftX, rightY - leftY)

and since atan2 already returns a result in the range -pi/2 to pi/2, the fmod is unnecessary. So:

bearingTrue = atan2(rightX - leftX, rightY - leftY) * (180 / PI);

I've put brackets around 180/PI so that the compiler can precompute it, otherwise it really ought to multiply by 180 and then divide by PI since that may give a slightly different answer.

[PS - don't forget to check for the case leftX == rightX && leftY == rightY before making the atan2 call!]

Hasn't anyone mentioned CORDIC algorithms yet? http://en.wikipedia.org/wiki/Cordic

I think it better to do optimization yourself when you recognize that something can be optimized.

Unless you know the compiler very well, your own attempts at optimization will rarely if ever be better than the compilers.

The proof is allways in the pudding test, if manual optimizations work, they work, if not, don't use them.

pekkaa: Call the accessor methods only once and store the values into local variables, for example

float boatPosLat = boatPos.getLat();

instead calling boatPos.getLat() multiple times as you do in the Line::cvtToPolar method.

If the the accessor functions definitions are visible at the point of use (e.g. defined directly in the class declaration) and just return member variables of the class, then calling accessor functions is a simple indexing operation (not a function call) and storing the result in local variables is additional overhead. But your advice is sound if the accessor functions do some sort of calculation.

Hasn't anyone mentioned CORDIC algorithms yet? http://en.wikipedia.org/wiki/Cordic

CORDIC no not mentioned before, might be interesting to see the timing of those on Arduino....

robtillaart:

Hasn’t anyone mentioned CORDIC algorithms yet? http://en.wikipedia.org/wiki/Cordic

CORDIC no not mentioned before, might be interesting to see the timing of those on Arduino…

430us for a 27bit resolution version…

long cordic_lookup [] = 
{
  0x20000000L,
  0x12E4051EL,
  0x09FB385BL,
  0x051111D4L,
  0x028B0D43L,
  0x0145D7E1L,
  0x00A2F61EL,
  0x00517C55L,
  0x0028BE53L,
  0x00145F2FL,
  0x000A2F98L,
  0x000517CCL,
  0x00028BE6L,
  0x000145F3L,
  0x0000A2FAL,
  0x0000517DL,
  0x000028BEL,
  0x0000145FL,
  0x00000A30L,
  0x00000518L,
  0x0000028CL,
  0x00000146L,
  0x000000A3L,
  0x00000051L,
  0x00000029L,
  0x00000014L,
  0x0000000AL,
  0x00000005L
};

#define ITERS 10000

void setup ()
{
  Serial.begin (57600) ;
  long  elapsed = micros () ;
  for (long i = 0 ; i < ITERS ; i++)
    test_cordic (i << 16, false) ;
  elapsed = micros () - elapsed ;
  Serial.print ("time taken for ") ; Serial.print (ITERS) ;
  Serial.print (" iterations = ") ; Serial.print (elapsed) ; Serial.println ("us") ;
  Serial.print (elapsed / ITERS) ; Serial.println (" us/iter") ;
  test_cordic (0x15555555L, true) ;
  test_cordic (0x95555555L, true) ;
}

void test_cordic (long aa, boolean printres)
{
  long  xx = 607252935L ;
  long  yy = 0L ;

  if ((aa ^ (aa<<1)) < 0L)
  {
    aa += 0x80000000L ;
    xx = -xx ;
    yy = -yy ;
  }
  
  for (int i = 0 ; i <= 27 ; i++)
  {
    long  da = cordic_lookup [i] ;
    long  dx = yy >> i ;
    long  dy = -xx >> i ;
    if (aa < 0L)
    {
      aa += da ;
      xx += dx ;
      yy += dy ;
    }
    else
    {
      aa -= da ;
      xx -= dx ;
      yy -= dy ;
    }
  }
  if (!printres)
    return ;
  Serial.print ("end angle=") ; Serial.print (aa) ;
  Serial.print ("  end x = 0.") ; Serial.print (xx) ;
  Serial.print ("  end y = 0.") ; Serial.println (yy) ;
}

void loop ()
{
}

Thanks!

With a 16 bit (int) version, 93us per cordic operation (calculate sin and cosine together).

robtillaart:

I think it better to do optimization yourself when you recognize that something can be optimized.

Unless you know the compiler very well, your own attempts at optimization will rarely if ever be better than the compilers.

The proof is allways in the pudding test, if manual optimizations work, they work, if not, don't use them.

This is all completely compiler dependent. In Arduino, the default environment (and there is no easy way to change this) optimizes for executable size, which means that the code could actually execute more slowly is certain cases. The most common optimizations typically make code bigger. So hand optimizations, when the compiler is targeting small code size, could be undone.

robtillaart: - max error: 0.00015676 == compared to sin() - avg error: 0.00004814

That's only 10 or 11 bits of precision. If you're doing this on GPS coordinates for vehicles, then your error may put you ten kilometers off... In fact 24 bits mantissa (which is all you get from "float" or "double" on Arduino) puts the error on the order of several feet at the surface of the Earth. I guess it's important to understand what the application is and what kinds of errors are acceptable or not if you want to optimize more...

jwatte:

robtillaart: - max error: 0.00015676 == compared to sin() - avg error: 0.00004814

That's only 10 or 11 bits of precision. If you're doing this on GPS coordinates for vehicles, then your error may put you ten kilometers off... In fact 24 bits mantissa (which is all you get from "float" or "double" on Arduino) puts the error on the order of several feet at the surface of the Earth. I guess it's important to understand what the application is and what kinds of errors are acceptable or not if you want to optimize more...

max error of 0.000156 is about 13 bits of precision or 600m of GPS error.

I guess it's important to understand what the application is and what kinds of errors are acceptable or not if you want to optimize more...

Agree 100%, if something is fast enough you don't need to optimize it.

that said, optimizing is also fun! ;)

How about this? It can be 3x as fast! http://hackaday.com/2011/05/27/chipkit-uno32-first-impressions-and-benchmarks/

sbright33: How about this? It can be 3x as fast! http://hackaday.com/2011/05/27/chipkit-uno32-first-impressions-and-benchmarks/

That's pretty cool!

What is this agricultural peasant woman's organisation that everybody keeps refering to ?

If your compiler doesn't optimize (PI/2), you should throw it away and get a new compiler. This is not a subtle distinction between space and time optimization, this is simple "constant folding." (however, from the C specification, the compiler is not required to evaluate this at compile time.)

[CORDIC has] 430us for a 27bit resolution version...

The existing avr-libc trig functions are pretty heavily optimized (although: some for size, rather than runtime.) Most of the trig functions should be taking less than 200us on a 16MHz ATmega328p: http://www.nongnu.org/avr-libc/user-manual/benchmarks.html

Don't forget that a floating point divide on AVR is pretty close in speed to a 32bit integer divide (slight faster, I think. Only 24 bits get divided.) (Multiply is somewhat more assisted by the HW multiplier.) (which means, BTW, that one thing you can look for is replacing division by multiplications. I don't know if the compiler will do that for constant expressions (you should check!) (Though most of that will already be done inside the trig functions.))

nithesh26: After uploading the code to the board, when we see the result through serial monitor it doesn't show anything. We should connect the output to any external device to see the output of Cordic ?

Perhaps you didn't see this line:

  Serial.begin (57600) ;

cos(x) = sin(x+90); so angle = 90 degree

robtillaart: cos(x) = sin(x+90); so angle = 90 degree

I did not get you. What actually are you trying to say.?

MarkT:

robtillaart:

Hasn’t anyone mentioned CORDIC algorithms yet? http://en.wikipedia.org/wiki/Cordic

CORDIC no not mentioned before, might be interesting to see the timing of those on Arduino…

430us for a 27bit resolution version…

long cordic_lookup [] = 

{
  0x20000000L,
  0x12E4051EL,
  0x09FB385BL,
  0x051111D4L,
  0x028B0D43L,
  0x0145D7E1L,
  0x00A2F61EL,
  0x00517C55L,
  0x0028BE53L,
  0x00145F2FL,
  0x000A2F98L,
  0x000517CCL,
  0x00028BE6L,
  0x000145F3L,
  0x0000A2FAL,
  0x0000517DL,
  0x000028BEL,
  0x0000145FL,
  0x00000A30L,
  0x00000518L,
  0x0000028CL,
  0x00000146L,
  0x000000A3L,
  0x00000051L,
  0x00000029L,
  0x00000014L,
  0x0000000AL,
  0x00000005L
};

#define ITERS 10000

void setup ()
{
  Serial.begin (57600) ;
  long  elapsed = micros () ;
  for (long i = 0 ; i < ITERS ; i++)
    test_cordic (i << 16, false) ;
  elapsed = micros () - elapsed ;
  Serial.print (“time taken for “) ; Serial.print (ITERS) ;
  Serial.print (” iterations = “) ; Serial.print (elapsed) ; Serial.println (“us”) ;
  Serial.print (elapsed / ITERS) ; Serial.println (” us/iter”) ;
  test_cordic (0x15555555L, true) ;
  test_cordic (0x95555555L, true) ;
}

void test_cordic (long aa, boolean printres)
{
  long  xx = 607252935L ;
  long  yy = 0L ;

if ((aa ^ (aa<<1)) < 0L)
  {
    aa += 0x80000000L ;
    xx = -xx ;
    yy = -yy ;
  }
 
  for (int i = 0 ; i <= 27 ; i++)
  {
    long  da = cordic_lookup [i] ;
    long  dx = yy >> i ;
    long  dy = -xx >> i ;
    if (aa < 0L)
    {
      aa += da ;
      xx += dx ;
      yy += dy ;
    }
    else
    {
      aa -= da ;
      xx -= dx ;
      yy -= dy ;
    }
  }
  if (!printres)
    return ;
  Serial.print (“end angle=”) ; Serial.print (aa) ;
  Serial.print ("  end x = 0.") ; Serial.print (xx) ;
  Serial.print ("  end y = 0.") ; Serial.println (yy) ;
}

void loop ()
{
}

How can I get the value of sin(anyangle) & cos (any angle) through the code provided.?
If I execute the code I’m getting values of x and y, What do those values refer to?