Go Down

Topic: Fastest way to do sin(), cos() atan2() (Read 12 times) previous topic - next topic

robtillaart

Quote
I think it better to do optimization yourself when you recognize that something can be optimized.

Quote
Unless you know the compiler very well, your own attempts at optimization will rarely if ever be better than the compilers.


The proof is allways in the pudding test, if manual optimizations work, they work, if not, don't use them.
Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

dc42


Call the accessor methods only once and store the values into local variables, for example

float boatPosLat = boatPos.getLat();

instead calling boatPos.getLat() multiple times as you do in the Line::cvtToPolar method.


If the the accessor functions definitions are visible at the point of use (e.g. defined directly in the class declaration) and just return member variables of the class, then calling accessor functions is a simple indexing operation (not a function call) and storing the result in local variables is additional overhead. But your advice is sound if the accessor functions do some sort of calculation.
Formal verification of safety-critical software, software development, and electronic design and prototyping. See http://www.eschertech.com. Please do not ask for unpaid help via PM, use the forum.

robtillaart

Quote
Hasn't anyone mentioned CORDIC algorithms yet? http://en.wikipedia.org/wiki/Cordic


CORDIC no not mentioned before, might be interesting to see the timing of those on Arduino....
Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

MarkT


Quote
Hasn't anyone mentioned CORDIC algorithms yet? http://en.wikipedia.org/wiki/Cordic


CORDIC no not mentioned before, might be interesting to see the timing of those on Arduino....


430us for a 27bit resolution version...

Code: [Select]

long cordic_lookup [] =
{
  0x20000000L,
  0x12E4051EL,
  0x09FB385BL,
  0x051111D4L,
  0x028B0D43L,
  0x0145D7E1L,
  0x00A2F61EL,
  0x00517C55L,
  0x0028BE53L,
  0x00145F2FL,
  0x000A2F98L,
  0x000517CCL,
  0x00028BE6L,
  0x000145F3L,
  0x0000A2FAL,
  0x0000517DL,
  0x000028BEL,
  0x0000145FL,
  0x00000A30L,
  0x00000518L,
  0x0000028CL,
  0x00000146L,
  0x000000A3L,
  0x00000051L,
  0x00000029L,
  0x00000014L,
  0x0000000AL,
  0x00000005L
};

#define ITERS 10000

void setup ()
{
  Serial.begin (57600) ;
  long  elapsed = micros () ;
  for (long i = 0 ; i < ITERS ; i++)
    test_cordic (i << 16, false) ;
  elapsed = micros () - elapsed ;
  Serial.print ("time taken for ") ; Serial.print (ITERS) ;
  Serial.print (" iterations = ") ; Serial.print (elapsed) ; Serial.println ("us") ;
  Serial.print (elapsed / ITERS) ; Serial.println (" us/iter") ;
  test_cordic (0x15555555L, true) ;
  test_cordic (0x95555555L, true) ;
}

void test_cordic (long aa, boolean printres)
{
  long  xx = 607252935L ;
  long  yy = 0L ;

  if ((aa ^ (aa<<1)) < 0L)
  {
    aa += 0x80000000L ;
    xx = -xx ;
    yy = -yy ;
  }
 
  for (int i = 0 ; i <= 27 ; i++)
  {
    long  da = cordic_lookup [i] ;
    long  dx = yy >> i ;
    long  dy = -xx >> i ;
    if (aa < 0L)
    {
      aa += da ;
      xx += dx ;
      yy += dy ;
    }
    else
    {
      aa -= da ;
      xx -= dx ;
      yy -= dy ;
    }
  }
  if (!printres)
    return ;
  Serial.print ("end angle=") ; Serial.print (aa) ;
  Serial.print ("  end x = 0.") ; Serial.print (xx) ;
  Serial.print ("  end y = 0.") ; Serial.println (yy) ;
}

void loop ()
{
}

[ I won't respond to messages, use the forum please ]

robtillaart

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

MarkT

With a 16 bit (int) version, 93us per cordic operation (calculate sin and cosine together).
[ I won't respond to messages, use the forum please ]

skyjumper


Quote
I think it better to do optimization yourself when you recognize that something can be optimized.

Quote
Unless you know the compiler very well, your own attempts at optimization will rarely if ever be better than the compilers.


The proof is allways in the pudding test, if manual optimizations work, they work, if not, don't use them.



This is all completely compiler dependent. In Arduino, the default environment (and there is no easy way to change this) optimizes for executable size, which means that the code could actually execute more slowly is certain cases. The most common optimizations typically make code bigger. So hand optimizations, when the compiler is targeting small code size, could be undone.

jwatte


- max error: 0.00015676  == compared to sin()
- avg error: 0.00004814


That's only 10 or 11 bits of precision. If you're doing this on GPS coordinates for vehicles, then your error may put you ten kilometers off...
In fact 24 bits mantissa (which is all you get from "float" or "double" on Arduino) puts the error on the order of several feet at the surface of the Earth.
I guess it's important to understand what the application is and what kinds of errors are acceptable or not if you want to optimize more...

MarkT



- max error: 0.00015676  == compared to sin()
- avg error: 0.00004814


That's only 10 or 11 bits of precision. If you're doing this on GPS coordinates for vehicles, then your error may put you ten kilometers off...
In fact 24 bits mantissa (which is all you get from "float" or "double" on Arduino) puts the error on the order of several feet at the surface of the Earth.
I guess it's important to understand what the application is and what kinds of errors are acceptable or not if you want to optimize more...


max error of 0.000156 is about 13 bits of precision or 600m of GPS error.
[ I won't respond to messages, use the forum please ]

robtillaart

Quote
I guess it's important to understand what the application is and what kinds of errors are acceptable or not if you want to optimize more...

Agree 100%, if something is fast enough you don't need to optimize it.

that said, optimizing is also fun! ;)
Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

sbright33

How about this?  It can be 3x as fast!
http://hackaday.com/2011/05/27/chipkit-uno32-first-impressions-and-benchmarks/
If you fall... I'll be there for you!
-Floor

Skype Brighteyes3333
(262) 696-9619

skyjumper


How about this?  It can be 3x as fast!
http://hackaday.com/2011/05/27/chipkit-uno32-first-impressions-and-benchmarks/


That's pretty cool!

michinyon

What is this agricultural peasant woman's organisation that everybody keeps refering to ?

westfw

If your compiler doesn't optimize (PI/2), you should throw it away and get a new compiler.  This is not a subtle distinction between space and time optimization, this is simple "constant folding."  (however, from the C specification, the compiler is not required to evaluate this at compile time.)

Quote
[CORDIC has] 430us for a 27bit resolution version...

The existing avr-libc trig functions are pretty heavily optimized (although: some for size, rather than runtime.)
Most of the trig functions should be taking less than 200us on a 16MHz ATmega328p:
http://www.nongnu.org/avr-libc/user-manual/benchmarks.html

Don't forget that a floating point divide on AVR is pretty close in speed to a 32bit integer divide (slight faster, I think.  Only 24 bits get divided.) (Multiply is somewhat more assisted by the HW multiplier.)
(which means, BTW, that one thing you can look for is replacing division by multiplications.  I don't know if the compiler will do that for constant expressions (you should check!)  (Though most of that will already be done inside the trig functions.))

MarkT


After uploading the code to the board, when we see the result through serial monitor  it doesn't show anything. We should connect the output to any external device to see the output of Cordic ?


Perhaps you didn't see this line:
Code: [Select]
  Serial.begin (57600) ;
[ I won't respond to messages, use the forum please ]

Go Up