Fastest way to do sin(), cos() atan2()

skyjumper · October 11, 2011, 9:09pm

Hi All...

I know Arduino has a nice trig library, but I am thinking a lookup table would give me faster, although less precise, results. I could do some testing, but I thought I would ask to see if anyone has done that already. I'm wondering just how much faster a lookup table would be, and if progmem, EEPROM or actual RAM is the fastest way to go?

Yes it consumes a lot of space, but in the 1284P I have a lot of space, at least in flash.

skyjumper · October 11, 2011, 9:14pm

Oh good grief, I just read up on progmem on the playground:

http://www.arduino.cc/playground/Main/PROGMEM

Apparently floating point types are not supported, at least by that library. Guess I'll have to look into rolling my own

jraskell · October 11, 2011, 9:28pm

The first question you really should answer is how fast do you need it to be?

If you're just looking at it from a purely academic perspective, that's one thing.

If, however, you have a particular project in mind that's going to involve a lot of trig operations, I'd recommend just prototyping using the available AVR Libc functions. I've found they're actually fairly efficient (not to be confused with the general gcc math.h implementations, which are terribly inefficient on AVRs). You can perform 3-4dof inverse kinematics at a fairly high frequency. I believe you could even do 6x 3dof ik (as in a hexapod robot) at an acceptable frequency.

There's little point in going down the path of optimization until you've solidly determined that optimization is actually necessary, unless as I said it's purely for academic reasons.

And along those lines, AVR Libc provides support for floats in progmem. See here for reference: avr-libc: pgmspace.h File Reference

robtillaart · October 11, 2011, 9:33pm

build this some time ago - Arduino Forum - make a lookup in the integer domain and convert it 'last minute' to float again.

note that cos(a) = sin(a+90);

do not have an atan2() but I think similar techniques apply
does that help?

skyjumper · October 11, 2011, 9:46pm

jraskell:
The first question you really should answer is how fast do you need it to be?

Much faster. My application currently accepts data from a GPS receiver at a rate of 0.5Hz (every other second) and interpolates the vehicles position 2 times each second. I was doing it at 5 times per second but with everything else my app does, I could not keep up. I probably could maintain 4 times per second, but didn't need to so I reduced it to 2 updates per second.

Well now the requirement has changed. Ideally, I need to accept actual empirical updates at a rate of 10 Hz and update the vehicle position at the same rate. So I need to do this at the very least 2.5 times as fast or better, assuming it worked at 4 Hz, which i think it did.

So I am concerned that I can't do it. My app uses sin(), cos() and atan2() functions to correct for antenna position, locate the vehicle on a plane, project a line along the current heading and then calculate a distance to an intercept point. So along with calling the trig functions I have a lot of floating point multiplication and division going on, much of which is converting from degrees to radians and vice versa.

So I guess my question comes down to, can building a lookup table in progmem give me a 3 fold increase in calculation speed?

skyjumper · October 11, 2011, 9:47pm

Thanks guys, i also found this one:

http://arduiniana.org/libraries/flash/

And yes, good point, I need to relearn my high school trig and see which numbers are redundant so i can keep the table size small.

cmiyc · October 11, 2011, 11:30pm

What about using a 2nd chip?

http://www.hvwtech.com/products_view.asp?ProductID=570

system · October 12, 2011, 12:31am

skyjumper:
And yes, good point, I need to relearn my high school trig and see which numbers are redundant so i can keep the table size small.

Actually, a small table doesn't help much on the AVR -- smaller tables would help if you were cache constrained, but the AVR doesn't have any cache at all AFAICT.
Also, reading from progmem is a lot slower than reading from SRAM, so if you can stuff your table data into SRAM, that's a speed win.
Finally, you should compare table look-up with techniques like Taylor approximation, or even a combination.

Let's assume you only need to solve the range 0 .. pi/2 for the sin (or cos) function, and can do the rest using inversion/symmetry.
Now, you can figure out some number of samples to store in the table -- let's say 64 samples to make the math simple.
For each sample, you could actually store the first few Taylor coefficients for approximating at that particular position.
You could then interpolate between the one sample and the next, using the distance delta between the two table samples as your interpolation factor.

So:

struct Coeffs { float a; float ... };
Coeffs table[64] = { ... };

float sin(float v) {
  float terp = v * 64 * HALF_PI_INVERSE;
  int v = (int)terp;
  terp = terp - (float)v;
  float valA = eval_coeffs(v, terp);
  float valB = eval_coeffs(v + 1, terp - 1);
  return valA * (1 - terp) + valB * terp;
}

float eval_coeffs(int ix, float delta) {
  if (ix < 0) return 0;
  if (ix >= 64) return 1;
  return table[v].a + delta * table[v].b + delta * delta * table[v].c + delta * delta * delta * table[v].d + ...;
}

Now, if floating point is too slow, you can start working with fixed point. None of the values involved is bigger than +/- 1, so you can use a repersentation that normalizes between [-2,2) for example. Adding and subtraction "just works" (assuming literals are properly shifted); multiplication needs pre-shifting the other way to avoid overflow, and/or more complex code using remainder math.

Hope this helps!

jmknapp · October 12, 2011, 12:49am

FYI, I measured the speed of a sin() call on a 16 MHz Diecimila and got 255 usec. A floating point multiply clocked in at 9 usec.

Joe

CrossRoads · October 12, 2011, 2:49am

Nice chip James. Too bad the A/D isn't a little faster:

A/D Conversion
Two 12-bit A/D channels are provided. The A/D
conversion can be triggered manually, through an
external input, or from a built-in timer. The A/D
values can be read as raw values or automatically
scaled to a floating point value. Data rates of up to
10,000 samples per second are supported.

skyjumper · October 12, 2011, 6:28am

Hm, interesting. Pricy, but easy to add to an existing design. On the other hand, going to a 32 bit CPU is much less expensive yet much more powerful.

skyjumper · October 12, 2011, 6:36am

Jwatte, the reason I wanted to keep the table small was just to conserve the limited flash or sram space. My chip has alot, but I need it (at least the sram) for other things. But your solution looks quite nice, thanks! I'll study it a bit more. I don't fully understand it yet. But thank you!

skyjumper · October 12, 2011, 6:38am

jmknapp:
FYI, I measured the speed of a sin() call on a 16 MHz Diecimila and got 255 usec. A floating point multiply clocked in at 9 usec.

Joe

Yup, plus an additional multiplication and division needed to convert from degrees to radians. So the operation looks more like 290 uSec, plus function call overhead. Thanks!

jmknapp · October 12, 2011, 12:37pm

Floating point divisions are more expensive, around 34 usec, but I don't think you have to use them. I.e., to convert from degrees to radians instead of 2pid/360., just use 0.0174533*d.

So I gather since you want a speedup of a factor of three, you're looking for a total time of less than about 100 usec for the sin() call? It's an interesting exercise that might come in handy for my purposes too, so I gave it a whack. Here's some code that does a sin() function using table lookup in progmem:

#include <avr/pgmspace.h>
#define N 5000   // number of calls to make during test
#define NS 1024   // number of entries in sin table
#define MAXI 32768  // max integer value in table
#define I2F (1./MAXI) // conversion from integer to float

int i ;
long startTime, endTime ;
float dt ;
float d = 0.0 ;  // argument to sin function, degrees
float s ;  // result of sin function

// sin table
// values for first quadrant, other quadrants calculated by symmetry
PROGMEM prog_uint16_t sintab[NS] = {
	0,
	50,100,150,201,251,
	301,351,402,452,502,
	552,603,653,703,753,
	804,854,904,954,1005,
	1055,1105,1155,1206,1256,
	1306,1356,1407,1457,1507,
	1557,1607,1658,1708,1758,
	1808,1858,1909,1959,2009,
	2059,2109,2159,2210,2260,
	2310,2360,2410,2460,2510,
	2560,2611,2661,2711,2761,
	2811,2861,2911,2961,3011,
	3061,3111,3161,3211,3261,
	3311,3361,3411,3461,3511,
	3561,3611,3661,3711,3761,
	3811,3861,3911,3961,4011,
	4061,4110,4160,4210,4260,
	4310,4360,4409,4459,4509,
	4559,4609,4658,4708,4758,
	4808,4857,4907,4957,5006,
	5056,5106,5155,5205,5255,
	5304,5354,5403,5453,5503,
	5552,5602,5651,5701,5750,
	5800,5849,5898,5948,5997,
	6047,6096,6146,6195,6244,
	6294,6343,6392,6442,6491,
	6540,6589,6639,6688,6737,
	6786,6835,6884,6934,6983,
	7032,7081,7130,7179,7228,
	7277,7326,7375,7424,7473,
	7522,7571,7620,7669,7717,
	7766,7815,7864,7913,7961,
	8010,8059,8108,8156,8205,
	8254,8302,8351,8400,8448,
	8497,8545,8594,8642,8691,
	8739,8788,8836,8884,8933,
	8981,9029,9078,9126,9174,
	9223,9271,9319,9367,9415,
	9463,9512,9560,9608,9656,
	9704,9752,9800,9848,9896,
	9944,9991,10039,10087,10135,
	10183,10230,10278,10326,10374,
	10421,10469,10517,10564,10612,
	10659,10707,10754,10802,10849,
	10897,10944,10991,11039,11086,
	11133,11181,11228,11275,11322,
	11369,11416,11464,11511,11558,
	11605,11652,11699,11746,11793,
	11839,11886,11933,11980,12027,
	12073,12120,12167,12213,12260,
	12307,12353,12400,12446,12493,
	12539,12586,12632,12678,12725,
	12771,12817,12864,12910,12956,
	13002,13048,13094,13140,13186,
	13232,13278,13324,13370,13416,
	13462,13508,13554,13599,13645,
	13691,13736,13782,13828,13873,
	13919,13964,14010,14055,14100,
	14146,14191,14236,14282,14327,
	14372,14417,14462,14507,14552,
	14598,14642,14687,14732,14777,
	14822,14867,14912,14956,15001,
	15046,15090,15135,15180,15224,
	15269,15313,15357,15402,15446,
	15491,15535,15579,15623,15667,
	15712,15756,15800,15844,15888,
	15932,15976,16019,16063,16107,
	16151,16195,16238,16282,16325,
	16369,16413,16456,16499,16543,
	16586,16630,16673,16716,16759,
	16802,16846,16889,16932,16975,
	17018,17061,17104,17146,17189,
	17232,17275,17317,17360,17403,
	17445,17488,17530,17573,17615,
	17658,17700,17742,17784,17827,
	17869,17911,17953,17995,18037,
	18079,18121,18163,18204,18246,
	18288,18330,18371,18413,18454,
	18496,18537,18579,18620,18662,
	18703,18744,18785,18826,18868,
	18909,18950,18991,19032,19073,
	19113,19154,19195,19236,19276,
	19317,19358,19398,19439,19479,
	19519,19560,19600,19640,19681,
	19721,19761,19801,19841,19881,
	19921,19961,20001,20040,20080,
	20120,20159,20199,20239,20278,
	20318,20357,20396,20436,20475,
	20514,20553,20592,20631,20671,
	20709,20748,20787,20826,20865,
	20904,20942,20981,21020,21058,
	21097,21135,21173,21212,21250,
	21288,21326,21365,21403,21441,
	21479,21517,21555,21592,21630,
	21668,21706,21743,21781,21818,
	21856,21893,21931,21968,22005,
	22042,22080,22117,22154,22191,
	22228,22265,22301,22338,22375,
	22412,22448,22485,22521,22558,
	22594,22631,22667,22703,22740,
	22776,22812,22848,22884,22920,
	22956,22992,23027,23063,23099,
	23134,23170,23205,23241,23276,
	23312,23347,23382,23417,23453,
	23488,23523,23558,23593,23627,
	23662,23697,23732,23766,23801,
	23835,23870,23904,23939,23973,
	24007,24041,24075,24109,24144,
	24177,24211,24245,24279,24313,
	24346,24380,24414,24447,24480,
	24514,24547,24580,24614,24647,
	24680,24713,24746,24779,24812,
	24845,24877,24910,24943,24975,
	25008,25040,25073,25105,25137,
	25169,25201,25234,25266,25298,
	25330,25361,25393,25425,25457,
	25488,25520,25551,25583,25614,
	25645,25677,25708,25739,25770,
	25801,25832,25863,25894,25925,
	25955,25986,26016,26047,26077,
	26108,26138,26169,26199,26229,
	26259,26289,26319,26349,26379,
	26409,26438,26468,26498,26527,
	26557,26586,26615,26645,26674,
	26703,26732,26761,26790,26819,
	26848,26877,26905,26934,26963,
	26991,27020,27048,27076,27105,
	27133,27161,27189,27217,27245,
	27273,27301,27329,27356,27384,
	27411,27439,27466,27494,27521,
	27548,27576,27603,27630,27657,
	27684,27711,27737,27764,27791,
	27817,27844,27870,27897,27923,
	27949,27976,28002,28028,28054,
	28080,28106,28131,28157,28183,
	28208,28234,28259,28285,28310,
	28335,28361,28386,28411,28436,
	28461,28486,28511,28535,28560,
	28585,28609,28634,28658,28682,
	28707,28731,28755,28779,28803,
	28827,28851,28875,28898,28922,
	28946,28969,28993,29016,29039,
	29062,29086,29109,29132,29155,
	29178,29201,29223,29246,29269,
	29291,29314,29336,29359,29381,
	29403,29425,29447,29469,29491,
	29513,29535,29557,29578,29600,
	29621,29643,29664,29686,29707,
	29728,29749,29770,29791,29812,
	29833,29854,29874,29895,29915,
	29936,29956,29977,29997,30017,
	30037,30057,30077,30097,30117,
	30137,30156,30176,30196,30215,
	30235,30254,30273,30292,30312,
	30331,30350,30368,30387,30406,
	30425,30443,30462,30480,30499,
	30517,30535,30554,30572,30590,
	30608,30626,30644,30661,30679,
	30697,30714,30732,30749,30766,
	30784,30801,30818,30835,30852,
	30869,30886,30902,30919,30936,
	30952,30969,30985,31001,31018,
	31034,31050,31066,31082,31098,
	31114,31129,31145,31161,31176,
	31192,31207,31222,31237,31253,
	31268,31283,31298,31312,31327,
	31342,31357,31371,31386,31400,
	31414,31429,31443,31457,31471,
	31485,31499,31513,31526,31540,
	31554,31567,31581,31594,31607,
	31620,31634,31647,31660,31673,
	31685,31698,31711,31723,31736,
	31749,31761,31773,31785,31798,
	31810,31822,31834,31846,31857,
	31869,31881,31892,31904,31915,
	31927,31938,31949,31960,31971,
	31982,31993,32004,32015,32025,
	32036,32047,32057,32067,32078,
	32088,32098,32108,32118,32128,
	32138,32148,32157,32167,32176,
	32186,32195,32205,32214,32223,
	32232,32241,32250,32259,32268,
	32276,32285,32294,32302,32311,
	32319,32327,32335,32343,32351,
	32359,32367,32375,32383,32390,
	32398,32405,32413,32420,32427,
	32435,32442,32449,32456,32463,
	32469,32476,32483,32489,32496,
	32502,32509,32515,32521,32527,
	32533,32539,32545,32551,32557,
	32562,32568,32573,32579,32584,
	32589,32595,32600,32605,32610,
	32615,32619,32624,32629,32633,
	32638,32642,32647,32651,32655,
	32659,32663,32667,32671,32675,
	32679,32682,32686,32689,32693,
	32696,32700,32703,32706,32709,
	32712,32715,32718,32720,32723,
	32726,32728,32730,32733,32735,
	32737,32739,32741,32743,32745,
	32747,32749,32750,32752,32754,
	32755,32756,32758,32759,32760,
	32761,32762,32763,32764,32764,
	32765,32766,32766,32767,32767,
	32767,32767,32767
} ;

float romsin(float x) {
	int ix ;   // index into sin table
	int is ;   // value read from sin table
	int q ;    // quadrant number 0,1,2,3
        int j ;

	x *= 1./360. ;

	x -= (int)x ;

	if (x < 0)
		x += 1.0 ;   // x is now between 0 and 1, representing 0 to 360 degrees

        q = x * 4 ;   // get quadrant number
	
        ix = (int)(x*4*NS) % NS ;  // get index into table
	
	switch(q) {
		case 0:   // 0-90
			is = pgm_read_word_near(sintab + ix);
			break ;
		case 1:   // 90-180
			ix = NS - ix - 1 ;  // reflect
			is = pgm_read_word_near(sintab + ix); ;
			break ;
		case 2:   // 180-270
			is = -pgm_read_word_near(sintab + ix); ;  // negate
			break ;
		case 3:   // 270-360
			ix = NS - ix - 1;   // reflect
			is = -pgm_read_word_near(sintab + ix); ;   // negate
			break ;
	}

        return((float)is*I2F) ;
}

void setup() {
  Serial.begin(9600) ;
  Serial.println("Sin in ROM test....") ;
}

void loop() {
  startTime=millis() ;
  for (i = 0 ; i < N ; i++)
    s = romsin(d) ;
  endTime=millis() ;
  dt = 1000.*(float)(endTime - startTime)/N ;  // time per call in usec
  
  Serial.print("sin(") ;
  Serial.print(d) ;
  Serial.print(")=") ;
  Serial.print(s) ;
  Serial.print(", ") ;
  Serial.print((int)dt) ;
  Serial.println(" usec/call") ;
  
  d += 1.0 ;
}

The results are around 80 usec per call. The error is at most 0.0016, near 0 and 180 degrees. That could be improved with a bigger table if necessary, and/or using interpolation although in the latter case there would be a time hit.

I'm not sure why people warn against the slowness of progmem access. From what I can see, a read takes less than a microsecond, so that's no worry in this context.

cos(), atan2() etc. are left as an exercise for the reader!

Joe

system · October 12, 2011, 2:41pm

Does the Makefile used by the Arduino link the lib math that is optimized for the avr's or does it use the generic math lib that is well written in C and is very slow and heavy?

robtillaart · October 12, 2011, 6:03pm

@jmknapp

your table is of type uint16 and you use only halve its range (0..65535) so there is a few bits precision to be won?

check - http://arduino.cc/forum/index.php/topic,69723.0.html -

< 50 usec per call

handles values outside 0..360
tablesize: 182 bytes (uses interpolation)

(measured with steps of 0.1 degree)

max error: 0.00015676 == compared to sin()
avg error: 0.00004814

skyjumper · October 12, 2011, 6:32pm

Very nice, thanks! I see what you're doing. That's a nice improvement in speed. I'm still setting up my simulators (I have some code that simulates a GPS receiver) and I'll see if I can pin down exactly how much of an improvement I need. This may be enough.

system · October 12, 2011, 6:36pm

Why don't you post here the sketch you have done so far? Perhaps we can point out how you can optimize it, for example, to eliminate unnecessary rad to deg conversions etc.

jmknapp · October 12, 2011, 6:59pm

robtillaart:
@jmknapp

your table is of type uint16 and you use only halve its range (0..65535) so there is a few bits precision to be won?

check - http://arduino.cc/forum/index.php/topic,69723.0.html -

< 50 usec per call

handles values outside 0..360

tablesize: 182 bytes (uses interpolation)

Hi Rob,

I don't think going to 65535 would give even one more bit of precision--the error comes mostly from the table size of 1024, so that for example the step from sintab[0] to sintab[1] is 50, and 50/32768 = 0.00152. If the table went to 65535, the step between sintab[0] and sintab[1] would just go to 101 and give almost the same ratio.

You mention sin() taking 120 usec--what board is that with? I get 255 usec with a 16 MHz Diecimila, or at least I did some time back when I looked at it.

joe

skyjumper · October 12, 2011, 7:35pm

pekkaa:
Why don't you post here the sketch you have done so far? Perhaps we can point out how you can optimize it, for example, to eliminate unnecessary rad to deg conversions etc.

Okay. The entire sketch is actually huge and complex, but here is most of a class where all this math is done. I took out a lot of the housekeeping stuff just to get it out of the way.

I do see where I can replace some divisions and multiplications with constants.

#include <math.h>
#include <hardwareserial.h>
#include "constants.h";

Line::Line(float m, unsigned char on, unsigned char off) 
{
  // Serial.begin(9600);
  
  magVariation = m;
  onLineAngle = on;
  offEndLineAngle = off;
 
  isLeftSet = isRightSet = false;       
}

Line::~Line(){}

void Line::setBoatPosition(double lat, double lon, float headingTrue, bool PERF)
{
  // The heading passed in should be true, which is what we want.

  boatPos.setLat(lat);
  boatPos.setLon(lon);
        
  cvtToPolar();

  // Recalc distance / bearing to start line ends
  distToLeft = (sqrt( pow(0 - leftX, 2) + pow(0 - leftY, 2)));
  distToRight = (sqrt( pow(0 - rightX, 2) + pow(0 - rightY, 2))); 

  if(PERF)
  {
    headingTrue = getBearingTrue() + 90.0;
  } 

  calcVU(headingTrue);
  
  // If we're calculating perf distance only, then we're done  
  if(PERF)
  {
    return;
  } 
  
  // Check to see if our centerline does not intersect the start line
  if(u < 0 || u > 1)
  {
      
    // Closer by distance    
    v = distToLeft;
    if(distToLeft > distToRight)
    {
      v = distToRight;
    }
    v *= -1; // Make distance negative if centerline does not intersect start line
  }
}

void Line::calcVU(float heading)
{
  double a,b,c,d;

  // As a approaches 0, numerical precision is lost. So we fix
  // it by checking this way and swapping varibles as needed:
  a = (rightX - leftX) * -1;
  b = sin(heading * PI / 180);
  c = (rightY - leftY) * -1;
  d = cos(-1 * heading * PI / 180);

  if(fabs(rightX - leftX) > fabs(rightY - leftY))
  {
    v = (leftY - leftX * (c / a))  /  (d-b * c / a);
    u = leftX / a - b * v / a;
  }
  else
  {
    v = (leftX - leftY * (a / c))  /  (b-d * a / c);
    u = leftY / c - d * v / c;     
  }
}

void Line::calcStats(void)
{
  cvtToPolar();

  bearingTrue = fmod((PI / 2 - atan2(rightY - leftY, rightX - leftX)) * 180 / PI + 360, 360);
  if(bearingTrue > 180)
    bearingTrue -= 180.0;
  
  lengthKm = (sqrt( pow(rightX - leftX, 2) + pow(rightY - leftY, 2) ));   
}

void Line::cvtToPolar(void)
{
  // This function converts the lat/lon of the line ends to polar x,y coordinates. Boat is
  // always at 0,0 by definition.
    
  leftX = (endLeft.getLon() - boatPos.getLon()) / 360.0 * cos(boatPos.getLat() * PI / 180) * PI * ED;
  leftY = (endLeft.getLat() - boatPos.getLat()) / 360 * PI * ED;
        
  rightX = (endRight.getLon() - boatPos.getLon()) / 360 * cos(boatPos.getLat() * PI / 180) * PI * ED;
  rightY = (endRight.getLat() - boatPos.getLat()) / 360 * PI * ED;
}

float Line::bearingDifference(float a1, float a2)
{
  return fabs( fmod((a1+180-a2),  360) - 180 );
}

Topic		Replies	Views
Faster cosine look up table Arduino Due	4	4169	May 6, 2021
clock cycles for math functions Libraries	30	16122	May 6, 2021
fastest way to get a theta from cartesian co-ordinates Programming Questions	8	1514	May 5, 2021
FastTrig Library Exhibition / Gallery	2	958	May 6, 2021
Real time execution cost of sin/cos function Programming Questions	14	3118	May 5, 2021

Fastest way to do sin(), cos() atan2()

Related Topics