Fastest way to do sin(), cos() atan2()

Hi All...

I know Arduino has a nice trig library, but I am thinking a lookup table would give me faster, although less precise, results. I could do some testing, but I thought I would ask to see if anyone has done that already. I'm wondering just how much faster a lookup table would be, and if progmem, EEPROM or actual RAM is the fastest way to go?

Yes it consumes a lot of space, but in the 1284P I have a lot of space, at least in flash.

Oh good grief, I just read up on progmem on the playground:

http://www.arduino.cc/playground/Main/PROGMEM

Apparently floating point types are not supported, at least by that library. Guess I'll have to look into rolling my own :-(

The first question you really should answer is how fast do you need it to be?

If you're just looking at it from a purely academic perspective, that's one thing.

If, however, you have a particular project in mind that's going to involve a lot of trig operations, I'd recommend just prototyping using the available AVR Libc functions. I've found they're actually fairly efficient (not to be confused with the general gcc math.h implementations, which are terribly inefficient on AVRs). You can perform 3-4dof inverse kinematics at a fairly high frequency. I believe you could even do 6x 3dof ik (as in a hexapod robot) at an acceptable frequency.

There's little point in going down the path of optimization until you've solidly determined that optimization is actually necessary, unless as I said it's purely for academic reasons.

And along those lines, AVR Libc provides support for floats in progmem. See here for reference: http://www.nongnu.org/avr-libc/user-manual/pgmspace_8h.html

build this some time ago - http://arduino.cc/forum/index.php?topic=69723.0 - make a lookup in the integer domain and convert it 'last minute' to float again.

note that cos(a) = sin(a+90);

do not have an atan2() but I think similar techniques apply does that help?

jraskell: The first question you really should answer is how fast do you need it to be?

Much faster. My application currently accepts data from a GPS receiver at a rate of 0.5Hz (every other second) and interpolates the vehicles position 2 times each second. I was doing it at 5 times per second but with everything else my app does, I could not keep up. I probably could maintain 4 times per second, but didn't need to so I reduced it to 2 updates per second.

Well now the requirement has changed. Ideally, I need to accept actual empirical updates at a rate of 10 Hz and update the vehicle position at the same rate. So I need to do this at the very least 2.5 times as fast or better, assuming it worked at 4 Hz, which i think it did.

So I am concerned that I can't do it. My app uses sin(), cos() and atan2() functions to correct for antenna position, locate the vehicle on a plane, project a line along the current heading and then calculate a distance to an intercept point. So along with calling the trig functions I have a lot of floating point multiplication and division going on, much of which is converting from degrees to radians and vice versa.

So I guess my question comes down to, can building a lookup table in progmem give me a 3 fold increase in calculation speed?

Thanks guys, i also found this one:

http://arduiniana.org/libraries/flash/

And yes, good point, I need to relearn my high school trig and see which numbers are redundant so i can keep the table size small.

What about using a 2nd chip?

http://www.hvwtech.com/products_view.asp?ProductID=570

skyjumper:
And yes, good point, I need to relearn my high school trig and see which numbers are redundant so i can keep the table size small.

Actually, a small table doesn’t help much on the AVR – smaller tables would help if you were cache constrained, but the AVR doesn’t have any cache at all AFAICT.
Also, reading from progmem is a lot slower than reading from SRAM, so if you can stuff your table data into SRAM, that’s a speed win.
Finally, you should compare table look-up with techniques like Taylor approximation, or even a combination.

Let’s assume you only need to solve the range 0 … pi/2 for the sin (or cos) function, and can do the rest using inversion/symmetry.
Now, you can figure out some number of samples to store in the table – let’s say 64 samples to make the math simple.
For each sample, you could actually store the first few Taylor coefficients for approximating at that particular position.
You could then interpolate between the one sample and the next, using the distance delta between the two table samples as your interpolation factor.

So:

struct Coeffs { float a; float ... };
Coeffs table[64] = { ... };

float sin(float v) {
  float terp = v * 64 * HALF_PI_INVERSE;
  int v = (int)terp;
  terp = terp - (float)v;
  float valA = eval_coeffs(v, terp);
  float valB = eval_coeffs(v + 1, terp - 1);
  return valA * (1 - terp) + valB * terp;
}

float eval_coeffs(int ix, float delta) {
  if (ix < 0) return 0;
  if (ix >= 64) return 1;
  return table[v].a + delta * table[v].b + delta * delta * table[v].c + delta * delta * delta * table[v].d + ...;
}

Now, if floating point is too slow, you can start working with fixed point. None of the values involved is bigger than +/- 1, so you can use a repersentation that normalizes between [-2,2) for example. Adding and subtraction “just works” (assuming literals are properly shifted); multiplication needs pre-shifting the other way to avoid overflow, and/or more complex code using remainder math.

Hope this helps!

FYI, I measured the speed of a sin() call on a 16 MHz Diecimila and got 255 usec. A floating point multiply clocked in at 9 usec.

Joe

Nice chip James. Too bad the A/D isn't a little faster:

A/D Conversion Two 12-bit A/D channels are provided. The A/D conversion can be triggered manually, through an external input, or from a built-in timer. The A/D values can be read as raw values or automatically scaled to a floating point value. Data rates of up to 10,000 samples per second are supported.

[quote author=James C4S link=topic=75126.msg566520#msg566520 date=1318375826] What about using a 2nd chip?

http://www.hvwtech.com/products_view.asp?ProductID=570 [/quote]

Hm, interesting. Pricy, but easy to add to an existing design. On the other hand, going to a 32 bit CPU is much less expensive yet much more powerful.

Jwatte, the reason I wanted to keep the table small was just to conserve the limited flash or sram space. My chip has alot, but I need it (at least the sram) for other things. But your solution looks quite nice, thanks! I’ll study it a bit more. I don’t fully understand it yet. But thank you!

jmknapp: FYI, I measured the speed of a sin() call on a 16 MHz Diecimila and got 255 usec. A floating point multiply clocked in at 9 usec.

Joe

Yup, plus an additional multiplication and division needed to convert from degrees to radians. So the operation looks more like 290 uSec, plus function call overhead. Thanks!

Floating point divisions are more expensive, around 34 usec, but I don’t think you have to use them. I.e., to convert from degrees to radians instead of 2pid/360., just use 0.0174533*d.

So I gather since you want a speedup of a factor of three, you’re looking for a total time of less than about 100 usec for the sin() call? It’s an interesting exercise that might come in handy for my purposes too, so I gave it a whack. Here’s some code that does a sin() function using table lookup in progmem:

#include <avr/pgmspace.h>
#define N 5000   // number of calls to make during test
#define NS 1024   // number of entries in sin table
#define MAXI 32768  // max integer value in table
#define I2F (1./MAXI) // conversion from integer to float

int i ;
long startTime, endTime ;
float dt ;
float d = 0.0 ;  // argument to sin function, degrees
float s ;  // result of sin function

// sin table
// values for first quadrant, other quadrants calculated by symmetry
PROGMEM prog_uint16_t sintab[NS] = {
	0,
	50,100,150,201,251,
	301,351,402,452,502,
	552,603,653,703,753,
	804,854,904,954,1005,
	1055,1105,1155,1206,1256,
	1306,1356,1407,1457,1507,
	1557,1607,1658,1708,1758,
	1808,1858,1909,1959,2009,
	2059,2109,2159,2210,2260,
	2310,2360,2410,2460,2510,
	2560,2611,2661,2711,2761,
	2811,2861,2911,2961,3011,
	3061,3111,3161,3211,3261,
	3311,3361,3411,3461,3511,
	3561,3611,3661,3711,3761,
	3811,3861,3911,3961,4011,
	4061,4110,4160,4210,4260,
	4310,4360,4409,4459,4509,
	4559,4609,4658,4708,4758,
	4808,4857,4907,4957,5006,
	5056,5106,5155,5205,5255,
	5304,5354,5403,5453,5503,
	5552,5602,5651,5701,5750,
	5800,5849,5898,5948,5997,
	6047,6096,6146,6195,6244,
	6294,6343,6392,6442,6491,
	6540,6589,6639,6688,6737,
	6786,6835,6884,6934,6983,
	7032,7081,7130,7179,7228,
	7277,7326,7375,7424,7473,
	7522,7571,7620,7669,7717,
	7766,7815,7864,7913,7961,
	8010,8059,8108,8156,8205,
	8254,8302,8351,8400,8448,
	8497,8545,8594,8642,8691,
	8739,8788,8836,8884,8933,
	8981,9029,9078,9126,9174,
	9223,9271,9319,9367,9415,
	9463,9512,9560,9608,9656,
	9704,9752,9800,9848,9896,
	9944,9991,10039,10087,10135,
	10183,10230,10278,10326,10374,
	10421,10469,10517,10564,10612,
	10659,10707,10754,10802,10849,
	10897,10944,10991,11039,11086,
	11133,11181,11228,11275,11322,
	11369,11416,11464,11511,11558,
	11605,11652,11699,11746,11793,
	11839,11886,11933,11980,12027,
	12073,12120,12167,12213,12260,
	12307,12353,12400,12446,12493,
	12539,12586,12632,12678,12725,
	12771,12817,12864,12910,12956,
	13002,13048,13094,13140,13186,
	13232,13278,13324,13370,13416,
	13462,13508,13554,13599,13645,
	13691,13736,13782,13828,13873,
	13919,13964,14010,14055,14100,
	14146,14191,14236,14282,14327,
	14372,14417,14462,14507,14552,
	14598,14642,14687,14732,14777,
	14822,14867,14912,14956,15001,
	15046,15090,15135,15180,15224,
	15269,15313,15357,15402,15446,
	15491,15535,15579,15623,15667,
	15712,15756,15800,15844,15888,
	15932,15976,16019,16063,16107,
	16151,16195,16238,16282,16325,
	16369,16413,16456,16499,16543,
	16586,16630,16673,16716,16759,
	16802,16846,16889,16932,16975,
	17018,17061,17104,17146,17189,
	17232,17275,17317,17360,17403,
	17445,17488,17530,17573,17615,
	17658,17700,17742,17784,17827,
	17869,17911,17953,17995,18037,
	18079,18121,18163,18204,18246,
	18288,18330,18371,18413,18454,
	18496,18537,18579,18620,18662,
	18703,18744,18785,18826,18868,
	18909,18950,18991,19032,19073,
	19113,19154,19195,19236,19276,
	19317,19358,19398,19439,19479,
	19519,19560,19600,19640,19681,
	19721,19761,19801,19841,19881,
	19921,19961,20001,20040,20080,
	20120,20159,20199,20239,20278,
	20318,20357,20396,20436,20475,
	20514,20553,20592,20631,20671,
	20709,20748,20787,20826,20865,
	20904,20942,20981,21020,21058,
	21097,21135,21173,21212,21250,
	21288,21326,21365,21403,21441,
	21479,21517,21555,21592,21630,
	21668,21706,21743,21781,21818,
	21856,21893,21931,21968,22005,
	22042,22080,22117,22154,22191,
	22228,22265,22301,22338,22375,
	22412,22448,22485,22521,22558,
	22594,22631,22667,22703,22740,
	22776,22812,22848,22884,22920,
	22956,22992,23027,23063,23099,
	23134,23170,23205,23241,23276,
	23312,23347,23382,23417,23453,
	23488,23523,23558,23593,23627,
	23662,23697,23732,23766,23801,
	23835,23870,23904,23939,23973,
	24007,24041,24075,24109,24144,
	24177,24211,24245,24279,24313,
	24346,24380,24414,24447,24480,
	24514,24547,24580,24614,24647,
	24680,24713,24746,24779,24812,
	24845,24877,24910,24943,24975,
	25008,25040,25073,25105,25137,
	25169,25201,25234,25266,25298,
	25330,25361,25393,25425,25457,
	25488,25520,25551,25583,25614,
	25645,25677,25708,25739,25770,
	25801,25832,25863,25894,25925,
	25955,25986,26016,26047,26077,
	26108,26138,26169,26199,26229,
	26259,26289,26319,26349,26379,
	26409,26438,26468,26498,26527,
	26557,26586,26615,26645,26674,
	26703,26732,26761,26790,26819,
	26848,26877,26905,26934,26963,
	26991,27020,27048,27076,27105,
	27133,27161,27189,27217,27245,
	27273,27301,27329,27356,27384,
	27411,27439,27466,27494,27521,
	27548,27576,27603,27630,27657,
	27684,27711,27737,27764,27791,
	27817,27844,27870,27897,27923,
	27949,27976,28002,28028,28054,
	28080,28106,28131,28157,28183,
	28208,28234,28259,28285,28310,
	28335,28361,28386,28411,28436,
	28461,28486,28511,28535,28560,
	28585,28609,28634,28658,28682,
	28707,28731,28755,28779,28803,
	28827,28851,28875,28898,28922,
	28946,28969,28993,29016,29039,
	29062,29086,29109,29132,29155,
	29178,29201,29223,29246,29269,
	29291,29314,29336,29359,29381,
	29403,29425,29447,29469,29491,
	29513,29535,29557,29578,29600,
	29621,29643,29664,29686,29707,
	29728,29749,29770,29791,29812,
	29833,29854,29874,29895,29915,
	29936,29956,29977,29997,30017,
	30037,30057,30077,30097,30117,
	30137,30156,30176,30196,30215,
	30235,30254,30273,30292,30312,
	30331,30350,30368,30387,30406,
	30425,30443,30462,30480,30499,
	30517,30535,30554,30572,30590,
	30608,30626,30644,30661,30679,
	30697,30714,30732,30749,30766,
	30784,30801,30818,30835,30852,
	30869,30886,30902,30919,30936,
	30952,30969,30985,31001,31018,
	31034,31050,31066,31082,31098,
	31114,31129,31145,31161,31176,
	31192,31207,31222,31237,31253,
	31268,31283,31298,31312,31327,
	31342,31357,31371,31386,31400,
	31414,31429,31443,31457,31471,
	31485,31499,31513,31526,31540,
	31554,31567,31581,31594,31607,
	31620,31634,31647,31660,31673,
	31685,31698,31711,31723,31736,
	31749,31761,31773,31785,31798,
	31810,31822,31834,31846,31857,
	31869,31881,31892,31904,31915,
	31927,31938,31949,31960,31971,
	31982,31993,32004,32015,32025,
	32036,32047,32057,32067,32078,
	32088,32098,32108,32118,32128,
	32138,32148,32157,32167,32176,
	32186,32195,32205,32214,32223,
	32232,32241,32250,32259,32268,
	32276,32285,32294,32302,32311,
	32319,32327,32335,32343,32351,
	32359,32367,32375,32383,32390,
	32398,32405,32413,32420,32427,
	32435,32442,32449,32456,32463,
	32469,32476,32483,32489,32496,
	32502,32509,32515,32521,32527,
	32533,32539,32545,32551,32557,
	32562,32568,32573,32579,32584,
	32589,32595,32600,32605,32610,
	32615,32619,32624,32629,32633,
	32638,32642,32647,32651,32655,
	32659,32663,32667,32671,32675,
	32679,32682,32686,32689,32693,
	32696,32700,32703,32706,32709,
	32712,32715,32718,32720,32723,
	32726,32728,32730,32733,32735,
	32737,32739,32741,32743,32745,
	32747,32749,32750,32752,32754,
	32755,32756,32758,32759,32760,
	32761,32762,32763,32764,32764,
	32765,32766,32766,32767,32767,
	32767,32767,32767
} ;

float romsin(float x) {
	int ix ;   // index into sin table
	int is ;   // value read from sin table
	int q ;    // quadrant number 0,1,2,3
        int j ;

	x *= 1./360. ;

	x -= (int)x ;

	if (x < 0)
		x += 1.0 ;   // x is now between 0 and 1, representing 0 to 360 degrees

        q = x * 4 ;   // get quadrant number
	
        ix = (int)(x*4*NS) % NS ;  // get index into table
	
	switch(q) {
		case 0:   // 0-90
			is = pgm_read_word_near(sintab + ix);
			break ;
		case 1:   // 90-180
			ix = NS - ix - 1 ;  // reflect
			is = pgm_read_word_near(sintab + ix); ;
			break ;
		case 2:   // 180-270
			is = -pgm_read_word_near(sintab + ix); ;  // negate
			break ;
		case 3:   // 270-360
			ix = NS - ix - 1;   // reflect
			is = -pgm_read_word_near(sintab + ix); ;   // negate
			break ;
	}

        return((float)is*I2F) ;
}

void setup() {
  Serial.begin(9600) ;
  Serial.println("Sin in ROM test....") ;
}

void loop() {
  startTime=millis() ;
  for (i = 0 ; i < N ; i++)
    s = romsin(d) ;
  endTime=millis() ;
  dt = 1000.*(float)(endTime - startTime)/N ;  // time per call in usec
  
  Serial.print("sin(") ;
  Serial.print(d) ;
  Serial.print(")=") ;
  Serial.print(s) ;
  Serial.print(", ") ;
  Serial.print((int)dt) ;
  Serial.println(" usec/call") ;
  
  d += 1.0 ;
}

The results are around 80 usec per call. The error is at most 0.0016, near 0 and 180 degrees. That could be improved with a bigger table if necessary, and/or using interpolation although in the latter case there would be a time hit.

I’m not sure why people warn against the slowness of progmem access. From what I can see, a read takes less than a microsecond, so that’s no worry in this context.

cos(), atan2() etc. are left as an exercise for the reader!

Joe

Does the Makefile used by the Arduino link the lib math that is optimized for the avr's or does it use the generic math lib that is well written in C and is very slow and heavy?

@jmknapp

your table is of type uint16 and you use only halve its range (0…65535) so there is a few bits precision to be won?

check - http://arduino.cc/forum/index.php/topic,69723.0.html -

< 50 usec per call

  • handles values outside 0…360
  • tablesize: 182 bytes (uses interpolation)

(measured with steps of 0.1 degree)

  • max error: 0.00015676 == compared to sin()
  • avg error: 0.00004814

Very nice, thanks! I see what you're doing. That's a nice improvement in speed. I'm still setting up my simulators (I have some code that simulates a GPS receiver) and I'll see if I can pin down exactly how much of an improvement I need. This may be enough.

Why don't you post here the sketch you have done so far? Perhaps we can point out how you can optimize it, for example, to eliminate unnecessary rad to deg conversions etc.

robtillaart:
@jmknapp

your table is of type uint16 and you use only halve its range (0…65535) so there is a few bits precision to be won?

check - http://arduino.cc/forum/index.php/topic,69723.0.html -

< 50 usec per call

  • handles values outside 0…360
  • tablesize: 182 bytes (uses interpolation)

Hi Rob,

I don’t think going to 65535 would give even one more bit of precision–the error comes mostly from the table size of 1024, so that for example the step from sintab[0] to sintab[1] is 50, and 50/32768 = 0.00152. If the table went to 65535, the step between sintab[0] and sintab[1] would just go to 101 and give almost the same ratio.

You mention sin() taking 120 usec–what board is that with? I get 255 usec with a 16 MHz Diecimila, or at least I did some time back when I looked at it.

joe

pekkaa:
Why don’t you post here the sketch you have done so far? Perhaps we can point out how you can optimize it, for example, to eliminate unnecessary rad to deg conversions etc.

Okay. The entire sketch is actually huge and complex, but here is most of a class where all this math is done. I took out a lot of the housekeeping stuff just to get it out of the way.

I do see where I can replace some divisions and multiplications with constants.

#include <math.h>
#include <hardwareserial.h>
#include "constants.h";

Line::Line(float m, unsigned char on, unsigned char off) 
{
  // Serial.begin(9600);
  
  magVariation = m;
  onLineAngle = on;
  offEndLineAngle = off;
 
  isLeftSet = isRightSet = false;       
}

Line::~Line(){}

void Line::setBoatPosition(double lat, double lon, float headingTrue, bool PERF)
{
  // The heading passed in should be true, which is what we want.

  boatPos.setLat(lat);
  boatPos.setLon(lon);
        
  cvtToPolar();

  // Recalc distance / bearing to start line ends
  distToLeft = (sqrt( pow(0 - leftX, 2) + pow(0 - leftY, 2)));
  distToRight = (sqrt( pow(0 - rightX, 2) + pow(0 - rightY, 2))); 

  if(PERF)
  {
    headingTrue = getBearingTrue() + 90.0;
  } 

  calcVU(headingTrue);
  
  // If we're calculating perf distance only, then we're done  
  if(PERF)
  {
    return;
  } 
  
  // Check to see if our centerline does not intersect the start line
  if(u < 0 || u > 1)
  {
      
    // Closer by distance    
    v = distToLeft;
    if(distToLeft > distToRight)
    {
      v = distToRight;
    }
    v *= -1; // Make distance negative if centerline does not intersect start line
  }
}

void Line::calcVU(float heading)
{
  double a,b,c,d;

  // As a approaches 0, numerical precision is lost. So we fix
  // it by checking this way and swapping varibles as needed:
  a = (rightX - leftX) * -1;
  b = sin(heading * PI / 180);
  c = (rightY - leftY) * -1;
  d = cos(-1 * heading * PI / 180);

  if(fabs(rightX - leftX) > fabs(rightY - leftY))
  {
    v = (leftY - leftX * (c / a))  /  (d-b * c / a);
    u = leftX / a - b * v / a;
  }
  else
  {
    v = (leftX - leftY * (a / c))  /  (b-d * a / c);
    u = leftY / c - d * v / c;     
  }
}

void Line::calcStats(void)
{
  cvtToPolar();

  bearingTrue = fmod((PI / 2 - atan2(rightY - leftY, rightX - leftX)) * 180 / PI + 360, 360);
  if(bearingTrue > 180)
    bearingTrue -= 180.0;
  
  lengthKm = (sqrt( pow(rightX - leftX, 2) + pow(rightY - leftY, 2) ));   
}

void Line::cvtToPolar(void)
{
  // This function converts the lat/lon of the line ends to polar x,y coordinates. Boat is
  // always at 0,0 by definition.
    
  leftX = (endLeft.getLon() - boatPos.getLon()) / 360.0 * cos(boatPos.getLat() * PI / 180) * PI * ED;
  leftY = (endLeft.getLat() - boatPos.getLat()) / 360 * PI * ED;
        
  rightX = (endRight.getLon() - boatPos.getLon()) / 360 * cos(boatPos.getLat() * PI / 180) * PI * ED;
  rightY = (endRight.getLat() - boatPos.getLat()) / 360 * PI * ED;
}

float Line::bearingDifference(float a1, float a2)
{
  return fabs( fmod((a1+180-a2),  360) - 180 );
}