Go Down

Topic: A faster sin() lookup function. (Read 5 times) previous topic - next topic

robtillaart

Aug 18, 2011, 01:49 am Last Edit: Aug 20, 2011, 05:58 pm by robtillaart Reason: 1
Today I did some investigations in a lookup function for sin(x).  Sin(x) takes approx 120 micros and I wanted more speed, and I knew the price was precision.

As I worked only in whole degrees a lookup table was the way to go. A straightforward float[361]  (1444 bytes!) was not acceptable so folding was applied, and a mapping to bring the table back to uint8_t[91].

The code shows two tables: a 16bit and an 8bit table,
The function float isin(long x) is the core, it folds the angle, does a lookup and convert that to a float.   // isin() stands for integer_sinus()
There are straightforward implementations of icos(x) and itan(x).

The function float fsin(float x) is an interpolation function, that runs on top of isin(),  

Code: [Select]

// 91 x 2 bytes ==> 182 bytes
unsigned int isinTable16[] = {
 0, 1144, 2287, 3430, 4571, 5712, 6850, 7987, 9121, 10252, 11380,
 12505, 13625, 14742, 15854, 16962, 18064, 19161, 20251, 21336, 22414,
 23486, 24550, 25607, 26655, 27696, 28729, 29752, 30767, 31772, 32768,

 33753, 34728, 35693, 36647, 37589, 38521, 39440, 40347, 41243, 42125,
 42995, 43851, 44695, 45524, 46340, 47142, 47929, 48702, 49460, 50203,
 50930, 51642, 52339, 53019, 53683, 54331, 54962, 55577, 56174, 56755,

 57318, 57864, 58392, 58902, 59395, 59869, 60325, 60763, 61182, 61583,
 61965, 62327, 62671, 62996, 63302, 63588, 63855, 64103, 64331, 64539,
 64728, 64897, 65047, 65176, 65286, 65375, 65445, 65495, 65525, 65535,
 };

// 91 bytes
uint8_t isinTable8[] = {
 0, 4, 9, 13, 18, 22, 27, 31, 35, 40, 44,
 49, 53, 57, 62, 66, 70, 75, 79, 83, 87,
 91, 96, 100, 104, 108, 112, 116, 120, 124, 128,

 131, 135, 139, 143, 146, 150, 153, 157, 160, 164,
 167, 171, 174, 177, 180, 183, 186, 190, 192, 195,
 198, 201, 204, 206, 209, 211, 214, 216, 219, 221,

 223, 225, 227, 229, 231, 233, 235, 236, 238, 240,
 241, 243, 244, 245, 246, 247, 248, 249, 250, 251,
 252, 253, 253, 254, 254, 254, 255, 255, 255, 255,
 };


float isin(long x)
{
 boolean pos = true;  // positive - keeps an eye on the sign.
 if (x < 0)
 {
   x = -x;
   pos = !pos;  
 }  
 if (x >= 360) x %= 360;  
 if (x > 180)
 {
   x -= 180;
   pos = !pos;
 }
 if (x > 90) x = 180 - x;
//  if (pos) return isinTable8[x] * 0.003921568627; // = /255.0
//  return isinTable8[x] * -0.003921568627 ;
 if (pos) return isinTable16[x] * 0.0000152590219; // = /65535.0
 return isinTable16[x] * -0.0000152590219 ;
}

float icos(long x)
{
 return isin(x+90);
}

float itan(long x)
{
 return isin(x) / icos(x);
}

float fsin(float d)
{
 float a = isin(d);
 float b = isin(d+1);
 return a + (d-int(d)) * (b-a);
}

Did some first tests and isin(x) is approx 6x faster than the original sin(x) for both tables. The 16 bit table is more precise with a max error of 0.000008 compared to sin(x); The 8bit Table scores a max error of 0.0002. [whole angles only]

I think it can be done better still, so all remarks, improvements or ideas are - as allways - welcome.

Rob
Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Anachrocomputer

Did you consider putting the big arrays into program memory and reading them back with 'pgm_read_word_near()'?  I've done this sort of thing with audio samples for a MIDI synthesiser on the Arduino.

robtillaart

Yes, considered that, but I keep it as a final optimization, as the RAM is just used elsewhere.

What I'm looking for is smart tricks that use only a few statements that will fold the table again, or increases the precision with the same amount of data.

I've considered storing only the mantisse of floats and add the exponent later. That would be 3 bytes per value, but that did not work as the exponents differ.

urrently I'm thinking what can be done with

sin(2x) = 2sin(x)cos(x)
If I have a table for 0..45, I could generate 46..90 (at least the even ones directly, and the odd by interpolation) but there is a recursion in it.

Next trick to be investigated is that the the steepness around x   ==>   sin(90-x)   (use radians for this)
So the table 0-45 holds enough values for the whole range.




Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

CrossRoads

heck, just approximate the whole thing with a triangle wave, so y is just a linear version of x ;)
Designing & building electrical circuits for over 25 years. Check out the ATMega1284P based Bobuino and other '328P & '1284P creations & offerings at  www.crossroadsfencing.com/BobuinoRev17.
Arduino for Teens available at Amazon.com.

robtillaart

afraid that the error margin is a bit too high, but performance will be flabbergasting ;)
Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

robtillaart

#5
Aug 18, 2011, 10:08 pm Last Edit: Aug 20, 2011, 06:01 pm by robtillaart Reason: 1
Changed the signature of isin to accept float as param. this is basis for interpolation. With the old interface fractional angles were truncated, now they are rounded. This reduces the :

runtime increased from 18 -> 32 micros  - still 3.75 faster than sin(x) -

steps of 0.1 degree
max error: 0.00872979    (below 1/100  !!  with the truncation, this was  0.016 almost twice as much)
avg error: 0.00277813      

lesson: Double the precision cost twice the performance?
Code: [Select]

float isin(float f)
{
 long x = f + 0.5;
 boolean pos = true;  // positive - keeps an eye on the sign.
 if (x < 0)
 {
   x = -x;
   pos = !pos;  
 }  
 if (x >= 360) x %= 360;  
 if (x > 180)
 {
   x -= 180;
   pos = !pos;
 }
 if (x > 90) x = 180 - x;
//  if (pos) return isinTable8[x] * 0.003921568627; // = /255.0
//  return isinTable8[x] * -0.003921568627 ;
 if (pos) return isinTable16[x] * 0.0000152590219; // = /65535.0
 return isinTable16[x] * -0.0000152590219 ;
}


next step is some interpolation (again) .

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

robtillaart

#6
Aug 20, 2011, 07:36 pm Last Edit: Jan 05, 2013, 12:18 pm by robtillaart Reason: 1
Added the interpolation

Code: [Select]
float isin(float f)
{
 boolean pos = true;  // positive
 if (f < 0)
 {
   f = -f;
   pos = !pos;  
 }  
 
 long x = f;
 unsigned int r = (f-x) * 256;
 
 if (x >= 360) x %= 360;
 if (x >= 180)
 {
   x -= 180;
   pos = !pos;
 }
 if (x >= 90)
 {
   x = 180 - x;
   if (r != 0)
   {
     r = 256-r;
     x--;
   }
 }

 unsigned int v = isinTable16[x];
 // interpolate if needed
 if (r > 0) v = v + ((isinTable16[x+1] - v)/8 * r) /32;

 if (pos) return v * 0.0000152590219; // = /65535.0
 return v * -0.0000152590219 ;
}


runtime 45 micros (49 if the interpolation is not conditional)   factor 2.6 faster than sine


steps of 0.1 degree
max error: 0.00015676   (factor 55.7 lower than without interpolation
avg error: 0.00004814    (factor 57.7 lower than without interpolation
Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

robtillaart

#7
Nov 03, 2013, 03:31 pm Last Edit: Nov 03, 2013, 04:02 pm by robtillaart Reason: 1
An integer only version of the isin() is discussed in the thread below. The output of this version varies between -127 and 127.
- http://forum.arduino.cc//index.php?topic=196625.msg1453433#msg1453433 -

updated: fastest version (with limitations!) 2.86 micros ==> 42 times faster
almost 350.000 calls per second possible!

(some tricks applied there might strip off a few micros in the above versions)
Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Go Up