A faster sin() lookup function.

Today I did some investigations in a lookup function for sin(x). Sin(x) takes approx 120 micros and I wanted more speed, and I knew the price was precision.

As I worked only in whole degrees a lookup table was the way to go. A straightforward float[361] (1444 bytes!) was not acceptable so folding was applied, and a mapping to bring the table back to uint8_t[91].

The code shows two tables: a 16bit and an 8bit table,
The function float isin(long x) is the core, it folds the angle, does a lookup and convert that to a float. // isin() stands for integer_sinus()
There are straightforward implementations of icos(x) and itan(x).

The function float fsin(float x) is an interpolation function, that runs on top of isin(),

// 91 x 2 bytes ==> 182 bytes
unsigned int isinTable16[] = { 
  0, 1144, 2287, 3430, 4571, 5712, 6850, 7987, 9121, 10252, 11380, 
  12505, 13625, 14742, 15854, 16962, 18064, 19161, 20251, 21336, 22414, 
  23486, 24550, 25607, 26655, 27696, 28729, 29752, 30767, 31772, 32768, 

  33753, 34728, 35693, 36647, 37589, 38521, 39440, 40347, 41243, 42125, 
  42995, 43851, 44695, 45524, 46340, 47142, 47929, 48702, 49460, 50203, 
  50930, 51642, 52339, 53019, 53683, 54331, 54962, 55577, 56174, 56755, 

  57318, 57864, 58392, 58902, 59395, 59869, 60325, 60763, 61182, 61583, 
  61965, 62327, 62671, 62996, 63302, 63588, 63855, 64103, 64331, 64539, 
  64728, 64897, 65047, 65176, 65286, 65375, 65445, 65495, 65525, 65535, 
  }; 

// 91 bytes 
uint8_t isinTable8[] = { 
  0, 4, 9, 13, 18, 22, 27, 31, 35, 40, 44, 
  49, 53, 57, 62, 66, 70, 75, 79, 83, 87, 
  91, 96, 100, 104, 108, 112, 116, 120, 124, 128, 

  131, 135, 139, 143, 146, 150, 153, 157, 160, 164, 
  167, 171, 174, 177, 180, 183, 186, 190, 192, 195, 
  198, 201, 204, 206, 209, 211, 214, 216, 219, 221, 

  223, 225, 227, 229, 231, 233, 235, 236, 238, 240, 
  241, 243, 244, 245, 246, 247, 248, 249, 250, 251, 
  252, 253, 253, 254, 254, 254, 255, 255, 255, 255, 
  }; 


float isin(long x)
{
  boolean pos = true;  // positive - keeps an eye on the sign.
  if (x < 0) 
  {
    x = -x;
    pos = !pos;  
  }  
  if (x >= 360) x %= 360;   
  if (x > 180) 
  {
    x -= 180;
    pos = !pos;
  }
  if (x > 90) x = 180 - x;
//  if (pos) return isinTable8[x] * 0.003921568627; // = /255.0
//  return isinTable8[x] * -0.003921568627 ;
  if (pos) return isinTable16[x] * 0.0000152590219; // = /65535.0
  return isinTable16[x] * -0.0000152590219 ;
}

float icos(long x)
{
  return isin(x+90);
}

float itan(long x) 
{
  return isin(x) / icos(x);
}

float fsin(float d)
{
  float a = isin(d);
  float b = isin(d+1);
  return a + (d-int(d)) * (b-a);
}

Did some first tests and isin(x) is approx 6x faster than the original sin(x) for both tables. The 16 bit table is more precise with a max error of 0.000008 compared to sin(x); The 8bit Table scores a max error of 0.0002. [whole angles only]

I think it can be done better still, so all remarks, improvements or ideas are - as allways - welcome.

Rob

Did you consider putting the big arrays into program memory and reading them back with 'pgm_read_word_near()'? I've done this sort of thing with audio samples for a MIDI synthesiser on the Arduino.

Yes, considered that, but I keep it as a final optimization, as the RAM is just used elsewhere.

What I'm looking for is smart tricks that use only a few statements that will fold the table again, or increases the precision with the same amount of data.

I've considered storing only the mantisse of floats and add the exponent later. That would be 3 bytes per value, but that did not work as the exponents differ.

urrently I'm thinking what can be done with

sin(2x) = 2sin(x)cos(x)
If I have a table for 0..45, I could generate 46..90 (at least the even ones directly, and the odd by interpolation) but there is a recursion in it.

Next trick to be investigated is that the the steepness around x ==> sin(90-x) (use radians for this)
So the table 0-45 holds enough values for the whole range.

heck, just approximate the whole thing with a triangle wave, so y is just a linear version of x :wink:

afraid that the error margin is a bit too high, but performance will be flabbergasting :wink:

Changed the signature of isin to accept float as param. this is basis for interpolation. With the old interface fractional angles were truncated, now they are rounded. This reduces the :

runtime increased from 18 -> 32 micros - still 3.75 faster than sin(x) -

steps of 0.1 degree
max error: 0.00872979 (below 1/100 !! with the truncation, this was 0.016 almost twice as much)
avg error: 0.00277813

lesson: Double the precision cost twice the performance?

float isin(float f)
{
  long x = f + 0.5;
  boolean pos = true;  // positive - keeps an eye on the sign.
  if (x < 0) 
  {
    x = -x;
    pos = !pos;  
  }  
  if (x >= 360) x %= 360;   
  if (x > 180) 
  {
    x -= 180;
    pos = !pos;
  }
  if (x > 90) x = 180 - x;
//  if (pos) return isinTable8[x] * 0.003921568627; // = /255.0
//  return isinTable8[x] * -0.003921568627 ;
  if (pos) return isinTable16[x] * 0.0000152590219; // = /65535.0
  return isinTable16[x] * -0.0000152590219 ;
}

next step is some interpolation (again) .

Added the interpolation

float isin(float f)
{
  boolean pos = true;  // positive
  if (f < 0) 
  {
    f = -f;
    pos = !pos;  
  }  
  
  long x = f;
  unsigned int r = (f-x) * 256;
  
  if (x >= 360) x %= 360;
  if (x >= 180) 
  {
    x -= 180;
    pos = !pos;
  }
  if (x >= 90)
  {
    x = 180 - x;
    if (r != 0)
    {
      r = 256-r;
      x--;
    }
  }

  unsigned int v = isinTable16[x];
  // interpolate if needed
  if (r > 0) v = v + ((isinTable16[x+1] - v)/8 * r) /32;

  if (pos) return v * 0.0000152590219; // = /65535.0
  return v * -0.0000152590219 ;
}

runtime 45 micros (49 if the interpolation is not conditional) factor 2.6 faster than sine

steps of 0.1 degree
max error: 0.00015676 (factor 55.7 lower than without interpolation
avg error: 0.00004814 (factor 57.7 lower than without interpolation

An integer only version of the isin() is discussed in the thread below. The output of this version varies between -127 and 127.

updated: fastest version (with limitations!) 2.86 micros ==> 42 times faster
almost 350.000 calls per second possible!

(some tricks applied there might strip off a few micros in the above versions)

I know it is frowned upon here for reopening old topics but the information in this thread has really helped me.

I can see it is 6 years old, which shows just how far behind the curve I am in the Arduino arena.

Were there any libraries written to implement these sine lookup tables?

Thanks for any help.