PROGMEM unsigned 16-bit sine table and lookup code.

And a reason why to avoid floats.

// Getting 16 bit unsigned vars in and out of PROGMEM (flash)
// Written for public domain by GoForSmoke 2014


// this program also compares the time to do table lookup vs using floats and sin()
// sorry about having to add and print the totals but otherwise the compiler optimizes the test to nothing

#include <avr/io.h>
#include <avr/pgmspace.h>
#include <math.h>

const word PROGMEM PROGTBL[ 91 ] = { // sine * 10000 by degree 0 to 90
  0U,  175U,  349U,  523U,  698U,  872U, 1045U, 1219U, 1392U, 1564U,
  1736U, 1908U, 2079U, 2250U, 2419U, 2588U, 2756U, 2924U, 3090U, 3256U,
  3420U, 3584U, 3746U, 3907U, 4067U, 4226U, 4384U, 4540U, 4695U, 4848U,
  5000U, 5150U, 5299U, 5446U, 5592U, 5736U, 5878U, 6018U, 6157U, 6293U,
  6428U, 6561U, 6691U, 6820U, 6947U, 7071U, 7193U, 7314U, 7431U, 7547U,
  7660U, 7771U, 7880U, 7986U, 8090U, 8192U, 8290U, 8387U, 8480U, 8572U,
  8660U, 8746U, 8829U, 8910U, 8988U, 9063U, 9135U, 9205U, 9272U, 9336U,
  9397U, 9455U, 9511U, 9563U, 9613U, 9659U, 9703U, 9744U, 9781U, 9816U,
  9848U, 9877U, 9903U, 9925U, 9945U, 9962U, 9976U, 9986U, 9994U, 9998U,
  10000U, };

void setup( void )
{
  Serial.begin( 57600 );

  char  formatted[ 8 ];

  word  W;

  PGM_P  Addr; // define a const prog_char pointer named Addr
  Addr = (const prog_char *) PROGTBL; // cast the table address to fit Addr

  for ( byte i = 0; i < 91; i++ )
  {
    W = pgm_read_word( Addr + 2 * i ); // read unsigned 16 bits at Addr
    sprintf( formatted, "%6u", W );
    Serial.print( formatted );
    if ( i < 90 ) Serial.print( ", " );
    if (( i % 5 ) == 4 ) Serial.println( );
  }
  Serial.println( "\n\n Timed test" );
  
  Serial.flush(); // getting serial interrupts out before time test
  delay( 1000 );  // making sure of it

  unsigned long accum = 0UL; // to make the compiler use all the lines below

  unsigned long mics = micros();
  for ( byte i = 0; i < 91; i++ )
  {
    W = pgm_read_word( Addr + i * 2 );
    accum += (unsigned long) W;
  }
  unsigned long tookMics = micros() - mics;
  
  Serial.print( "\n accum = " ); 
  Serial.print( accum );
  Serial.print( " took " ); 
  Serial.print( tookMics );
  Serial.print( " micros to add up" ); 
  Serial.println( );
  
  Serial.flush(); // getting serial interrupts out before time test
  delay( 1000 );  // making sure of it
  
  float fAccum = 0.0;
  
  mics = micros();
  float T = sin( M_PI * 30.0 / 180.0 );
  for ( byte i = 0; i < 91; i++ )
  {
    T = sin( M_PI * (float) i / 180.0 );
    fAccum += T;
  }
  tookMics = micros() - mics;
  
  Serial.print( "\n float accum = " ); 
  Serial.print( fAccum );
  Serial.print( " took " ); 
  Serial.print( tookMics );
  Serial.print( " micros to add up" ); 
  Serial.println( );
  
} 

void loop( void )
{
}

Using unsigned int (word) trig values example for these 4-place values:

unsigned int Rise = (unsigned long) Radius * (unsigned long) table_value / 10000UL

The intermediate 32-bit precision is to avoid overflow. The final divide should keep the result 16-bit safe.
Keep Radius to 4 places or less or use 64-bit intermediate and 32-bit Rise.
You can get higher precision and accuracy than possible with floats in far less time.

If you need places to the right of the decimal point then work in smaller units and print the decimal as needed.
So for Radius = 5 to 3 decimal places with 30 degree sine, start with Radius * 1000 ( 1000 for 3 decimals ) .
(unsigned int) Rise = (unsigned long) 5000 * (unsigned long) 5000 / 10000UL = 2500U, print 2.500.

Nice work.

A word of caution: Serial is now interrupt driven. Be careful that you are benchmarking only your code instead of your code plus the interrupt service routine. Carefully placed calls to Serial.flush() solve the problem. Or, my preference, save the results in variables and only print after all tests are finished.

From what I have read, Serial output does not return until the last 2 bytes are in the USB chip.

That was true with older versions. The situation is now exactly the opposite. write simply queues the outbound byte and enables interrupts...
https://github.com/arduino/Arduino/blob/master/hardware/arduino/cores/arduino/HardwareSerial.cpp#L460

Changes made. Flush followed by delay before both timed loops.

But I must be doing something wrong because the table-lookup loop now takes 16 usecs longer though the float calculation method is shorter by 100's of usecs.

BTW,

Now 132 usecs for 91 lookup and add, 132 / 91 = 1.45 usec each on average.

Now 13848 usecs for 91 float sin() and add, 13848 / 91 = 152.18 usec each on average, over 100x as long!

Each are very minimal get and use value tests. Any more and floating point will rate even worse. :stuck_out_tongue:

GoForSmoke:
But I must be doing something wrong because the table-lookup loop now takes 16 usecs longer...

Which brings up the next caveat with performance benchmarking. To be able to compare one thing (table lookup) to another (sine) the optimizer has to start at the same place. Very likely some code above the table lookup section allowed the optimizer to take advantage of values already loaded in registers. (Or, the optimizer did something dumb in preparation of the code after the table lookup section.) In any case, the solution is to move each section of code to be bench-marked into a separate function with inlining disabled.

...though the float calculation method is shorter by 100's of usecs.

That's what happened when I made a similar change.

Now 132 usecs for 91 lookup and add, 132 / 91 = 1.45 usec each on average.
Now 13848 usecs for 91 float sin() and add, 13848 / 91 = 152.18 usec each on average, over 100x as long!

Those look like the same numbers I was getting.

I figure that running a function means pushing a return address on the stack and a jump on exit, minimum.

If nothing else, I think those results speak highly for integer table use. That's how old flight sims were even possible.

What I think needs tabling most is anything that gets around using type float. Trig functions being a place to save some major cycles, I should add a table of sine squared's and one of roots. Cosine is just Sine read backward, you get 2 tables for the price of 1, 3 times over with squares and roots added.

Any kind of of data can be PROGMEM'd, including const-data-only Class Objects. Use a PGM_P pointer then cast it as a pointer to the object. If your objects are always the same size, you can INDEX by sizeof(object) bytes.

Should I update it to handle 32-bit PROGMEM values and show a 64-bit intermediate math example?