clock cycles for math functions

Dear Sir

I am moving from using DSP's to using micros, and trying to squeeze the capabilities.
Looking at the forum and documentation, I cannot find a clock cycle list for basic functions.
e.g. sqrt, sin, cos etc
in most DSP applications, you can get a table
e.g. for a single atan on a TI67xx takes 167 clock cycles, but an array of atan with the correct look up tables will enable a processing speed of 3.5 clock cycles (two solutions ever 7 cycles)+26 cycles overhead (at a cost of 2k ram).
Where can I get such for the Arduino?

2k ram).
Where can I get such for the Arduino?

The Arduino UNO only has 2K RAM so lookuptables need to be "smart"

AFAIK there is no such table

  • a simple sketch you can get an estimation
  • the sources of math functions can give an indication ==> - Arduino Forum - (reply nr five)

3000+ clocks. No particular optimizations for multiple values, as far as I know.
The AVR has no hardware support for trig functions. Nor even floating point. Nor even fixed point Divide.
If you need a table of FP math function efficiencies, you probably need a different CPU.

PhilippeRubbers:
I am moving from using DSP’s to using micros

Now that’s a challenge!!!

LUT tables stored in EEPROM, IMHO is elegant solution. 32 kB on UNO.
"Progmem" will do the job.

Magician:
LUT tables stored in EEPROM, IMHO is elegant solution. 32 kB on UNO.
"Progmem" will do the job.

I think you meant flash... although, the 32kB are not real since the bootloader takes some of that and the user software is going to take another bit. But for a lookup table should be good enough.

EEPROM would be too slow for something like this.

I think you meant flash... EEPROM would be too slow for something like this.

Reading data from Flash lookup table is faster than reading data from EEPROM lookup table? Are you certain?

Reading data from Flash lookup table is faster than reading data from EEPROM lookup table? Are you certain?

It would be pretty close, but progmem is faster than reading the AVR's interal EEPROM. The EEPROM is treated as a peripheral, so you output the address to a couple of IO ports, twiddle some bits to start a read, and then find the result in another IO port. PROGMEM is actual memory (the PC has to access it anyway), so you just load the address into the Z registers and execute an LPM instruction (there's even an auto-increment mode.)

Both would be pretty fast compared the the existing SW trig functions.

(I suspect the person who thought EEPROM was too slow was thinking of external, serial, EEPROM.)

Ah. There's the killer / leveler I missed...

When the EEPROM is read, the CPU is halted for four clock cycles before the next instruction is executed

Following the linke of westfw brought me to this (approx) table - avr-libc: Benchmarks -

Ah! Thanks for tracking that down!

I didn't know, there is a such difference, but I mean
http://www.nongnu.org/avr-libc/user-manual/group__avr__pgmspace.html#ga75acaba9e781937468d0911423bc0c35
and it says : Flash ROM. :~
I'm still learning !

I was actually talking about the internal EEPROM. Also, you'd have only 2 or 4 k instead of the 32k written on the post.

The way to write fast "signal processing" code on a device like Arduino is to try VERY HARD to stay AWAY from doing any floating point math at all. You're reading integers from input devices, and presumably writing integers to some output device, and converting to floating point in between is just a convenient crutch. (alas, VERY convenient. Or DSPs would never have added floating point.)

How would a look up table be used to replace trig functions?

Say you need to know the sin(X), and you'll have a one degree (for the sake of the example) precision...

int sin[360];

sin[0] = 0;
sin[1] = 17; //multiplied by 1000 
sin[2] = 35;
...

sin[90] = 1000;

//and so on, and so on...

no need for a 360 degree table ==> you need a table sinus[91] 0…90 all others can be mirrored.

(code not tested)

float sinus[91];   //0..90 -- can be an int*1000 as bubulindo stated too to make it faster or even  0..100 then it will fit in one byte, depends on the precission needed.

float _sin(int x)  // float x allows interpolation; left as an exercise
{
  // handle negative values for x
  if (x < 0) return -sin(-x);   

  // handle values above 360
  if (x >=360) x %= 360;  // if prevents the expensive modulo if not needed

  switch(x/90)  //which quadrant?
  {
    case 0: return sinus[x]; break;
    case 1: return sinus[180-x]; break;
    case 2: return -1 * sinus[x-180]; break;
    case 3: return -1 * sinus[360-x]; break;
  }
}
    
float _cos(int x)
{
  return sin(x + 90);
}

float _tan(int x)
{
  return sin(x)/cos(x);  // may return NaN  not a number
}

I know... I was just showing how to set up one of such tables. You can also translate sin() into cos() with some arithmetic. But that wasn't the purpose.

sometimes I just lose myself in coding :wink:

I have a question for Experts in the area :slight_smile:
Tweaking with integer FFT code (Board UNO, ATMega328, 16 MHz), I couldn't get results any better than 23 ms ( 128 points calculus ). I'm using sine LUT, with pgm_read_word to get a value, and I know that reading happened 2 times in iner loop, which executed 127 times.
There are 254 readings overall.

When I shoot down reading LUT at all, and just multiply with dummy constant instead of sine:

wr =  5; //pgm_read_word(&Sinewave[j+N_WAVE/4]);
wi = 5; //-pgm_read_word(&Sinewave[j]);

result show 9 ms. The question is :

  • isn't it too much 16 ms / 254 = 63 usec per one pgm_read_word?