Pages: 1 [2]   Go Down
Author Topic: clock cycles for math functions  (Read 3330 times)
0 Members and 1 Guest are viewing this topic.
'round the world...
Offline Offline
Faraday Member
**
Karma: 42
Posts: 3291
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

Say you need to know the sin(X), and you'll have a one degree (for the sake of the example) precision...
Code:
int sin[360];

sin[0] = 0;
sin[1] = 17; //multiplied by 1000
sin[2] = 35;
...

sin[90] = 1000;

//and so on, and so on...




Logged

Eu não sou o teu criado. Se respondo no fórum é para ajudar todos mediante a minha disponibilidade e disposição. Responder por mensagem pessoal iria contra o propósito do fórum e por isso evito-o.
Se realmente pretendes que eu te ajude por mensagem pessoal, então podemos chegar a um acordo e contrato onde me pagas pela ajuda que eu fornecer e poderás então definir os termos de confidencialidade do meu serviço. De forma contrária toda e qualquer ajuda que eu der tem de ser visível a todos os participantes do fórum (será boa ideia, veres o significado da palavra fórum).
Nota também que eu não me responsabilizo por parvoíces escritas neste espaço pelo que se vais seguir algo dito por mim, entende que o farás por tua conta e risco.

Dito isto, mensagens pessoais só se forem pessoais, ou seja, se já interagimos de alguma forma no passado ou se me pretendes convidar para uma churrascada com cerveja (paga por ti, obviamente).

Global Moderator
Netherlands
Offline Offline
Shannon Member
*****
Karma: 224
Posts: 13915
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

no need for a 360 degree table ==> you need a table sinus[91]   0..90 all others can be mirrored.

(code not tested)
Code:
float sinus[91];   //0..90 -- can be an int*1000 as bubulindo stated too to make it faster or even  0..100 then it will fit in one byte, depends on the precission needed.

float _sin(int x)  // float x allows interpolation; left as an exercise
{
  // handle negative values for x
  if (x < 0) return -sin(-x);   

  // handle values above 360
  if (x >=360) x %= 360;  // if prevents the expensive modulo if not needed

  switch(x/90)  //which quadrant?
  {
    case 0: return sinus[x]; break;
    case 1: return sinus[180-x]; break;
    case 2: return -1 * sinus[x-180]; break;
    case 3: return -1 * sinus[360-x]; break;
  }
}
   
float _cos(int x)
{
  return sin(x + 90);
}

float _tan(int x)
{
  return sin(x)/cos(x);  // may return NaN  not a number
}
Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

'round the world...
Offline Offline
Faraday Member
**
Karma: 42
Posts: 3291
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

I know... I was just showing how to set up one of such tables. You can also translate sin() into cos() with some arithmetic. But that wasn't the purpose.
Logged

Eu não sou o teu criado. Se respondo no fórum é para ajudar todos mediante a minha disponibilidade e disposição. Responder por mensagem pessoal iria contra o propósito do fórum e por isso evito-o.
Se realmente pretendes que eu te ajude por mensagem pessoal, então podemos chegar a um acordo e contrato onde me pagas pela ajuda que eu fornecer e poderás então definir os termos de confidencialidade do meu serviço. De forma contrária toda e qualquer ajuda que eu der tem de ser visível a todos os participantes do fórum (será boa ideia, veres o significado da palavra fórum).
Nota também que eu não me responsabilizo por parvoíces escritas neste espaço pelo que se vais seguir algo dito por mim, entende que o farás por tua conta e risco.

Dito isto, mensagens pessoais só se forem pessoais, ou seja, se já interagimos de alguma forma no passado ou se me pretendes convidar para uma churrascada com cerveja (paga por ti, obviamente).

Global Moderator
Netherlands
Offline Offline
Shannon Member
*****
Karma: 224
Posts: 13915
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

sometimes I just lose myself in coding smiley-wink
Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Montreal
Offline Offline
Faraday Member
**
Karma: 30
Posts: 2602
Per aspera ad astra.
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

I have a question for Experts in the area  smiley
Tweaking with integer FFT code (Board UNO, ATMega328, 16 MHz), I couldn't get results any better than 23 ms ( 128 points calculus ).  I'm using sine LUT, with pgm_read_word to get a value, and I know that reading happened 2 times in iner loop, which executed 127 times.
There are 254 readings overall.

When I shoot down reading LUT at all, and just multiply with dummy constant instead of sine:
Code:
     
wr =  5; //pgm_read_word(&Sinewave[j+N_WAVE/4]);
wi = 5; //-pgm_read_word(&Sinewave[j]);
result show 9 ms. The question is :
 - isn't it too much  16 ms / 254 = 63 usec per one pgm_read_word?


Logged

Global Moderator
Dallas
Offline Offline
Shannon Member
*****
Karma: 210
Posts: 13039
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset


Your comments remove more code than just the pgm_read_word.  Each piece can potentially generate code depending on the outer context.  In other words, each of these is a potential code generator...

&Sinewave
N_WAVE/4
j+
[j] /* can also include a multiplication */

wr =  5; //pgm_read_word(&Sinewave[j+N_WAVE/4]);
wi = 5; //-pgm_read_word(&Sinewave[j]);

In addition, if the outer code is "complex", various register values may have to be shuffled around (push / pop / move).

Quote
result show 9 ms. The question is : - isn't it too much  16 ms / 254 = 63 usec per one pgm_read_word?

= 1008 clock cycles.  Does seem a bit high.

In any case, that may be a good spot for some hand optimization.
Logged

Global Moderator
Netherlands
Offline Offline
Shannon Member
*****
Karma: 224
Posts: 13915
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Is it possible to put the lookup table in RAM iso progmem and measure the timing.

Is your Sinewave table of the type float or int ?   // keep the math completely int or long if possible iso floats

How much entries are in the table?

j+N_WAVE/4
you could initialize a index k = j + NWAVE/4; and let it go with the flow:  e.g.  for (j=0; k = N_WAVE/k; j < 128; j++, k++).
Will not change very much but many drops make an ocean smiley-wink

Please post some code / at least the loop
Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

SF Bay Area (USA)
Offline Offline
Tesla Member
***
Karma: 137
Posts: 6792
Strongly opinionated, but not official!
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

pgmspace reads are actually quite fast (the instructions are read from pgmspace, so it has to be fast!)

Try this example (carefully checked to make sure the compiler doesn't optimize away anything that shouldn't be optimized away, and organized to make the disassembly somewhat reasonable...)
Code:
#include <avr/pgmspace.h>

void setup()
{
  Serial.begin(19200);
}
volatile long a, b;

unsigned long *p = (unsigned long *)1000;

void loop()
{
  unsigned long i, ramstart, ramend, pgmstart, pgmend; 
  b = 12354;

  ramstart = millis();
  for (i=0; i < 100000; i++) {
    a = b;
  }
  ramend = millis();

    pgmstart = millis();
  for (i=0; i < 100000; i++) {
    a = *p++;
  }
  pgmend = millis();

  Serial.print("RAM Variable time: ");
  Serial.println(ramend - ramstart);
  Serial.print("pgmmem Variable time: ");
  Serial.println(pgmend - pgmstart);

  while (Serial.read() == -1)
    ;
  Serial.println();
  Serial.println();
}
Logged

Montreal
Offline Offline
Faraday Member
**
Karma: 30
Posts: 2602
Per aspera ad astra.
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Results for ram/pgm test
Quote
RAM Variable time: 182
pgmmem Variable time: 182
RAM Variable time: 182
pgmmem Variable time: 182
Thanks for your help!


« Last Edit: August 19, 2011, 06:21:13 pm by Magician » Logged

SF Bay Area (USA)
Offline Offline
Tesla Member
***
Karma: 137
Posts: 6792
Strongly opinionated, but not official!
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

I'm not familiar with FFT coding at all, but I rewrote your code slightly and gave it a try here, on an otherwise not-connected-to-anything Ardunio.

New code in the FFT loop:
Code:
  while (l < fft_n) {
    istep = l << 1;
    for (m=0; m<l; ++m) {
      j = m << k;
      /* 0 <= j < N_WAVE/2 */
      smart2++;

      //test2 - NO_PGM
#define TEST2
#ifdef TEST2
      wr =  smart2>>1; //pgm_read_word(&Sinewave[j+N_WAVE/4]);
      wi =  smart2>>1; //-pgm_read_word(&Sinewave[j]);
#else
      wr =  pgm_read_word(&Sinewave[j+N_WAVE/4]);
      wi = -pgm_read_word(&Sinewave[j]);
#endif
      //  wr >>= 1;
      //  wi >>= 1;

      for (i=m; i<fft_n; i+=istep) { //istep = 2, 4, 8, 16 ......
        j = i + l;

        //test 3 - No multiplication.
#ifdef TEST3
        tr = smart2; //FIX_MPY(wr,fr[j]) - FIX_MPY(wi,fi[j]);
        ti = smart2; //FIX_MPY(wr,fi[j]) + FIX_MPY(wi,fr[j]);'
#else
        tr = FIX_MPY(wr,fr[j]) - FIX_MPY(wi,fi[j]);
        ti = FIX_MPY(wr,fi[j]) + FIX_MPY(wi,fr[j]);
#endif
        qr = fr[i];
        qi = fi[i];

Is that what you meant to have happen?  I'm not getting quite the same results that you reported (perhaps because I don't have a signal source connected?)

no tests:    Time FFT calc.: 23696    Time SQRT calc.: 8040
test2:    Time FFT calc.: 13044    Time SQRT calc.: 7988

I suspect that rather than pgm memory being slow, what's happening is that the compiler is being especially clever in the test case, and managing to completely remove one of your inner loops (or similar.)  I'm willing to disassemble and check, if you confirm that I'm on the right track so far.
Logged

Global Moderator
Netherlands
Offline Offline
Shannon Member
*****
Karma: 224
Posts: 13915
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Quote
There are just 192 entries in table for now, would be more, so I couldn't move table in RAM - memory space is in high demand
Folding as I did above - makes it shorter, and you can leave out part of the code as your input will only be 0..360 if i'm right

The code that's left is shorter than writing out the complete table. As you use bytes the table costs 91 bytes of RAM.

Code:
int _sin(int x) 
{
  switch(x/90)  //which quadrant?
  {
    case 0: return sinus[x]; break;
    case 1: return sinus[180-x]; break;
    case 2: return -1 * sinus[x-180]; break;
    case 3: return -1 * sinus[360-x]; break;
  }
}
Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Montreal
Offline Offline
Faraday Member
**
Karma: 30
Posts: 2602
Per aspera ad astra.
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

To westfw:  yes, you are on a right track. I forget to mention test #4: when I cut off pgm_, and
multiplication I did one more move: change wi and wr to char (it was int when I get 9 ms),
but I didn't change pgm_read_word to pgm_read_byte, as it still mark out in my test.
Probably, this is why it 13 ms in your results.
To robtillaart:  table include 0 .. 270 in 192 points, algorithm works only if quantity of elements in the sine table > than fx[]. As I'm gonna to extend fx[] to 1024, (removing x[] - debug intermediate) I'd met really tough memory conditions.
I 've been looking on i-net, and find a book "math toolkit for real time programming", they folded
sine wave even more that 4x, up to 16x (0.. 22.5 degree) smiley
Logged

SF Bay Area (USA)
Offline Offline
Tesla Member
***
Karma: 137
Posts: 6792
Strongly opinionated, but not official!
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

first thing noticed in analysis:
Without the pgm_read_word code, the resulting fix_fft() function ends up only containing two calls to the __mulsi3 function (internal multiply helper.)
Logged

Global Moderator
Netherlands
Offline Offline
Shannon Member
*****
Karma: 224
Posts: 13915
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Quote
I 've been looking on i-net, and find a book "math toolkit for real time programming", they folded
sine wave even more that 4x, up to 16x (0.. 22.5 degree)

yes I knew that can be done but the profit of the folding should make up the extra code for folding, that's why I never exceeded 4 times folding. 8x folding or more would need the same amount of bytes if you want to have an accuracy of 1 degree: Furthermore sin ( 2A ) = 2 sin A cos A => Two lookups. For 16x folding you need even more AFAIK.

But if the accuracy needed is less, very true
Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Montreal
Offline Offline
Faraday Member
**
Karma: 30
Posts: 2602
Per aspera ad astra.
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

The more weird results on testing PROGMEM:
1) If I declare array with : const prog_int8_t Sinewave[N_WAVE-N_WAVE/4] PROGMEM = {
    and read a value by : wr =  pgm_read_word(&Sinewave[j+N_WAVE/4]);
    I'm good, even there is obvious mismatch int8_t ( byte ) and pgm_read_word ( integer ).
    If I try to read with pgm_read_byte - everything goes wrong.

2) Declaring array with : const prog_int16_t Sinewave[N_WAVE-N_WAVE/4] PROGMEM = {
    and read a value by : wr =  pgm_read_word -  results goes crazy again.
 
Why?  I could not find an implementation of pgm_read_word, is there any conversion between
 integer and string / Cstring / LPSTR  goes on? It would explain a delay in the access time.

 
 
Logged

Pages: 1 [2]   Go Up
Jump to: