16-bit FFT on Arduino

Hello.
I'm trying to modify an existing 8-bit FFT library to work on my Arduino Duemilanove. The 8-bit FFT library I'm currently using is this:
http://www.arduino.cc/cgi-bin/yabb2/YaBB.pl?num=1286718155

In 8-bits it works perfectly for what I need it. 256Hz sampling rate, 256 samples (resulting in 1Hz resolution), once an FFT is performed it's sent to the serial output.

The algorithm there is very common in it's 16-bit version. The only differences I can spot are the data types and a shift in the fixed-point multiplication.
When I try the project with the 16-bit version the output of the FFT is just some noise, all values in the data array are 1 or 0, regardless of the input. It acts just like it has no input.

The only debugging I did was to check if the look-up table is read correctly, and it is. Any hints on what is the problem when trying to implement the 16-bit FFT ?

8-bit obviously doesn't allow values higher than 256 and I'm trying to make full use of the ADC range, so 16-bit value is what I need.

I had to break up my post in two parts since it exceeds the character limit. Sorry !
This is the 16-bit version of the code:

/* fix_fft.c - Fixed-point in-place Fast Fourier Transform  */
 

#define N_WAVE      1024    /* full length of Sinewave[] */
#define LOG2_N_WAVE 10      /* log2(N_WAVE) */

short Sinewave[N_WAVE-N_WAVE/4] = {
      0,    201,    402,    603,    804,   1005,   1206,   1406,
   1607,   1808,   2009,   2209,   2410,   2610,   2811,   3011,
   3211,   3411,   3611,   3811,   4011,   4210,   4409,   4608,
   4807,   5006,   5205,   5403,   5601,   5799,   5997,   6195,
   6392,   6589,   6786,   6982,   7179,   7375,   7571,   7766,
   7961,   8156,   8351,   8545,   8739,   8932,   9126,   9319,
   9511,   9703,   9895,  10087,  10278,  10469,  10659,  10849,
  11038,  11227,  11416,  11604,  11792,  11980,  12166,  12353,
  12539,  12724,  12909,  13094,  13278,  13462,  13645,  13827,
  14009,  14191,  14372,  14552,  14732,  14911,  15090,  15268,
  15446,  15623,  15799,  15975,  16150,  16325,  16499,  16672,
  16845,  17017,  17189,  17360,  17530,  17699,  17868,  18036,
  18204,  18371,  18537,  18702,  18867,  19031,  19194,  19357,
  19519,  19680,  19840,  20000,  20159,  20317,  20474,  20631,
  20787,  20942,  21096,  21249,  21402,  21554,  21705,  21855,
  22004,  22153,  22301,  22448,  22594,  22739,  22883,  23027,
  23169,  23311,  23452,  23592,  23731,  23869,  24006,  24143,
  24278,  24413,  24546,  24679,  24811,  24942,  25072,  25201,
  25329,  25456,  25582,  25707,  25831,  25954,  26077,  26198,
  26318,  26437,  26556,  26673,  26789,  26905,  27019,  27132,
  27244,  27355,  27466,  27575,  27683,  27790,  27896,  28001,
  28105,  28208,  28309,  28410,  28510,  28608,  28706,  28802,
  28897,  28992,  29085,  29177,  29268,  29358,  29446,  29534,
  29621,  29706,  29790,  29873,  29955,  30036,  30116,  30195,
  30272,  30349,  30424,  30498,  30571,  30643,  30713,  30783,
  30851,  30918,  30984,  31049,  31113,  31175,  31236,  31297,
  31356,  31413,  31470,  31525,  31580,  31633,  31684,  31735,
  31785,  31833,  31880,  31926,  31970,  32014,  32056,  32097,
  32137,  32176,  32213,  32249,  32284,  32318,  32350,  32382,
  32412,  32441,  32468,  32495,  32520,  32544,  32567,  32588,
  32609,  32628,  32646,  32662,  32678,  32692,  32705,  32717,
  32727,  32736,  32744,  32751,  32757,  32761,  32764,  32766,
  32767,  32766,  32764,  32761,  32757,  32751,  32744,  32736,
  32727,  32717,  32705,  32692,  32678,  32662,  32646,  32628,
  32609,  32588,  32567,  32544,  32520,  32495,  32468,  32441,
  32412,  32382,  32350,  32318,  32284,  32249,  32213,  32176,
  32137,  32097,  32056,  32014,  31970,  31926,  31880,  31833,
  31785,  31735,  31684,  31633,  31580,  31525,  31470,  31413,
  31356,  31297,  31236,  31175,  31113,  31049,  30984,  30918,
  30851,  30783,  30713,  30643,  30571,  30498,  30424,  30349,
  30272,  30195,  30116,  30036,  29955,  29873,  29790,  29706,
  29621,  29534,  29446,  29358,  29268,  29177,  29085,  28992,
  28897,  28802,  28706,  28608,  28510,  28410,  28309,  28208,
  28105,  28001,  27896,  27790,  27683,  27575,  27466,  27355,
  27244,  27132,  27019,  26905,  26789,  26673,  26556,  26437,
  26318,  26198,  26077,  25954,  25831,  25707,  25582,  25456,
  25329,  25201,  25072,  24942,  24811,  24679,  24546,  24413,
  24278,  24143,  24006,  23869,  23731,  23592,  23452,  23311,
  23169,  23027,  22883,  22739,  22594,  22448,  22301,  22153,
  22004,  21855,  21705,  21554,  21402,  21249,  21096,  20942,
  20787,  20631,  20474,  20317,  20159,  20000,  19840,  19680,
  19519,  19357,  19194,  19031,  18867,  18702,  18537,  18371,
  18204,  18036,  17868,  17699,  17530,  17360,  17189,  17017,
  16845,  16672,  16499,  16325,  16150,  15975,  15799,  15623,
  15446,  15268,  15090,  14911,  14732,  14552,  14372,  14191,
  14009,  13827,  13645,  13462,  13278,  13094,  12909,  12724,
  12539,  12353,  12166,  11980,  11792,  11604,  11416,  11227,
  11038,  10849,  10659,  10469,  10278,  10087,   9895,   9703,
   9511,   9319,   9126,   8932,   8739,   8545,   8351,   8156,
   7961,   7766,   7571,   7375,   7179,   6982,   6786,   6589,
   6392,   6195,   5997,   5799,   5601,   5403,   5205,   5006,
   4807,   4608,   4409,   4210,   4011,   3811,   3611,   3411,
   3211,   3011,   2811,   2610,   2410,   2209,   2009,   1808,
   1607,   1406,   1206,   1005,    804,    603,    402,    201,
      0,   -201,   -402,   -603,   -804,  -1005,  -1206,  -1406,
  -1607,  -1808,  -2009,  -2209,  -2410,  -2610,  -2811,  -3011,
  -3211,  -3411,  -3611,  -3811,  -4011,  -4210,  -4409,  -4608,
  -4807,  -5006,  -5205,  -5403,  -5601,  -5799,  -5997,  -6195,
  -6392,  -6589,  -6786,  -6982,  -7179,  -7375,  -7571,  -7766,
  -7961,  -8156,  -8351,  -8545,  -8739,  -8932,  -9126,  -9319,
  -9511,  -9703,  -9895, -10087, -10278, -10469, -10659, -10849,
 -11038, -11227, -11416, -11604, -11792, -11980, -12166, -12353,
 -12539, -12724, -12909, -13094, -13278, -13462, -13645, -13827,
 -14009, -14191, -14372, -14552, -14732, -14911, -15090, -15268,
 -15446, -15623, -15799, -15975, -16150, -16325, -16499, -16672,
 -16845, -17017, -17189, -17360, -17530, -17699, -17868, -18036,
 -18204, -18371, -18537, -18702, -18867, -19031, -19194, -19357,
 -19519, -19680, -19840, -20000, -20159, -20317, -20474, -20631,
 -20787, -20942, -21096, -21249, -21402, -21554, -21705, -21855,
 -22004, -22153, -22301, -22448, -22594, -22739, -22883, -23027,
 -23169, -23311, -23452, -23592, -23731, -23869, -24006, -24143,
 -24278, -24413, -24546, -24679, -24811, -24942, -25072, -25201,
 -25329, -25456, -25582, -25707, -25831, -25954, -26077, -26198,
 -26318, -26437, -26556, -26673, -26789, -26905, -27019, -27132,
 -27244, -27355, -27466, -27575, -27683, -27790, -27896, -28001,
 -28105, -28208, -28309, -28410, -28510, -28608, -28706, -28802,
 -28897, -28992, -29085, -29177, -29268, -29358, -29446, -29534,
 -29621, -29706, -29790, -29873, -29955, -30036, -30116, -30195,
 -30272, -30349, -30424, -30498, -30571, -30643, -30713, -30783,
 -30851, -30918, -30984, -31049, -31113, -31175, -31236, -31297,
 -31356, -31413, -31470, -31525, -31580, -31633, -31684, -31735,
 -31785, -31833, -31880, -31926, -31970, -32014, -32056, -32097,
 -32137, -32176, -32213, -32249, -32284, -32318, -32350, -32382,
 -32412, -32441, -32468, -32495, -32520, -32544, -32567, -32588,
 -32609, -32628, -32646, -32662, -32678, -32692, -32705, -32717,
 -32727, -32736, -32744, -32751, -32757, -32761, -32764, -32766,
};


inline short FIX_MPY(short a, short b)
{
	/* shift right one less bit (i.e. 15-1) */
	int c = ((int)a * (int)b) >> 14;
	/* last bit shifted out = rounding-bit */
	b = c & 0x01;
	/* last shift + rounding bit */
	a = (c >> 1) + b;
	return a;
}


int fix_fft(short fr[], short fi[], short m, short inverse)
{
	int mr, nn, i, j, l, k, istep, n, scale, shift;
	short qr, qi, tr, ti, wr, wi;

	n = 1 << m;

	/* max FFT size = N_WAVE */
	if (n > N_WAVE)
		return -1;

	mr = 0;
	nn = n - 1;
	scale = 0;

	/* decimation in time - re-order data */
	for (m=1; m<=nn; ++m) {
		l = n;
		do {
			l >>= 1;
		} while (mr+l > nn);
		mr = (mr & (l-1)) + l;

		if (mr <= m)
			continue;
		tr = fr[m];
		fr[m] = fr[mr];
		fr[mr] = tr;
		ti = fi[m];
		fi[m] = fi[mr];
		fi[mr] = ti;
	}

	l = 1;
	k = LOG2_N_WAVE-1;
	while (l < n) {
		if (inverse) {
			/* variable scaling, depending upon data */
			shift = 0;
			for (i=0; i<n; ++i) {
				j = fr[i];
				if (j < 0)
					j = -j;
				m = fi[i];
				if (m < 0)
					m = -m;
				if (j > 16383 || m > 16383) {
					shift = 1;
					break;
				}
			}
			if (shift)
				++scale;
		} else {
			/*
			  fixed scaling, for proper normalization --
			  there will be log2(n) passes, so this results
			  in an overall factor of 1/n, distributed to
			  maximize arithmetic accuracy.
			*/
			shift = 1;
		}
		/*
		  it may not be obvious, but the shift will be
		  performed on each data point exactly once,
		  during this pass.
		*/
		istep = l << 1;
		for (m=0; m<l; ++m) {
			j = m << k;
			/* 0 <= j < N_WAVE/2 */
			wr =  Sinewave[j+N_WAVE/4];
			wi = -Sinewave[j];
			if (inverse)
				wi = -wi;
			if (shift) {
				wr >>= 1;
				wi >>= 1;
			}
			for (i=m; i<n; i+=istep) {
				j = i + l;
				tr = FIX_MPY(wr,fr[j]) - FIX_MPY(wi,fi[j]);
				ti = FIX_MPY(wr,fi[j]) + FIX_MPY(wi,fr[j]);
				qr = fr[i];
				qi = fi[i];
				if (shift) {
					qr >>= 1;
					qi >>= 1;
				}
				fr[j] = qr - tr;
				fi[j] = qi - ti;
				fr[i] = qr + tr;
				fi[i] = qi + ti;
			}
		}
		--k;
		l = istep;
	}
	return scale;
}

if I count it correctly you declare an array of 96x8=768 16 bits entities.

You only have 512 bytes = 256 shorts at your disposal in a duemilanuove, I thought.

maybe you can put the array into program memory, which is far bigger. you need a special trick for that.
but I cannot recall what it was exactly. do a search.

Regards,
Ronald

The trick is, ATMega doesn't execute MULS_16x16_32, it does MULS_16x16_16, and of course, it always made an error "overflow". It took me a while to figure this out, I think there is a compiler flow.
The way around, is to cast both multiplier and multiplicand to long, to force MULS_32x32_32 operation.
The draw - back is 10 MULS_8x8_16, instead of 4 required for this in normal style.
Check on a link, I did a project with the same 16-bit code:
http://fftarduino.blogspot.com/2011/02/color-organ-spectrum-analyzer-on.html

And as raalst pointed out, you have to use a PROGMEM for LUT sin table.

But the ATmega328 has 2K of RAM = 2048 bytes... at least my 2009 board does. :grin:

Wk

Progmem store a data (sin table) to code memory space (flash), 32k
LUT is very convenient way to improve math performance on 8-bit controllers, I used progmem also for Hamming and Bit-Reverse tables in other my projects ( N=512 FFT ).
For OP, actually, he doesn't need high speed math, for 1 sec using regular sin function code is just fine:

int sin_value  = 16383 *sin(( angle * frequecy *   2 * 3.1415926535  / FFT_SIZE)
int cos_value  = 16383 *cos(( angle * frequecy *   2 * 3.1415926535  / FFT_SIZE)

to be optimized to

float alpha = angle * frequecy *   2 * 3.1415926535  / FFT_SIZE;
int sin_value  = 16383 *sin(alpha);
int cos_value  = 16383 *cos(alpha);

I've written the LUT in the program memory, the 16-bit code posted is the original. Also loaded the values from the LUT correctly.

Seems like the elm-chan has some LUT's for 256 points so I will use them to save space (no need for larger number of FFT points for now).

Magician, THANK YOU !
Your project was in my bookmarks but I never thought about looking at the code.

Will try it in the next few days and report back. It's weird that it doesn't do MULS_16x16_32 since it does 8x8_16...
Anyway, I don't need dramatic speed so if it works, it's good. Precision is what I need :slight_smile:

Thank you again !

elm-chan's sine table is combination of sine / cos, so you will to have to edit it.
Other option would be generate table with excell / libreOffice or load your arduino with simple sketch and it'd print out on serial monitor for you.

    for(int i=0; i<192; i++) // <<<--- size
    {
      printf("%+5d",(int)((16383 * sin(2*M_PI * i / 256))  ));
       if (i<192-1)     printf(",");
       if ((i+1)%8 == 0) printf("\n");
    }

It's easier for me to write a C program to output the values in a file, thanks for the idea :slight_smile:

I can't remember now what the LUT values are in elm-chan's file, it was late and I just glimpsed it. If they are mixed, cos just trails sin by 90* so it's not a big problem to extract the values.

For what it is worth, I wrote an integer isin() version some time ago which uses not so much memory (362 bytes of RAM) as the sine() Lookup table can be folded - Arduino Forum -

A few versions with quite an high precision in the end...

usable for an FFT ?

Magician, if I could, I'd give you a few rounds of beer. THANKS !

After changing everything to force 32x32=32 multiplication and using the LUT built from your code snippet it works :slight_smile:

Horray !

Thanks, glad to hear it helps.
What application you are working on?

Static sound detection, as in broken fans, broken bearings etc
Frequency precision is key.

I'm trying to understand more about FFT in general. Do you guys recommend any material to help us better grasp what it does?

IMHO, the most popular:

Why do you need such analysis is another question.

Oilburner could you post your working code?
Thanks
Patgadget
Montreal

Patgadget:
Oilburner could you post your working code?
Thanks
Patgadget
Montreal

This is the fix_fft.cpp file you can use in the library

#include <avr/pgmspace.h>
#include "fix_fft.h"
#include <WProgram.h>


 #define N_WAVE          256    /* full length of Sinewave[] */
 #define LOG2_N_WAVE     8      /* log2(N_WAVE) */
/*
  Since we only use 3/4 of N_WAVE, we define only
  this many samples, in order to conserve data space.
*/

const prog_int16_t Sinewave[N_WAVE-N_WAVE/4] PROGMEM = {
    0,  402,  803, 1205, 1605, 2005, 2403, 2800, 3196, 3589, 3980, 
	4369, 4755, 5139, 5519, 5896, 6269, 6639, 7004, 7365, 7722, 8075, 
	8422, 8764, 9101, 9433, 9759,10079,10393,10700,11002,11296,11584,
	11865,12139,12405,12664,12915,13158,13394,13621,13841,14052,14254,
	14448,14633,14810,14977,15135,15285,15425,15556,15677,15789,15892,
	15984,16068,16141,16205,16259,16304,16338,16363,16378,16383,16378,
	16363,16338,16304,16259,16205,16141,16068,15984,15892,15789,15677,
	15556,15425,15285,15135,14977,14810,14633,14448,14254,14052,13841,
	13621,13394,13158,12915,12664,12405,12139,11865,11584,11296,11002,
	10700,10393,10079, 9759, 9433, 9101, 8764, 8422, 8075, 7722, 7365, 
	7004, 6639, 6269, 5896, 5519, 5139, 4755, 4369, 3980, 3589, 3196, 
	2800, 2403, 2005, 1605, 1205,  803,  402,    0, -402, -803,-1205,
	-1605,-2005,-2403,-2800,-3196,-3589,-3980,-4369,-4755,-5139,-5519,
	-5896,-6269,-6639,-7004,-7365,-7722,-8075,-8422,-8764,-9101,-9433,
	-9759,-10079,-10393,-10700,-11002,-11296,-11584,-11865,-12139,-12405,
	-12664,-12915,-13158,-13394,-13621,-13841,-14052,-14254,-14448,-14633,
	-14810,-14977,-15135,-15285,-15425,-15556,-15677,-15789,-15892,-15984,
	-16068,-16141,-16205,-16259,-16304,-16338,-16363,-16378
};

/*
  fix_fft() - perform forward/inverse fast Fourier transform.
  fr[n],fi[n] are real and imaginary arrays, both INPUT AND
  RESULT (in-place FFT), with 0 <= n < 2**m; set inverse to
  0 for forward transform (FFT), or 1 for iFFT.
*/

inline int FIX_MPY(int a, int b)
{

    /* shift right one less bit (i.e. 15-1) */
    long c = ((long)a * (long)b) >> 14;
    /* last bit shifted out = rounding-bit */
    b = c & 0x01;
    /* last shift + rounding bit */
    a = (c >> 1) + b;

    return a;
}


int fix_fft(int fr[], int fi[], int m)
{
    int mr, nn, i, j, l, k, istep, n, scale, shift;
    int qr, qi, tr, ti, wr, wi;

    n = 1 << m;

    /* max FFT size = N_WAVE */
    if (n > N_WAVE)
	  return -1;

    mr = 0;
    nn = n - 1;
    scale = 0;

    /* decimation in time - re-order data */
    for (m=1; m<=nn; ++m) {
	  l = n;
	  do 
	  {
		l >>= 1;
	  } while (mr+l > nn);
	  mr = (mr & (l-1)) + l;
	  if (mr <= m)
		continue;
		
	  tr = fr[m];
	  fr[m] = fr[mr];
	  fr[mr] = tr;
	  ti = fi[m];
	  fi[m] = fi[mr];
	  fi[mr] = ti;
    }
	
    l = 1;
    k = LOG2_N_WAVE-1;
    while (l < n) 
	{
	  /*
	    it may not be obvious, but the shift will be
	    performed on each data point exactly once,
	    during this pass.
	  */
	  istep = l << 1;
	  for (m=0; m<l; ++m) 
	  {
		j = m << k;
		/* 0 <= j < N_WAVE/2 */
		wr =  pgm_read_word_near(Sinewave + j+N_WAVE/4);
		wi = -pgm_read_word_near(Sinewave + j);
		wr >>= 1;
		wi >>= 1;
		for (i=m; i<n; i+=istep) 
		{
		    j = i + l;
			//tr = ((long)wr*(long)fr[j] - (long)wi*(long)fi[j])>>15;
			//ti = ((long)wr*(long)fi[j] + (long)wi*(long)fr[j])>>15;
			tr = FIX_MPY(wr,fr[j]) - FIX_MPY(wi,fi[j]);
		    ti = FIX_MPY(wr,fi[j]) + FIX_MPY(wi,fr[j]);
		    qr = fr[i];
		    qi = fi[i];
			qr >>= 1;
			qi >>= 1;
		    fr[j] = qr - tr;
		    fi[j] = qi - ti;
		    fr[i] = qr + tr;
		    fi[i] = qi + ti;
		}
	  }
	  --k;
	  l = istep;
    }
    return scale;
}

AWOL:
Cooley–Tukey FFT algorithm - Wikipedia

That's the algorithm behind the code I'm using :slight_smile: