Arduino FFT code not expected results...

Hello everyone,
I'm trying to make a sketch for Arduino to do the FFT of an 8 sample signal. I looked up on the internet on how to do FFT and came across dozens of webpages explaining what a "butterfly" is. I tried to code up a sketch to do FFT, but I'm not quite sure if it is working or not. When I put certain sinewaves directly in the input[] value, FFT returns expected results, sometimes...
It took me a while just to grasp the basic concept of FFT over DFT, so I don't even know if my math here is right! But here is my code:

#define FFTSIZE 8

float input[FFTSIZE];
float output[FFTSIZE];

const float sina[FFTSIZE] = {0, 0.71, 1, 0.71, 0, -0.71, -1, -0.71};
const float cosa[FFTSIZE] = {1, 0.71, 0, -0.71, -1, -0.71, 0, 0.71};

void setup() {
  Serial.begin(9600);
  for(int counter = 0; counter < FFTSIZE; counter++) {
    input[counter] = sin(2 * PI * 1 * counter / FFTSIZE);
    Serial.println(input[counter]);
  }
  Serial.println("FFT Values Return:");
}

void loop() {
  FFT();
  Serial.println();
  for(int counter = 0; counter < FFTSIZE; counter++) {
    Serial.println(output[counter]);
  }
  while(1);
}

void FFT() {
  float X[FFTSIZE];
  float Y[FFTSIZE];
  float Z[FFTSIZE];
  // buterfly 2
  X[0] = input[0] + input[4];
  X[1] = input[0] - input[4];
  X[2] = input[1] + input[5];
  X[3] = input[1] - input[5];
  X[4] = input[2] + input[6];
  X[5] = input[2] - input[6];
  X[6] = input[3] + input[7];
  X[7] = input[3] - input[7];
  // butterfly 4
  Y[0] = X[0] + X[2];
  Y[1] = X[1] + X[3];
  Y[2] = X[0] - X[2];
  Y[3] = X[1] - X[3];
  Y[4] = X[4] + X[6];
  Y[5] = X[5] + X[7];
  Y[6] = X[4] - X[6];
  Y[7] = X[5] - X[7];
  // final butterfly 8
  Z[0] = Y[0] + Y[4];
  Z[1] = Y[1] + (Y[5] * (sina[1] + cosa[1]));
  Z[2] = Y[2] + Y[6];
  Z[3] = Y[3];
  Z[4] = Y[0] - Y[4];
  Z[5] = Y[1] - (Y[5] * (sina[1] + cosa[1]));
  Z[6] = Y[2] - Y[6];
  Z[7] = Y[3];
  
  for(int counter = 0; counter < FFTSIZE; counter++) {
    if(Z[counter] < 0) Z[counter] = 0;
    output[counter] = Z[counter];
  }
}

Any help would be great, thanks.
Judd

When I put certain sinewaves directly in the input[] value, FFT returns expected results,

So what is the problem?

sometimes..

That will be due to the input waveform not being synchronized with a whole number of the sampling rate.

There is a number of things wrong with it. Here, I will show you the results of sinewaves in the serial terminal:

FFT Values Return:
K = 1
0.00        // input
0.71
1.00
0.71
-0.00
-0.71
-1.00
-0.71

0.00         // output
6.26
0.00
0.00
0.00
0.00
0.00
0.00


K = 2
0.00
1.00
-0.00
-1.00
0.00
1.00
-0.00
-1.00

0.00
0.00
0.00
0.00
4.00
0.00
0.00
0.00


K = 3
0.00
0.71
-1.00
0.71
-0.00
-0.71
1.00
-0.71

0.00
0.58
0.00
0.00
0.00
2.25
0.00
0.00

Also, cosine waveforms produce wacky results:

FFT Values Return:
K = 1
1.00
0.71
-0.00
-0.71
-1.00
-0.71
0.00
0.71

0.00
1.41
0.00
0.59
0.00
5.42
0.00
0.59


K = 2
1.00
-0.00
-1.00
0.00
1.00
-0.00
-1.00
0.00

0.00
0.00
0.00
0.00
4.00
0.00
4.00
0.00


K = 3
1.00
-0.71
0.00
0.71
-1.00
0.71
-0.00
-0.71

0.00
2.59
0.00
3.41
0.00
0.00
0.00
3.41

That will be due to the input waveform not being synchronized with a whole number of the sampling rate.

Could you explain the synchronized part a little more? I don't think I have a sampling rate in my sketch, I haven't hooked up any analog sensors yet.
Thanks,
Judd

I don't think your math is correct. For example, this line does not look right:

Z[1] = Y[1] + (Y[5] * (sina[1] + cosa[1]));

Compare your approach with that here: ArduinoFFT - Open Music Labs Wiki

What this line Z[1] = Y[1] + (Y[5] * (sina[1] + cosa[1])); is supposed to do is use sina[1] and cosa[1] to form e^2*PI*k*n/Nwhich is the complex exponential required for FFT. David Dorran said that the e^... value is the sum of sin (Wt)+ cos(Wt):

I got my equations from this video:

Skip to 7:57 to see 8 sample butterfly.

The Arduino FFT library is written in assembler... I have no background in that, but I will keep trying to make sense of the commands.
Thanks,
Judd

You can't add the sine and cosine terms like that. They form the two parts of a complex number and have to be treated separately as two real numbers (or you must use the complex variable type in the C/C++ programming language).

I looked around and could not find a good tutorial on the FFT for Arduino. There are many good general tutorials, but a reasonable base for comparison is the Arduino FIX_FFT (fixed point) code. It is simple and fast and written in plain C. You can find the code and a discussion of some examples of its use at this site: http://members.shaw.ca/el.supremo/Arduino/fft/ffttest.htm

This is the heart of the FFT algorithm (the wr and wi are the real cosine and imaginary sine terms, FIX_MPY is a multiply operation). Just look at the symmetry of the operation.

		for (i=m; i<n; i+=istep) {
		    j = i + l;
		    tr = FIX_MPY(wr,fr[j]) - FIX_MPY(wi,fi[j]);
		    ti = FIX_MPY(wr,fi[j]) + FIX_MPY(wi,fr[j]);
		    qr = fr[i];
		    qi = fi[i];
		    if (shift) {
			  qr >>= 1;
			  qi >>= 1;
		    }
		    fr[j] = qr - tr;
		    fi[j] = qi - ti;
		    fr[i] = qr + tr;
		    fi[i] = qi + ti;
		}

Is what I put in comments basically correct?

for(i = m; i < n; i += istep) {
  j = i + l;
  tr = (wr * fr[j]) - (wi * fi[j]);  // temp[0] = (cosine * inputReal[odd]) - (sine * inputImag[odd])
  ti = (wr * fi[j]) + (wi * fr[j]);  // temp[1] = (cosine * inputImag[odd]) + (sine * inputReal[odd])
  qr = fr[i];                        // temp[2] = inputReal[even]
  qi = fi[i];                        // temp[3] = inputImag[even]
  if(shift) {
    qr >>= 1;
    qi >>= 1;
  }
  fr[j] = qr - tr;                   // outputReal[odd] = temp[2] - temp[0]
  fi[j] = qi - ti;                   // outputImag[odd] = temp[3] - temp[1]
  fr[i] = qr + tr;                   // outputReal[even] = temp[2] + temp[0]
  fi[i] = qi + ti;                   // outputImag[even] = temp[3] + temp[1]
}

If so, can I translate this:

Z[1] = Y[1] + (Y[5] * (sina[1] + cosa[1]));

into this:

Z[1] = (Y[1] * cosa[1]) + (Y[5] * sina[1]);

Thanks
Judd

That looks better.
tr, ti, qr and qi are the real and imaginary parts of two temporary variables in the original code.

OK great! Now I should do this for all other lines of the fft code, since they all use sina[] and cosa[] in some way, I just replaced the values at the beginning to make the code simpler. I would replace those lines with the new lines right?

Thanks for all your help!
Judd

Yes, it appears that you optimized away some multiply operations by sines and cosines that happened to be +1 or -1. The code is not easy to follow when you do that.

Sorry about that...
I have one more question however. Is there any way to calculate the phase shift of the signal? Like in DFT, when you have the seperate sides of the equation dealing with sin and cos. Is that possible here in FFT?
Thanks,
Judd

Yes. The amplitude of a particular frequency component is sqrt(rere + imim), where re and im are the real and imaginary components of the complex data array element at the frequency index of interest The phase angle in radians is atan2(im,re). When calculating amplitudes, don't forget that the FFT operation scales the transform (by a factor of N if I recall correctly).

It still doesn't seem to be working. I replaced all the lines of code with the new good ones. Here is the new and full code:

#define FFTSIZE 8
#define _PI 6.28

float input[FFTSIZE];
float output[FFTSIZE];

const float sina[FFTSIZE] = {
  0, 0.71, 1, 0.71, 0, -0.71, -1, -0.71
};
const float cosa[FFTSIZE] = {
  1, 0.71, 0, -0.71, -1, -0.71, 0, 0.71
};

void setup() {
  Serial.begin(9600);
}

void loop() {
  for(int count = 1; count <= 4; count++) {
    for(int counter = 0; counter < FFTSIZE; counter++) {
      input[counter] = sin(_PI * count * counter / FFTSIZE);
    }
    Serial.println();
    FFT();
    Serial.println("FFT Values Return:");
    for(int counter = 0; counter < FFTSIZE; counter++) {
      Serial.println(output[counter]);
    }
  }
  while(1);
}

void FFT() {
  float C[FFTSIZE];
  float S[FFTSIZE];
  float X[FFTSIZE];
  float Y[FFTSIZE];
  float Z[FFTSIZE];
  // buterfly 2
  C[0] = (input[0] * cosa[0]);
  C[1] = (input[0] * cosa[4]);
  C[2] = (input[1] * cosa[0]);
  C[3] = (input[1] * cosa[4]);
  C[4] = (input[2] * cosa[0]);
  C[5] = (input[2] * cosa[4]);
  C[6] = (input[3] * cosa[0]);
  C[7] = (input[3] * cosa[4]);
  S[0] = (input[4] * sina[0]);
  S[1] = (input[4] * sina[4]);
  S[2] = (input[5] * sina[0]);
  S[3] = (input[5] * sina[4]);
  S[4] = (input[6] * sina[0]);
  S[5] = (input[6] * sina[4]);
  S[6] = (input[7] * sina[0]);
  S[7] = (input[7] * sina[4]);
  X[0] = C[0] + S[0];
  X[1] = C[1] - S[1];
  X[2] = C[2] + S[2];
  X[3] = C[3] - S[3];
  X[4] = C[4] + S[4];
  X[5] = C[5] - S[5];
  X[6] = C[6] + S[6];
  X[7] = C[7] - S[7];
  // butterfly 4
  Y[0] = (X[0] * cosa[0]) + (X[2] * sina[0]);
  Y[1] = (X[1] * cosa[2]) + (X[3] * sina[2]);
  Y[2] = (X[0] * cosa[0]) - (X[2] * sina[0]);
  Y[3] = (X[1] * cosa[2]) - (X[3] * sina[2]);
  Y[4] = (X[4] * cosa[0]) + (X[6] * sina[0]);
  Y[5] = (X[5] * cosa[2]) + (X[7] * sina[2]);
  Y[6] = (X[4] * cosa[0]) - (X[6] * sina[0]);
  Y[7] = (X[5] * cosa[2]) - (X[7] * sina[2]);
  // final butterfly 8
  output[0] = (Y[0] * cosa[0]) + (Y[4] * sina[0]);
  output[1] = (Y[1] * cosa[1]) + (Y[5] * sina[1]);
  output[2] = (Y[2] * cosa[2]) + (Y[6] * sina[2]);
  output[3] = (Y[3] * cosa[3]) + (Y[7] * sina[3]);
  output[4] = (Y[0] * cosa[0]) - (Y[4] * sina[0]);
  output[5] = (Y[1] * cosa[1]) - (Y[5] * sina[1]);
  output[6] = (Y[2] * cosa[2]) - (Y[6] * sina[2]);
  output[7] = (Y[3] * cosa[3]) - (Y[7] * sina[3]);
}

I get these results from the serial terminal, each time the loop repeats the input sinewave increases by 1 cycle over the 8 samples:

FFT Values Return:
0.00
-1.00
1.00
0.00
0.00
0.00
-1.00
-1.00

FFT Values Return:
0.00
-0.00
0.00
-1.42
0.00
-1.42
-0.00
-0.00

FFT Values Return:
0.00
-1.00
-1.00
-0.00
0.00
-0.00
1.00
-1.00

FFT Values Return:
0.00
-0.00
-0.00
0.00
0.00
0.00
0.00
-0.00

What's wrong with this?! Again, it seems as though it is working with some values, and not for others, specifically try #2.
Thanks,
Judd

Why don't you just get your code working with an FFT library first? Then try writing your own.

Pete

What's wrong with this?!

It is very unlikely that anyone on the forum will have the time and/or patience to go through your code line by line and figure out what the problem(s) is/are. You really need to bite the bullet and with the short example that you've got, for one simple input (I suggest a square wave of just 1's and -1s), go through the code line by line and verify that each line delivers the expected result according to theory. However, keep in mind that your example, even when working properly, will not be very accurate because the sine/cosine table is accurate to only two digits.

Alright thanks everyone, I'll try to use an already existing FFT library first, and when I understand it maybe I can try re-writing my own code.

If you don't want to use the assembly-coded library or the fixed point fft library, here is the floating point code I've been using for some time. It came out of "Numerical Recipes" and was implemented using WinAVR. This version is just a test example with a square wave input and the answer is correct to about 5 decimal places. Note the unusual Fortran-like indexing of the data array (begins with element 1), so to be safe allocate N+1 elements for data.

/*
 REAL FFT tests.
 atmega64 @ 11.059 MHz with 32k external sram
 910 milliseconds for 2048 point floating point transform (100 ms/256 points)
// relocated heap to external SRAM via linker command
// -Wl,--defsym=__heap_start=0x801100,--defsym=__heap_end=0x807fff
// also must include floating point printf and lib:
// -Wl,-u,vfprintf -lprintf_flt -lm

*/

#include <avr/io.h>
#include <avr/interrupt.h>
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <avr/pgmspace.h>
#define F_CPU 11059200UL
#include <util/delay.h>      


#define SWAP(a,b) tempr=(a);(a)=(b);(b)=tempr
#define NP 256

///////////////////////////////////////////////////////
// Global Variables

float *data;	//allocated at runtime in main
int ndata=NP;	//size of transform buffer

// initialization sequence in .init3

void init_extSRAM(void) __attribute__((naked,section(".init3"))) ; 

void init_extSRAM(void) {
 //Enable external SRAM: set MCUCR, XMCRA and XMCRB for
// External SRAM page configuration: 0000h - 7FFFh
// Lower page wait state(s): None
// Upper page wait state(s): None
	MCUCR=0x80;
	XMCRA=0x00;    
	XMCRB &=~7 ;  //Make bits 0-2 of XMCRB low to enable all of external SRAM
} 

volatile unsigned short ticks=0;	//0.01 second clock ticks updated in ISR

//  delay for time_ms milliseconds by looping
// _delay_ms() has max delay about 13 ms @ 20 MHz

void delay_ms( unsigned short time_ms )
{
	unsigned int i;
	for ( i = 0; i < time_ms; i++ )  _delay_ms( 1 );
}

// global COM=0 or 1 specifies UART0 or UART1

int COM=1;

#include "uputget.c"

int uart_putchar(char c, FILE *stream) {
/*
 * Send character c to the LCD for use with printf
 */
   uputchar(c);
   return 0;
}

//setup outpuf file for printf

FILE uart_str = FDEV_SETUP_STREAM(uart_putchar, NULL, _FDEV_SETUP_WRITE);

/*
 * Timer0 overflow interrupt handler.  
 * provides global ticks of 10 ms
 */

ISR(TIMER0_OVF_vect)
{
	TCNT0=256-108;
	ticks++; 		
}

void timer_init(void) {

// set up Timer0 to provide 100 tick/sec

	TCCR0  = (7<<CS00);  //clock/1024
	TIMSK |= (1<<TOIE0); // enable int on overflow
	TCNT0=256-108; //exact divisor for 11.059 MHz
}

void get_data(void)
{
	int j;
	for (j=0;j<NP/2;j++) {
	data[j]= -1.0;
	data[j+NP/2]= 1.0;
	}
}
void four1(int nn, int isign)
{
	int n,mmax,m,j,istep,i;
	/*double*/
	float wtemp,wr,wpr,wpi,wi,theta;

	float tempr,tempi;

	n=nn << 1;
	j=1;
	for (i=1;i<n;i+=2) {
		if (j > i) {
			SWAP(data[j],data[i]);
			SWAP(data[j+1],data[i+1]);
		}
		m=n >> 1;
		while (m >= 2 && j > m) {
			j -= m;
			m >>= 1;
		}
		j += m;
	}
	mmax=2;
	while (n > mmax) {
		istep=mmax << 1;
		theta=isign*(6.2831853/mmax);
		wtemp=sin(0.5*theta);
		wpr = -2.0*wtemp*wtemp;
		wpi=sin(theta);
		wr=1.0;
		wi=0.0;
		for (m=1;m<mmax;m+=2) {
			for (i=m;i<=n;i+=istep) {
				j=i+mmax;
				tempr=wr*data[j]-wi*data[j+1];
				tempi=wr*data[j+1]+wi*data[j];
				data[j]=data[i]-tempr;
				data[j+1]=data[i+1]-tempi;
				data[i] += tempr;
				data[i+1] += tempi;
			}
			wr=(wtemp=wr)*wpr-wi*wpi+wr;
			wi=wi*wpr+wtemp*wpi+wi;
		}
		mmax=istep;
	}
}

void realft(int n, int isign)
{

	int i,i1,i2,i3,i4,np3;
	float c1=0.5,c2,h1r,h1i,h2r,h2i;
	/*double*/
	float wr,wi,wpr,wpi,wtemp,theta;

	theta=3.141592654/(float) (n>>1);
	if (isign == 1) {
		c2 = -0.5;
		four1(n>>1,1);
	} else {
		c2=0.5;
		theta = -theta;
	}
	wtemp=sin(0.5*theta);
	wpr = -2.0*wtemp*wtemp;
	wpi=sin(theta);
	wr=1.0+wpr;
	wi=wpi;
	np3=n+3;
	for (i=2;i<=(n>>2);i++) {
		i4=1+(i3=np3-(i2=1+(i1=i+i-1)));
		h1r=c1*(data[i1]+data[i3]);
		h1i=c1*(data[i2]-data[i4]);
		h2r = -c2*(data[i2]+data[i4]);
		h2i=c2*(data[i1]-data[i3]);
		data[i1]=h1r+wr*h2r-wi*h2i;
		data[i2]=h1i+wr*h2i+wi*h2r;
		data[i3]=h1r-wr*h2r+wi*h2i;
		data[i4] = -h1i+wr*h2i+wi*h2r;
		wr=(wtemp=wr)*wpr-wi*wpi+wr;
		wi=wi*wpr+wtemp*wpi+wi;
	}
	if (isign == 1) {
		data[1] = (h1r=data[1])+data[2];
		data[2] = h1r-data[2];
	} else {
		data[1]=c1*((h1r=data[1])+data[2]);
		data[2]=c1*(h1r-data[2]);
		four1(n>>1,-1);
	}
}

//////////////////////////////////////////////////////////////////////// 
int main ()
{	
	int i;
	DDRD |= 1;  	//set PORTD.0 = output
	PORTD &= ~1;  	//led on
	delay_ms(200);
	PORTD |= 1;	//led off
		
	uart_init();            
	stdout = &uart_str;         // associate stream with stdout
	timer_init();
	sei();

	printf("Real FFT 0.1\n\r");   // announce string

	if ( (data = (float *)malloc( (ndata+1)*sizeof(float)) ) == NULL ) 
		printf("Unable to allocate data array\r\n");
	else 
		printf("Successfully allocated data array\r\n");
		
	get_data();

	for(i=0; i<32; i+=2)
	printf( "%7.2f %7.2f \r\n",(double) data[i],(double) data[i+1]); 

	ticks=0;
	realft(NP,1); //transform
	i=ticks;
	printf(" realfft in %dx10 ms \r\n",i);
	printf(" %d data points \r\n",NP);
	for(i=0; i<32; i+=2)
	printf( "%7.2f %7.2f \r\n",(double) data[i],(double) data[i+1]); 
	while(1);
}

I was just looking at my code and I realized that when I multiply by sina[0] and sina[4] in the 2 butterflies, those values are equal to zero, so the sine half of the 2 butterflies are all 0 values. Could this be why the cosine values are coming out all weird? Because it's the sine that works but cosine doesn't.
Thanks,
Judd

jremington:
It is very unlikely that anyone on the forum will have the time and/or patience to go through your code line by line ...

And I didn't. I notice, though, that there's nothing in the code that performs complex arithmetic, and the results of the butterflies are, in general, complex numbers. It looks like the twiddle factors are complex - the sine and cosine terms are kept separate, at least - but there's no complex multiplication implemented, and all the results are added together as real numbers.

I don't see anything that suggests that you understand the notion of a complex number. Try these articles:

If the mathematical concepts in those articles aren't immediately familiar, I'd recommend that you step back and work on understanding the math behind the FFT. It starts with those concepts. If that's not somewhere you want to go, then I'd recommend sticking with FFT programs and libraries written by others.

I told Judd about complex numbers back in November.
http://forum.arduino.cc/index.php?topic=198289.msg1464020#msg1464020

Pete