ArduinoDUE - FloatingPointVS Integer math

Hi everybody,
I would like to ask you a simple question. Does anybody can explain me why floating point double precision math are almost 2x faster than integer math on ArduinoDUE?

I've implemented some digital signal processing on my arduino, like 5KS 16 bit sampling, quadrature demodulation, antialiasing filtering, downsampling, LP filtering, mean and standard deviation calculations. As I read on SAM3X8E manual, the uC has not a floating point unit, but it can emulate it. That's why many forums suggest to use integer math instead of float or double math, and that's the reason why the first implementation of the algorythm I did, used integer math.
But yesterday I was asking my self if I could get more precision using double math, even with some more computation time, and after some code upgrade, I got the final result:

  • more precision
  • half computation time

I'm wondered of the results I got and I just can not explain my self the reason why on an average of 11 algorythm executions, I get an average execution time of 714.18 [us] for integer math, against 355.28 [us] for double precision floating point math, i.e. 32bit native math VS 64bit emulated math.

I'm asking you this question, because this is part of my thesis work and I'm not able to justify this behavior, this is not the one expected.. Thanx

I also am baffled. That result makes no sense at all. The only way to really answer the question is to have an example though. We'd need to see some code that exhibits this behavior. Off the top of my head I can think of only a couple of possibilities:

  1. Have you tested for proper output from both versions? That is, can you verify that both produce the right result?
  2. The algorithm used for integer math was terrible compared to the one used for floating point. It is possible to write bad code in any language and at any time. So, make sure your algorithms are efficiently constructed.

Equivalent versions of an algorithm with one coded as integer math and the other FP should show the integer version to be many times faster on a Due.

maybe floating point is done in hardware and integer math (partly) in software?

Please post sample code so it can be verified.

Hi guys, thank for your replies. The code is quite long, so I'll try to post an extract for the two version in two different posts.
Before I post the code, I want to grant you that I get the same result in both ways. The algorythm has been first written in Matlab and then translated in C. To compare the results, I get the raw samples printed on Serial port, I copy and paste them in Matlab passing them through matlab algorythm. Except some normal fluctuations on the 4th decimal, the result it's the same.

Now,think about I've dinamically allocated some vectors using polyAlloc() function: rawSignal, polySignal, polyIQ and polyResult. I've also succesfully acquired 5000 16 bit samples and stored them inside rawSignal vector. Then I call computeOutput() function, that uses the filter() function.

This is the integer math version:

#define nelems(x) sizeof(x)/sizeof(*x)

const int Fs = 5000;												// Sample Rate
const int D = 5;													// Down sampling factor
float T = 1.0;														// Acquisition Time sec
double meanValue = 0.0;
double stdValue	 = 0.0;

int volatile sampleInd = 0;											// Global Buffer Location
int * volatile rawSignal;											// Acquisition RAM Buffer
int * polySignal[D];												// PolySignal branches
int * polyIQ[2];													// Quadrature filtered, downsampled, demodulated signals
unsigned int * polyResult;											// Reconstructed, LP filtered signal

// I-Q quadrature oscillations
const float I[D] = {2.0, 0.61803398,-1.61803398,-1.61803398, 0.61803398};// Cosine
const float Q[D] = {0.0, 1.90211303, 1.17557050,-1.17557050,-1.90211303};// Sine

	
// AntiAliasing filter coefficients :: FIR window Kaiser, Fs=5000, Fc=300
const float AAF[15] = {0.01786240,0.03388546,0.05089129,0.06751284,0.08232631,0.09400803,0.10148476,0.10405780,0.10148476,0.09400803,0.08232631,0.06751284,0.05089129,0.03388546,0.01786240};
float * polyAAF[D];													// Polyphase downsampled impulse responses

// LP filter coefficients :: FIR window Kaiser, Fs=1000, Fc=15
const float LP[21] = {0.04144742,0.04320807,0.04483017,0.04629891,0.04760078,0.04872372,0.04965724,0.05039261,0.05092289,0.05124307,0.05135014,0.05124307,0.05092289,0.05039261,0.04965724,0.04872372,0.04760078,0.04629891,0.04483017,0.04320807,0.04144742};

void computeOutput(){
	timemicros = micros();
	rawSignal[0] = rawSignal[5];									// First value adjust
	
	int offset = 0;													// Offset
	for(int h=0;h<Fs*T;h++)
		offset+=rawSignal[h];
	offset/=Fs*T;
	
	for (int i=0;i<Fs*T/D;i++)										// Initialize arrays
	{
		polyIQ[0][i] = 0;
		polyIQ[1][i] = 0;
	}	
		
	for (int j=0;j<D;j++)
	{
		// DownSampling
		for (int i=j;i<Fs*T;i+=5)
			polySignal[j][i/5] = rawSignal[i]-offset;				// Downsampling and offset removal
			
		for (int i=j;i<nelems(AAF);i+=5)
			polyAAF[j][i/5] = AAF[i];

		// AntiAlias Filtering	
		filter(polyAAF[j],nelems(AAF)/D,polySignal[j],(int) Fs*T/D);
				
		// Quadrature Demodulation
		for (int i=0;i<Fs*T/D;i++)
		{
			polyIQ[0][i] += I[j]*(polySignal[j][i]);
			polyIQ[1][i] += Q[j]*(polySignal[j][i]);
		}
	}

	// Norm calculation :: sqrt(I^2+Q^2)
	for (int i=0;i<Fs*T/D;i++)
		polyResult[i] = (unsigned int) sqrt(polyIQ[0][i]*polyIQ[0][i] + polyIQ[1][i]*polyIQ[1][i]);

	// LP filtering
	filter((float *) LP,nelems(LP),(int*) polyResult,(int) Fs*T/D);

	// Elaborate stats
	statistics();
	
	double convFactor = 2.0*4.0/65535.0;							// 2 * FullScale / (2^16-1)
	meanValue *= convFactor;
	stdValue  *= convFactor;
	
	printResults();
	
	polyFreeAlloc();
	timemicros=micros()-timemicros;
	Serial.println(timemicros);
}

bool polyAlloc()
{
	if(NULL != rawSignal)
	{
		free(rawSignal);											// Deallocate buffer memory
		free(polyResult);
		rawSignal = NULL;
		polyResult= NULL;
	}
		
	rawSignal  = (int *)malloc(Fs*T*sizeof(int));					// Dynamic Memory Allocation
	polyResult = (unsigned int *)malloc(Fs*T/D*sizeof(int));
	
	polyIQ[0]  = (int *)malloc(Fs*T/D*sizeof(int));
	polyIQ[1]  = (int *)malloc(Fs*T/D*sizeof(int));
	
	bool polys = true;
	for (int j=0;j<D;j++)
	{
		polySignal[j]=    (int *)malloc(Fs*T/D*sizeof(int));
		polyAAF[j]   =  (float *)malloc(nelems(AAF)/D*sizeof(float));
		
		polys &= polySignal[j] != NULL && polyAAF[j] != NULL;
	}
	
	return (rawSignal[0] != NULL && polyIQ[0] != NULL && polyIQ[1] != NULL && polyResult != NULL && polys);
}

void filter(float *Num, int n, int *vect, int vLen)
{
	int average = 0;												// Initial conditions calculation
	for(int h=0;h<vLen;h++)
		average+=vect[h];
	average/=vLen;
		
	int numbuf[n];													// Initialize FIFO FIR buffer
	for(int h=0;h<n;h++)
		numbuf[h]=average;

	for (int h=0;h<vLen;h++){										// Filtering cycle
		int FIR = 0;

		for (int k=n-1;k>0;k--)										// Cycle FIR buffer elements
			numbuf[k]=numbuf[k-1];
		numbuf[0] = vect[h];										// Push new element inside FIR buffer
			
		for (int j=0;j<n;j++)										// FIR convolution
			FIR += (int) numbuf[j]*Num[j];
			
		vect[h] = FIR;												// Output calculation
	}
}

And this is the double math version:

#define nelems(x) sizeof(x)/sizeof(*x)
const int Fs = 5000;												// Sample Rate
const int D = 5;													// Down sampling factor
float T = 1.0;														// Acquisition Time sec
double meanValue = 0.0;
double stdValue	 = 0.0;

int volatile sampleInd = 0;											// Global Buffer Location
short int * volatile rawSignal;										// Acquisition RAM Buffer
double * polySignal[D];												// PolySignal branches
double * polyIQ[2];													// Quadrature filtered, downsampled, demodulated signals
double * polyResult;												// Reconstructed, LP filtered signal

// I-Q quadrature oscillations
const double I[D] = {	2.000000000000000,							// Cosine
				0.618033988749895,
				-1.618033988749895,
				-1.618033988749895,
				0.618033988749894};
const double Q[D] = {0.000000000000000,							// Sine
				1.902113032590307,
				1.175570504584947,
				-1.175570504584946,
				-1.902113032590307};

	
// AntiAliasing filter coefficients :: FIR window Kaiser, Fs=5000, Fc=300
const double AAF[15] = {0.017862400522538,
						0.033885458153087,
						0.050891294195008,
						0.067512846793389,
						0.082326310000912,
						0.094008030100344,
						0.101484759026756,
						0.104057802415933,
						0.101484759026756,
						0.094008030100344,
						0.082326310000912,
						0.067512846793389,
						0.050891294195008,
						0.033885458153087,
						0.017862400522538};
double * polyAAF[D];												// Polyphase downsampled impulse responses

// LP filter coefficients :: FIR window Kaiser, Fs=1000, Fc=15
const double LP[21] = {	0.041447420224253,
						0.043208077329578,
						0.044830173046221,
						0.046298913638535,
						0.047600786034868,
						0.048723722606653,
						0.049657248993218,
						0.050392612896925,
						0.050922892035903,
						0.051243079731859,
						0.051350146923975,
						0.051243079731859,
						0.050922892035903,
						0.050392612896925,
						0.049657248993218,
						0.048723722606653,
						0.047600786034868,
						0.046298913638535,
						0.044830173046221,
						0.043208077329578,
						0.041447420224253};

bool volatile BUSY_STATE = false;									// Internal state

unsigned long int timemicros;


void computeOutput(){
	timemicros = micros();
	rawSignal[0] = rawSignal[5];									// First value adjust
	
	short int offset = 0;													// Offset
	for(int h=0;h<Fs*T;h++)
		offset+=rawSignal[h];
	offset/=Fs*T;
	
	for (int i=0;i<Fs*T/D;i++)										// Initialize arrays
	{
		polyIQ[0][i] = 0;
		polyIQ[1][i] = 0;
	}	
		
	for (int j=0;j<D;j++)
	{
		// DownSampling
		for (int i=j;i<Fs*T;i+=5)
			polySignal[j][i/5] = (double)(rawSignal[i]-offset);				// Downsampling and offset removal
			
		for (int i=j;i<nelems(AAF);i+=5)
			polyAAF[j][i/5] = AAF[i];

		// AntiAlias Filtering	
		filter(polyAAF[j],nelems(AAF)/D,polySignal[j],(int) Fs*T/D);
				
		// Quadrature Demodulation
		for (int i=0;i<Fs*T/D;i++)
		{
			polyIQ[0][i] += I[j]*(polySignal[j][i]);
			polyIQ[1][i] += Q[j]*(polySignal[j][i]);
		}
	}

	// Norm calculation :: sqrt(I^2+Q^2)
	for (int i=0;i<Fs*T/D;i++)
		polyResult[i] = (double) sqrt(polyIQ[0][i]*polyIQ[0][i] + polyIQ[1][i]*polyIQ[1][i]);

	// LP filtering
	filter((double*) LP,nelems(LP),(double*) polyResult,(int) Fs*T/D);

	// Elaborate stats
	statistics();
	
	double convFactor = 2.0*4.0/65535.0;							// 2 * FullScale / (2^16-1)
	meanValue *= convFactor;
	stdValue  *= convFactor;
	
	printResults();
	
	polyFreeAlloc();
	timemicros=micros()-timemicros;
	Serial.println(timemicros);
}

bool polyAlloc()
{
	if(NULL != rawSignal)
	{
		free(rawSignal);											// Deallocate buffer memory
		free(polyResult);
		rawSignal = NULL;
		polyResult= NULL;
	}
		
	rawSignal  = (short int *)malloc(Fs*T*sizeof(short int));					// Dynamic Memory Allocation
	polyResult = (double *)malloc(Fs*T/D*sizeof(double));
	
	polyIQ[0]  = (double *)malloc(Fs*T/D*sizeof(double));
	polyIQ[1]  = (double *)malloc(Fs*T/D*sizeof(double));
	
	bool polys = true;
	for (int j=0;j<D;j++)
	{
		polySignal[j]=  (double *)malloc(Fs*T/D*sizeof(double));
		polyAAF[j]   =  (double *)malloc(nelems(AAF)/D*sizeof(double));
		
		polys &= polySignal[j] != NULL && polyAAF[j] != NULL;
	}
	
	return (rawSignal[0] != NULL && polyIQ[0] != NULL && polyIQ[1] != NULL && polyResult != NULL && polys);
}

void polyFreeAlloc()
{
	for (int i=0;i<D;i++)
	{
		free(polySignal[i]);
		polySignal[i] = NULL;
	}
	
	free(polyIQ[0]);
	free(polyIQ[1]);
	polyIQ[0] = NULL;
	polyIQ[1] = NULL;
}

void filter(double *Num, int n, double *vect, int vLen)
{
	double average = 0.0;											// Initial conditions calculation
	for(int h=0;h<vLen;h++)
		average+=vect[h];
	average/=(double)vLen;
		
	double numbuf[n];												// Initialize FIFO FIR buffer
	for(int h=0;h<n;h++)
		numbuf[h]=average;

	for (int h=0;h<vLen;h++){										// Filtering cycle
		double FIR = 0.0;

		for (int k=n-1;k>0;k--)										// Cycle FIR buffer elements
			numbuf[k]=numbuf[k-1];
		numbuf[0] = vect[h];										// Push new element inside FIR buffer
			
		for (int j=0;j<n;j++)										// FIR convolution
			FIR += numbuf[j]*Num[j];
			
		vect[h] = FIR;												// Output calculation
	}
}

I experienced once that there was a big time difference when dividing two int32 values compared to dividing two int64 values (don't remember the excact numbers, but was something like 1us to 40us). For all other operations it seemed to be like expected. Unless the division problem, in all my code examples int64 math was much quicker than float32 math and roughly two or three times slower than int32 math.

So the problem could be performing divisions using data types larger than the atomic size which is 32bit for the DUE. The division of float32's however should be somehow an atomic operation.

Any comments or tests on that?

*update: you can also find where exactly the time difference is caused in your code by using stuff like:

t1 = micros();
**code**
t2 = micros();
Serial.println(t2-t1);

for every block of code.. I guess you will find an int64 division to be the crucial point. could you try?

I forgot the statistics() function:

void statistics()
{
	// Mean value
	unsigned int meanSum = 0;
	for (int i=0;i<Fs*T/D;i++)
		meanSum += polyResult[i];

	meanValue = (double)meanSum/(Fs*T/D);							// Averaging reduces quantization noise by 1/sqrt(K) |K=1000 => 30 dB => 5 LSBs
	
	// Standard deviation value
	double stdSum = 0;
	for (int i=0;i<Fs*T/D;i++)
		stdSum += ((double)polyResult[i]-meanValue)*((double)polyResult[i]-meanValue);

	stdValue = sqrt(stdSum/(Fs*T/D-1));
}

Hi schwingkopf,
I've used almost no division in my code but only multiplications and sums, except some average calculation. In the integer math version there are only multiplication between int32 and float32, while in double version there are multiplication between double64..

I have noticed on Teensy3 that 32 bit float support from newlib is particularly slow. Since Arduino Due is using the same C library, it's probably pretty similar. I have not yet investigated why.

One thing I do know: 32 bit division is extremely fast. It's handled in hardware using only 2 cycles.

Thanks for suggestions. Now it's quite late for me, so I'm going to sleep. Tomorrow I'll get the execution time for each step inside the code and I'll post the output.
I don't know if it could be useful, but I'm using Atmel Studio Framework with VisualMicro plugin to compile and flash the code. Goodnight