Pages: [1]   Go Down
Author Topic: ArduinoDUE - FloatingPointVS Integer math  (Read 1371 times)
0 Members and 1 Guest are viewing this topic.
Offline Offline
Newbie
*
Karma: 0
Posts: 24
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Hi everybody,
I would like to ask you a simple question. Does anybody can explain me why floating point double precision math are almost 2x faster than integer math on ArduinoDUE?

I've implemented some digital signal processing on my arduino, like 5KS 16 bit sampling, quadrature demodulation, antialiasing filtering, downsampling, LP filtering, mean and standard deviation calculations. As I read on SAM3X8E manual, the uC has not a floating point unit, but it can emulate it. That's why many forums suggest to use integer math instead of float or double math, and that's the reason why the first implementation of the algorythm I did, used integer math.
But yesterday I was asking my self if I could get more precision using double math, even with some more computation time, and after some code upgrade, I got the final result:
  • more precision
  • half computation time

I'm wondered of the results I got and I just can not explain my self the reason why on an average of 11 algorythm executions, I get an average execution time of 714.18 [us] for integer math, against 355.28 [us] for double precision floating point math, i.e. 32bit native math VS 64bit emulated math.

I'm asking you this question, because this is part of my thesis work and I'm not able to justify this behavior, this is not the one expected.. Thanx
Logged

Earth
Offline Offline
Sr. Member
****
Karma: 14
Posts: 331
Arduino rocks
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

I also am baffled. That result makes no sense at all. The only way to really answer the question is to have an example though. We'd need to see some code that exhibits this behavior. Off the top of my head I can think of only a couple of possibilities:

1. Have you tested for proper output from both versions? That is, can you verify that both produce the right result?
2. The algorithm used for integer math was terrible compared to the one used for floating point. It is possible to write bad code in any language and at any time. So, make sure your algorithms are efficiently constructed.

Equivalent versions of an algorithm with one coded as integer math and the other FP should show the integer version to be many times faster on a Due.


Logged

Global Moderator
Netherlands
Offline Offline
Shannon Member
*****
Karma: 222
Posts: 13866
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

maybe floating point is done in hardware and integer math (partly) in software?

Please post sample code so it can be verified.
Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Offline Offline
Newbie
*
Karma: 0
Posts: 24
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Hi guys, thank for your replies. The code is quite long, so I'll try to post an extract for the two version in two different posts.
Before I post the code, I want to grant you that I get the same result in both ways. The algorythm has been first written in Matlab and then translated in C. To compare the results, I get the raw samples printed on Serial port, I copy and paste them in Matlab passing them through matlab algorythm. Except some normal fluctuations on the 4th decimal, the result it's the same.

Now,think about I've dinamically allocated some vectors using polyAlloc() function: rawSignal, polySignal, polyIQ and polyResult. I've also succesfully acquired 5000 16 bit samples and stored them inside rawSignal vector. Then I call computeOutput() function, that uses the filter() function.
Logged

Offline Offline
Newbie
*
Karma: 0
Posts: 24
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

This is the integer math version:
Code:
#define nelems(x) sizeof(x)/sizeof(*x)

const int Fs = 5000; // Sample Rate
const int D = 5; // Down sampling factor
float T = 1.0; // Acquisition Time sec
double meanValue = 0.0;
double stdValue = 0.0;

int volatile sampleInd = 0; // Global Buffer Location
int * volatile rawSignal; // Acquisition RAM Buffer
int * polySignal[D]; // PolySignal branches
int * polyIQ[2]; // Quadrature filtered, downsampled, demodulated signals
unsigned int * polyResult; // Reconstructed, LP filtered signal

// I-Q quadrature oscillations
const float I[D] = {2.0, 0.61803398,-1.61803398,-1.61803398, 0.61803398};// Cosine
const float Q[D] = {0.0, 1.90211303, 1.17557050,-1.17557050,-1.90211303};// Sine


// AntiAliasing filter coefficients :: FIR window Kaiser, Fs=5000, Fc=300
const float AAF[15] = {0.01786240,0.03388546,0.05089129,0.06751284,0.08232631,0.09400803,0.10148476,0.10405780,0.10148476,0.09400803,0.08232631,0.06751284,0.05089129,0.03388546,0.01786240};
float * polyAAF[D]; // Polyphase downsampled impulse responses

// LP filter coefficients :: FIR window Kaiser, Fs=1000, Fc=15
const float LP[21] = {0.04144742,0.04320807,0.04483017,0.04629891,0.04760078,0.04872372,0.04965724,0.05039261,0.05092289,0.05124307,0.05135014,0.05124307,0.05092289,0.05039261,0.04965724,0.04872372,0.04760078,0.04629891,0.04483017,0.04320807,0.04144742};

void computeOutput(){
timemicros = micros();
rawSignal[0] = rawSignal[5]; // First value adjust

int offset = 0; // Offset
for(int h=0;h<Fs*T;h++)
offset+=rawSignal[h];
offset/=Fs*T;

for (int i=0;i<Fs*T/D;i++) // Initialize arrays
{
polyIQ[0][i] = 0;
polyIQ[1][i] = 0;
}

for (int j=0;j<D;j++)
{
// DownSampling
for (int i=j;i<Fs*T;i+=5)
polySignal[j][i/5] = rawSignal[i]-offset; // Downsampling and offset removal

for (int i=j;i<nelems(AAF);i+=5)
polyAAF[j][i/5] = AAF[i];

// AntiAlias Filtering
filter(polyAAF[j],nelems(AAF)/D,polySignal[j],(int) Fs*T/D);

// Quadrature Demodulation
for (int i=0;i<Fs*T/D;i++)
{
polyIQ[0][i] += I[j]*(polySignal[j][i]);
polyIQ[1][i] += Q[j]*(polySignal[j][i]);
}
}

// Norm calculation :: sqrt(I^2+Q^2)
for (int i=0;i<Fs*T/D;i++)
polyResult[i] = (unsigned int) sqrt(polyIQ[0][i]*polyIQ[0][i] + polyIQ[1][i]*polyIQ[1][i]);

// LP filtering
filter((float *) LP,nelems(LP),(int*) polyResult,(int) Fs*T/D);

// Elaborate stats
statistics();

double convFactor = 2.0*4.0/65535.0; // 2 * FullScale / (2^16-1)
meanValue *= convFactor;
stdValue  *= convFactor;

printResults();

polyFreeAlloc();
timemicros=micros()-timemicros;
Serial.println(timemicros);
}

bool polyAlloc()
{
if(NULL != rawSignal)
{
free(rawSignal); // Deallocate buffer memory
free(polyResult);
rawSignal = NULL;
polyResult= NULL;
}

rawSignal  = (int *)malloc(Fs*T*sizeof(int)); // Dynamic Memory Allocation
polyResult = (unsigned int *)malloc(Fs*T/D*sizeof(int));

polyIQ[0]  = (int *)malloc(Fs*T/D*sizeof(int));
polyIQ[1]  = (int *)malloc(Fs*T/D*sizeof(int));

bool polys = true;
for (int j=0;j<D;j++)
{
polySignal[j]=    (int *)malloc(Fs*T/D*sizeof(int));
polyAAF[j]   =  (float *)malloc(nelems(AAF)/D*sizeof(float));

polys &= polySignal[j] != NULL && polyAAF[j] != NULL;
}

return (rawSignal[0] != NULL && polyIQ[0] != NULL && polyIQ[1] != NULL && polyResult != NULL && polys);
}

void filter(float *Num, int n, int *vect, int vLen)
{
int average = 0; // Initial conditions calculation
for(int h=0;h<vLen;h++)
average+=vect[h];
average/=vLen;

int numbuf[n]; // Initialize FIFO FIR buffer
for(int h=0;h<n;h++)
numbuf[h]=average;

for (int h=0;h<vLen;h++){ // Filtering cycle
int FIR = 0;

for (int k=n-1;k>0;k--) // Cycle FIR buffer elements
numbuf[k]=numbuf[k-1];
numbuf[0] = vect[h]; // Push new element inside FIR buffer

for (int j=0;j<n;j++) // FIR convolution
FIR += (int) numbuf[j]*Num[j];

vect[h] = FIR; // Output calculation
}
}
Logged

Offline Offline
Newbie
*
Karma: 0
Posts: 24
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

And this is the double math version:

Code:
#define nelems(x) sizeof(x)/sizeof(*x)
const int Fs = 5000; // Sample Rate
const int D = 5; // Down sampling factor
float T = 1.0; // Acquisition Time sec
double meanValue = 0.0;
double stdValue = 0.0;

int volatile sampleInd = 0; // Global Buffer Location
short int * volatile rawSignal; // Acquisition RAM Buffer
double * polySignal[D]; // PolySignal branches
double * polyIQ[2]; // Quadrature filtered, downsampled, demodulated signals
double * polyResult; // Reconstructed, LP filtered signal

// I-Q quadrature oscillations
const double I[D] = { 2.000000000000000, // Cosine
0.618033988749895,
-1.618033988749895,
-1.618033988749895,
0.618033988749894};
const double Q[D] = {0.000000000000000, // Sine
1.902113032590307,
1.175570504584947,
-1.175570504584946,
-1.902113032590307};


// AntiAliasing filter coefficients :: FIR window Kaiser, Fs=5000, Fc=300
const double AAF[15] = {0.017862400522538,
0.033885458153087,
0.050891294195008,
0.067512846793389,
0.082326310000912,
0.094008030100344,
0.101484759026756,
0.104057802415933,
0.101484759026756,
0.094008030100344,
0.082326310000912,
0.067512846793389,
0.050891294195008,
0.033885458153087,
0.017862400522538};
double * polyAAF[D]; // Polyphase downsampled impulse responses

// LP filter coefficients :: FIR window Kaiser, Fs=1000, Fc=15
const double LP[21] = { 0.041447420224253,
0.043208077329578,
0.044830173046221,
0.046298913638535,
0.047600786034868,
0.048723722606653,
0.049657248993218,
0.050392612896925,
0.050922892035903,
0.051243079731859,
0.051350146923975,
0.051243079731859,
0.050922892035903,
0.050392612896925,
0.049657248993218,
0.048723722606653,
0.047600786034868,
0.046298913638535,
0.044830173046221,
0.043208077329578,
0.041447420224253};

bool volatile BUSY_STATE = false; // Internal state

unsigned long int timemicros;


void computeOutput(){
timemicros = micros();
rawSignal[0] = rawSignal[5]; // First value adjust

short int offset = 0; // Offset
for(int h=0;h<Fs*T;h++)
offset+=rawSignal[h];
offset/=Fs*T;

for (int i=0;i<Fs*T/D;i++) // Initialize arrays
{
polyIQ[0][i] = 0;
polyIQ[1][i] = 0;
}

for (int j=0;j<D;j++)
{
// DownSampling
for (int i=j;i<Fs*T;i+=5)
polySignal[j][i/5] = (double)(rawSignal[i]-offset); // Downsampling and offset removal

for (int i=j;i<nelems(AAF);i+=5)
polyAAF[j][i/5] = AAF[i];

// AntiAlias Filtering
filter(polyAAF[j],nelems(AAF)/D,polySignal[j],(int) Fs*T/D);

// Quadrature Demodulation
for (int i=0;i<Fs*T/D;i++)
{
polyIQ[0][i] += I[j]*(polySignal[j][i]);
polyIQ[1][i] += Q[j]*(polySignal[j][i]);
}
}

// Norm calculation :: sqrt(I^2+Q^2)
for (int i=0;i<Fs*T/D;i++)
polyResult[i] = (double) sqrt(polyIQ[0][i]*polyIQ[0][i] + polyIQ[1][i]*polyIQ[1][i]);

// LP filtering
filter((double*) LP,nelems(LP),(double*) polyResult,(int) Fs*T/D);

// Elaborate stats
statistics();

double convFactor = 2.0*4.0/65535.0; // 2 * FullScale / (2^16-1)
meanValue *= convFactor;
stdValue  *= convFactor;

printResults();

polyFreeAlloc();
timemicros=micros()-timemicros;
Serial.println(timemicros);
}

bool polyAlloc()
{
if(NULL != rawSignal)
{
free(rawSignal); // Deallocate buffer memory
free(polyResult);
rawSignal = NULL;
polyResult= NULL;
}

rawSignal  = (short int *)malloc(Fs*T*sizeof(short int)); // Dynamic Memory Allocation
polyResult = (double *)malloc(Fs*T/D*sizeof(double));

polyIQ[0]  = (double *)malloc(Fs*T/D*sizeof(double));
polyIQ[1]  = (double *)malloc(Fs*T/D*sizeof(double));

bool polys = true;
for (int j=0;j<D;j++)
{
polySignal[j]=  (double *)malloc(Fs*T/D*sizeof(double));
polyAAF[j]   =  (double *)malloc(nelems(AAF)/D*sizeof(double));

polys &= polySignal[j] != NULL && polyAAF[j] != NULL;
}

return (rawSignal[0] != NULL && polyIQ[0] != NULL && polyIQ[1] != NULL && polyResult != NULL && polys);
}

void polyFreeAlloc()
{
for (int i=0;i<D;i++)
{
free(polySignal[i]);
polySignal[i] = NULL;
}

free(polyIQ[0]);
free(polyIQ[1]);
polyIQ[0] = NULL;
polyIQ[1] = NULL;
}

void filter(double *Num, int n, double *vect, int vLen)
{
double average = 0.0; // Initial conditions calculation
for(int h=0;h<vLen;h++)
average+=vect[h];
average/=(double)vLen;

double numbuf[n]; // Initialize FIFO FIR buffer
for(int h=0;h<n;h++)
numbuf[h]=average;

for (int h=0;h<vLen;h++){ // Filtering cycle
double FIR = 0.0;

for (int k=n-1;k>0;k--) // Cycle FIR buffer elements
numbuf[k]=numbuf[k-1];
numbuf[0] = vect[h]; // Push new element inside FIR buffer

for (int j=0;j<n;j++) // FIR convolution
FIR += numbuf[j]*Num[j];

vect[h] = FIR; // Output calculation
}
}

Logged

Offline Offline
Newbie
*
Karma: 1
Posts: 42
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

I experienced once that there was a big time difference when dividing two int32 values compared to dividing two int64 values (don't remember the excact numbers, but was something like 1us to 40us). For all other operations it seemed to be like expected. Unless the division problem, in all my code examples int64 math was much quicker than float32 math and roughly two or three times slower than int32 math.

So the problem could be performing divisions using data types larger than the atomic size which is 32bit for the DUE. The division of float32's however should be somehow an atomic operation.

Any comments or tests on that?

*update: you can also find where exactly the time difference is caused in your code by using stuff like:
Code:
t1 = micros();
**code**
t2 = micros();
Serial.println(t2-t1);
for every block of code.. I guess you will find an int64 division to be the crucial point. could you try?
« Last Edit: August 26, 2013, 04:55:11 pm by schwingkopf » Logged

Offline Offline
Newbie
*
Karma: 0
Posts: 24
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

I forgot the statistics() function:
Code:
void statistics()
{
// Mean value
unsigned int meanSum = 0;
for (int i=0;i<Fs*T/D;i++)
meanSum += polyResult[i];

meanValue = (double)meanSum/(Fs*T/D); // Averaging reduces quantization noise by 1/sqrt(K) |K=1000 => 30 dB => 5 LSBs

// Standard deviation value
double stdSum = 0;
for (int i=0;i<Fs*T/D;i++)
stdSum += ((double)polyResult[i]-meanValue)*((double)polyResult[i]-meanValue);

stdValue = sqrt(stdSum/(Fs*T/D-1));
}

Logged

Offline Offline
Newbie
*
Karma: 0
Posts: 24
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Hi schwingkopf,
I've used almost no division in my code but only multiplications and sums, except some average calculation. In the integer math version there are only multiplication between int32 and float32, while in double version there are multiplication between double64..
Logged

0
Offline Offline
God Member
*****
Karma: 26
Posts: 621
Always making something...
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

I have noticed on Teensy3 that 32 bit float support from newlib is particularly slow.  Since Arduino Due is using the same C library, it's probably pretty similar.  I have not yet investigated why.

One thing I do know: 32 bit division is extremely fast.  It's handled in hardware using only 2 cycles.
Logged

Offline Offline
Newbie
*
Karma: 0
Posts: 24
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Thanks for suggestions. Now it's quite late for me, so I'm going to sleep. Tomorrow I'll get the execution time for each step inside the code and I'll post the output.
I don't know if it could be useful, but I'm using Atmel Studio Framework with VisualMicro plugin to compile and flash the code. Goodnight
Logged

Pages: [1]   Go Up
Jump to: