Weird fixed point behavior

Hey everyone,

Im currently trying to check how long a floating point calculation takes and how long a fixed point (Q16) calculation takes. However, there is something curious going on:

A fixed point multiplication takes about 104 microseconds while a floating point multiplication takes about 12-16 microseconds. But whenever i reset the calculated value back to 0 AFTER the timing and calculation, the calculation itself seems to becomes faster...

my code:

#include <Serial.h>
#include "Fixed_Math.h"

typedef int32_t fixed_16;

#define FIX16 (65536)

unsigned long last_time;
unsigned long last_time2;
float fl1 = 5;
float fl2 = 2.5;
float flresult;
fixed_16 x1;
fixed_16 x2;
fixed_16 result1;

void setup()
{
	Serial.begin(115200);
	
	delay(1000);
	
	x1 = float_to_fixed(fl1);
	x2 = float_to_fixed(fl2);
	
	//floating point calculation
	start_timer();
	flresult = fl1 * fl2;
	stop_timer();
	Serial.print(F("floating point calc took:")); tab(); display_time();
	
	//fixed point calculation
	result1 = fixed_mul(x1, x2);
}

void loop()
{
	
}

void print_fixed(fixed_16 x){
	Serial.print(F("Fixed point result:")); tab();
	Serial.println(fixed_to_float(x));
}

fixed_16 fixed_mul(fixed_16 x1, fixed_16 x2){
	fixed_16 result = 0;
	
	start_timer();
	result = (int32_t) (((int64_t) x1 * (int64_t) x2) / FIX16);
	stop_timer();
	Serial.print("fixed calc took:"); tab(); display_time(); 
	print_fixed(result);
	
	//result = 0; //when commented, the calculation of the result takes about 100 usec
                          //when not commented, the calculation takes about 4 usec
	
	return result;
}

fixed_16 float_to_fixed(float x){
	fixed_16 rslt = (fixed_16) ((x) * FIX16);
	return rslt;
}

float fixed_to_float(fixed_16 x){
	float rslt = (((float) (x)) / FIX16);
	return rslt;
}

void start_timer(){
	last_time = micros();
}

void stop_timer(){
	last_time2 = micros();
}

void display_time(){
	Serial.println(((last_time2 - last_time)));
}

void tab(){
	Serial.print("\t");
}

When i run this code WITHOUT the result = 0; (inside fixed_mul(){...} ) part, the serial monitor gives me this:

floating point calc took: 12
fixed calc took: 104
Fixed point result: 12.50

But when i run it WITH the result = 0; part, the serial monitor gives this result:

floating point calc took: 12
fixed calc took: 4
Fixed point result: 12.50

I think the problem lies with the compiler thinking the fixed_mul() function should always return 0. But if this is true, how would i make this function compute a fixed point calculation faster than a floating point calculation? (this function is of no use to me in this state anyway since it would indeed always return 0)

Any guesses? :slight_smile:

Perhaps a smart optimizer put your function inline and decided that it could do the calculation at compile time since the return value was a constant.

When you get timing results that change drastically it's usually the optimizer.

Thanks for your reply,

That should explain why it is a lot faster with the constant return value. But i don't understand how the fixed point calculation is so much slower than the floating point calculation? Fixed point calculations are usually faster than floating point calculations. It might not be in this case since it involves only one calculation, but it should'nt be almost 90 usec slower right?

You can't test it like that.
Between start_timer and stop_timer, you use just one float multiply. It is better to do maybe 100 multiplies, and you have to force the compiler to do that in runtime.

The compiler knows that fl1=5.0 and fl2=2.5, it also knows that you want to multply them. So the compiler might do that multipy for you and in real time, the value 12.5 could be used.

I just went through this exercise recently and found these pitfalls:

  1. Don't send serial output in advance of timing something since the serial output is asynchronous and interrupt driven. If a UART interrupt occurs in the middle of what you're timing it will affect the result. Either flush() the serial output or simply save the timing results in variables and output them at the end of your sketch.

  2. If the compiler can calculate it at compile time it will, so you have to provide values it can't see. You can do this by reading the values from somewhere (Serial input, EEPROM, an input port, a timer, a pseudo-random). You can also succeed by making the values too numerous for the compiler to do it ahead of time, such as in a loop where you are calculating from an input value=1 to 100.

  3. The compiler may optimize your code and rearrange it so that what you are trying to time is executed after the second timer call. By burying your timer references two functions deep you may be able to fool the compiler, but you should check this if it's returning an unexpectedly small elapsed time. When I was trying to measure a single operation (not in a loop) I resorted to inline assembly to prevent the compiler from doing this.

  4. If the actual result of the calculation is not output somewhere in your sketch the compiler may choose to optimize away the entire calculation.

One approach I use in timing loops is to make the variable volatile. That forces the compiler to consider that it might be changing and not optimize away all accesses. Also certainly don't use constants (unless it is constants you are testing).

You also need to be careful you aren't timing your serial prints. Here is a simple example that does something like that:

volatile float fVar;
volatile int iVar;
unsigned long start, finish;
const unsigned int ITERATIONS = 20000;

void setup ()
  {
  Serial.begin (115200);
  Serial.println ();
  Serial.flush ();
  
  start = micros ();
  for (int i = 0; i < ITERATIONS; i++)
    fVar = (float) i / 123.456;
  finish = micros ();
  Serial.print ("Float took ");
  Serial.println (finish - start);
  Serial.flush ();
  
  start = micros ();
  for (int i = 1000; i < (ITERATIONS + 1000); i++)
    iVar =  i / 123;
  finish = micros ();
  Serial.print ("Integer took ");
  Serial.println (finish - start);
  Serial.flush ();
  
  }  // end of setup

void loop () { }

Results:

Float took 673288
Integer took 311368

I am a bit surprised there isn't more discrepancy between the methods.

There is not that much difference between float and long. And both are slow with divide, so the time with division is even closer.