Arduino Forum

Using Arduino => Programming Questions => Topic started by: zatalian on Nov 08, 2012, 04:23 pm

Title: Speed of floating point operations
Post by: zatalian on Nov 08, 2012, 04:23 pm
Hi,

I'm new to this forum, but i have been working with arduino's for quite a while.
I have a question about the execution time of floating point commands. I found this old thread: http://arduino.cc/forum/index.php/topic,40901.0.html mentioning the speed of execution and was wondering where those numbers where comming from...
I get different (very strange and confusing) results...

First i used the function micros() to time my operations, but i read that micros()'s resolution is 4µs?
For my second attempt, i'm using Timer1, with prescaler 1, on a duemillanove, so i should have 16 ticks / µs.

This is the code i'm trying to time :

Code: [Select]

void setup()
{
  Serial.begin(9600);
  TCCR1B &= 0xF8;
  TCCR1B |= (1 << CS10);
}

void loop()
{
  float fnumber;
  float fresult = 0.0;
 
 
  uint16_t time;
  TCNT1 = 0;

  fnumber = 50.0;
  fresult = sqrt(fnumber);
  //fresult = sin(fnumber);
  //delayMicroseconds(10);

  time = TCNT1;
   
  Serial.print("delay: ");
  Serial.println(time, DEC);
 
  Serial.print("sqrt(");
  Serial.print(fnumber, DEC);
  Serial.print(") = ");
  Serial.println(fresult, DEC); 
 
  delay(1000);
}


executing a sqrt() or sin() gives my a delta of 1, meaning 1/16th of a µs. This can't be right, but i can't figure out what i'm doing wrong...
Inserting a delayMicroseconds(10) function gives me (roughly) the correct delta of 156 (160 expected), so my timers seems to work correctly.

using the function micros() instead of timer1 gives me a delta of 4µs...

is an arduino really that fast in executing floating point calculations?


Title: Re: Speed of floating point operations
Post by: AWOL on Nov 08, 2012, 04:39 pm
Your parameter is effectively a constant - maybe the compiler optimised the call to sqrt away.
Title: Re: Speed of floating point operations
Post by: JimEli on Nov 08, 2012, 04:39 pm
What he said.

http://ucexperiment.wordpress.com/2012/10/30/arduino-timing-failure/
Title: Re: Speed of floating point operations
Post by: ckiick on Nov 08, 2012, 04:43 pm
A much better method of benchmarking operations like this is to run the calculation in a loop for, say, 10,000 iterations.  You take a timestamp before, and a timestamp after the loop (using micros or even millis if the number of iterations is high enough) and some simple math gets you the average time per operation.  It's much more reliable, evens out hiccups due to things like interrupts, can time operations that take less time than the timer resolution, and would be portable to boards with different clock rates.  You also avoid most problems with the compiler optimizing code in ways you don't expect.  And you don't have to do all that messing around with timers, either.

I don't know why you are getting those results, but try it this way and see if you get more reasonable results.
Title: Re: Speed of floating point operations
Post by: PeterH on Nov 08, 2012, 04:45 pm
Since you're calling a known function whose result only depends on its argument, and the argument is a compile-time constant, it's conceivable that the floating point calculation has been optimised out by the compiler. If this was happening then you might get a different execution time if you included a value which was not a compile-time constant.

Edit add: too slow!
Title: Re: Speed of floating point operations
Post by: zatalian on Nov 08, 2012, 05:11 pm
Ok... replacing 50.0 by analogRead(0) gives me more sane results for functions sin() and sqrt(). (200-300 clockpulses)

But what about the next example :

Code: [Select]

void setup()
{
  Serial.begin(9600);
  TCCR1B &= 0xF8;
  TCCR1B |= (1 << CS10);
}

void loop()
{
  float fnumber1, fnumber2;
  float fresult = 0.0;
 
  fnumber1 = (float) analogRead(0);
  fnumber2 = (float) analogRead(1);
 
  uint16_t time;
  TCNT1 = 0;

  fresult = fnumber1 / fnumber2;
 
  time = TCNT1;
   
  Serial.print("delay: ");
  Serial.println(time, DEC);
 
 
  Serial.print(fnumber1, DEC);
  Serial.print(" / ");
  Serial.print(fnumber2, DEC);
  Serial.print(" = ");
  Serial.println(fresult, DEC); 
 
  delay(1000);
}


I'm deviding 2 floating point numbers (no compile time constants). Again, i get a delta of 1 clockpulse... for a floating point devision.

Title: Re: Speed of floating point operations
Post by: DuaneB on Nov 08, 2012, 05:39 pm
Optimised out again -

Code: [Select]

  uint16_t time;
  TCNT1 = 0;
     10a: e4 e8        ldi r30, 0x84 ; 132
     10c: f0 e0        ldi r31, 0x00 ; 0
     10e: 11 82        std Z+1, r1 ; 0x01
     110: 10 82        st Z, r1

  fresult = fnumber1 / fnumber2;
 
  time = TCNT1;
     112: e0 80        ld r14, Z
     114: f1 80        ldd r15, Z+1 ; 0x01
   
  Serial.print("delay: ");



Can't understand why, your printing it out later in the code so it shouldn't be getting optimized out, but this suggests it is.

disassembled like so -

http://rcarduino.blogspot.com/2012/09/how-to-view-arduino-assembly.html (http://rcarduino.blogspot.com/2012/09/how-to-view-arduino-assembly.html)

Duane B

rcarduino.blogspot.com (http://rcarduino.blogspot.com)
Title: Re: Speed of floating point operations
Post by: DuaneB on Nov 08, 2012, 05:43 pm
I moved fresult to global scope and made it volatile to force the compiler to leave it alone, now we get the following and your test will work -

Code: [Select]
  uint16_t time;
  TCNT1 = 0;
     10a: 04 e8        ldi r16, 0x84 ; 132
     10c: 10 e0        ldi r17, 0x00 ; 0
     10e: f8 01        movw r30, r16
     110: 11 82        std Z+1, r1 ; 0x01
     112: 10 82        st Z, r1

  fresult = fnumber1 / fnumber2;
     114: c6 01        movw r24, r12
     116: b5 01        movw r22, r10
     118: a4 01        movw r20, r8
     11a: 93 01        movw r18, r6
     11c: 0e 94 58 06 call 0xcb0 ; 0xcb0 <__divsf3>
     120: 60 93 24 01 sts 0x0124, r22
     124: 70 93 25 01 sts 0x0125, r23
     128: 80 93 26 01 sts 0x0126, r24
     12c: 90 93 27 01 sts 0x0127, r25
 
  time = TCNT1;
     130: f8 01        movw r30, r16
     132: e0 80        ld r14, Z
     134: f1 80        ldd r15, Z+1 ; 0x01


Again heres how I get the disassembly -

http://rcarduino.blogspot.com/2012/09/how-to-view-arduino-assembly.html (http://rcarduino.blogspot.com/2012/09/how-to-view-arduino-assembly.html)

Duane B

rcarduino.blogspot.com (http://rcarduino.blogspot.com)
Title: Re: Speed of floating point operations
Post by: robtillaart on Nov 08, 2012, 07:39 pm
just use

volatile float fresult = sqrt(fnumber);

and check your timing....

volatile says to the compiler you may not optimize this statement.
Title: Re: Speed of floating point operations
Post by: PeterH on Nov 08, 2012, 07:40 pm

Can't understand why, your printing it out later in the code so it shouldn't be getting optimized out, but this suggests it is.


I don't think the calculation could have been eliminated completely - all I can think is that the compiler has reordered the code so that the calculation no longer occurs between the timing statements.
Title: Re: Speed of floating point operations
Post by: zatalian on Nov 09, 2012, 10:39 am

disassembled like so -

http://rcarduino.blogspot.com/2012/09/how-to-view-arduino-assembly.html (http://rcarduino.blogspot.com/2012/09/how-to-view-arduino-assembly.html)


Thanks a lot for that link!! That's exactly what a was missing...



Can't understand why, your printing it out later in the code so it shouldn't be getting optimized out, but this suggests it is.


I don't think the calculation could have been eliminated completely - all I can think is that the compiler has reordered the code so that the calculation no longer occurs between the timing statements.


using the volatile keyword to create the variable solved this problem too. I'm getting 20-40 clockpulses / floating point devision now (calculating the average of 10000 devisions as suggested)


Maybe 1 last question... Is there a way to remove the -Os compile option and get rid of all the optimizations? Just for the sake of this kind of exercises?


Title: Re: Speed of floating point operations
Post by: robtillaart on Nov 09, 2012, 11:37 am
Quote
a friend of mine asked if it was possible to make a very small device that could:

(never tried)
move the compiler to another folder and place a proxy.exe in the current one that just passes the params you like
Title: Re: Speed of floating point operations
Post by: DuaneB on Nov 09, 2012, 11:39 am
Quote
20-40 clockpulses


I am very surprised that its that fast, just to be sure, do you mean milliseconds or clock pulses ?

Duane B
Title: Re: Speed of floating point operations
Post by: michael_x on Nov 09, 2012, 12:28 pm
Quote
volatile float fresult = sqrt(fnumber);

and check your timing....


will skip optimization of the copy of a float variable to the next. ( Which I thought were slightly faster than 20 clock pulses )

What about
volatile float fnumber= 50.0;
float fresult = sqrt(fnumber);


might be considerably slower.
Either try it or look at assembly code again ...
Title: Re: Speed of floating point operations
Post by: MichaelMeissner on Nov 09, 2012, 01:56 pm

just use

volatile float fresult = sqrt(fnumber);

and check your timing....

volatile says to the compiler you may not optimize this statement.

All volatile says is it may not alter the stores or loads to the variable.  In particular, the compiler is allowed to optimize the sqrt value and save it away in a temporary value or compute it in the compiler, and then store the saved value in the loop.

The code sequences that replace the original code have implicit conversions from integer to floating point as well as the sqrt operation.  In addition, if you ever move the code to a different processor, like say a Due that uses the Arm chip, storing the result of sqrt into a float variable will cause an implicit double to single conversion.
Title: Re: Speed of floating point operations
Post by: DuaneB on Nov 09, 2012, 02:02 pm
Which all goes to show what a nonsense measuring the performance and optimization of contrived sequences of code is.

If you have a real application - then it gets interesting.

Duane B.

Title: Re: Speed of floating point operations
Post by: dhenry on Nov 09, 2012, 02:36 pm
Quote
All volatile says is it may not alter the stores or loads to the variable.


It tells the compiler not to assume that the value in a variable, even if it does not appear that the variable has not been written to.

The issue in this code is due to the resulting variable is not used so the division was optimized away. You can force a use of that division by assigning it to a (volatile) variable.
Title: Re: Speed of floating point operations
Post by: PeterH on Nov 09, 2012, 03:36 pm

Is there a way to remove the -Os compile option and get rid of all the optimizations? Just for the sake of this kind of exercises?


You are proposing to disable all optimisations in order to make it possible to measure the performance? Doesn't that render the performance measurements meaningless?
Title: Re: Speed of floating point operations
Post by: robtillaart on Nov 09, 2012, 03:53 pm
Quote
Which all goes to show what a nonsense measuring the performance and optimization of contrived sequences of code is.

If you have a real application - then it gets interesting.


Agree with you unless your goal is to learn about optimizations and how they are done. (there is always that other option ;)
Title: Re: Speed of floating point operations
Post by: zatalian on Nov 11, 2012, 01:53 pm

Quote
20-40 clockpulses


I am very surprised that its that fast, just to be sure, do you mean milliseconds or clock pulses ?

Duane B


clockpulses... This is the code i used :

Code: [Select]
void setup()
{
 Serial.begin(9600);
 TCCR1B &= 0xF8;
 TCCR1B |= (1 << CS10);
}

void loop()
{
 float fnumber1, fnumber2;
 volatile float fresult = 0.0;
 
 uint16_t time;
 unsigned long total;
 
 for (int i = 0; i < 10000; i++)
 {
   fnumber1 = random() / 1000.0;
   fnumber2 = random() / 1000.0;
 
   TCNT1 = 0;
   fresult = fnumber1 / fnumber2;
   time = TCNT1;
 
   total += time;
 }
 total /= 10000;
 
 Serial.print("delta: ");
 Serial.println(time, DEC);
 
 
 Serial.print(fnumber1, DEC);
 Serial.print(" / ");
 Serial.print(fnumber2, DEC);
 Serial.print(" = ");
 Serial.println(fresult, DEC);  
 

}




Quote
Which all goes to show what a nonsense measuring the performance and optimization of contrived sequences of code is.

If you have a real application - then it gets interesting.


Agree with you unless your goal is to learn about optimizations and how they are done. (there is always that other option ;)


Well, disabling optimizations can be useful to compare floating point operations versus integer operations. The whole point of this exercise was to know - before i have a complete project - if the arduino will be fast enough and if i will be able to use floating points or if I will have to do all the calculations with integers.

But in the end... these measurements will indeed be estimates and real measurements can only be taken in real programs. I totally agree with that statement.
Thanks to everybody for this very informative discussion.


Title: Re: Speed of floating point operations
Post by: chung on Nov 11, 2012, 03:02 pm
This (http://www.nongnu.org/avr-libc/user-manual/benchmarks.html) is a good reference on the number of clockcycles required by an AVR chip to perform some mathematical operation (floating point operation). However, make sure not to invoke the function using some Arduino wrapper. As a rule of thumb, the lower your level of coding becomes, the faster the performance of your implementation will be, and the more difficult your code will become; so there's a balance you need to strike between efficiency and simplicity.