### Topic: Speed of floating point operations (Read 7754 times)previous topic - next topic

#### zatalian

##### Nov 08, 2012, 04:23 pm
Hi,

I'm new to this forum, but i have been working with arduino's for quite a while.
I have a question about the execution time of floating point commands. I found this old thread: http://arduino.cc/forum/index.php/topic,40901.0.html mentioning the speed of execution and was wondering where those numbers where comming from...
I get different (very strange and confusing) results...

First i used the function micros() to time my operations, but i read that micros()'s resolution is 4µs?
For my second attempt, i'm using Timer1, with prescaler 1, on a duemillanove, so i should have 16 ticks / µs.

This is the code i'm trying to time :

Code: [Select]
`void setup(){  Serial.begin(9600);  TCCR1B &= 0xF8;  TCCR1B |= (1 << CS10);}void loop(){  float fnumber;  float fresult = 0.0;      uint16_t time;  TCNT1 = 0;  fnumber = 50.0;  fresult = sqrt(fnumber);  //fresult = sin(fnumber);  //delayMicroseconds(10);  time = TCNT1;     Serial.print("delay: ");  Serial.println(time, DEC);    Serial.print("sqrt(");  Serial.print(fnumber, DEC);  Serial.print(") = ");  Serial.println(fresult, DEC);      delay(1000);}`

executing a sqrt() or sin() gives my a delta of 1, meaning 1/16th of a µs. This can't be right, but i can't figure out what i'm doing wrong...
Inserting a delayMicroseconds(10) function gives me (roughly) the correct delta of 156 (160 expected), so my timers seems to work correctly.

using the function micros() instead of timer1 gives me a delta of 4µs...

is an arduino really that fast in executing floating point calculations?

#### AWOL

#1
##### Nov 08, 2012, 04:39 pm
Your parameter is effectively a constant - maybe the compiler optimised the call to sqrt away.

#2

#### ckiick

#3
##### Nov 08, 2012, 04:43 pm
A much better method of benchmarking operations like this is to run the calculation in a loop for, say, 10,000 iterations.  You take a timestamp before, and a timestamp after the loop (using micros or even millis if the number of iterations is high enough) and some simple math gets you the average time per operation.  It's much more reliable, evens out hiccups due to things like interrupts, can time operations that take less time than the timer resolution, and would be portable to boards with different clock rates.  You also avoid most problems with the compiler optimizing code in ways you don't expect.  And you don't have to do all that messing around with timers, either.

I don't know why you are getting those results, but try it this way and see if you get more reasonable results.

#### PeterH

#4
##### Nov 08, 2012, 04:45 pm
Since you're calling a known function whose result only depends on its argument, and the argument is a compile-time constant, it's conceivable that the floating point calculation has been optimised out by the compiler. If this was happening then you might get a different execution time if you included a value which was not a compile-time constant.

#### zatalian

#5
##### Nov 08, 2012, 05:11 pm
Ok... replacing 50.0 by analogRead(0) gives me more sane results for functions sin() and sqrt(). (200-300 clockpulses)

But what about the next example :

Code: [Select]
`void setup(){  Serial.begin(9600);  TCCR1B &= 0xF8;  TCCR1B |= (1 << CS10);}void loop(){  float fnumber1, fnumber2;  float fresult = 0.0;    fnumber1 = (float) analogRead(0);  fnumber2 = (float) analogRead(1);    uint16_t time;  TCNT1 = 0;  fresult = fnumber1 / fnumber2;    time = TCNT1;     Serial.print("delay: ");  Serial.println(time, DEC);      Serial.print(fnumber1, DEC);  Serial.print(" / ");  Serial.print(fnumber2, DEC);  Serial.print(" = ");  Serial.println(fresult, DEC);      delay(1000);}`

I'm deviding 2 floating point numbers (no compile time constants). Again, i get a delta of 1 clockpulse... for a floating point devision.

#### DuaneB

#6
##### Nov 08, 2012, 05:39 pm
Optimised out again -

Code: [Select]
`  uint16_t time;  TCNT1 = 0;     10a: e4 e8        ldi r30, 0x84 ; 132     10c: f0 e0        ldi r31, 0x00 ; 0     10e: 11 82        std Z+1, r1 ; 0x01     110: 10 82        st Z, r1  fresult = fnumber1 / fnumber2;    time = TCNT1;     112: e0 80        ld r14, Z     114: f1 80        ldd r15, Z+1 ; 0x01     Serial.print("delay: ");`

Can't understand why, your printing it out later in the code so it shouldn't be getting optimized out, but this suggests it is.

disassembled like so -

Duane B

#### DuaneB

#7
##### Nov 08, 2012, 05:43 pm
I moved fresult to global scope and made it volatile to force the compiler to leave it alone, now we get the following and your test will work -

Code: [Select]
`  uint16_t time;  TCNT1 = 0;     10a: 04 e8        ldi r16, 0x84 ; 132     10c: 10 e0        ldi r17, 0x00 ; 0     10e: f8 01        movw r30, r16     110: 11 82        std Z+1, r1 ; 0x01     112: 10 82        st Z, r1  fresult = fnumber1 / fnumber2;     114: c6 01        movw r24, r12     116: b5 01        movw r22, r10     118: a4 01        movw r20, r8     11a: 93 01        movw r18, r6     11c: 0e 94 58 06 call 0xcb0 ; 0xcb0 <__divsf3>     120: 60 93 24 01 sts 0x0124, r22     124: 70 93 25 01 sts 0x0125, r23     128: 80 93 26 01 sts 0x0126, r24     12c: 90 93 27 01 sts 0x0127, r25    time = TCNT1;     130: f8 01        movw r30, r16     132: e0 80        ld r14, Z     134: f1 80        ldd r15, Z+1 ; 0x01`

Again heres how I get the disassembly -

Duane B

#### robtillaart

#8
##### Nov 08, 2012, 07:39 pmLast Edit: Nov 08, 2012, 07:44 pm by robtillaart Reason: 1
just use

volatile float fresult = sqrt(fnumber);

volatile says to the compiler you may not optimize this statement.
#### PeterH

#9
##### Nov 08, 2012, 07:40 pm

Can't understand why, your printing it out later in the code so it shouldn't be getting optimized out, but this suggests it is.

I don't think the calculation could have been eliminated completely - all I can think is that the compiler has reordered the code so that the calculation no longer occurs between the timing statements.

#### zatalian

#10
##### Nov 09, 2012, 10:39 am

disassembled like so -

http://rcarduino.blogspot.com/2012/09/how-to-view-arduino-assembly.html

Thanks a lot for that link!! That's exactly what a was missing...

Can't understand why, your printing it out later in the code so it shouldn't be getting optimized out, but this suggests it is.

I don't think the calculation could have been eliminated completely - all I can think is that the compiler has reordered the code so that the calculation no longer occurs between the timing statements.

using the volatile keyword to create the variable solved this problem too. I'm getting 20-40 clockpulses / floating point devision now (calculating the average of 10000 devisions as suggested)

Maybe 1 last question... Is there a way to remove the -Os compile option and get rid of all the optimizations? Just for the sake of this kind of exercises?

#### robtillaart

#11
##### Nov 09, 2012, 11:37 am
Quote
a friend of mine asked if it was possible to make a very small device that could:

(never tried)
move the compiler to another folder and place a proxy.exe in the current one that just passes the params you like
#### DuaneB

#12
##### Nov 09, 2012, 11:39 am
Quote
20-40 clockpulses

I am very surprised that its that fast, just to be sure, do you mean milliseconds or clock pulses ?

#### michael_x

#13
##### Nov 09, 2012, 12:28 pm
Quote
volatile float fresult = sqrt(fnumber);

will skip optimization of the copy of a float variable to the next. ( Which I thought were slightly faster than 20 clock pulses )

volatile float fnumber= 50.0;
float fresult = sqrt(fnumber);

might be considerably slower.
Either try it or look at assembly code again ...

#### MichaelMeissner

#14
##### Nov 09, 2012, 01:56 pm

just use

volatile float fresult = sqrt(fnumber);