Calculation speed (floats vs ints)

Lars81 · January 13, 2016, 10:21pm

So how much impact does this have on speed? Is there a simple way to test it or a simple rule of thumb?

I tried making a simple sketch where two variables are declared as ints and multiplied to form a third variable as int and then switch a pin high/low to see the loop period on a scope. Then I changed the variables from int to float and got the same period. I guess the "loop time" in arduino is the limiting factor in this case?

DrAzzy · January 13, 2016, 10:24pm

If the numbers are all constants, the compiler may be optimizing out the math.

Floats should be significantly slower.

Also, tighten the loop, and don't use digitalWrite() to twiddle the pin, because digitalWrite() is slow, it's like 50 clock cycles or something.

Pick a pin to output the signal on, set it as output, and look at pinout charts. You'll see it marked like PA1 or PC2 or whatever. To toggle the current state of that pin:

PINA=(1<<1);
or
PINC=(1<<2);

and so on. Much, much faster.

ie
void setup() {
pinMode(A2,OUTPUT);
while (1){
float1+=float3; //gotta change it's value so it doesn't get optimized out.
PINC=(1<<2);
}
}
void loop() {
//code will never get here since there's an infinite loop in setup()
}

Something like that will probably work better - that was just off the top of my head.
Note that I think someone actually did more rigorous performance calculations here and posted them somewhere.

robtillaart · January 13, 2016, 10:39pm

if you test performance you should declare the ints and/or the floats volatile, so the compiler won't optimize them. IN sketch below the multiplication is tested for 3 datatypes in a loop of 1000. Note that the loop overhead is same for all 3 loops.

//
//    FILE: .ino
//  AUTHOR: Rob Tillaart
// VERSION: 0.1.00
// PURPOSE: demo
//    DATE: 2016=01-13
//     URL: http://forum.arduino.cc/index.php?topic=371813.0
//
// Released to the public domain
//

uint32_t start;
uint32_t stop;

volatile int x = 3, y = 4, z;
volatile long k = 3, l = 4, m;
volatile float p = 3, q = 4, r;

void setup()
{
  Serial.begin(115200);
  Serial.print("Start ");
  Serial.println(__FILE__);

  start = micros();
  for (int i = 0; i < 1000; i++)
  {
    z = y * x;
  }
  stop = micros();
  Serial.println(stop - start);

  start = micros();
  for (int i = 0; i < 1000; i++)
  {
    m = k * m;
  }
  stop = micros();
  Serial.println(stop - start);

  start = micros();
  for (int i = 0; i < 1000; i++)
  {
    r = p * q;
  }
  stop = micros();
  Serial.println(stop - start);
}

void loop()
{
}

==>

Start sketch_jan13b.ino
1804
6384
10156

westfw · January 14, 2016, 2:13am

Floats should be significantly slower.

Note that floating point divides may be very close or even faster than 32bit integer ("long") divides, because code has to essentially perform the same divide algorithm on only 24 bits instead of 32.

robtillaart · January 14, 2016, 5:58pm

westfw:
Note that floating point divides may be very close or even faster than 32bit integer ("long") divides, because code has to essentially perform the same divide algorithm on only 24 bits instead of 32.

update sketch,

added byte in multiply
added divide for 4 datatypes.
added addition for 4 datatypes (subtraction is same)

//
//    FILE: mulCompare.ino
//  AUTHOR: Rob Tillaart
// VERSION: 0.1.01
// PURPOSE: demo
//    DATE: 2016-01-13
//     URL: http://forum.arduino.cc/index.php?topic=371813
//
// Released to the public domain
//

uint32_t start;
uint32_t stop;

volatile byte a = 3, b = 4, c;
volatile int x = 3, y = 4, z;
volatile long k = 3, l = 4, m;
volatile float p = 3, q = 4, r;

void setup()
{
  Serial.begin(115200);
  Serial.print("Start ");
  Serial.println(__FILE__);
  Serial.println("multiply compare, time per 1000 micros");

  start = micros();
  for (int i = 0; i < 1000; i++)
  {
    c = a * b;
  }
  stop = micros();
  Serial.println(stop - start);

  start = micros();
  for (int i = 0; i < 1000; i++)
  {
    z = y * x;
  }
  stop = micros();
  Serial.println(stop - start);

  start = micros();
  for (int i = 0; i < 1000; i++)
  {
    m = k * m;
  }
  stop = micros();
  Serial.println(stop - start);

  start = micros();
  for (int i = 0; i < 1000; i++)
  {
    r = p * q;
  }
  stop = micros();
  Serial.println(stop - start);


  Serial.println("divide compare, time per 1000 micros");

  start = micros();
  for (int i = 0; i < 1000; i++)
  {
    c = a / b;
  }
  stop = micros();
  Serial.println(stop - start);

  start = micros();
  for (int i = 0; i < 1000; i++)
  {
    z = y / x;
  }
  stop = micros();
  Serial.println(stop - start);

  start = micros();
  for (int i = 0; i < 1000; i++)
  {
    m = k / m;
  }
  stop = micros();
  Serial.println(stop - start);

  start = micros();
  for (int i = 0; i < 1000; i++)
  {
    r = p / q;
  }
  stop = micros();
  Serial.println(stop - start);

  Serial.println("add compare, time per 1000 micros");

  start = micros();
  for (int i = 0; i < 1000; i++)
  {
    c = a + b;
  }
  stop = micros();
  Serial.println(stop - start);

  start = micros();
  for (int i = 0; i < 1000; i++)
  {
    z = y + x;
  }
  stop = micros();
  Serial.println(stop - start);

  start = micros();
  for (int i = 0; i < 1000; i++)
  {
    m = k + m;
  }
  stop = micros();
  Serial.println(stop - start);

  start = micros();
  for (int i = 0; i < 1000; i++)
  {
    r = p + q;
  }
  stop = micros();
  Serial.println(stop - start);
}

void loop()
{
}

output (byte int long float)

multiply compare, time per 1000 micros
976
1804
6568
10160

divide compare, time per 1000 micros
5968
14628
40888
29408

add compare, time per 1000 micros
768
1256
2156
7896

Dande80 · January 14, 2016, 6:41pm

Just to check I ran this on a Teensy 3.2 at 72MHz. Here are the results:

multiply compare, time per 1000 micros
230
181
167
601
divide compare, time per 1000 micros
170
252
167
515
add compare, time per 1000 micros
143
169
154
1045

And the same on a Arduino Uno:

multiply compare, time per 1000 micros
968
1812
6632
10160
divide compare, time per 1000 micros
5960
14628
40888
29408
add compare, time per 1000 micros
768
1252
2156
7896

Damn ist that ARM M4 chip fast.

CrossRoads · January 14, 2016, 7:32pm

Yep, can blink an LED like it's nobody's business 8)

Lars81 · January 15, 2016, 12:15pm

Wow, I had noe idea that data types had such a huge impact on speed. Same with the type of operation performed. Dividing also had a huge cost compared to addidion/subtraction.

Lars81 · January 15, 2016, 12:27pm

Dande80:
Just to check I ran this on a Teensy 3.2 at 72MHz. Here are the results:

multiply compare, time per 1000 micros
230
181
167
601
divide compare, time per 1000 micros
170
252
167
515
add compare, time per 1000 micros
143
169
154
1045

And the same on a Arduino Uno:

multiply compare, time per 1000 micros
968
1812
6632
10160
divide compare, time per 1000 micros
5960
14628
40888
29408
add compare, time per 1000 micros
768
1252
2156
7896

Damn ist that ARM M4 chip fast.

How come float is even faster sometimes than the other types on the ARM?

robtillaart · January 15, 2016, 2:29pm

Lars81:
How come float is even faster sometimes than the other types on the ARM?

float = 23 bit mantisse == 3 bytes and 8 bit exponent
long = 32 bit mantisse == 4 bytes

for division the exponents are subtracted which is very fast for 8 bit, takes < 5% of time.
so in effect you are comparing a 3 byte division with a 4 byte division .
From this reasoning the time for a long should be approx 4/3 x time float
looking at the number for UNO we see
float = ~30 uSec and long ~40 uSec

Note that other numbers might give different results.

drgrujic · May 9, 2019, 8:11am

Here is the result from Arduino DUE including double:
(please note that on DUE word: 16bit, int: 32bit, long: 32bit, float: 32bit, double: 64bit)

multiply compare, time per 1000 micros
byte: 345
word: 173
int: 298
long: 296
float: 888
double: 1195

divide compare, time per 1000 micros
byte: 137
word: 161
int: 359
long: 219
float: 874
double: 1072

add compare, time per 1000 micros
byte: 132
word: 270
int: 220
long: 296
float: 1356
double: 1702

At the end I am confused. Why byte take more time than word?
Why int and long are not the same (on DUE both 32 bit)?

Here is modified code from robtillaart:

//
//    FILE: mulCompare.ino
//  AUTHOR: Rob Tillaart
// VERSION: 0.1.01
// PURPOSE: demo
//    DATE: 2016-01-13
//     URL: http://forum.arduino.cc/index.php?topic=371813
//
// Released to the public domain
//

uint32_t start;
uint32_t stop;

volatile byte a = 3, b = 4, c;
volatile word aw = 3, bw = 4, cw;
volatile int x = 3, y = 4, z;
volatile long k = 3, l = 4, m;
volatile float p = 3, q = 4, r;
volatile double s = 3, t = 4, w;

void setup()
{
  Serial.begin(250000);
  Serial.print("Start ");
  Serial.println(__FILE__);
  Serial.println("multiply compare, time per 1000 micros");

  start = micros();
  for (int i = 0; i < 1000; i++)
  {
    c = a * b;
  }
  stop = micros();
  Serial.print("byte: ");
  Serial.println(stop - start);

  start = micros();
  for (int i = 0; i < 1000; i++)
  {
    cw = aw * bw;
  }
  stop = micros();
  Serial.print("word: ");
  Serial.println(stop - start);

  start = micros();
  for (int i = 0; i < 1000; i++)
  {
    z = y * x;
  }
  stop = micros();
  Serial.print("int: ");
  Serial.println(stop - start);

  start = micros();
  for (int i = 0; i < 1000; i++)
  {
    m = k * l;
  }
  stop = micros();
  Serial.print("long: ");
  Serial.println(stop - start);

  start = micros();
  for (int i = 0; i < 1000; i++)
  {
    r = p * q;
  }
  stop = micros();
  Serial.print("float: ");
  Serial.println(stop - start);

  start = micros();
  for (int i = 0; i < 1000; i++)
  {
    w = s * t;
  }
  stop = micros();
  Serial.print("double: ");
  Serial.println(stop - start);


  Serial.println("divide compare, time per 1000 micros");

  start = micros();
  for (int i = 0; i < 1000; i++)
  {
    c = a / b;
  }
  stop = micros();
  Serial.print("byte: ");
  Serial.println(stop - start);

  start = micros();
  for (int i = 0; i < 1000; i++)
  {
    cw = aw / bw;
  }
  stop = micros();
  Serial.print("word: ");
  Serial.println(stop - start);

  start = micros();
  for (int i = 0; i < 1000; i++)
  {
    z = y / x;
  }
  stop = micros();
  Serial.print("int: ");
  Serial.println(stop - start);

  start = micros();
  for (int i = 0; i < 1000; i++)
  {
    m = k / l;
  }
  stop = micros();
  Serial.print("long: ");
  Serial.println(stop - start);

  start = micros();
  for (int i = 0; i < 1000; i++)
  {
    r = p / q;
  }
  stop = micros();
  Serial.print("float: ");
  Serial.println(stop - start);

  start = micros();
  for (int i = 0; i < 1000; i++)
  {
    w = s / t;
  }
  stop = micros();
  Serial.print("double: ");
  Serial.println(stop - start);

  Serial.println("add compare, time per 1000 micros");

  start = micros();
  for (int i = 0; i < 1000; i++)
  {
    c = a + b;
  }
  stop = micros();
  Serial.print("byte: ");
  Serial.println(stop - start);

  start = micros();
  for (int i = 0; i < 1000; i++)
  {
    cw = aw + bw;
  }
  stop = micros();
  Serial.print("word: ");
  Serial.println(stop - start);

  start = micros();
  for (int i = 0; i < 1000; i++)
  {
    z = y + x;
  }
  stop = micros();
  Serial.print("int: ");
  Serial.println(stop - start);

  start = micros();
  for (int i = 0; i < 1000; i++)
  {
    m = k + l;
  }
  stop = micros();
  Serial.print("long: ");
  Serial.println(stop - start);

  start = micros();
  for (int i = 0; i < 1000; i++)
  {
    r = p + q;
  }
  stop = micros();
  Serial.print("float: ");
  Serial.println(stop - start);

  start = micros();
  for (int i = 0; i < 1000; i++)
  {
    w = s + t;
  }
  stop = micros();
  Serial.print("double: ");
  Serial.println(stop - start);
}

void loop()
{
}

ard_newbie · May 9, 2019, 8:27am

The DUE main uc is a 32-bit uc (SAM3X8e), therefore all access are optimized for 32-bit variables.

westfw · May 9, 2019, 8:34am

Many ARMs have some sort of code memory caching that can cause the same code to take different times just because of how it happens to be positions in memory. It can be very difficult to figure out a cycle count, even looking at the assembly code

I'd like to know why the double and float timings are so close - I'd expect doubles to be about half the speed of float...

The DUE main uc is a 32-bit uc (SAM3X8e), therefore all access are optimized for 32-bit variables.

Operations shorter than 32bit can require extra "extend" instructions, depending on operation and how good the compiler is - I've seen articles stressing that 32bit math can be faster than 8bit math on an ARM. But I don't think this explains why 16bit math would be faster than 8bit math...

TS · March 14, 2020, 7:03pm

Here is the result from an ESP32:

multiply compare, time per 1000 micros
byte: 45
word: 57
int: 59
long: 57
float: 62
double: 543

divide compare, time per 1000 micros
byte: 61
word: 72
int: 64
long: 69
float: 881
double: 2203

add compare, time per 1000 micros
byte: 57
word: 65
int: 57
long: 57
float: 58
double: 317

===

Here is the result from an nRf52 (Adafruit Feather Bluefruit):

multiply compare, time per 1000 micros
byte: 0
word: 0
int: 976
long: 977
float: 0
double: 976

divide compare, time per 1000 micros
byte: 0
word: 977
int: 976
long: 0
float: 0
double: 977

add compare, time per 1000 micros
byte: 0
word: 977
int: 977
long: 0
float: 0
double: 977

Any suggestion as to why we have no performance for a float operations on the nRF52?

Topic		Replies	Views
Speed of math with fractions. With or without floats. Programming	19	3837	April 15, 2020
Maths performance Programming	24	8970	December 24, 2013
Floating point tones Programming	27	209	December 10, 2025
Program memory usage: int vs word vs byte vs uint16_t Programming	10	202	March 5, 2025
Universal Arduino Benchmark Bar Sport	99	778	March 7, 2026

Calculation speed (floats vs ints)

Related topics