A basic question about "float" and "double"

Hi friends!
Maybe the next question is very basic, but I don't understand deeply the meaning of variable "float".
If you search the definition of variable float in webpage of Arduino: float - Arduino Reference you can see the next:

"(..) Floating point numbers are not exact, and may yield strange results when compared. For example 6.0 / 3.0 may not equal 2.0. You should instead check that the absolute value of the difference between the numbers is less than some small number."

I don't understand what it means. I have done diferents test in my Arduino and always 6 between 3 is equal to 2. On the other hand, what meaning: "(...) is less than some small number."? ¿What number?

Finally, when should I use a "float"?, and... when should I use a "double"?

Thank you very much!!

On AVR-based Arduinos, float and double are exactly the same. On other platforms, like ARM they will often be different, with float being 32-bit, and double being 64-bit.

float and double numbers are rarely "exact", as they are inherently limited to representing of range of 6-7 decades, and therefore are inherently incapable of handling all numbers without some loss of precision. To demonstrate this, try running this code:

void setup()
{
    Serial.begin(9600);
    float x = 1.000;
    Serial.print("x=");
    char buf[16];
    Serial.println(dtostrf(x), 10, 8, buf));

    x = x + 0.00000001;
    Serial.print("x+0.00000001=");
    char buf[16];
    Serial.println(dtostrf(x), 15, 10, buf));

    while(1)
        ;
}

void loop()
{
}

Regards,
Ray L.

1 Like

@OP

1. In human vocabulary, the integer number refers to a number with no decimal point and fractional part; it has only integer part. For example: 1234.

2. In human vocabulary, the decimal number refers to a number with integer part, decimal point, and fractional part. For example: 12.37. In programming language, it is known as floating point number or simply float.

3. How to store a integer number like 1234 into computer memory?
We simply execute the following instruction; as a result, the given number is automatically saved into two consecutive memory locations. The given integer number is always saved as binary bits in the computer memory. This is to say that the given integer number of base 10 is converted into binary number of base 2 (123410 ---->00000100110100102 ---->04D216) and then it is saved into computer memory. As one memory location can hold 8-bit data, we need two memory locations to hold the given data.

unsigned int x = 1234;
===> unsigned int x = 0x04D2;

4. How to store a floating point number like 12.37 into computer memory?
(1) According to IEEE-754 Standard (aka binary32 format), a float number (+ve or -ve) will be coded into 32-bit data as per following template in order to store it into computer memory. This is what we call '32-bit representation of a float number'.
binary32.png
Figure-1: 32-bit representation of a float number (for example: 0.15625)

(2) Manual procedure to obtain binary32 formatted value for the float number 0.15625.
(a) Calculating binary bits of 0.15625

0.15625x2 = 0.3125
0.3125x2   = 0.625
0.625x2     = 1.25
1.25x2       = 0.50
0.50x2       = 1.00

... continue until the residual is exhausted to 0 or 23 fractional bits are accumulated

(b) 0.1562510 = 0.001012
==>0x20+0x2-1+0x2-2+1x2-3+0x2-4+1x2-5
==> 0 + 0 + 0 + 0.125 + 0 + 0.03125
==> 0.15625

(c) 0.001012
==> 1.012*2-3

(d) binary32 format value of 0.15625 as per Fig-1.
Sign (1-bit: b31: 0 (the given number is +ve)
Biased exponent (8-bit : b30 - b23): -3 (from Step-4c) + 127 (fixed bias) = 124 = 7Ch
fraction (23-bit: b22 - b0) 01000000000000000000000 (from Step-4c)

(e) binray32 value: 0(sign) 01111100(biased exponent) 01000000000000000000000(fraction)

(f) Arranging as nibbles: 0011 1110 0010 0000 0000 0000 0000 0000

(g) Presenting in hex: 3E200000 (4-byte = 4x8 = 32-bit)

5. Programming codes to generate binary32 formatted value for a float number (say: 0.15625)
(1) When we make the following declaration using the keyword float, the binary32 formatted 32-bit (4-byte) value is automatically saved into 4 consecutive memory locations. The low order memory location holds the lower byte of the data.

float x = 0.15625;

(2) We may execute the following codes to collect the 32-bit data from the unseen memory locations and show it in the Serial Monitor.

float x = 0.15625;
unsigned long *ptr;
ptr = (unsigned long*) &x;
unsigned long m = *ptr;
Serial.println(m, HEX); //shows: 3E200000

6. Precision of a binary32 formatted float number
Precision refers to the number of digits after the decimal point that we can present. For the binary32 formatted float number, the precision is 23 digits.

7. Accuracy of a binary32 formatted float result
Accuracy refers to 'how many' digits are coming out exactly in the result during the processing of two/more float numbers. For example:

float x1 = 12.12345678;
float x2 = 23.12345678;
----------------------------------------------
Sum on manual calculation: 35.24691356

Sum using program codes:
float x = x1 + x2;
Serial.print(x, 23); //shows: 35.246913 90991210937500000

The accuracy is 6-digit; whereas, in manual calculation we have 8-digit accuracy.

8. double type floating point number.
This is a 64-bit representation of a floating point number. In this representation, there are 53 fraction bits after the decimal point in addition to a very large integer part. The result of the processing of double type float numbers gives about 15-digit accuracy. The encoding format (known as binary64) is:
binar64.png
Figure-2: 64-bit (binary64 formatted) representation of a float number

9. Arduino UNO, NANO, and MEGA (all 8-bit AVR) supports only 32-bit representation of float number. In these Arduinos, the keywords float and double mean the same thing -- the binary32 format.

10. Arduino DUE supports both 32-bit and 64-bit representation of decimal numbers via the keywords float and double respectively.

I apologize and seek correction if any misconception is being carried by this post.

binary32.png

binar64.png

Hi Friends!
Thank you very much!
You are the best!!

josepramon:
Hi friends!
Maybe the next question is very basic, but I don't understand deeply the meaning of variable "float".
If you search the definition of variable float in webpage of Arduino: float - Arduino Reference you can see the next:

"(..) Floating point numbers are not exact, and may yield strange results when compared. For example 6.0 / 3.0 may not equal 2.0. You should instead check that the absolute value of the difference between the numbers is less than some small number."

I don't understand what it means. I have done diferents test in my Arduino and always 6 between 3 is equal to 2. On the other hand, what meaning: "(...) is less than some small number."? ¿What number?

Finally, when should I use a "float"?, and... when should I use a "double"?

Thank you very much!!

The example "6 / 3 might not equal 2" is only meant to illustrate the type of problem that floating point numbers have, not to be an example of an actual computation that is wrong. 6/3==2 is fine because they're both small integers. However, floating point numbers do have precision limits that will become significant if you are working with very large numbers or numbers that cannot be expressed as a binary fraction (a rational number whose denominator is a power of 2).

For example, this sketch should print false. When you're using floats, 12/33 multiplied by 33 is not equal to 12, even though math says it should be.

void setup() {
  // put your setup code here, to run once:
  Serial.begin(9600);
  volatile float a;
  a = 12/33;
  Serial.println((a * 33 == 12)?"true":"false");
}

void loop() {
  // put your main code here, to run repeatedly:

}

Never use equality to test floats. Instead, subtract them and see if the absolute value of their difference is smaller than some arbitrary epsilon value that you choose (ie. abs(a - b) < 0.001 or something like that).

josepramon:
Finally, when should I use a "float"?, and... when should I use a "double"?

As already stated, for an 8-bit AVR, a float is exactually the same as a double (4 bytes), whereas a double is on 8 bytes with 15 decimals accuracy and a float on 4 bytes with 7 decimals accuracy on an 32-bit ARM.

GolamMostafa:
2. In human vocabulary, the decimal number refers to a number with integer part, decimal point, and fractional part. For example: 12.37. In programming language, it is known as floating point number or simply float.

...

I apologize and seek correction if any misconception is being carried by this post.

In English vocabulary, a decimal number is a number expressed in base 10.

Nothing more, nothing less.

You're welcome

AWOL:
In English vocabulary, a decimal number is a number expressed in base 10.

Nothing more, nothing less.

And of course a decimal number could be integer, or non-integer.

For example 3.141592654 (pi (approx)), is a decimal number, using the ten digits, 0 thru 9

  1. Precision of a binary32 formatted float number
    Precision refers to the number of digits after the decimal point that we can present. For the binary32 formatted float number, the precision is 23 binary digits.

There - clarified that for you.

Again, you're welcome.

Is it just me, or do GolamMostafa's answers tend to be a little um lengthy, not to mention sometimes being um wrong?

neiklot:
Is it just me, or do GolamMostafa's answers tend to be a little um lengthy, not to mention sometimes being um wrong?

"differently correct"