I'm working on a project that needs signed 7 digit precision, with 5 on the right of the decimal. I've read Arduino's double/float limitation of 6-7 digits. I don't understand how it can be 6 or 7 digits.
I've searched and found nothing about when 6 or 7 would be used. Can someone help me understand this?
Floating point doesn't have a precise precision. Hence the name "floating" point.
If you require a fixed precision, the you want to be working with fixed point maths, which is like integer maths but with an implied decmial point.
In fixed point maths you effectively multiply your values by a precision level, and work with bigger numbers. For instance, if you require 5 decimal point precision, then you can work with 105 multiplication.
The number 4 would be 4x105, or 400000.
The number 12.44928 would be 12.44928x105, or 1244928.
The decimal point doesn't actually exist, but it is implied that it is between the 5th and 6th digits. It's up to you to handle it.
Fixed point maths is also considerably faster than floating point.
It's IEEE floating point. Count on 6 places and sometimes 1 will be .9999999.
If you're good with numbers and remember the by-hand ways then you can achieve more accuracy using 32 and 64 bit integer fixed-point --- in general faster than using floats on AVR-based Arduinos.
Really, 9 places with type long and 19 places with type long long.
My suggestion is if you want like meters to 6 places then use micrometers as your unit and only print the decimal point for human use.
There are ways to use integers to get greater range but since I don't need them I forget the details. Try looking up the Big Number library just for fun, number of places is arbitrary and may be as large as you have RAM and time.
In the IEEE binary interchange formats the leading 1 bit of a normalized significand is not actually stored in the computer datum. It is called the "hidden" or "implicit" bit. Because of this, single precision format actually has a significand with 24 bits of precision, double precision format has 53, and quad has 113.
They quote a single-precision float as having ~7.2 digits, which you can verify thus:
log (2^24) / log (10) = 7.22
So it is probably true to say you have slightly more than 7 digits of precision on the Arduino (using float) rather than 6 to 7.
2^24 = 16777216
Thus I expect to be able to store that number in a float (or maybe one less).
volatile float f = 16777216;
void setup ()
{
Serial.begin (115200);
Serial.println (f);
} // end of setup
void loop () { }
The floating point number has 24 bits of precision. That is, for a particular exponent, it can represent 2 to the power of 24 different numbers. That is about 16.7 million different numbers, or about 7 decimal digits.
But it is only 8.3 million positive and negative numbers.
Once you start adding, multiplying and dividing them, you get small roundoff errors at every stage, which reduces the precision.
If you have two actual numbers ( of unlimited precision ), which are one part in a million different, they will have a different floating point representation. But if they are only one part in ten million different, then they probably won't have a different floating point representation, and the distinction between them will be lost.
Any integer with absolute value less than 2^24 can be exactly represented in the single precision format, and any integer with absolute value less than 2^53 can be exactly represented in the double precision format
I'll give you the "minus one" but it is saying absolute value less than 2^24.
So you should be able to store -16777215 to +16777215. Not half of that.
Once you start adding, multiplying and dividing them, you get small roundoff errors at every stage, which reduces the precision.
What was the old saying? Something like using floating point numbers is like moving around piles of sand, every time you move one you lose a little sand and pick up a little dirt.
When I talk about places I only count the digits that can be 0 to 9. The above has 7 places. Yet the Reference doc says 6-7 and that gives me the feeling that there's values where it's true.
log10 2^32 = 9.63 a good argument for 9 places of accuracy or smaller bits of dirt to replace the lost sand.
log10 2^24 = 7.22 should be enough for government work...
According to the wiki page, if the exponent is non zero, then there is an implicit 1 before decimal, followed by 23 bits of significand after decimal so except for that 16xxxxxx, you can do 24 bits. You have to be at least twice or half value of the straight 16xxxxxxx.
I tried a little experimenting to see if I could figure out the 6 or 7 digit question. I incremented several boundary values that I would working within and found strange results.
Incrementing 31.000001 by .000001 will increase it's value. However, incrementing 32.000001 by the same amount does not. Incrementing by .000002 does work, though.
float i = 31.000001;
float j = 99.000001;
float k = 32.000001;
float m = 32.000001;
float n = 63.000001;
void setup ()
{
Serial.begin (115200);
} // end of setup
void loop () {
i+=.000001;
j+=.000001;
k+=.000001;
m+=.000002;
n+=.000001;
Serial.print (i,15);
Serial.print(" ");
Serial.print(j,15);
Serial.print(" ");
Serial.print(k,15);
Serial.print(" ");
Serial.print(n,15);
Serial.print(" ");
Serial.println(m,15);
}
This gives from 6 to 9 significant decimal digits precision (if a decimal string with at most 6 significant decimal is converted to IEEE 754 single precision and then converted back to the same number of significant decimal, then the final string should match the original; and if an IEEE 754 single precision is converted to a decimal string with at least 9 significant decimal and then converted back to single, then the final number must match the original).
That seems to be saying 6 digits only are guaranteed. However that appears to be contradicted by a page they link to:
When using a decimal floating point format the decimal representation will be preserved using:
The worse case precision of IEEE single floats is arguably one part in 2^23, not 2^24:
The mantissa is 23 bits and has an implicit '1' before the MSB. Thus at the lower end of the
range the ratio between successive values is 1/2 : 1/2+2^-24, at the higher end 1 : 1+2^-24 (values
represent 0.5 upto nearly 1.0). The lower end corresponds to 23 bits of accuracy (6.9 decimal
digits and the higher end 24 bits of accuracy (7.2 decimal digits).
So for example the next value after 1.0 is 1.00000011921,
the value prior to 2.0 is 1.99999989079,
the next one after 2.0 is 2.00000023842
Picking the nearest representation gives a worst case error of 1/2 the spacing of representable
values - so you can get +/-1 part in 2^24 by choosing the closest value (7.2 significant figures),
but note that quoting +/- shouldn't be glossed over, its still only 23 bits of precision, there is a
factor of 2 from the range -1..+1
All of this is the accuracy of the representation - once you do some floating point operations you
can get errors of 1/2 LSB on every operation - so in practice the error of any useful calculation
is several LSBs, so +/- one part in 2^22 or 23 is more likely, which is around 6 to 6.5 significant figures.
Remember we can talk about a fraction of a significant figures because we are actually talking about
the base 10 logarithm of the error.
GoForSmoke:
So maybe when I want 3 place decimals I would choose to work with 6.
Which agrees with what they are saying on those pages, that to take an existing float, convert it to decimal, and then convert back, without loss of precision, you have to use 9 digits, not 6. So you throw in 3 extra ones to cope with rounding, etc.