 # Why Arduino can't deal with high precision numbers?

In my project I need to make some calculations with high-precision numbers. But I noticed that the answers are not correct. The code bellow shows what I mean.

``````void setup() {
Serial.begin(9600);
double l = 32.48750458;
double n = 32.48751068;
double im = l - n;
double im1 = im * 0.605;
double cs = 32.48751068 + im;
Serial.println(l, 8);
Serial.println(n, 8);
Serial.println(im1, 8);
Serial.println(cs, 8);
}

void loop() {
}
``````

The output of this code on the serial monitor is:

``````32.48750305
32.48751068
-0.00000462
32.48750305
``````

Also posted at:

If you're going to do that then please be considerate enough to add links to the other places you cross posted. This will let us avoid wasting time due to duplicate effort and also help others who have the same questions and find your post to discover all the relevant information. When you post links please always use the chain links icon on the toolbar to make them clickable.

Because there is no support for 8-byte double float variables for normal Arduinos.

You are restricted to the six to seven digit precision of the standard 4-byte floats.
I'm not aware of libraries that give double support.

yazeedAlshorman:
But I noticed that the answers are not correct.

No matter how few thinning gray hairs remain on my head an ignorance of Claude Shannon always brings a smile to my face. But not in a schadenfreude way.

pert:
Also posted at:
Why cant't Arduino deal with high precision numbers? - Stack Overflow

but that post (at stackoverflow) was posted by me, I really need to find solution as soon as possible. so I post the question here also.

1 Like

If you want precision, don't use a floating point number Use a integer of a fixed point number.

yazeedAlshorman:
but that post (at stackoverflow) was posted by me,

We know. We wish you had told us that you had also asked the question on stackoverflow so that we would not waste our time duplicating other answers.

...R

1 Like

Robin2:
We know. We wish you had told us
...R

Aha, I'm sorry about that. I will take it into account at the next time.

AFAIK on an AVR arduino, a double equals a float (max 7 decimals precison). On a DUE (ARM Cortex M3), a double is on 8 bytes and gives you a precision of up to 15 decimals.

ard_newbie:
On a DUE (ARM Cortex M3), a double is on 8 bytes and gives you a precision of up to 15 decimals.

but I tried Arduino duo instead of uno and I have the same problem as will.

but I tried Arduino duo instead of uno and I have the same problem as will.
Of course, you set to 8 the number of decimals you want to print out in Serial.print().

An example sketch to output “bignums” :

``````uint64_t bignum0 = 0xFFFFFFFFFFFFFFllu;
uint64_t bignum1 = (1llu<<64) - 1; //pow(2, 64) - 1;
int64_t bignum2 = -(1llu<<62); //- pow(2, 62);
double bignum3 = -1.12345678987654321;
double bignum4 = -166666666666666666666e-20;

void setup() {
Serial.begin(250000);
}

void loop() {
printf(" bignum0 = 0x%llx\n", bignum0);
printf(" bignum0 = 0x%llX\n", bignum0);
printf(" bignum1 = %llu\n", bignum1);
printf(" bignum2 = %lld\n", bignum2);
Serial.print(" bignum3 = ");
Serial.println(bignum3,15);
Serial.print(" bignum4 = ");
Serial.println(bignum4,16);
delay(1000);
}
``````

yazeedAlshorman:
In my project I need to make some calculations with high-precision numbers. But I noticed that the answers are not correct. The code bellow shows what I mean.

If you build AVR-GCC with the "newlib" library, you can support float, double and long double... but double and long double take up quite an amount of program memory and they are relatively slow. Here's what I get with my AVR-GCC using newlib test code from the book "Practical C programming":

``````// Floating point accuracy test - from the cow book page 271
int main (void)
{
init();
Serial.begin (115200);
STDIO.open (Serial);

char buffer;

const char *mask0 = "Data type = %s (%d bytes)\n";
const char *mask1 = "%3d digits accuracy in calculations\n";
const char *mask2 = "%3d digits accuracy in storage\n\n";

int counter;

float fnumber1, fnumber2, fresult;
double dnumber1, dnumber2, dresult;
long double lnumber1, lnumber2, lresult;

fprintf (stdout, "Floating point accuracy test From \"Practical C Programming\"\n");
fprintf (stdout, "3rd. Ed. by Steve Qualline, Page 271, example 16-1 \"float.c\"\n\n");

sprintf (buffer, mask0, "float", sizeof (float));
fprintf (stdout, "%s", buffer);

fnumber1 = 1.0;
fnumber2 = 1.0;
counter = 0;

while (fnumber1 + fnumber2 != fnumber1) {
counter++;
fnumber2 /= 10.0;
}

fprintf (stdout, "%s", buffer);

fnumber1 = 1.0;
fnumber2 = 1.0;
counter = 0;

while (1) {
fresult = fnumber1 + fnumber2;

if (fresult == fnumber1) {
break;
}

counter++;
fnumber2 /= 10.0;
}

fprintf (stdout, "%s", buffer);

sprintf (buffer, mask0, "double", sizeof (double));
fprintf (stdout, "%s", buffer);

dnumber1 = 1.0;
dnumber2 = 1.0;
counter = 0;

while (dnumber1 + dnumber2 != dnumber1) {
counter++;
dnumber2 /= 10.0;
}

fprintf (stdout, "%s", buffer);

dnumber1 = 1.0;
dnumber2 = 1.0;
counter = 0;

while (1) {
dresult = dnumber1 + dnumber2;

if (dresult == dnumber1) {
break;
}

counter++;
dnumber2 /= 10.0;
}

fprintf (stdout, "%s", buffer);

sprintf (buffer, mask0, "long double", sizeof (long double));
fprintf (stdout, "%s", buffer);

lnumber1 = 1.0;
lnumber2 = 1.0;
counter = 0;

while (lnumber1 + lnumber2 != lnumber1) {
counter++;
lnumber2 /= 10.0;
}

fprintf (stdout, "%s", buffer);

lnumber1 = 1.0;
lnumber2 = 1.0;
counter = 0;

while (1) {
lresult = lnumber1 + lnumber2;

if (lresult == lnumber1) {
break;
}

counter++;
lnumber2 /= 10.0;
}

fprintf (stdout, "%s", buffer);

while (1);
}
``````

Result:

``````[b]
Floating point accuracy test From "Practical C Programming"
3rd. Ed. by Steve Qualline, Page 271, example 16-1 "float.c"

Data type = float (4 bytes)
8 digits accuracy in calculations
8 digits accuracy in storage

Data type = double (8 bytes)
16 digits accuracy in calculations
16 digits accuracy in storage

Data type = long double (16 bytes)
20 digits accuracy in calculations
20 digits accuracy in storage[/b]
``````

Unfortunately, getting these results requires you to modify and compile your own AVR-GCC toolchain.

I don't see any float/double support. Does your library do these too?

Not so sure I believe the bolded line! @OP

1. Let us say that we have the following two decimal (aka floating point number) numbers:
n1 = 3.12345678901234567890899;
n2 = 3.12343478901234567890996;

Which one of the above two numbers is more precise? Which one is more accurate?

Both numbers have 23 digits precision after the decimal points.

Each number is accurate as accurate it should be on its own respect.

2. The meaning of accuracy will be , as I understand, clear from the following discussion:
(a) Let us add the above two numbers of Step-1 manually. We will get --
sum, n = 6.24689157802469135781895

3. Let us add the above two numbers of Step-1 using UNO. We will get --
6.24689149856567382812500

``````void setup()
{
Serial.begin(9600);
float n1 = 3.12345678901234567890899;
float n2 = 3.12343478901234567890996;
//By manual calculation, n1+n2 = 6.24689157802469135781895
Serial.println (n1 + n2, 23); //prints: 6.24689149856567382812500
}

void loop()
{

}
``````

4. Is it the man-made result (6.24689157802469135781895 of Step-2) or the machine-made result (6.24689149856567382812500 of Step-3) which is more accurate? The answers are --
(a) Both results have equal accuracy when we consider 6-digit accuracy after the decimal point.
(b) Man-made result (6.2468915) of Step-2 is more accurate than the machine-made result (6.2468914) of Step-3 if we consider 7-digit accuracy after the decimal point.

5. Where and how have we lost the accuracy?
To get the answer to this question, we need to understand the bit level representation of the floating point number (float) (number with integer and fractional part).
(a) When we declare/define a float number by: float n1 = 3.12345678901234567890899;, a 32-bit wide bit pattern (4047E6B7) is saved into 4 consecutive memory locations of the MCU. The bit pattern is determined based on the IEEE-754 (aka binary32 format) standard (Fig-1) where 23 bits have been allocated to contain the fractional part of the float number.

Figure-1: binary32 format for the representation of float number

6. How can we improve the accuracy of the machine-made result?
Fig-1 of Step-5 reveals that the accuracy could be improved by allocating more bits for the fractional part of the decimal number. Accordingly, the IEEE-754 standard (aka binary64 format) has come into action (Fig-2) where 52 bits have been allocated to contain the fractional part of the float number.

Figure-2: binary64 format for the representation of float number

(a) In binary64 format, 64-bit wide bit pattern is produced (as per Fig-2) for a float number, and it is saved in an 8 byte consecutive memory locations. For example: for the definition of double n1 = 3.12345678901234567890899;, the 64-bit pattern of 0x4008FCD6E9BA37B3 is saved into memory.

(b) The 64-bit float data type is supported by Arduino DUE with the keyword double.

7. Let us check the improvement of accuracy by dealing the float numbers of Step-1 using binary64 format and Arduino DUE.

``````void setup()
{
Serial.begin(9600);
double n1 = 3.12345678901234567890899;
double n2 = 3.12343478901234567890996;
Serial.println (n1 + n2, 53);
//prints: 6.24689157802469097191533364821225404739379882812500000
}

void loop()
{

}
``````

(a) Man-made result : 6.24689157802469135781895 (23-digit accuracy and 23-digit precision when compared to itself)
(b) binary32 format result : 6.24689149856567382812500 (6-digit accuracy and 23-digit precision when compared to (a))
(c) binary64 format result : 6.24689157802469097191533364821225404739379882812500000 (14-digit accuracy and 53-digit precision when compared to (a) and (b))

We observe that the accuracy has gone up upto 14-digit after decimal point when using binary64 format.

Arduino (DUE) indeed can deal with the situation where higher accuracy and higher precision are demanded in the process of floating point numbers.

yazeedAlshorman:
In my project I need to make some calculations with high-precision numbers. But I noticed that the answers are not correct. The code bellow shows what I mean.

What are you really trying to calculate?

``````  double l = 32.48750458;
``````

double n = 32.48751068;

What do those numbers represent? 32.48750458 what? Liters of milk? Tons of concrete? Years you have been married?

There might be, and probably is, a way around the problem you are having. But to find a way around it, it would really help if we had more information.