Strange float behaviour on Arduino Mega

I just noticed the following strange difference between C++ and Arduino Mega:

Standard C++:

float myFloat = 0.090650305151939f;
printf ("%lu\n", sizeof (myFloat));
printf ("%.15f", myFloat);

outputs:

4
0.090650305151939

Arduino:

float myFloat = 0.090650305151939f;
Serial.println (sizeof (myFloat));
Serial.println (String (myFloat, 15));

outputs:

4
0.090650305000000

What is the cause for this rounding?

How many significant figures do you expect to get from a 4-byte IEEE 754 floating point value?

A 32 bit float only has ~7 significant digits. Anything printed after that is just noise. Looks like the Arduino handles that differently than whatever the other box was.

Read this:
https://docs.arduino.cc/language-reference/en/variables/data-types/float/

Is it after or before the decimal point or including both?

For example:
1234.5678342345678901 -- how many significant digits are there in this single precision float number? This is just for my understanding.

The binary32 format for the above float number is: 0x449A522C which when converted back to float gives: 1234.567871093750000 maintaining only 4-digit accuracy after the decimal point.

@bojan_jurca

There is also the double type for floating point.
The drawback is that some boards map it to a 4 byte float while others handle it (more correctly) as full 8-byte floating points.

give this snippet a try.

double x;
Serial.println(sizeof(x));

Even then there are differences between the double on an embedded system and on a laptop. IIRC the Intel processor handles all double operation internally in 80 bit to optimize math, and store the results as 64 bit.
The SW-emulation of doubles in most boards just do all in 64 bits as that takes already enough time.

The concept of significant digits is well-defined:

1 Like

As with OP's AVR Arduino Mega.

1 Like

MEGA has no floating point co-processor. Then, is it the IDE/Compiler or the board that maps the given float number into binary32 format?

That value you used above: did you choose it completely randomly or did you copy it from the output of some code running on the same system (PC/laptop)? I suspect it will have been the latter, and if you choose a genuinely random number with 15 decimal places, you will find you still get different results on the Mega compared to the PC/laptop, but neither will exactly match your chosen number.

IIRC the compiler does handle the compile time (float) constants to appropriate value. For the remaining it is the AVR floating point library that does the math.
(math.h)

1 Like

This little answer helps us a lot for dealing with students of Arduino Class.

I'm developing the code on a regular computer, namely OnlineGDB because it works faster and then I just use the results on Arduino (Mega 2560 in this case). This is how I noticed that they produce different results. When trying to find the root cause of the problem I noticed that it starts with the initialization.

The number was calculated and printed with OnlineGDB. When imported to Arduino it seems to lose some precision. The first thing I did was checking the length of the float on both systems. It is 4 in both cases.

It seems that floats can contain the exact number 0.090650305151939. When converted to double it is 0.09065030515193939209. So all the digits are correct. There is no reason for the Arduino to do the rounding.

Thank you to all that answered. I do understand how floating point arithmetic works. The problem I'm having is compatibility. I suppose all 4 byte floats should follow IEEE 754 regardless of being calculated by HW or SW.

I did some additional testing with binary representation of floats on both systems.

C++:

float myFloat = 0.090650305151939f;
printf ("%lu\n", sizeof (myFloat)); // 4 bytes
printf ("%.15f\n", myFloat);
printf ("%lu\n", sizeof (int)); // 4 bytes
int myInt = *(int *) &myFloat; // copy float bytes to int bytes
cout << myInt << endl; // 1035577054

Arduino:

float myFloat = 0.090650305151939f;

Serial.println (sizeof (myFloat)); // 4 bytes

Serial.println (String (myFloat, 15));

Serial.println (sizeof (long)); // 4 bytes

long myLong = *(long *) &myFloat; // copy float bytes to long bytes

Serial.println (myLong); // 1035577054

Binary both results are the same. The problem must be with the output then.

I don't think there is a "problem". When you attempt to print digits beyond the useful significant digits of a float type, the results will be implementation-dependent. I have a feeling you're perhaps also being mislead by compiler optimization.

1 Like

No; for any exact binary floating point value with digits to the right of the decimal point, the last non-zero digit must be a 5: 0.5, 0.25, 0.125, 0.0625 etc.

In this case, the closest exact values are (scroll to the bottom here)

  • 32-bit: 9.065030515193939208984375 / 100
  • 64-bit: 9.0650305151939003511785131195210851728916168212890625 / 100

Nope.

Put another way, consider a proper, irreducible fraction of the form N/D. The value is exactly representable in binary (meaning with a finite number of bits to the right of the binary point) if an only if D = 2^p, where p = 1, 2, 3, 4, 5, ….

With 32-bit floats, that value and the ones before and after (by decrementing or incrementing the Significand) are approximately

  • 1234.567749 before
  • 1234.567871
  • 1234.567993 after
  • 1234.568115 after that

The value is incrementing by one at the 8th digit, but then skips .5680. (If rounding instead of truncating, there's no .5678) So almost eight significant digits, but really only seven.

The difference is with the String constructor. On AVR, try

void setup() {
  Serial.begin(115200);
  float myFloat = 0.090650305151939f;
  Serial.println(sizeof(myFloat));
  Serial.println(String(myFloat, 15));
  Serial.println(myFloat, 15);
  Serial.println(String(9065.03f / 100000, 15));
  Serial.println(9065.03f / 100000, 15);
}


void loop() {}

With Wokwi I get

4
0.090650305000000
0.090650310516357
0.090650305000000
0.090650310516357

The same code with ESP32 all prints the original best-approximate

0.090650305151939

The "not-rounded" on AVR is not the same. It's closer to the number that comes after

  • 0.0906502977 before
  • 0.09065030515
  • 0.0906503126 after

but not quite. More on this later, but at least the difference is past the significant digits. The String constructor on both boards -- AVR

String::String(float value, unsigned char decimalPlaces)
{
	init();
	char buf[33];
	*this = dtostrf(value, (decimalPlaces + 2), decimalPlaces, buf);
}

and ESP32

String::String(float value, unsigned int decimalPlaces) {
  init();
  char *buf = (char *)malloc(decimalPlaces + 42);
  if (buf) {
    *this = dtostrf(value, (decimalPlaces + 2), decimalPlaces, buf);
    free(buf);
  } else {
    *this = "nan";
    log_e("No enough memory for the operation.");
  }
}

(those are big buffers, but check out the one for double on ESP32) -- both use dtostrf; on ESP32 versus on AVR, where apparently it calls dtoa_prf. Quite different between those two and also Print::print, which is the same code on AVR and ESP32. But that code uses double, which is only 32-bit on AVR, so the results deviate.

1 Like

Thank you.