# conversion double float recieved to int?

Here’s my problem. Mt device sends masages with 64bit IEEE denormalized double float values. I can save the message bytes in an array. I have found a function on this forum that is supposed to take the 8 bytes and bring it down to 4 bytes floating type numbers. I’m a bit confused as to how to pass the bytes to the function. I am posting the function code from the forum below.

``````float conv(byte *dn)
{
union {
float f;
byte b;
} fn;
int expd = ((dn & 127) << 4) + ((dn & 240) >> 4);
int expf = expd ? (expd - 1024) + 128 : 0;
fn.b = (dn & 128) + (expf >> 1);
fn.b = ((expf & 1) << 7) + ((dn & 15) << 3) + ((dn & 0xe0) >> 5);
fn.b = ((dn & 0x1f) << 3) + ((dn & 0xe0) >> 5);
fn.b = ((dn & 0x1f) << 3) + ((dn & 0xe0) >> 5);

return fn.f;
}
``````

I also am trying to format the data to resend out in a RTCM format. This is also confusing as it states it should be a 38bit int but says it holds +/-13,743,895.3471meters. How does an int type data hold fractional information?

Either as a fraction ... stored value / constant ... or a as fixed-point.

The two are essentially the same. What differs is the range of the constant divisor.

(2^38) / 13743895.3471 = 2000. So the divisor is 2000.

So the constant is “understood” to be 2000 by any device programmed to read the standard and the reading device will perform the conversion after it recieveds the interger value?

For the function do I pass the 8 bytes in like this?

``````for(int i=0; i<8; i++){
conv(&arrayName[i]);
}
``````

So will this pass 8 bytes into the function and return the modified 4 bytes to array 0-3 to be read as a float type? I don’t see how float f can be read out of the function.

JHEnt:
So the constant is “understood” to be 2000 by any device programmed to read the standard and the reading device will perform the conversion after it recieveds the interger value?

Yes.

Value From Device / 2000.0 = Value As A Human Would Understand It

(uint38_t)((Value As A Human Would Understand It * 2000.0) + 0.5) = Value As Device Would Understand It

For the function do I pass the 8 bytes in like this?

No…

``````conv(&arrayName);
``````

So will this pass 8 bytes into the function and return the modified 4 bytes to array 0-3 to be read as a float type?

Looks like it.

I don’t see how float f can be read out of the function.

The union is the key.

So am I understanding this right?

Do you

``````float misc = conv(&storedArrayName); //where 0 is the first position of the 8 bytes
``````

This reads 8 bytes and returns a float value to variable "misc" ? I'm not seeing where the array dn[] is initialized. Or does the "dn" mean something else?

JHEnt: Do you

``````float misc = conv(&storedArrayName); //where 0 is the first position of the 8 bytes
``````

Yes.

No. Nothing is "read" or moved around. Arrays are always passed "by reference" (another way of saying "by pointer"). The code you posted above and this...

``````float misc = conv( storedArrayName );
``````

...produce the same result. As does this...

``````float misc = conv( & storedArrayName );
``````

I prefer the first version because I think it makes it clear that a pointer to the first element of the array is being passed.

and returns a float value to variable "misc" ?

Yes.

I'm not seeing where the array dn[] is initialized. Or does the "dn" mean something else?

dn does not have any special meaning. It is a simple parameter. When the conv function is called, dn becomes a pointer that points to the first byte in the storedArrayName array.

It's kind of like an alias. Outside of the conv function, the section of memory is called storedArrayName . Inside of the conv function, the same section of memory is called dn.

In C(++), arrays and pointers are essentially interchangeable. dn and *dn and *(dn+0) all result in the first byte of the storedArrayName array. dn and *(dn+1) all result in the second byte of the storedArrayName array.

Whoever wrote conv really should have prototyped the function like this...

``````float conv(byte dn)
{
...
``````

...to make it quite clear that an array of eight bytes is expected.

I am considering using a C++ compiler instead of the Arduino IDE because it alows using 64 bit double floats and 64 bit long long int. So I am thinking if I take a 64bit float, multiply it by 10000.0 to shift the decimal, then cast it to a 64 bit signed int, it should trunicate at the point of precision my application requires.

From there I am still trying to figure out what to do. I need the int to fit into a 38bit signed 2's complement type. Assuming the measurement should fit into the capability of 37 bits and a sign bit can I just pull the bytes from an array. Read the sign bit from the original 64 bit number and map it onto the 38th bit of the number I need to send? If it does fit into the 37 bit value segment then all other bits would be 0 anyway, right?

I need the 38 bit int to represent a measurement to .0001 meters. The device send this in a 64bit float value.

I really need some guidence here.

JHEnt: Here's my problem. Mt device sends masages with 64bit IEEE denormalized double float values. I can save the message bytes in an array. I have found a function on this forum that is supposed to take the 8 bytes and bring it down to 4 bytes floating type numbers. I'm a bit confused as to how to pass the bytes to the function.

If you are getting denormalized 64-bit floating point, then you have very little hope of processing this on the arduino that only gives you access to 32-bit floating point numbers (double and long double are the same format as float). This is because a 64-bit denormal number is outside the range of a 32-bit floating point number.

at the point of precision my application requires

Which is?

Range?

I am considering using a C++ compiler instead of the Arduino IDE because it alows using 64 bit double floats and 64 bit long long int.

What do you intend to do with the resulting .o file, then?

Paul S, I have a C++ ide that allows AVR projects I have played with a little. I also have downloaded ATMEL's IDE but haven't really begun looking into it yet.

Coding Badly, In practice I need to measure to approx +/- 4,000,000.0001 meters. So an INT number in the +/- 100,000,000,000

Also based on the measurement of this output I would say the 64bit float is normalized? A different meassage outputs radians which would be 0.xxxxxxxx and would be denormallized?

I have a C++ ide that allows AVR projects I have played with a little. I also have downloaded ATMEL's IDE but haven't really begun looking into it yet.

Each of them uses the same compiler that the Arduino IDE uses, which does NOT support 64 bit doubles or ints.

In practice I need to measure to approx +/- 4,000,000.0001 meters.

4 million meters at plus/minus one tenth of a millimeter? Get real.

PaulS, that is the RTCM Special Commitee 104 standard for Earth Centered Earth Fixed antenna position cooordinates. I,m just trying to make the actual measurement fit into their critera.

So what your saying is there is no way to directly work with 64 bit numbers on the 8bit AVR MCU?

I was thinking of moving up to a 32bit AVR with built in FPU but I didn't want to spend on having custom circuit boards made untill I had figured out how to make it work on something I was somewhat familiar with.

So what your saying is there is no way to directly work with 64 bit numbers on the 8bit AVR MCU?

Exactly. Nick Gammon put together a library called BigNumbers that provides the capability you seem to be looking for. It has, of course, tradeoffs. Speed being the first thing sacrificed.

JHEnt: Also based on the measurement of this output I would say the 64bit float is normalized? A different meassage outputs radians which would be 0.xxxxxxxx and would be denormallized?

In IEEE floating point, except for very small numbers, every number is normalized, which means for non-zero/special numbers, the binary exponent is set so that the top mantissa bit is always 1, and in the encoding, this bit is implicit, and not actually recorded. The formats are:

A denormal 64-bit value has 0 in the exponent field (which is 2*-1024), and then the fractional part is the 52 bits in the mantissa. A denormal 32-bit value has 0 in the exponent field (2*-128) and the fractional part is the 23 bit mantissa. So if you extract the top 20 bits of the mantissa from the double and plop it into the bottom 20 out of 23 bits in the single precision, it would work.

One other thing to watch out for is the AVR is little endian, which means the least significant byte is the first byte. If you are transmitting raw values from another computer, you might need to swap bytes, depending on whether the remote system is big endian or little endian. Typically, if you are transmitting values in standard format, the network order is big endian, so you may need to swap bytes as you are getting them from the wire.

So what your saying is there is no way to directly work with 64 bit numbers on the 8bit AVR MCU?

Who said that? The "long long" data type is supported, giving 64 bit support.

Note, it occurs to me that while the AVR compiler converts 'double' into being a 32-bit representation (same as 'float'), that ARM chips will provide a 64-bit IEEE format, and for the the OP, this might solve a lot of headaches to switch processors. On October 22nd, the DUE will be available and it should be fairly compatible with the current AVR based Arduino's, but there are various other ARM solutions out there right now that you might need to learn a new IDE for. Or if you don't need hard realtime, the Raspberry Pi is available.

Now, if you are doing lots and lots of floating point arithmetic, you might want to get a chip with hardware floating point (unfortunately the DUE will not have hardware FP).

So unless theres some way to convert with the bits themselves it sounds like I need a seperate PC program to work with the 64 bit floats. So after I do message order checking I'll have to pass the full message array to a PC to be changed to RTCM standard format or start with a different imbedded processor.

I was looking at the spec sheet for a AVR32UC which you can get with an FPU built in, but it looks like it is set up for 32 bit acess also.

This is also confusing as it states it should be a 38bit int but says it holds +/-13,743,895.3471meters.

(2^38) / 13743895.3471 = 2000. So the divisor is 2000

This thread started going off-track very early.
238 / 13,743,895.3471 is 20 000, not 2000.

A simpler way of looking at the problem is to examine the value 237 (don’t forget the sign bit) = 137 438 953 472.
Look familiar? Now divide it by 10 000, giving the 1/10 mm resolution already mentioned.
I can’t believe that converting a 64 bit IEE754 number to this representation is all that difficult using “long long” arithmetic.
It won’t be fast, but things that use numbers with this sort of range rarely (in my experience) require speedy calculations.

However, if there is a similarly-priced platform that will handle 64 bit IEE754 natively, that’s the solution I’d go for.

So if I could take the 52 numerical bits into a long long int 64bit (or should I just work with the bytes in a second array), then read the 11 exponent bits into an int, sutract 1023 from it for the actual exponent. I would need to determine how many decimal places the exponent indicates shifting and add 4 more shifts to the right. This should be the trunication point of the real int number. Then I need to use those bits 37 places left for my interger and add the sign bit.

OK that said I've got my head confused. If I work with it all at the byte/bit level, would the sign bit be the leftmost bit in the lowest addressed byte in the array of 8 bytes making up the original float?