16 bit floats?

Rx7man · June 4, 2015, 5:19am

I'm trying to save some space, and some CPU cycles, but I really don't want to have to resort to keeping track of multipliers for integers values.. I have too many to keep track of.

I just need a low-precision floating point number, is there a library available?.. If I had 0.001 resolution near 0, and a max value of lets say 100,000 that would suffice for my application...

michinyon · June 4, 2015, 5:23am

The short answer is, there are not 16 bit floats.

In some contexts, you could use 16 bit fixed point fractional numbers, which are really an integer type. But that is not going to work over the range you suggest.

michinyon · June 4, 2015, 5:28am

If you allocate 4 bits to value 0 to 15, to represent decimal exponents from -4 to +10, and then had a 12 bit fixed point fractional binary number representing a value between 0 and 1, then, you could implement such a thing. Go ahead, do it yourself. I actually did this a few years ago, and found that it wasn't very useful.

tammytam · June 4, 2015, 10:11am

Not to mention all the faff you have to go through to implement all the mathematical functions you need for your newly found fixed point number. Unless of course you intend on converting from fixed point to floating point as you need to (speedy! not)

Rx7man · June 4, 2015, 2:57pm

Ahh.. too bad. I know I wouldn't be able to write code efficient enough to do it.. figured someone smarter than me must have had the need for them, but perhaps the overhead was too big.

I was thinking you could convert them to 32 bit floats quite easily if you did a coupe && operations, you would only need to increase the bits dedecated to the mantissa and exponent.

MorganS · June 4, 2015, 3:12pm

"Too many" multipliers? Then you're doing something wrong. Most of your fixed-point numbers should be on the same multipiler so that you can add and subtract them easily. Changing multipliers (multiplying or dividing by 10) is actually a slow operation for the Arduino and should be avoided.

16-bit floats do exist but they're not very useful. They have absolutely awful precision because the other parts of the float restrict the mantissa to a very small number of bits. You probably couldn't represent each number between 0-255 in a 16-bit float, much less every 0.001 in 0-100,000.

Are you sure it's really the floats which are slowing your program down? I've found that floating-point division actually works faster than long-integer division so some of the critical parts of my code are faster with floats. There may be some other loop in your code which can be optimised more productively.

tammytam · June 4, 2015, 3:22pm

I'd be surprised if floats are ever faster than integers on the Arduino.

I think his primary concern was space, 16bit variable obviously taking up have the space of the 32bit one.

If speed really is your primary concern, and this is a hobby project, then have you considered wiring up an FPU chip to the Arduino?

Maybe post your code, might have some suggestions on how to make it smaller.

Rx7man · June 4, 2015, 3:55pm

I was saying that I could do with .001 precision from 0-1, .01 from 1-10, and .1 from 10 to 100, etc.. basically if I had 4 significant figures I'd be good, and I might be able to pull it off with just 3.

I don't know that my code is slow, but I'm sure there's lots of room for improvements.. I heard someone say "As long as it's fast enough, you don't need to worry yourself to death optimizing it"

My current code is about 1000 lines, nearly 18K compiled, and I have nearly 2K in global variables (about 20 instances of a class containing 12 floats and a few ints takes up a good portion of that (20x12x4 = 960).

I don't think floats would ever be faster than integers, if you only considered they take 4 bytes instead of 2.. though if you too 4 byte integers and compared them, some math may be pretty close, especially if they have the same exponents.

While we're on math... here's something for those who like having something to ponder..
is there any way of expressing 1 and one third as a non repeating decimal? if so, how? PM me:)

robtillaart · June 4, 2015, 3:57pm

There exist 16 bit float - Half-precision floating-point format - Wikipedia -

16 bits: 1 sign bit 5 exponent and 10 fraction.

I have written a partial draft library in the past and two test sketches but never published it as it is not mature and not tested enough. Big chance it is incorrect in some places. It only has a few constructors and comparison operators. There are also 2 test sketches included. My goal was to write some python (which supports float16) on the PC to check the quality.

You can use at your own risk, and feel free to improve.

Note: as I have little time I will not give support on this draft library

float16.zip (4.04 KB)

tammytam · June 4, 2015, 4:19pm

Just had a play with that lib, you definitely lose accuracy, but of what I've thrown at it, it definitely falls within your ranges, of 4 significant figs.

You'll have to flesh out the /*+- operators though

robtillaart · June 4, 2015, 4:25pm

is there any way of expressing 1 and one third as a non repeating decimal? if so, how? PM me:)

Please check my fraction class - Fraction library for Arduino - Libraries - Arduino Forum -

robtillaart · June 4, 2015, 4:36pm

I was saying that I could do with .001 precision from 0-1, .01 from 1-10, and .1 from 10 to 100, etc.. basically if I had 4 significant figures I'd be good, and I might be able to pull it off with just 3.

if you want to have a decimal format:

you need 1000 steps ==> 10 bits (0..1023) => room for a few special values)
one bit for sign +-1
5 bits for exponent e.g [-15..15]

smallest number 0.001 * 10^-15;
largest number 0.999 * 10 ^15

Note that many numbers can be written in multiple ways if you use this coding scheme

tammytam · June 4, 2015, 4:37pm

Sorry, my earlier post was wrong. It won't support your need for large numbers (100,000), and how could it be expected to being 16bit :(.

But it was going strong on the 3 sig figs upto 65,000

Rx7man · June 4, 2015, 7:18pm

Rob, that is kinda what I'm looking for... and it has the range I need too.. I looked at the code a bit, I'll have to actually study it more though

Tammytam, a 16 bit float would handle large number the same way a 32 bit float handles numbers larger than 2^32.

As far as my riddle-like question before, I perhaps didn't make myself clear.. you aren't allowed to use a fractional expression.. just a decimal point... I did a big VB project on it and proved it can be done

MorganS · June 4, 2015, 8:45pm

Rx7man:
I have nearly 2K in global variables (about 20 instances of a class containing 12 floats and a few ints takes up a good portion of that (20x12x4 = 960).

I don't think floats would ever be faster than integers, if you only considered they take 4 bytes instead of 2.. though if you too 4 byte integers and compared them, some math may be pretty close, especially if they have the same exponents.

So where's the rest of the 2K? Half of your memory is missing somewhere you don't know about. You can usually make a significant improvement by using the F() macro on any constant strings you have.
Don't think. Try it. For values which are big enough to require long ints and you need to do a division (like divide by 10 or 100) inside your innermost loop, floating-point is actually faster on the AVR Arduinos. I don't have an Arduino with me right now to test this but the last time I checked (on an Uno) 100,000 integer divisions took 4,117ms and the same number of floating-point divisions took 3,410ms.

Rx7man · June 5, 2015, 1:00am

I have very few literal strings.. I'm not "missing" memory.. I was just showing an example of just one type of global variable that's taking a lot of space (the class with the floats).. Aside from that I have lots of other global variables.. more floats, more longs, etc.

In VB there's a specific operator for integer division '' instead of '/'.. is there anything similar in Cpp?

pYro_65 · June 5, 2015, 1:12am

Rx7man:
I have very few literal strings.. I'm not "missing" memory.. I was just showing an example of just one type of global variable that's taking a lot of space (the class with the floats).. Aside from that I have lots of other global variables.. more floats, more longs, etc.

In VB there's a specific operator for integer division '' instead of '/'.. is there anything similar in Cpp?

If both the left and right operands are a type of integer, then the division is done in the integer domain.

C++ uses integral promotion when it computes expressions. If one operand is of a greater ranking (float ranks higher than integers), then the other operand is promoted to the same type.

If you have a float you want to do integer division on, cast it: (int) f or int(f)

michinyon · June 5, 2015, 4:01am

My version had 4 exponent bits, but they represented powers of 10, not 2. One sign bit and 11 fractional bits. The sign bit was set for positive numbers and cleared for negative numbers. There is an exact representation for +1 but not for -1, and also it has -0 but not +0.

I'll see if I can find the backup disk with the code for it.

Topic		Replies	Views
Strange float behaviour on Arduino Mega Programming	29	403	December 25, 2025
How good is my concept on floating point number presentation/display? Programming	113	471	November 25, 2025
A 2 byte float? Programming	48	3230	March 18, 2024
Accuracy loss while converting from int to float Programming	13	160	January 16, 2026
How to get more decimal places after division of two floats General Guidance	26	8852	October 18, 2022

16 bit floats?

Related topics