Hello! Today, I decided to push the limits of the 8-bit AVR that is the heart of the Arduino and decided to write a fixed-point arithmetic library. Initial testing suggested that I could make each 16-bit fixed-point multiplication 2 times faster (~4us per mult compared to ~9us) than that of a floating point multiplication. The comparison may seem unfair because the floats are 32-bit, but I doubt I need at 32-bits at this point, hence 16-bit.
Anyway, I wrote the class for signed fixed-point and some functions and also downloaded a pretty fast assembly multiplication routine for integers here: Arduino – AVR GCC multiplication | Mekonikuv blog. I currently use the signed 16-bit x 16-bit to 32-bit routine.
I'm now going to talk about problems with the code, which is attached to this post.
Initially, the results were pretty good. Multiplications with positive and negative results individually took around ~3 to 4us. The algorithm (important part in my code) I wrote to find the 2 most significant bytes are very similar for both. I then added a conditional statement to check if the initial 32-bit result was positive or negative before passing it to the appropriate function. Oddly, it seemed that this single conditional statement costs over 6us to execute! Also, the positive result multiplication (without the conditional statement) starts to perform weirdly and now takes ~15us! The negative result multiplication still takes ~3us but jumps to ~9us if the conditional statement is included.
Unfortunately, I still only know an insignificant amount of assembly so I cannot see what's happening behind the IDE. But there might be two problems which I am unable to detect. The first is possibly in the positive result MSB selection code: hp16(int32_t &a, uint8_t &b) which is extremely similar to the version for negative results.
The second problem is with how the conditional statement: iv<0?hn16(iv, iq):hp16(iv, iq) is being handled.
If anyone is interested in helping me analyse the code, I'll greatly appreciate it! It'll also be great if anyone has suggestions to make it even faster.
By the way, the various performance times can be checked by changing the value of #define TYPE in fp.h.
Thanks! I haven't tried any other fixed-point libraries since most of them don't seem to match my applications. I looked at some information about fixed-point here and there but most of the algorithm's just intuitive so there should be lots of room for improvement (a great source for brain workouts).
There is a fixed point math implementation in AVR Lib-C that is used by the avr-gcc compiler that is used in the Arduino IDE, stop re-inventing the wheel.
If nobody bothers to make a wheel for themselves once in a while, eventually wheel-making will become an esoteric art. Also, bicycle wheels shouldn't be used on sedans, sedan wheels shouldn't be used on off-roads, off-road wheels shouldn't be used on track cars, track car wheels shouldn't be used on aeroplanes, aeroplane wheels shouldn't be used on bicycles. The concept of a wheel is implemented differently to suit specific needs.
However, I didn't know there's an implementation of fixed-point included in with the IDE. I thank you for that portion of your input.
Senso:
There is a fixed point math implementation in AVR Lib-C that is used by the avr-gcc compiler that is used in the Arduino IDE, stop re-inventing the wheel.
@Senso: This is a hobby site - reinventing the wheel is part of the learning process. The OP took the time to share his work, and others may benefit from it. Some people find this kind of low-level development work to be very interesting, and there is always room for improvement.
@Crappinni: I'll take a look at your implementation this weekend and will get you some feedback. Thanks for sharing.
There is a fixed point math implementation in ... in the Arduino IDE
Indications from AVR Freaks are that it is incomplete. The latest news appears to be from May, 2011: a volunteer's implementation is being rolled into the distribution.
My gut tells me that the fixed-point code shipping with Arduino should be treated with a great deal of suspicion.
I'm not sure about the status of that particular implementation, but I haven't really found much documentation about it. My code doesn't follow a "standard" method of implementation either. The number of fractional bits, Q, changes constantly based on how the bits are filling up the intermediate 32-bit result (since I only want 16-bit multiplications). I wrote it as such so multiplication between numbers with different number of fractional bits can be done with the same function call.
Unfortunately, I still haven't been able to make any progress with the speed issue.