Alternative to floating point computation for cortex m3

@Paul

If you unroll your loops, their size will be from here to China haha

I'm kidding. ))

@MrAl

I will do it someday...

Loop unrolling doesn't have to unroll everything. If the loop overhead is 20% of your code duration then just 8 copies of your loop code will reduce that loop overhead to 3% - a significant gain.

Also look at Duff's Device. That's a mind-bending compiler trick.

Of course, if we had to unroll everything the number of lines would exceed the number of atoms in the universe.

Loop overhead is nothing compared to algorithms that aren't efficient, and efficiency is often comparative
i.e. against better cases of algorithms that solve the same problem better.

My version is similar to the "Duff's device" in the sense (unrelated to loop checking overhead - similar to the sense that it uses a device so to get faster to a solution) it uses saturated arithmetic to achieve it's goal.