I have a fair amount of floating point math to do. I don't need large values or great precision. Is there
a way to do a fixed point approximation on the DUE that won't drive me grayer than I am now?
the second example is easier to do than the first.. fixed point implementation of power functions depend largely on the range. you're probably using fixed point for speed, and it would make sense to turn your power function for 1-t into a nice look-up table..
will yield gamma with 8 bit after the comma. inputs are integer.. if you want fixed point inputs than you will have to scale after each multiplication!
in any case, a 32 bit int should be fine for most of this and you should get a huge speed-up..
The pow function is relative expensive, you can first expand the function and the compress it
(pow((1 - t), 3) * P0) + (3 * pow((1 - t), 2) * t * P1) + (3 * (1 - t) * pow(t, 2) * P2) + (pow(t, 3) * P3);
substitution float s = 1-t;
==>
s * s * s * P0 + 3 * s * s * t * P1 + 3 * s * t * t * P2 + t * t * t * P3;
==>
s * s * ( s * P0 + 3 * t * P1) + t * t * ( 3 * s * P2 + t * P3) ;
from: 4 POW + 8 MUL + 5 ADD ==> 10 MUL + 4 ADD (should be a bit faster)
the second one can be optimized by pre-calculate what you can, and replace the division by multiply
void setBarycentric(float x, float y)
{
float t1 = y2 - y3;
float t2 = x3 - x2;
float t3 = x1 - x3;
float t4 = y1 - y3;
float t5 = 1.0 / (t1 * t3 + t2 * t4);
float xx = x - x3;
float yy = y - y3;
float alpha = (t1 * xx + t2 * yy) * t5; // mul is faster than div
float beta = (-t4 * xx + t3 * yy) * t5;
float gamma = 1.0 - alpha - beta;
from: 8 MUL + 22 ADD + 2 DIV ==> 8 MUL + 11 ADD + 1 DIV
so mainly there is 1 division and 11 additions less.
This same tricks can be used to optimize the integer version of the code of earx,
especially the removal of 1 division.
substitution float s = 1-t;
==>
s * s * s * P0 + 3 * s * s * t * P1 + 3 * s * t * t * P2 + t * t * t * P3;
==>
s * s * ( s * P0 + 3 * t * P1) + t * t * ( 3 * s * P2 + t * P3) ;
from: 4 POW + 8 MUL + 5 ADD ==> 10 MUL + 4 ADD (should be a bit faster)
More than a bit faster. With your suggestions, 24 iterations now takes 4 msec instead of 13. So 3 X or so faster. (That is a cubic Bezier curve, BTW.)
I had looked at a floating point benchmark for the DUE that was some 50 floating point operations / msec. This is, I believe, 336. I'm impressed and its plenty fast enough for the job at hand.