Math function: ax^4+bx^3+cx^2+dx+e

Hello everyone,

I’m trying to figure out how to program the following line mathematical.
the input values are, a,b,c,d,e,x.

ax4+bx3+cx2+dx+e.

I found out the sq(x) function. but I don’t know how it works wit the x4 or x3.

Can someone explain me how to.?
Thank you.

long quadratic( int a, int b, int c, int d, int x, int e)
{
  return (a*x*x*x*x)+(b*x*x*x)+(c*x*x)+(d*x)+e;
}

//float version
float quadraticf( float a, float b, float c, float d, float x, float e)
{
  return (a*x*x*x*x)+(b*x*x*x)+(c*x*x)+(d*x)+e;
}

I found out the sq(x) function. but I don’t know how it works wit the x4 or x3.

Well, x3 is “x * sq (x)”, and x4 is “sq(sq(x));”

Hey Groove & co.

What a quick response!

Thank you, I'd just discovered by myself by just 'try-and-error' it. :) Well anyway thanks for your help all. :D

If you work with floating point, the pow(x,y) from the math library is the function you want to use. This will return the value x to the exponent of y.

The reference for the full AVR math library is here:

http://www.nongnu.org/avr-libc/user-manual/group__avr__math.html#g41b41c307b8f96760e9c0c17180b241b

Because the exponent is a known integer, one need not include the function, and therefore ends up with a faster algorithm.

If the exponents were unknown as well, then the need for a pow(x,y) would be persent, but it's not.

Using a bit of algebra the quartic can be rearranged as:

(((a*x+b)*x+c)*x+d)*x+e

... and therefore ends up with a faster algorithm.

You may be right, but I wouldn't count on it unless you benchmark. Don't forget that there is no floating point hardware on the Atmega chip - so any and all calculations will be compiled into low level function calls (also basic arithmetic). If you study the assembly code, you may even find that "x*x" will compile as pow(x,2).

Floating point libraries have been around for quite some time and they're typically highly optimized for what they do. If you find a function that does what you're after - you're likley to be better off using it than rolling your own.

pow(x,y) actually gets implemented as exp(y*log(x)), which is slow as both exp and log are transcendental, and uses more program memory as it has to bring in the pow and exp and log library functions. The only advantage is that the code is more readable:

r=a*pow(x,4) + b*pow(x,3) + c*pow(x,2) + d*x +e;

I notice the library also has a multiply-add function which can be combined with the Horner's Rule method I posted above, to give:

r=fma(fma(fma(fma(a,x,b),x,c),x,d),x,e)

This is probably the fastest method (but also the least readable!)

for small integer powers of variables, it might be worthwhile to write your own function:

float ipow(float var, char exponent)
{
  float result = var;
  for (byte i=2; i<=exponent; i++) {
    result = result * var;
  }
  return result;
}

(note that this doesn’t handle exponent of 0, but it could.)

All the readability of using “pow”, and (almost) all of the efficiency of repeated multiplication vs log/exp operations.

Here’s a benchmark with the various flavors as suggested above (time in microseconds):

Benchmark Sketch
Ver: 1 Time: 7434924 Val: 41143476.00
Ver: 2 Time: 1183496 Val: 41143492.00
Ver: 3 Time: 694020 Val: 41143492.00
Ver: 4 Time: 756992 Val: 41143492.00

Ver 1: apow(x,4)+bpow(x,3)+cpow(x,2)+dx+e;
Ver 2: axxxx+bxxx+cxx+dx+e;
Ver 3: (((a*x+b)*x+c)*x+d)*x+e;
Ver 4: fma(fma(fma(fma(a,x,b),x,c),x,d),x,e);

Here’s a copy of the benchmark sketch:

void setup()
{
  float r,x;
  float a=1,b=2,c=3,d=4,e=5;
  uint32_t t;
  
  Serial.begin(57600);
  Serial.println("Benchmark Sketch");

  t=micros();
  r=0.0;
  x=1.0;
  for (int i=0;i<10000;i++) {
    //ax4+bx3+cx2+dx+e.
    r+=a*pow(x,4)+b*pow(x,3)+c*pow(x,2)+d*x+e;
    x+=0.001;
  }
  print_r(1,t,r);

  t=micros();
  r=0.0;
  x=1.0;
  for (int i=0;i<10000;i++) {
    //(a*x*x*x*x)+(b*x*x*x)+(c*x*x)+(d*x)+e;
    r+=a*x*x*x*x+b*x*x*x+c*x*x+d*x+e;
    x+=0.001;
  }
  print_r(2,t,r);

  t=micros();
  r=0.0;
  x=1.0;
  for (int i=0;i<10000;i++) {
  //(((a*x+b)*x+c)*x+d)*x+e 
    r+=(((a*x+b)*x+c)*x+d)*x+e;
    x+=0.001;
  }
  print_r(3,t,r);

  t=micros();
  r=0.0;
  x=1.0;
  for (int i=0;i<10000;i++) {
    //r=fma(fma(fma(fma(a,x,b),x,c),x,d),x,e) 
    r+=fma(fma(fma(fma(a,x,b),x,c),x,d),x,e);
    x+=0.001;
  }
  print_r(4,t,r);
}

void print_r(byte ver, uint32_t time, float val)
{
  Serial.print("Ver: ");
  Serial.print(ver,DEC);
  Serial.print(" Time: ");
  Serial.print(micros()-time);
  Serial.print(" Val: ");
  Serial.println(val,2);
}

void loop()
{
}

Edit: I modified it to make sure that the main expression is recalculated every time through the loop. Results suggest that using pow() is indeed the slower version.

Stimmer’s version is closest to the polynomial methods I’ve usually seen in firmware and high-end fast software. If you know you have 5 coefficients (x^0 through x^4), then it’s the fastest way to get it on most architectures. If you are not sure how many coefficients you’ll have, a small function can loop through an array of them easily enough, and still get Stimmer’s results.

double polyx(double x, double coeff[], int count)
{
    double y = coeff[0];
    for (int i = 1; i < count; i++)
        y += x * coeff[i];
    return y;
}

Unlike the Arduino, many of the better math processors actually have routines that do a multiply and an add in one instruction. Thus, many math problems are converted to coefficient loops like this where possible.

double polyx(double x, double coeff[], int count)

{
   double y = coeff[0];
   for (int i = 1; i < count; i++)
       y += x * coeff[i];
   return y;
}

calculates (a+b+c+d)x+e !!

Agh, I knew I shouldn't have typed it from memory.

double polyx(double x, double coeff[], int count)
{
    double term = 1.0;
    double y = 0.0;
    while (count > 0)
    {
            y += coeff[count--] * term;
            term *= x;
    }
    return y;
}

Ver 3: (((a*x+b)*x+c)*x+d)*x+e;

For version 3 (and the code Halley is talking about), don’t you have to recompute the coefficients ? After all, you end up actually calculating
(a+b+c+d)*x4 + (a+b+c)*x3 + etc…
You can come up with new values of a,b,c,d that make the nested equation equivalent, but you have to do some work. (How come the benchmark got the same result? Just because X=1.0 is a bad test?)

Ver 3: (((a*x+b)*x+c)*x+d)*x+e;

For version 3 (and the code Halley is talking about), don't you have to recompute the coefficients ? After all, you end up actually calculating (a+b+c+d)*x4 + (a+b+c)*x3 + etc...

No, the formula is correct.