Faster divide by using multiply shift instead (celsius to fahrenheit example)

Quite a while back I wrote an article on codeproject - Optimizing integer divisions with Multiply Shift in C# - CodeProject - based on earlier work of Jeffrey Sax about replacing a relative expensive integer divide operation (with known divider!) with a faster long multiply and shift operation for a known range of values. In Math notation replace / d ==> * M >> S

Tonight I had a try to see if and how well it works on the Arduino. And yes it did !.

Note 1: As the trick works only for a limited range of values it need to be tested if the values of the used range are calculated right.
Note 2: the optimized code for speed is larger in size as negative values are handled separately. However sometimes the speed improvement is worth the extra size. And sometimes you don't have negative values.

The question remaining is how to find the values for the multiply and shift (M,S). That is solving the equation 1/d == M >> S + e (e = error to be minimized)
Leaving the error out, the formula can be rewritten to 1 << S = M * d, so we need to find an M which multiplied by the known dividor is (approx) a power of two. Turning this around we need to test 32 powers of two to see if an adequate M can be found. That will take some time but can be done fast enough. Maybe I'll write a sketch for that too some day :).

enough theory, time for some practice:

In the code below I use the familiar celsius to fahrenheit conversion as example. This conversion is timed three times for 1273 values from -273 till 1000 in three different ways. The timings are made on a duemillanove and IDE21:

  • multiply shift optimization (14 ms)
  • straightforward as function (20 ms)
  • straightforward inline (20 ms)

So the multiply shift optimized version is 30% faster than the 'normal' division version, despite longer code and the use of longs. However its range is limited (just test 5000C)

A fourth loop is added to show the values of the optimized conversion are identical to the normal conversion for the range -273 - 3000 Celsius, which will be sufficient for most sketches I guess.

Disclaimer: use this technique at own risk, use it with care and only when understood and the performance is really needed ...

Finally some code...

void setup()
{
  Serial.begin(115200);
}

int y;
void loop()
{
  unsigned long t1 = millis();
  for (int i=-273; i< 1000; i++)
  {
    y = CelciusToFahrenheit(i);
  }
  Serial.println(millis() - t1);


  t1 = millis();
  for (int i=-273; i< 1000; i++)
  {
    y = c2f(i);
  }
  Serial.println(millis() - t1);
 
  
  for (int i=-273; i< 1000; i++)
  {
    y =  i * 9 / 5 + 32;
  }
  Serial.println(millis() - t1);  
  
  for (int i=-273; i< 3000; i++)
  {
    if (c2f(i) != CelciusToFahrenheit(i))
      Serial.println(i);
  }
  //Serial.println(millis() - t1);
  Serial.println();
  delay(1000);
}

int c2f(int celcius)
{
  return celcius * 9 /5 + 32;
}

long CelciusToFahrenheit(int celcius)
{
  long x = celcius * 9;        // use register for x
  if (x >=0 )
    return (x * 52429 >> 18) + 32;    //   1 << 18 ~~ 52429 * 5 <> 262144 ~~ 262145
  else
    return -((-x) * 52429 >> 18) + 32;
}

As allways remarks are welcome.

Interesting.

How does it compare with the "straightforward" function compiled with higher optimization levels enabled?

-j