# PING duration to cm division / 29 optimized

Being busy with optimizing divisions lately
I recalled that the PING sensor has to do an division by 29 to convert duration to centimetre. In fact the divisor should be 29.41176, but float divisions are slow so a integer division is mostly used (which are also slow:).

In the code below I have written an optimized integer versions for doing the conversion of duration to cm and to mm.
The code compares the performance.

``````//
//    FILE: divide.ino
//  AUTHOR: Rob Tillaart
//    DATE: 2013-05-11
//
// PUPROSE: fast divide routines for PING sensors
//

unsigned long start = 0;
unsigned long stop = 0;

volatile unsigned long x;

void setup()
{
Serial.begin(115200);

Serial.print("1/29.4 \t");
start = micros();
for (unsigned long i = 0; i < 10000; i++)
{
x = i / 29.41176;
}
stop = micros();
Serial.println((stop - start)/10000.0, 4);
Serial.println("--------------");

Serial.print("i*0.034\t");
start = micros();
for (unsigned long i = 0; i < 10000; i++)
{
x = i * 0.034;
}
stop = micros();
Serial.println((stop - start)/10000.0, 4);
Serial.println("--------------");

Serial.print("1/29\t");
start = micros();
for (unsigned long i = 0; i < 10000; i++)
{
x = i / 29;
}
stop = micros();
Serial.println((stop - start)/10000.0, 4);
Serial.println("--------------");

Serial.print("div29\t");
start = micros();
for (long i = 0; i < 10000; i++)
{
x = div29(i);
}
stop = micros();
Serial.println((stop - start)/10000.0, 4);
Serial.println("--------------");

Serial.print("PING2cm\t");
start = micros();
for (unsigned long i = 0; i < 10000; i++)
{
x = PING2cm(i);
}
stop = micros();
Serial.println((stop - start)/10000.0, 4);
Serial.println("--------------");

Serial.print("PING2mm\t");
start = micros();
for (unsigned long i = 0; i < 10000; i++)
{
x = PING2mm(i);
}
stop = micros();
Serial.println((stop - start)/10000.0, 4);
Serial.println("--------------");

Serial.println();
for (unsigned long i = 0; i < 100000UL; i+=100)
{
Serial.print(i);
Serial.print('\t');
Serial.print(i/29.41176);
Serial.print('\t');
Serial.print(div29(i));
Serial.print('\t');
Serial.print(PING2cm(i));
Serial.print('\t');
Serial.print(PING2mm(i));
Serial.println();
}

Serial.println();
Serial.println("done");

}

void loop()
{
}

/////////////////////////////////////////////////////////////////////////////

uint32_t div29(uint32_t in)
{
// uint32_t x = (in >> 5) + (in >> 9) + (in >> 10) + (in >> 12)+ (in >> 15) + (in >> 16);
uint32_t x = (in >> 5) + (in >> 12);
x = x + (x >> 4) + (in >> 10) + (in >> 15);
return x;
}

uint32_t PING2cm(uint32_t in)
{
// divide by 29.41176 == * 0.034
// uint32_t x = (in >> 5) + (in>> 9) + (in >> 11) + (in >> 12) + (in >> 14);
uint32_t x = ((in >> 1) + (in>> 5) + (in >> 7) + (in >> 8) + (in >> 10)) >> 4;
return x;
}

uint32_t PING2mm(uint32_t in)
{
// divide by 2.941176 == * 0.34;
// uint32_t x = (in >> 2) + (in>>4) + (in>>6) + (in>>7) + (in>>8) + (in>>13);
uint32_t x = (in >> 2) + (in >> 4);
x = x + (x >> 4);
uint32_t t = ((in >> 1) + (in >> 7)) >> 6;
return x+t;
}
``````

## PING2mm 13.5316

conclusions:

1. float multiplication is about twice as fast as division
2. long int division is as slow as the float division
3. optimized div29 is 6% faster than float multiplication
4. PING2CM multiplies * 0.034 in integer math is 15% faster than float multiplication
5. PING2CM is 158% faster than the default 1/29
• PING2MM multiplies * 0.34 in integer math is 27% faster than float multiplication and 186% faster than 1/29

Besides faster, the PING2CM is also more precise:
1/29 has an error about 1.4% compared to the 1/29.41176, where PING2CM has an error of about 3E-6

in fact as the duration is forth and back the division should be by 58.823…
So the code becomes (red are the changes with above)

``````uint32_t PING2cm(uint32_t in)
{
uint32_t x = ((in >> 1) + (in>> 5) + (in >> 7) + (in >> 8) + (in >> 10)) >> [color=red]5[/color];
return x;
}

uint32_t PING2mm(uint32_t in)
{
// divide by 2.941176 == * 0.34;
// uint32_t x = (in >> 2) + (in>>4) + (in>>6) + (in>>7) + (in>>8) + (in>>13);
uint32_t x = (in >> 2) + (in >> 4);
x = x + (x >> 4);
uint32_t t = ((in >> 1) + (in >> 7)) >> 6;
return (x+t)[color=red]>>1[/color];
}
``````

All above use 32 bit math, if the durations are not too big they would fit in an unsigned int (0..65535) // 65535 equals about 10 meters

signature changes to 16 bit, code is very similar as above.

``````// divides by 58.8...
uint16_t PING2cm2(uint16_t in)
{
uint32_t x = ((in >> 1) + (in>> 5) + (in >> 7) + (in >> 8) + (in >> 10)) >> 5;
return x;
}
``````

performance PING2cm2 16bit: 5.4856 us per call - that is 7x faster than the original 1/29 ; code.

1/29 has an error about 1.4% compared to the 1/29.41176, where PING2CM has an error of about 3E-6

I don't think this is important, distance calculation based on speed of sound depends on temperature, and according to Speed of sound - Wikipedia 10 C degree variation would account for 1.8 % error.
Introducing temperature as a factor in equation makes division optimization to "pre-defined" set of coefficients quite complicated .

Rob, if you add 1 more term of in>>11, you will put the overall multiplier at .0340271. The speed of sound at standard T&P is 34029 cm/S putting the number very close.

I disagree with the Magician as I think (that in this situation anyway) faster and more accurate is better. Even though the speed of sound can vary so much, starting with an accurate base means the program can actually account for it if it chooses.

distance calculation based on speed of sound depends on temperature

and on the speed of the objects (mind an incoming train, for example)

@magician,
you are right, speed of sound depends on temperature and humidity too, See - http://arduino.cc/forum/index.php/topic,65205.msg493120.html#msg493120 -

If numbers are variable, these kind of optimizations are indeed difficult to realize, sometimes even impossible.
Implementation is in that sense often a balance of competing requirements mixed with assumptions (The temperature is constant).
The code above gives high weight to the requirement performance.

For the die hards

Adding temperature to the equation can be done after converting to centimeter’s.

The speed of sound has this formula correction for temperature.

``````float sos =  331.45 * sqrt(1 + t/273);  // t in Celsius  source see link above
uint32_t distance = duration * sos / scalefactor;
``````

merging the formulas gives

``````distance = duration * 331.45 * sqrt(1 + t/273) / scalefactor;
``````

if duration in uSec and distance in cm ==> scalefactor is 10000. (*340/10000 <==> /29)

although I used the “wrong” number 340 (thats ~sos@15C ) I can optimize the [duration * 331.45] part in the same way making the total faster,
all be it not detectable anymore as the SQRT and the float division take far more time. But we can bring the float formula to the integer domain.

lets think:

~~sqrt( 1 + t/273 ) / scalefactor ==> ~~
sqrt(256 * (1+t/273)) / scalefactor2 ; (*)
sqrt(256 + (256 * t) / 273) / scalefactor2 ;
256 + 256* t / 273 ~~ 256 + t * 0.937728938 ~~ 256 + (t>>1) + (t>>2) + (t>>3) + (t>>4); // error < 0.1%
as the temperature is maximum - lets say - 125 Celsius (upper limit of a DS18B20) == 400 Kelvin that means that 256 + t * 0.937728938 always lies between 256 and 631.
~~SQRT(256…631) = 16…25, so a lookup table of 10 values can replace the sqrt. ~~
if that is not precise enough replace 256 above with a bigger number in the line marked (*) and redo
~~65556 => SQRT(65536 …96024) ==> 256…309 so ~50 steps ~~
// The ideal number to scale up the inner of the square root is one that makes scalefactor2 a power of 2 of course.
Although not elaborated this shows that all the math can be moved to the integer domain + a lookup table.
distance = duration * 331.45 * sqrt(1 + t/273) / scalefactor;
ideally I would rewrite it too
distance = duration * sqrt(A + t * A/273.15 ) >> B; // where the number A determines the # steps and B controls the precision

update: the above contains a practical flaw as the temperature will in practice between -55C and 125C [limits DS18B20], also the formula is Celsius iso Kelvin. Still the approach would be similar.

A quick test shows that a formula like

``````distance = duration * 331.45 * sqrt(1 + t/273) / scalefactor;
``````

takes about 145 uSeconds if durationa and t are parameters.

I do not know the alternative

``````distance = duration * sqrt(A + t * A/273.15 ) * B;
``````

should be possible in about 70usec (3xmultiply 15 usec, 1xaddition 5 usec, lookup 20usec)

(no it is not on my todo list;)

and on the speed of the objects (mind an incoming train, for example)

No, according to Doppler Effect (I think, it's part of basic school program, at least it was when I've been there) Doppler effect - Wikipedia
train would change a frequency - not speed of the sound wave.
To OP: fast math on 8-bitters is quite interesting brain-teasing amusement, I even write assembly macros for some of my project, where fast int16 multiplication was really a bottle neck (regular arduino can't multiply int16 by int16 correctly, and using int32 x int32 take about 5 usec; assembly macros = 2 usec, gain 2.5 times). I even keep this yours post in my bookmarks: Arduino Forum
I only trying to say, that PING is not the best project where fast division required, except difficulties with temperature, there is 3 milliseconds time for sound to travel just 1 meter (0.5 m distance), so +- 200 useconds math should not be an issue,

I only trying to say, that PING is not the best project where fast division required, except difficulties with temperature, there is 3 milliseconds time for sound to travel just 1 meter (0.5 m distance), so +- 200 useconds math should not be an issue,

I agree , still the way the conversion is normally implemented with a /29 is "rough math" compared to the PING2CM formula.

In fact the divisor should be 29.41176, but float divisions are slow so a integer division is mostly used (which are also slow:)

I read somewhere that floating division can be faster than long division. That sorta makes sense, since for the sorts of numbers “near 1” that we’re talking about here, they’ll be doing nearly the same operations, except the float operation only does the actual division part on 24bits, while the long operation has to loop for all 32bits.

Likewise, a floating multiplication can be faster than a long divide - try multiplying by (1.0/29.41176)

I don’t know how the times will work out if you also have to convert an int to float first…

Now, using the floating point value of 1.0/29.4 in an integer multiplication and doing the normalization explicitly might be interesting (but that ought to be pretty close to what your optimizations are doing, anyway, right?)

An update about the math to include Temp as my thinking yesterday contained some serious flaws.

So lets do the thinking again, and now with focus and real math

``````cm = time * 331.45 * sqrt( 1 + t/273 ) / 10000;
``````

Lets say the temperature is between -55 and 125 Celsius (range of the DS18B20)
that means that sqrt() is between 0.8936 and 1.2074.
as square root is quite linear on such small range we can map T [-55..125] to [915..1237] (*1024)

``````y = 915 + (T+55) * 322/180;  = 915 + (T+55) * 1.7888888;  // do not forget the  / 1024  or >> 10
``````

that makes the formula

``````distance = duration * 331.45 * (915 + (T+55) * 1.7888888) /1024 /10000;
``````

bringing the factors together

``````distance = duration * (915 + (T+55) * 1.7888888) * 331.45 /1024 /10000;
distance = duration * (915 + (T+55) * 1.7888888) * 0.53032 / 16384;
``````

this leads to the following code

``````uint32_t PING2cm3(uint32_t duration, int t)
{
uint32_t x = t+55;
// *1.788888888
x = x + (x>>1) + (x>>2) + (x>>5);
x = x + 915;
x = x * duration;
// * 0.53032
x = (x >> 1) + (x >> 5) + (x >> 8) + (x >> 9);
x = x >> 14;
return x;
}
``````

performance - 25.4152 per call
sqrt() formula - 122.2460 per call (~factor 5)
1/29 formula - 38.7380 per call // has no temp correction

varying duration from 300-100,000 at 20C ==> error 0..1% // at shorter range the error increases

varying temperature 0..125C at constant duration of 10,000 ==> error 0..1% (the variation in SOS == 0..14% )
varying temperature 0..125C at constant duration of 100,000 ==> error 0..1%

Conclusion:
The above formula added temperature correction for PING))) with no performance costs, while the error stays within 1% range.

Note1: error can improved by adding extra factors in the approximation formulas.
Note2: ping itself takes about 300 (5cm) - 10000 (150cm) uSeconds , so for the smaller distances the profit wrt the sqrt formula is substantial.

Hi robtillaart,

thanks for your math skills here. I'll use the final version for my project.
Just one note:

I changed:

``````  x = x >> 14;
return x;
``````

to:

``````  x = x >> 15;
return x;
``````

so I get distance in cm as respone (ping roundtrip time / 2).

Regards,
Robert

thanks for your math skills here. I’ll use the final version for my project.

I have had a new insight yesterday, SQRT(1 + x) == 1 + x/2 for |x| < 0.2
==> (1+x/2)^2 == 1 + x + (x^2)/4 // error (x^2)/4 ==> x=0.2 error = 1% [effective range ± 50C]

``````cm = time * 331.45 * sqrt( 1 + t/273 ) / 10000;
becomes
cm = time * 331.45 * (1 + t/546 )  * 0.0001; // should be substantial faster than the one above, no sqrt and no division
``````

… (ping roundtrip time / 2).

That’s a sign you understood the math

That's a sign you understood the math

Wouldn't say this. :~
But at least I learned bitshifting and masking when you helped me out with the hacrocam.

My project is a robot for SLAM. The Ultrasonic sensors I use for correcting the estimated position (based on odometry) by measuring the distance to known landmarks.
I need constant pinging with good accuracy. So your code should releave the mp a bit.

Robert