Pages: [1]   Go Down
Author Topic: PING duration to cm division / 29 optimized  (Read 992 times)
0 Members and 1 Guest are viewing this topic.
Global Moderator
Netherlands
Offline Offline
Shannon Member
*****
Karma: 170
Posts: 12487
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Being busy with optimizing divisions lately smiley-wink
I recalled that the PING sensor has to do an division by 29 to convert duration to centimetre. In fact the divisor should be 29.41176, but float divisions are slow so a integer division is mostly used (which are also slow:).

In the code below I have written an optimized integer versions for doing the conversion of duration to cm and to mm.
The code compares the performance.

Code:
//
//    FILE: divide.ino
//  AUTHOR: Rob Tillaart
//    DATE: 2013-05-11
//
// PUPROSE: fast divide routines for PING sensors
//

unsigned long start = 0;
unsigned long stop = 0;

volatile unsigned long x;

void setup()
{
  Serial.begin(115200);

  Serial.print("1/29.4 \t");
  start = micros();
  for (unsigned long i = 0; i < 10000; i++)
  {
    x = i / 29.41176;
  }
  stop = micros();
  Serial.println((stop - start)/10000.0, 4);
  Serial.println("--------------");


  Serial.print("i*0.034\t");
  start = micros();
  for (unsigned long i = 0; i < 10000; i++)
  {
    x = i * 0.034;
  }
  stop = micros();
  Serial.println((stop - start)/10000.0, 4);
  Serial.println("--------------");


  Serial.print("1/29\t");
  start = micros();
  for (unsigned long i = 0; i < 10000; i++)
  {
    x = i / 29;
  }
  stop = micros();
  Serial.println((stop - start)/10000.0, 4);
  Serial.println("--------------");


  Serial.print("div29\t");
  start = micros();
  for (long i = 0; i < 10000; i++)
  {
    x = div29(i);
  }
  stop = micros();
  Serial.println((stop - start)/10000.0, 4);
  Serial.println("--------------");


  Serial.print("PING2cm\t");
  start = micros();
  for (unsigned long i = 0; i < 10000; i++)
  {
    x = PING2cm(i);
  }
  stop = micros();
  Serial.println((stop - start)/10000.0, 4);
  Serial.println("--------------");


  Serial.print("PING2mm\t");
  start = micros();
  for (unsigned long i = 0; i < 10000; i++)
  {
    x = PING2mm(i);
  }
  stop = micros();
  Serial.println((stop - start)/10000.0, 4);
  Serial.println("--------------");

  Serial.println();
  for (unsigned long i = 0; i < 100000UL; i+=100)
  {
    Serial.print(i);
    Serial.print('\t');
    Serial.print(i/29.41176);
    Serial.print('\t');
    Serial.print(div29(i));
    Serial.print('\t');
    Serial.print(PING2cm(i));
    Serial.print('\t');
    Serial.print(PING2mm(i));
    Serial.println();
  }

  Serial.println();
  Serial.println("done");

}

void loop()
{
}

/////////////////////////////////////////////////////////////////////////////

uint32_t div29(uint32_t in)
{
  // uint32_t x = (in >> 5) + (in >> 9) + (in >> 10) + (in >> 12)+ (in >> 15) + (in >> 16);
  uint32_t x = (in >> 5) + (in >> 12);
  x = x + (x >> 4) + (in >> 10) + (in >> 15);
  return x;
}

uint32_t PING2cm(uint32_t in)
{
  // divide by 29.41176 == * 0.034 
  // uint32_t x = (in >> 5) + (in>> 9) + (in >> 11) + (in >> 12) + (in >> 14);
  uint32_t x = ((in >> 1) + (in>> 5) + (in >> 7) + (in >> 8) + (in >> 10)) >> 4;
  return x;
}

uint32_t PING2mm(uint32_t in)
{
  // divide by 2.941176 == * 0.34;
  // uint32_t x = (in >> 2) + (in>>4) + (in>>6) + (in>>7) + (in>>8) + (in>>13);
  uint32_t x = (in >> 2) + (in >> 4);
  x = x + (x >> 4);
  uint32_t t = ((in >> 1) + (in >> 7)) >> 6;
  return x+t;
}

output:
1/29.4    38.2200
--------------
i*0.034   17.2888
--------------
1/29   38.7452
--------------
div29   16.2968
--------------
PING2cm   14.9772
--------------
PING2mm   13.5316
--------------

conclusions:
1) float multiplication is about twice as fast as division
2) long int division is as slow as the float division
3) optimized div29 is 6% faster than float multiplication
4) PING2CM multiplies * 0.034 in integer math is 15% faster than float multiplication
5) PING2CM is 158% faster than the default 1/29

- PING2MM multiplies * 0.34 in integer math is 27% faster than float multiplication and 186% faster than 1/29

Besides faster, the PING2CM is also more precise:
1/29 has an error about 1.4% compared to the 1/29.41176, where PING2CM has an error of about 3E-6


in fact as the duration is forth and back the division should be by 58.823...
So the code becomes (red are the changes with above)
Code:
uint32_t PING2cm(uint32_t in)
{
  uint32_t x = ((in >> 1) + (in>> 5) + (in >> 7) + (in >> 8) + (in >> 10)) >> [color=red]5[/color]; 
  return x;
}

uint32_t PING2mm(uint32_t in)
{
  // divide by 2.941176 == * 0.34;
  // uint32_t x = (in >> 2) + (in>>4) + (in>>6) + (in>>7) + (in>>8) + (in>>13);
  uint32_t x = (in >> 2) + (in >> 4);
  x = x + (x >> 4);
  uint32_t t = ((in >> 1) + (in >> 7)) >> 6;
  return (x+t)[color=red]>>1[/color];
}
Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Global Moderator
Netherlands
Offline Offline
Shannon Member
*****
Karma: 170
Posts: 12487
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

All above use 32 bit math, if the durations are not too big they would fit in an unsigned int (0..65535)  // 65535 equals about 10 meters

signature changes to 16 bit, code is very similar as above.
Code:
// divides by 58.8...
uint16_t PING2cm2(uint16_t in)
{
  uint32_t x = ((in >> 1) + (in>> 5) + (in >> 7) + (in >> 8) + (in >> 10)) >> 5;
  return x;
}
performance PING2cm2  16bit: 5.4856  us per call - that is 7x faster than the original  1/29 ; code.
Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Montreal
Offline Offline
Edison Member
*
Karma: 23
Posts: 2487
Per aspera ad astra.
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Quote
1/29 has an error about 1.4% compared to the 1/29.41176, where PING2CM has an error of about 3E-6
I don't think this is important, distance calculation based on speed of sound  depends on temperature, and according to http://en.wikipedia.org/wiki/Sound_speed#Practical_formula_for_dry_air  10 C degree variation would account for 1.8 % error.
 Introducing temperature as a factor in equation makes division optimization to "pre-defined" set of coefficients quite complicated .
Logged

texas
Offline Offline
God Member
*****
Karma: 27
Posts: 862
old, but not dead
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Rob, if you add 1 more term of in>>11, you will put the overall multiplier at .0340271.  The speed of sound at standard T&P is 34029 cm/S putting the number very close.

I disagree with the Magician as I think (that in this situation anyway) faster and more accurate is better.  Even though the speed of sound can vary so much, starting with an accurate base means the program can actually account for it if it chooses.
Logged

Experience, it's what you get when you were expecting something else.

Rapa Nui
Offline Offline
Edison Member
*
Karma: 53
Posts: 1991
Pukao hats cleaning services
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Quote
distance calculation based on speed of sound  depends on temperature
and on the speed of the objects (mind an incoming train, for example) smiley
Logged

Global Moderator
Netherlands
Offline Offline
Shannon Member
*****
Karma: 170
Posts: 12487
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

@magician,
you are right, speed of sound depends on temperature and humidity too, See - http://arduino.cc/forum/index.php/topic,65205.msg493120.html#msg493120 -

If numbers are variable, these kind of optimizations are indeed difficult to realize, sometimes even impossible.
Implementation is in that sense often a balance of competing requirements mixed with assumptions (The temperature is constant).
The code above gives high weight to the requirement performance.

For the die hards smiley-wink

Adding temperature to the equation can be done after converting to centimeter's.

The speed of sound has this formula correction for temperature.  
Code:
float sos =  331.45 * sqrt(1 + t/273);  // t in Celsius  source see link above
uint32_t distance = duration * sos / scalefactor;

merging the formulas gives

Code:
distance = duration * 331.45 * sqrt(1 + t/273) / scalefactor;

if duration in uSec and distance in cm ==> scalefactor is 10000.   (*340/10000 <==> /29)

although I used the "wrong" number 340  (thats ~sos@15C )  I can optimize the [duration * 331.45] part in the same way making the total faster,
all be it not detectable anymore as the SQRT and the float division take far more time. But we can bring the float formula to the integer domain.

lets think:


sqrt( 1 + t/273 ) / scalefactor ==>  
sqrt(256 * (1+t/273)) / scalefactor2 ;  (*)
sqrt(256 + (256 * t) / 273) / scalefactor2 ;

256 + 256* t / 273 ~~  256 + t * 0.937728938 ~~  256 + (t>>1) + (t>>2) + (t>>3) + (t>>4);  // error < 0.1%

as the temperature is maximum - lets say - 125 Celsius (upper limit of a DS18B20) == 400 Kelvin  that means that  256 + t * 0.937728938 always lies between 256 and 631.

SQRT(256..631) = 16..25, so a lookup table of 10 values can replace the sqrt.  
if that is not precise enough replace 256 above with a bigger number in the line marked (*) and redo
65556 => SQRT(65536 ..96024) ==> 256..309  so ~50 steps  

// The ideal number to scale up the inner of the square root is one that makes scalefactor2 a power of 2 of course.


Although not elaborated this shows that all the math can be moved to the integer domain + a lookup table.

distance = duration * 331.45 * sqrt(1 + t/273) / scalefactor;
ideally I would rewrite it too
distance = duration * sqrt(A + t * A/273.15 ) >> B;   // where the number A determines the # steps and B controls the precision


update: the above contains a practical flaw as the temperature will in practice between -55C and 125C [limits DS18B20], also the formula is Celsius iso Kelvin. Still the approach would be similar.

A quick test shows that a formula like
Code:
distance = duration * 331.45 * sqrt(1 + t/273) / scalefactor;
takes about 145 uSeconds if durationa and t are parameters.


I do not know the alternative
Code:
distance = duration * sqrt(A + t * A/273.15 ) * B;
should be possible in about 70usec  (3xmultiply 15 usec, 1xaddition 5 usec, lookup 20usec)

(no it is not on my todo list;)

update - added code tags
« Last Edit: May 12, 2013, 03:16:50 am by robtillaart » Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Montreal
Offline Offline
Edison Member
*
Karma: 23
Posts: 2487
Per aspera ad astra.
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Quote
and on the speed of the objects (mind an incoming train, for example)
No, according to Doppler Effect (I think, it's part of basic school program, at least it was when I've been there) http://en.wikipedia.org/wiki/Dopler_effect
train would change a frequency - not speed of the sound wave.
To OP: fast math on 8-bitters is quite interesting brain-teasing amusement, I even write assembly macros for some of my project, where fast int16 multiplication was really a bottle neck (regular arduino can't multiply int16 by int16 correctly, and using int32 x int32 take about 5 usec;  assembly macros = 2 usec, gain 2.5 times). I even  keep this yours post in my bookmarks:   http://arduino.cc/forum/index.php/topic,60924.0.html
I only trying to say, that PING is not the best project where fast division required, except difficulties with temperature, there is 3 milliseconds time for sound to travel just 1 meter (0.5 m distance), so +- 200 useconds math should not be an issue,
« Last Edit: May 11, 2013, 05:37:15 pm by Magician » Logged

Global Moderator
Netherlands
Offline Offline
Shannon Member
*****
Karma: 170
Posts: 12487
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Quote
I only trying to say, that PING is not the best project where fast division required, except difficulties with temperature, there is 3 milliseconds time for sound to travel just 1 meter (0.5 m distance), so +- 200 useconds math should not be an issue,
I agree , still the way the conversion is normally implemented with a  /29  is "rough math" compared to the PING2CM formula.
Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

SF Bay Area (USA)
Offline Offline
Tesla Member
***
Karma: 106
Posts: 6382
Strongly opinionated, but not official!
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Quote
In fact the divisor should be 29.41176, but float divisions are slow so a integer division is mostly used (which are also slow:)

I read somewhere that floating division can be faster than long division.  That sorta makes sense, since for the sorts of numbers  "near 1" that we're talking about here, they'll be doing nearly the same operations, except the float operation only does the actual division part on 24bits, while the long operation has to loop for all 32bits.

Likewise, a floating multiplication can be faster than a long divide - try multiplying by (1.0/29.41176)

I don't know how the times will work out if you also have to convert an int to float first...

Now, using the floating point value of 1.0/29.4 in an integer multiplication and doing the normalization explicitly might be interesting (but that ought to be pretty close to what your optimizations are doing, anyway, right?)
Logged

Global Moderator
Netherlands
Offline Offline
Shannon Member
*****
Karma: 170
Posts: 12487
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset


An update about the math to include Temp as my thinking yesterday contained some serious flaws.

So lets do the thinking again, and now with focus and real math smiley-wink

Code:
cm = time * 331.45 * sqrt( 1 + t/273 ) / 10000;

Lets say the temperature is between -55 and 125 Celsius (range of the DS18B20)
that means that sqrt() is between 0.8936 and 1.2074.
as square root is quite linear on such small range we can map T [-55..125] to [915..1237] (*1024)

Code:
y = 915 + (T+55) * 322/180;  = 915 + (T+55) * 1.7888888;  // do not forget the  / 1024  or >> 10

that makes the formula

Code:
distance = duration * 331.45 * (915 + (T+55) * 1.7888888) /1024 /10000;

bringing the factors together

Code:
distance = duration * (915 + (T+55) * 1.7888888) * 331.45 /1024 /10000;
distance = duration * (915 + (T+55) * 1.7888888) * 0.53032 / 16384;

this leads to the following code   
Code:
uint32_t PING2cm3(uint32_t duration, int t)
{
  uint32_t x = t+55;
  // *1.788888888
  x = x + (x>>1) + (x>>2) + (x>>5);
  x = x + 915;
  x = x * duration;
  // * 0.53032
  x = (x >> 1) + (x >> 5) + (x >> 8) + (x >> 9);
  x = x >> 14;
  return x;
}
performance    - 25.4152 per call 
sqrt() formula - 122.2460 per call   (~factor 5)
1/29 formula   - 38.7380 per call   // has no temp correction

varying duration from 300-100,000 at 20C ==> error 0..1%      // at shorter range the error increases

varying temperature 0..125C at constant duration of 10,000 ==> error 0..1%  (the variation in SOS == 0..14% )
varying temperature 0..125C at constant duration of 100,000 ==> error 0..1%

Conclusion:
The above formula added temperature correction for PING))) with no performance costs, while the error stays within 1% range.

Note1: error can improved by adding extra factors in the approximation formulas.
Note2: ping itself takes about 300 (5cm) - 10000 (150cm) uSeconds , so for the smaller distances the profit wrt the sqrt formula is substantial.

Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Offline Offline
Full Member
***
Karma: 1
Posts: 152
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Hi robtillaart,

thanks for your math skills here. I'll use the final version for my project.
Just one note:

I changed:
Code:
  x = x >> 14;
  return x;

to:
Code:
  x = x >> 15;
  return x;

so I get distance in cm as respone (ping roundtrip time / 2).

Regards,
Robert
Logged

Global Moderator
Netherlands
Offline Offline
Shannon Member
*****
Karma: 170
Posts: 12487
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset


Quote
thanks for your math skills here. I'll use the final version for my project.
Welcome, tell us more about your project!

I have had a new insight yesterday, SQRT(1 + x) == 1 + x/2  for   |x| < 0.2
==>  (1+x/2)^2 == 1 + x + (x^2)/4   // error  (x^2)/4 ==> x=0.2  error = 1%  [effective range +- 50C]

Code:
cm = time * 331.45 * sqrt( 1 + t/273 ) / 10000; 
becomes
cm = time * 331.45 * (1 + t/546 )  * 0.0001; // should be substantial faster than the one above, no sqrt and no division

 
Quote
.... (ping roundtrip time / 2).
That's a sign you understood the math smiley-wink
Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Offline Offline
Full Member
***
Karma: 1
Posts: 152
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Quote
That's a sign you understood the math 
Wouldn't say this.  smiley-confuse
But at least I learned bitshifting and masking when you helped me out with the hacrocam.  smiley-grin

My project is a robot for SLAM. The Ultrasonic sensors I use for correcting the estimated position (based on odometry) by measuring the distance to known landmarks.
I need constant pinging with good accuracy. So your code should releave the mp a bit. :-)

Robert
Logged

Pages: [1]   Go Up
Jump to: