Go Down

### Topic: #define versus const and other software performance issues (Read 7667 times)previous topic - next topic

#15
##### Oct 14, 2011, 11:23 pm

When Atmel says the AVR processor is "RISC", they aren't kidding!

#### Nick Gammon

#16
##### Oct 14, 2011, 11:59 pm
Well I suppose they don't need it, and subtracting is arguably more useful than adding.
Please post technical questions on the forum, not by personal message. Thanks!

http://www.gammon.com.au/electronics

#### PaulS

#17
##### Oct 15, 2011, 02:57 am
Quote
If you're working with integers, you can also do bit shifting to divide by 10. If you do something weird like this, make sure you comment what you did.

Please explain how. Bit shifting one place divides by 2. Then, you still need to divide by 5. I can't figure how how to shift any direction to divide by 5.

#### Nick Gammon

#18
##### Oct 15, 2011, 04:47 am
A bit of Googling and I found an answer for at least an approximation. And hey, isn't most stuff an approximation? This test sketch:

Code: [Select]
`const boolean SHOW_RESULTS = false;const unsigned long ITERATIONS = 50000;void setup (){  Serial.begin (115200);  Serial.println ();  unsigned long start;   Serial.println ("By shifting:");    start = millis ();  for (unsigned long n = 0; n < ITERATIONS; n++)  {    volatile unsigned long n2, n4, n32, n64, n102, ans;    n2 = n     << 1;    // n * 2    n4 = n2    << 1;    // n * 4    n32 = n4   << 3;    // n * 32    n64 = n32  << 1;    // n * 64    n102 = n64 + n32 + n4 + n2;  // n * 102    ans = n102 >> 10;  // n * 102 / 1024    if (SHOW_RESULTS)    {      Serial.print ("n = ");      Serial.print (n, DEC);      Serial.print (" n / 10 = ");      Serial.println (ans, DEC);    }  } // end for loop  Serial.print ("time taken = ");  Serial.println (millis () - start, DEC);  Serial.println ("By multiplying and then shifting:");    start = millis ();  for (unsigned long n = 0; n < ITERATIONS; n++)  {    volatile unsigned long ans;    ans = (n * 102) >> 10;  // n * 102 / 1024    if (SHOW_RESULTS)    {      Serial.print ("n = ");      Serial.print (n, DEC);      Serial.print (" n / 10 = ");      Serial.println (ans, DEC);    }  } // end for loop  Serial.print ("time taken = ");  Serial.println (millis () - start, DEC);    Serial.println ("By dividing:");    start = millis ();  for (unsigned long n = 0; n < ITERATIONS; n++)  {    volatile unsigned long ans;    ans = n / 10;    if (SHOW_RESULTS)    {      Serial.print ("n = ");      Serial.print (n, DEC);      Serial.print (" n / 10 = ");      Serial.println (ans, DEC);    }  } // end for loop  Serial.print ("time taken = ");  Serial.println (millis () - start, DEC);} // end setupvoid loop() {}`

As written, I get this:

Code: [Select]
`By shifting:time taken = 771By multiplying and then shifting:time taken = 294By dividing:time taken = 1956`

So, the basic method is to multiply by 102 and then shift right 10 bits (ie. divide by 1024). This is an approximation in the sense that we want to multiply by 0.1 but are actually multiplying by 102/1024 (0.09961).

I tried three ways, the first was to multiply by 102 by doing assorted shifts, the second was to let the compiler do the multiplication, and the third to let the compiler do the division.

The figures show that the division was the slowest (1956 mS), my shifting idea second slowest (771 mS) and the multiply-then-shift fastest (294 mS). In fact the multiply-then-shift is over 6 times as fast, if you can stand the loss of precision.

Obviously this is only really useful if you know the numbers in advance. It's also interesting that my attempt to second-guess the compiler and do a "more efficient" multiply by 102 was actually quite a lot slower than what the compiler generated, left to its own devices.
Please post technical questions on the forum, not by personal message. Thanks!

http://www.gammon.com.au/electronics

#### stimmer

#19
##### Oct 15, 2011, 08:17 pm
Your method would be faster if you hadn't made the intermediates n2, n4, n32, n64 and n102 unnecessarily volatile. You could make it slightly faster still with the chain 2,3,6,96,102.

The reason the multiply method is fastest is because the ATmega has a 2 cycle 8-bit multiply instruction built in. You really want the shift to be 8 or 16 bits as then the shifting can be done for free (the compiler just moves the bytes in the long down by 1 or 2 places). Try (n*6554)>>16
Due VGA library - http://arduino.cc/forum/index.php/topic,150517.0.html

#### Nick Gammon

#20
##### Oct 15, 2011, 11:00 pm
Quote
Your method would be faster if you hadn't made the intermediates n2, n4, n32, n64 and n102 unnecessarily volatile

You are right. I had volatile there to convince the compiler to do the work at all, but making only ans volatile gave me this:

Code: [Select]
`By shifting:time taken = 497By multiplying and then shifting:time taken = 294By dividing:time taken = 1956`

Quote
Try (n*6554)>>16

Good call. I got:

Code: [Select]
`By multiplying and then shifting:time taken = 75`

Closer too. Rather than multiplying by 0.1 we are multiplying by 0.1000061.
Please post technical questions on the forum, not by personal message. Thanks!

http://www.gammon.com.au/electronics

#21
##### Oct 15, 2011, 11:04 pm

Which is about the same number of significant digits expected from float.  Orders of magnitude faster and just as accurate.  What's not to like!

#### robtillaart

#22
##### Oct 15, 2011, 11:06 pm
wrote a shift multiply finder few years back in C# - http://www.codeproject.com/KB/cs/FindMulShift.aspx -
Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Go Up

Please enter a valid email to subscribe