Pages: 1 ... 3 4 [5] 6 7 8   Go Down
Author Topic: divmod10() : a fast replacement for /10 and %10 (unsigned)  (Read 20978 times)
0 Members and 1 Guest are viewing this topic.
0
Offline Offline
God Member
*****
Karma: 26
Posts: 625
Always making something...
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

'all' we need to do is document up and make available in a form every one can use easily.
thats that hard part,

How about the Print.cpp and Print.h Tom posted on reply #51 or I posted on #49 and #54 ?

All you have to do to use these is drop them into your Arduino's hardware/arduino/cores/arduino folder (or hardware/teensy/cores/teensy), replacing the Print.cpp and Print.h that are already there.

On a Mac, you'll need to control-click on Arduino and choose "show package contents", then navigate or search until you find the folder with Print.cpp and Print.h.  I believe it's Contents/Resources/Java/hardware/arduino/cores/arduino, but I don't have a Mac handy at the moment to check.

Admittedly, the copy I posted on #54 had test code enabled.  Here's a "final" version with all the test stuff commented, as ready-to-use as it gets.  These are tested with Teensy but probably will work with official Arduino boards.

* Print.cpp (8.64 KB - downloaded 17 times.)
* Print.h (6.77 KB - downloaded 9 times.)
Logged

Melbourne, Australia
Offline Offline
God Member
*****
Karma: 8
Posts: 567
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

When Arduino release a new IDE, etc, those libraries you have upgraded are potentially going to get blown away, and require work to reinstall / overwrite.

One potential solution would be if you could set up pathing so modified libraries were linked if they existed in rather than default libraries.
Logged

Windows serial port monitor: Tellurium | Arduino serial port debugging library: DBG | Cusom LCD char generator | Technical questions will only be answered in forum threads

0
Offline Offline
God Member
*****
Karma: 26
Posts: 625
Always making something...
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

I restarted my exhaustive test, using the asm macro version.  In a few days I should be able to confirm if it matches the C library version for all 2^32 possible inputs....
Logged

Global Moderator
Netherlands
Offline Offline
Shannon Member
*****
Karma: 224
Posts: 13917
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Code:
Unless there's more algorithm magic, I believe it's safe to say this 83 cycle version is the fastest possible implementation.
That's why I asked several posts back if one of the earlier C version - shorter in terms of C - could be faster is asm as it might be less statements.

Code:
In a few days I should be able to confirm if it matches the C library version for all 2^32 possible inputs....
how do you do this test? Generate strings and compare them?
Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

0
Offline Offline
God Member
*****
Karma: 26
Posts: 625
Always making something...
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

how do you do this test? Generate strings and compare them?

Compute the quotient and modulus using both and check if either is different.  Basically, like this:

Code:
                q1 = n / 10;
                m1 = n - q1 * 10;
                q2 = n;
                divmod10_asm(q2, tmp, m2);
                if (q1 != q2 || m1 != m2) {
                        errorcount++;
                }

Here is the complete code.   It should run on any AVR-based Arduino compatible board.

* divmod10_asm_test.zip (1.68 KB - downloaded 10 times.)
Logged

0
Offline Offline
God Member
*****
Karma: 26
Posts: 625
Always making something...
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

Opps, forgot this.  To run it on non-Teensy boards, you'll also have to add this .h file.  Or you could comment out the stuff that prints dots and blinks the LED, but that would not be any fun.

* elapsedMillis.h (4.2 KB - downloaded 52 times.)
Logged

Global Moderator
Netherlands
Offline Offline
Shannon Member
*****
Karma: 224
Posts: 13917
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset


would it not be faster to
Code:
for(unsigned long(n = 0; n < 0xFFFFFFFF; n++)
{
  divmod10_asm(n, d, m);
  if (n != (10*d + m)) Serial.println(i);
}
as division is quite expensive as we know smiley-wink
Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

0
Offline Offline
God Member
*****
Karma: 26
Posts: 625
Always making something...
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

That would be much faster, aside from a couple minor code problems.  But does only comparing the output to itself really form a valid test?

The slow way will complete in a few days, which really isn't very long in the grand scheme of things.
Logged

Global Moderator
Netherlands
Offline Offline
Shannon Member
*****
Karma: 224
Posts: 13917
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Quote
But does only comparing the output to itself really form a valid test?
no there is a small chance in theory that there exist a (d, m) pair that would also solve the n = 10*d + m equation   (e.g. n = 100, d = 0,  m = 100)
but given the fact that the code is a white box and we know how the code works this chance is minimal.

To prevent this theoretical issue one could add a test for m to be smaller than 10.

Code:
for(unsigned long(n = 0; n < 0xFFFFFFFF; n++)
{
  divmod10_asm(n, d, m);
  if (n != (10*d + m) || m >= 10 ) Serial.println(n);
}
Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Leeds, UK
Offline Offline
Edison Member
*
Karma: 80
Posts: 1729
Once the magic blue smoke is released, it won't go back in!
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

Even still, using a formula to prove itself is circular reasoning, which is not a proof at all. (http://en.wikipedia.org/wiki/Circular_reasoning)
Logged

~Tom~

0
Offline Offline
God Member
*****
Karma: 26
Posts: 625
Always making something...
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

The verification only needs to be run once, so it's probably not worth optimizing.  It's already been running over 12 hours and will be done in a couple days.

I believe a much better optimization effort would involve looking into why Arduino's Print takes 15 to actually do something with each digit that's generated in only 5 us.
Logged

Offline Offline
God Member
*****
Karma: 32
Posts: 507
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Is there any reason for avoiding using mul in the assembly code? I know not all atmels have mul (I think some attinys don't) but it's there on the Uno and Mega.

Here's a version using mul and fixed point arithmetic:
Code:
void setup() {
  Serial.begin(115200);
}

void loop() {
 
  uint32_t i=1;
  uint32_t q=0;
  uint8_t r=1;
  while(1){
    uint32_t j=i;
    uint8_t k;
    uint8_t t;
    asm volatile(
   
    // this code fragment works out i/10 and i%10 by calculating i*(51/256)*(256/255)/2 == i*51/510 == i/10
   
    // by "j.k" I mean 32.8 fixed point, j is integer part, k is fractional part
    // j.k = ((j+1.0)*51.0)/256.0
    // (we add 1 because we will be using the floor of the result later)
   
      " ldi %2,51     \n\t"
      " mul %A0,%2    \n\t"
      " clr %A0       \n\t"
      " add r0,%2     \n\t"     
      " adc %A0,r1    \n\t"
      " mov %1,r0     \n\t"
      " mul %B0,%2    \n\t"
      " clr %B0       \n\t"
      " add %A0,r0    \n\t"
      " adc %B0,r1    \n\t"
      " mul %C0,%2    \n\t"
      " clr %C0       \n\t"
      " add %B0,r0    \n\t"
      " adc %C0,r1    \n\t"
      " mul %D0,%2    \n\t"
      " clr %D0       \n\t"
      " add %C0,r0    \n\t"
      " adc %D0,r1    \n\t"
      " clr r1        \n\t" 
     
     
    // j.k = j.k*256.0/255.0
    // (actually the result will always be slightly low, which we need because we use the floor of the result)
     
      " add %1,%A0    \n\t"
      " adc %A0,%B0   \n\t"
      " adc %B0,%C0   \n\t"
      " adc %C0,%D0   \n\t"
      " adc %D0,r1    \n\t"
      " add %1,%B0    \n\t"
      " adc %A0,%C0   \n\t"
      " adc %B0,%D0   \n\t"
      " adc %C0,r1    \n\t"
      " adc %D0,r1    \n\t"
      " add %1,%D0    \n\t"
      " adc %A0,r1    \n\t"
      " adc %B0,r1    \n\t"
      " adc %C0,r1    \n\t"
      " adc %D0,r1    \n\t"     
     
   
    // j.k = j.k / 2.0   
   
      " lsr %D0       \n\t"
      " ror %C0       \n\t"
      " ror %B0       \n\t"
      " ror %A0       \n\t"
      " ror %1        \n\t"   
   
   
   // now i*51.0/510.0 < j.k < (i+1.0)*51.0/510.0
   // therefore j == i/10 (since j == floor(j.k))
   // and (k*10)>>8 == i % 10
     
      " ldi %2,10     \n\t"
      " mul %1,%2     \n\t"
      " mov %1,r1     \n\t"
     
      " clr r1        \n\t"


     
      :"+r"(j),"=d"(k),"=d"(t)
      :
      :  );
     
      if((j!=q) || (k!=r)){
        Serial.print(i);Serial.print(" ");
        Serial.print(j);Serial.print(" ");
        Serial.print(k);Serial.println(" ");
      }
      if((i & 16777215)==0){Serial.print(i);Serial.print(" ");Serial.println(millis());
        if(i==0)return;}
      i++;
      r++;if(r==10){r=0;q++;}
     
  }
}

I've tested it on various values but the full test will take 6 hours to run.

My way of testing for correctness is to keep track of the expected quotient and remainder, incrementing the remainder each time the divisor is incremented and when the remainder gets to 10, increase the quotient:

Code:

  uint32_t i=1;
  uint32_t q=0;
  uint8_t r=1;
  while(1){

 [[ work out div/mod 10 of i here and check against q and r]]

      i++;
      r++;if(r==10){r=0;q++;}
     
  }


This is fast and simple, and I hope it's agreed that it is obviously a correct way of checking.
Logged


Global Moderator
Netherlands
Offline Offline
Shannon Member
*****
Karma: 224
Posts: 13917
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

@stimmer
Can you wrap your code in a similar function as divmod10() and run a performance test for comparison?

please use Toms code as starter - http://forum.arduino.cc/index.php?topic=167414.msg1288779#msg1288779 -
« Last Edit: June 26, 2013, 03:26:58 pm by robtillaart » Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Offline Offline
God Member
*****
Karma: 32
Posts: 507
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

As a normal function it would be dominated by overhead and marshalling. As an inline function or macro it should take 3us. The calculation is 43 instructions long, 5 of which are mul, so it takes 48 cycles in total. In a quick real-world test the result I got was 3.1245us as an inline function.

The full 32-bit range test has now completed without error smiley
Logged


Leeds, UK
Offline Offline
Edison Member
*
Karma: 80
Posts: 1729
Once the magic blue smoke is released, it won't go back in!
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

Lol. This is starting to get insane.

Credit where credit is due, thats an amazing optimisation stimmer  smiley-grin.
Logged

~Tom~

Pages: 1 ... 3 4 [5] 6 7 8   Go Up
Jump to: