integer division timing anomoly

Hi all

I was doing some research into integer division on the Arduino Uno, and trying to determine the fastest methods of solving division.
In readings, I came across this post.
I grabbed and modified the sketch that Nick Gammon posted in reply #4, the ultimate aim being to compile a list of the computations that I need to perform, with their timings.

Here is my modified sketch:

#define DATATYPE byte
#define OPERATION i / j

volatile DATATYPE x=0;
volatile DATATYPE y=0;
//volatile DATATYPE j=7;
unsigned long time;
unsigned long loop_time;

void setup ()
{
    Serial.begin(115200);
    Serial.println ();

    time = micros();
    for (DATATYPE i = 1; i < 101; i++) {
        //for (DATATYPE j = 1; j < 101; j++) {
        for (DATATYPE j = 56; j < 57; j++) {
            y++;
        } // end of for j
    }  // end of for i
    loop_time = micros() - time;
    Serial.print("Loop baseline Completion time: ");
    Serial.print(loop_time);
    Serial.print(" ");

    time = micros();
    for (DATATYPE i = 1; i < 101; i++) {
        //for (DATATYPE j = 1; j < 101; j++) {
        for (DATATYPE j = 56; j < 57; j++) {
          y++;
            x = OPERATION;
        } // end of for j
    }  // end of for i
    time = micros() - time;
    Serial.print("Completion time: ");
    Serial.print(time);
    Serial.print(" Operation time: ");
    time = time - loop_time;
    Serial.println(time);
} // end of setup

void loop() {}

The output of this is:

Loop baseline Completion time: 52 Completion time: 644 Operation time: 592

This essentially gives a time of 5usec for each division operation using bytes.

However, changing two loops from :

for (DATATYPE j = 56; j < 57; j++) {

to

for (DATATYPE j = 57; j < 58; j++) {

gives the following output:

Loop baseline Completion time: 52 Completion time: 108 Operation time: 56

I have tried a few different vales for the inner loop (7, 56, 58, 3) etc

3 gives a similar result, but 7, 11 and 13 do not

Can anyone tell me why there would be such a big difference in the times taken to perform the operation?

Cheers...

Problem #1: You are not only measuring the OPERATION you are also measuring the amount of time it takes for the interrupt service routine to shuffle bytes out the serial port.

Problem #2: You may also be measuring the time it takes for the timer 0 overflow interrupt to run.

Hi.
Thanks for the feedback

I'm not sure that I get you there.
I have 4 micros() function calls, each pair brackets a loop, but does not include the serial output.

Could you clarify that please? Do you mean that I might be facing an overflow issue with micros()?

micros() will not overflow/reset until +/-70 minutes after the program starts.
this program does not take anywhere near that long to compete.

We can come back to interrupts. Before that...

darrob:
I was doing some research into integer division on the Arduino Uno, and trying to determine the fastest methods of solving division.

What leads you to believe there are multiple implementations of integer division with AVR-Libc?

Hi

I have made no such assumption.
I was looking at methods I could use in my code to produce faster results.
It was just an Anomaly that I noticed while playing with that particular sketch

darrob:
I have made no such assumption.

Then why bother performing timing tests on integer division? If there is one implementation and you need to divide two numbers you have one choice.

The timing was to determine a couple of things for myself.
If you looked at the thread I got this sketch from in the first place, you can see that there are speed differences based on the data type used.
I wanted to find the timings for different divisions to see if they could be solved quickly enough for my application.
Regardless of why I am testing, (or even if I should test), I would still be interested to know why the timings on those two particular numbers take a lot less time than the timings on other numbers.

Cheers

darrob:
I would still be interested to know why the timings on those two particular numbers take a lot less time than the timings on other numbers.

Back to interrupts...

Start at the top of your source code. Work your way towards the bottom. At what point within your code can an interrupt service routine run?

darrob:
I was looking at methods I could use in my code to produce faster results.

Avoid using division. If at all possible use a combination of right shifts and multiplication.

And if you are working with integers make sure to avoid overflow or underflow - time isn't everything.

From some tests I did a while ago IIRC an unsigned long division takes much the same time as a float division - which rather surprised me.

...R

Hi
I'm sorry but I'm still not clear what you're suggesting.

Unless I've totally misunderstood how interrupts on the Arduino Uno work, nothing in my code blocks an interrupt, and nothing that an interrupt does will terminate a those loops prematurely

An ISR could run any time, but I can't see how I would be timing timer0 overflow interrupt only in the case of j=3 and j=57.

Robin2:
Avoid using division. If at all possible use a combination of right shifts and multiplication.

And if you are working with integers make sure to avoid overflow or underflow - time isn't everything.

From some tests I did a while ago IIRC an unsigned long division takes much the same time as a float division - which rather surprised me.

...R

Hi

This was one of the reasons that brought start investigating.
Most of the numbers I'll be dealing with are signed long integers, and a comment I remember seeing regarding the length of time re long vs float division got me to try some comparisons.

Hence the code...
Hence the anomaly...
Hence the question :slight_smile:

Hello darrob

For

x = OPERATION;

With option -Os the compiler generates (here):

with 56/57

 6a2:	82 2f       	mov	r24, r18
 6a4:	63 2f       	mov	r22, r19
 6a6:	0e 94 d2 03 	call	0x7a4	; 0x7a4 <__udivmodqi4>
 6aa:	80 93 59 01 	sts	0x0159, r24	; 0x800159 <x>

with 57/58

  6a2:	82 9f       	mul	r24, r18
 6a4:	91 2d       	mov	r25, r1
 6a6:	11 24       	eor	r1, r1
 6a8:	96 95       	lsr	r25
 6aa:	90 93 59 01 	sts	0x0159, r25	; 0x800159 <x>

Regards,
bidouilleelec

What is the purpose of this line

for (DATATYPE j = 56; j < 57; j++)

Why should the datatype for that loop counter change when you want to consider the time for divisions of different datatypes.

I don't see the point of changing the values used in the division calculation. If you suspect that the values involved in the calculation have an effect on the duration of the calculation then try some specific different calculations to test that hypothesis.

Separately, I reckon if I was trying to figure out the time of a calculation I would measure it over 1000 or 10,000 iterations

I would also do some untimed tests to make sure that the answers to the calculations are correct.

...R

bidouilleelec:
Hello darrob

For

x = OPERATION;

With option -Os the compiler generates (here):

with 56/57

 6a2: 82 2f       mov r24, r18

6a4: 63 2f       mov r22, r19
6a6: 0e 94 d2 03 call 0x7a4 ; 0x7a4 <__udivmodqi4>
6aa: 80 93 59 01 sts 0x0159, r24 ; 0x800159




with 57/58


6a2: 82 9f       mul r24, r18
6a4: 91 2d       mov r25, r1
6a6: 11 24       eor r1, r1
6a8: 96 95       lsr r25
6aa: 90 93 59 01 sts 0x0159, r25 ; 0x800159




Regards,
bidouilleelec

Hi bidouilleelec

So in the case of 57/58 (and I suppose 3/4)
The compiler is optimising the code to use reciprocal multiplication and bit shifting
instead of unsigned division

one more instruction, but takes less time.

Thank you very much for that :slight_smile:

What did you use to generate the ASM code?

darrob:
What did you use to generate the ASM code?

%ardudump%avr-objdump.exe --include=%adresseElf% -h -S -d -l %arducomp%%fichierino%.ino.elf > %ardustock%%fichierino%_elf.lst

Hi Robin2

Robin2:
What is the purpose of this line

for (DATATYPE j = 56; j < 57; j++)

When I first started out with this, I found the sketch that Nick Gammon posted.
I just copied it and ran it before starting to play with it.

I wanted, ultimately to test signed and unsigned longs, but I like to make small changes and
run to see that I've not made (too great) mistakes.

anyway, I made some changes to the loops, and decided that I wanted to test dividing by one number.
I chose a number at random, which was 57.

The difference in what I got, to what Nick reported was enough for me to play some more with this section, wanting to understand what was going on, rather than to ignore it.

Robin2:
Why should the datatype for that loop counter change when you want to consider the time for divisions of different datatypes.

As I said, I make small changes and run. My first aim was to change the loops and timings to suit my eventual needs.

Robin2:
I don't see the point of changing the values used in the division calculation. If you suspect that the values involved in the calculation have an effect on the duration of the calculation then try some specific different calculations to test that hypothesis.
Separately, I reckon if I was trying to figure out the time of a calculation I would measure it over 1000 or 10,000 iterations

I would also do some untimed tests to make sure that the answers to the calculations are correct.

...R

Again, small changes and test to see if things work :slight_smile:
The answers (in themselves) are not terribly important at this stage. I just wanted to see how long a division would take.
A small sample is quite fine for me at the moment, just to give me an idea of what was going on.

What it has shown is that the compiler apparently optimises specific denominators into reciprocal multiplication and bitshifts. That may be useful to know and exploit (If I was clever enough to do so)

bidouilleelec:
%ardudump%avr-objdump.exe --include=%adresseElf% -h -S -d -l %arducomp%%fichierino%.ino.elf > %ardustock%%fichierino%_elf.lst

Thank You :slight_smile:

darrob:

Why should the datatype for that loop counter change when you want to consider the time for divisions of different datatypes

As I said, I make small changes and run. My first aim was to change the loops and timings to suit my eventual needs.

I think you are missing my point. IMHO the datatype used for the loop counter should be quite independent from the datatype you are testing.

The answers (in themselves) are not terribly important at this stage.

I would not be content testing code that does not give sensible answers. The compiler might be cleverer than you and optimize out some code.

...R

Robin2:
As I said, I make small changes and run. My first aim was to change the loops and timings to suit my eventual needs.
I think you are missing my point. IMHO the datatype used for the loop counter should be quite independent from the datatype you are testing.

Ah, I (think I) get what you're saying.

The counters serve both as counters and as Dividend and Divisor.
Remember, this was taken from a previous test and is being modified to suit.
The code is simple and easy to read. One define will change what is being tested. Neat!

Robin2:
I would not be content testing code that does not give sensible answers. The compiler might be cleverer than you and optimize out some code.

...R

True. But again, at the time of posing this question, I was more interested in the results of the timing rather than the results of the operation. All I wanted was to find out how long dividing one number by another number took. The compiler did optimise out code, but it turned out to be serendipitous.

Cheers.

darrob:
True. But again, at the time of posing this question, I was more interested in the results of the timing rather than the results of the operation.

You are making the assumption that the two are independent. If I was in your shoes I would like to see proof of that .

...R