Sorting algorithms

Hi,
I am trying to teach my son the difference between the sorting algorithms therefore I did a small program on Arduino IDE that I am running on an Arduino NANO. Besides sorting I am calculating as well the numbers of cycles and the time needed by the algorithm needed to sort the vector. At the end I send the results on the serial port.
The code is:

// Sorting algorithm reverse
int a[200];
int i,j,temp,cycle;
unsigned long micro1,micro2;

void setup() {
Serial.begin (57600);
}

void loop()
{
randomSeed(micros());
for (i=0;i<200;i++)
a = random(-9999,9999);
cycle=1;
micro1 = micros();
for (i=0;i<199;i++)
{

  • for(j=199;j>i;j--)*
  • {*
    _ if (a*>a[j])_
    _
    {_
    _ temp = a ;
    a = a [j];
    a [j] = temp;
    }
    cycle++;
    }
    }
    micro2 = micros();
    for (i=0;i<200;i++)
    {
    Serial.print("a[");
    Serial.print(i);
    Serial.print("]=");_

    _Serial.println(a);
    }
    Serial.print("Sort Duration: ");
    Serial.print(micro2-micro1);
    Serial.println("uS");
    Serial.print("Cycles: ");
    Serial.println(cycle);
    delay (1000);
    }
    and the result of the sorting algorithm is:
    .....
    a[195]=9319
    a[196]=9516
    a[197]=9657
    a[198]=9794
    a[199]=9854
    Sort Duration: 24548uS
    Cycles: 19901
    Now, the strange thing. If I modify a little bit the algorithm and in the sense that in the second iteration loop instead of going from the last element to the first one, i go from the first to the last, although the number of cycles required by the algorithm to sort the vector remains the same, the duration is almost double. i do not understand why. It does not make any sense for me. Below the modified algorithm and the results.
    // Sorting algorithm direct*

    int a[200];
    int i,j,temp,cycle;
    unsigned long micro1,micro2;
    void setup() {
    Serial.begin (57600);
    }
    void loop()
    {
    randomSeed(micros());
    for (i=0;i<200;i++)
    a = random(-9999,9999);
    cycle=1;
    micro1 = micros();
    for (i=0;i<199;i++)
    {
    for(j=i+1;j<200;j++)
    * {
    if (a>a[j])
    {
    temp = a ;
    a = a [j];
    a [j] = temp;
    }
    cycle++;
    }
    }
    micro2 = micros();
    for (i=0;i<200;i++)
    {
    Serial.print("a[");
    Serial.print(i);
    Serial.print("]=");_

    _Serial.println(a);
    }
    Serial.print("Sort Duration: ");
    Serial.print(micro2-micro1);
    Serial.println("uS");
    Serial.print("Cycles: ");
    Serial.println(cycle);
    delay (1000);
    }
    .....*

    a[197]=9783
    a[198]=9827
    a[199]=9959
    Sort Duration: 38388uS
    Cycles: 19901
    Any idea?
    Many thanks.
    Florin_

Two things:

  1. Did you made that code italic on purpose? Don't think so :wink: Please read How to use the forum and edit your post to have code-tags.
  2. Using your e-mail address as a public forum name is a bit stupid... You can probably ask a mod to change it.

And because of 1), reading the code now is to hard for me to find the problem.

Hi, yes you are right. I could have used another name. I have no clue how to change it.
Regarding the code - here you have it in the recommended format:

Direct Sorting

// Sorting algorithm direct
int a[200];
int i,j,temp,cycle;
unsigned long micro1,micro2;

void setup() {
Serial.begin (57600);
}

void loop() 
{
randomSeed(micros());
for (i=0;i<200;i++)
a [i]= random(-9999,9999);
cycle=1;
micro1 = micros();
for (i=0;i<199;i++)
{
  for(j=i+1;j<200;j++)
  {
      if (a[i]>a[j])
      {
        temp = a [i];
        a [i] = a [j];
        a [j] = temp;
      }
        cycle++;     
    }  
}
micro2 = micros();
for (i=0;i<200;i++)
{
Serial.print("a[");
Serial.print(i);
Serial.print("]=");
Serial.println(a[i]);
}
Serial.print("Sort Duration: ");
Serial.print(micro2-micro1);
Serial.println("uS");
Serial.print("Cycles: ");
Serial.println(cycle);
delay (1000);
}

This is the output on the serial monitor
.........
a[195]=9626
a[196]=9671
a[197]=9699
a[198]=9762
a[199]=9869
Sort Duration: 38712uS
Cycles: 19901

and this is the second version and the relevant output of the serial monitor.

// Sorting algorithm reverse

int a[200];
int i,j,temp,cycle;
unsigned long micro1,micro2;

void setup() {
Serial.begin (57600);
}

void loop() 
{
randomSeed(micros());
for (i=0;i<200;i++)
a [i]= random(-9999,9999);
cycle=1;
micro1 = micros();
for (i=0;i<199;i++)
{
  for(j=199;j>i;j--)
  {
      if (a[i]>a[j])
      {
        temp = a [i];
        a [i] = a [j];
        a [j] = temp;
      }
        cycle++;     
    }  
}
micro2 = micros();
for (i=0;i<200;i++)
{
Serial.print("a[");
Serial.print(i);
Serial.print("]=");
Serial.println(a[i]);
}
Serial.print("Sort Duration: ");
Serial.print(micro2-micro1);
Serial.println("uS");
Serial.print("Cycles: ");
Serial.println(cycle);
delay (1000);
}


And this is the result:

.......

a[194]=9588
a[195]=9650
a[196]=9817
a[197]=9856
a[198]=9989
a[199]=9992
Sort Duration: 24420uS
Cycles: 19901

Florin

Run out and buy Sedgwick's "Algorithms in C" Parts 1-4. You might not be able to get big enough datasets on the arduino to really make a difference.

Run both versions 10 times each. I think you'll see which one is faster is different for different runs, which means you're simply seeing the effects of the random distribution of values when you initialize the array. A single run is meaningless. Plus, it appears to me in the first code you posted, the "forward" sort is faster than the "reverse" sort, while in the second code you posted, it is just the opposite, which would prove my point.

Regards,
Ray L.

i did run both algorithms more than 20 times. I have indeed small deviations but consistently one algorithm is 12 ms fastest than the other. Although the number of cycles to sort the vector is the same.
Like when you go in the second loop from the last element to the first one you are faster than when you go from first element to last.
Kindly ask you you to do the same test on your board. You need just the board. Check the result for yourself.

Florin

Well I'm not sure what it is you're expecting to find. The only variable that affects the sort time is the distribution of the data, which affects how many "swaps" it does. NOTHING else is changing, other than possibly some small effects of slightly different compiler optimizations. Try filling the array in reverse order, or try filling it with canned data, rather than random numbers, and you'll see the same kind of difference. Keep in mind the random() function will return EXACTLY the same sequence of numbers every time, for any given seed. The way you're seeding is likely to product pretty consistent results. Try instead seeding when the users types a key, or something similar that will ensure you get very different seeds on each run.

Regards,
Ray L.

What the heck. Here is Table 6.1 from Sedgewick and his comments. As you can see for small data sets the numbers are not very different. (the numbers are times for execution):

Table 6.1
Empirical Study of elementary sorting algorithms
Insertion Sort and selection sort are about twice as fast as bubble sort for small files, but running times grow quadratically (when the file size grows by a factor of 2, the running time grows by a factor of 4). None of the methods for large randomly ordered files - for example, the numbers corresponding to those in this table are less than 2 for the shellsort algorithm given in section 6.6. When comparisons are expensive - for example when the two keys are strings - then insertion sort is much faster than the other two since it uses fewer comparisons. Not included here is when exchanges are expensive, then selection sort is best.
32 bit integer keys | string keys
N S I* I B B* S I B
1000 5 7 4 11 8 13 8 19
2000 21 29 15 45 34 56 31 78
4000 85 119 62 182 138 228 126 321

Key:
S Selection Sort
I* Insertion Sort, exchange based
I Insertion Sort
B Bubble Sort
B* Shaker Sort

Thank you all for your feedback and time.
Still puzzled with the result. I tried with a smaller set of numbers - a vector with the dimension of 20.
I also eliminated the random function, i have the same values in the vector all the time.
The difference remains. I simply do not get it. The algorithm is the same in both cases. I am simply modifying the second iteration loop - instead of high to low i go with the index from low to high. Look below at the results, and are consistent whatever I do. It may be, as one of you mentioned, that the compiler is optimizing differently the code. Anyway thanks again. Here you have the code. If somebody has any idea let me know. Or if you have time, try the code on your board. Unfortunately I have only one NANO.

  1. Method 1 - index in the second loop goes from low to high.
int a[20] = {2,7,3,89,-123,-777,0,7654,234,-8,4,67,89,345,147,2344,55,2345,5,33};
int i,j,temp,cycle;
unsigned long micro1,micro2;

void setup() {
Serial.begin (57600);
}

void loop() 
{
//randomSeed(micros());
//for (i=0;i<200;i++)
//a [i]= random(-9999,9999);
cycle=1;
micro1 = micros();
for (i=0;i<19;i++)
{
[color=red]  for(j=i+1;j<20;j++)[/color]
  {
      if (a[i]>a[j])
      {
        temp = a [i];
        a [i] = a [j];
        a [j] = temp;
      }
        cycle++;     
    }  
}
micro2 = micros();
for (i=0;i<20;i++)
{
Serial.print("a[");
Serial.print(i);
Serial.print("]=");
Serial.println(a[i]);
}
Serial.print("Sort Duration: ");
Serial.print(micro2-micro1);
Serial.println("uS");
Serial.print("Cycles: ");
Serial.println(cycle);
delay (1000);
}


RESULT ON THE SERIAL MONITOR:

a[0]=-777
a[1]=-123
a[2]=-8
a[3]=0
a[4]=2
a[5]=3
a[6]=4
a[7]=5
a[8]=7
a[9]=33
a[10]=55
a[11]=67
a[12]=89
a[13]=89
a[14]=147
a[15]=234
a[16]=345
a[17]=2344
a[18]=2345
a[19]=7654
[color=red]Sort Duration: 316uS
Cycles: 191[/color]
  1. Method 2 - index in the second loop goes from high to low.
int a[20] = {2,7,3,89,-123,-777,0,7654,234,-8,4,67,89,345,147,2344,55,2345,5,33};
int i,j,temp,cycle;
unsigned long micro1,micro2;

void setup() {
Serial.begin (57600);
}

void loop() 
{
//randomSeed(micros());
//for (i=0;i<200;i++)
//a [i]= random(-9999,9999);
cycle=1;
micro1 = micros();
for (i=0;i<19;i++)
{
 [color=red] for(j=19;j>i;j--)[/color]
  {
      if (a[i]>a[j])
      {
        temp = a [i];
        a [i] = a [j];
        a [j] = temp;
      }
        cycle++;     
    }  
}
micro2 = micros();
for (i=0;i<20;i++)
{
Serial.print("a[");
Serial.print(i);
Serial.print("]=");
Serial.println(a[i]);
}
Serial.print("Sort Duration: ");
Serial.print(micro2-micro1);
Serial.println("uS");
Serial.print("Cycles: ");
Serial.println(cycle);
delay (1000);
}

RESULT ON THE SERIAL MONITOR:

a[0]=-777
a[1]=-123
a[2]=-8
a[3]=0
a[4]=2
a[5]=3
a[6]=4
a[7]=5
a[8]=7
a[9]=33
a[10]=55
a[11]=67
a[12]=89
a[13]=89
a[14]=147
a[15]=234
a[16]=345
a[17]=2344
a[18]=2345
a[19]=7654
[color=red]Sort Duration: 240uS
Cycles: 191[/color]

florin.petolea@gmail.com:
Thank you all for your feedback and time.
Still puzzled with the result.

When it comes to sorting, I found out, that the IDE's built-in "qsort()" function is lightning fast.

qsort() is part of the AVR LIBC library on the Atmega hardware platforms.

While not so easy to use (using callback function, pointers), qsort() is an hand-optimized "Quicksort algorithm in Assembler code.

When programming Atmegas, you hardly will find any faster sorting function.

Need an example, how to use the "qsort()" function on an array of int values?
When sorting more than 10 values, §qsort()" should always be the fastest function on Atmega controllers.

Clue: You've counted the number of times through the loop - now count the number of times you do the swap.

I would guess the compiled size of both of your programs are slightly different. If so, try compiling with no optimization and see what happens.

Try reversing the order of the data and run them both again.