AVR vs ARM

Hello,
I got my new DUE and wanted to do some speed tests.

I was looking for primes with my Arduinos.
All boards are tested with the same code.
The time I got from millis ().

These are the results:

from 0 to ... UNO R3 (AVR) Mega 2560 (AVR) Due (ARM)
100 00:00:00.044 00:00:00.044 <00:00:00.000
1000 00:00:02.968 00:00:02.977 00:00:00.014
10000 00:03:39.441 00:03:40.180 00:00:01.093
100000 04:48:12.416 04:51:05.177 00:01:23.571
1000000 too long too long 01:56:23.084

The Uno and Mega are minimal different because the internal counters are not equal clocked. (capacitor inaccuracy).

In my view the DUE is much better when dealing with calculations.
But since there are only a few libraries for him, he's probably still quite a while my second choice.

My test Code:

//#########################################################################################
byte initLED = 5;                   //show the "boot"                                    ##
byte runLED = 6;                    //show the "main" runing                             ##
byte finishLED = 7;                 //show the program is finish                         ##
byte speaker = 8;                   //the speaker pin                                    ##
byte silentSpInterrupt = 0;         //the interrupt number for the silenc button         ##
boolean speakerSilent = true;      //true: speaker is silent; false: speaker sounds      ##
//#########################################################################################
const unsigned long n = 100;     //Suchfeld = [0; n]                                     ##
const byte w = 10;      //wordwrap after "w" primes                                      ##
const boolean printSerial = false;   // true: print Prims; false: don't print Prims      ##
//#########################################################################################

unsigned long amountOfPrim;  //How much Prims in the Intervall I = [0; n]
unsigned long beginTime;     //begin time in ms
unsigned long endTime;       //end time ion ms
boolean finish = true;       //give the OK for the speaker

void setup() {
  
  pinMode(initLED, OUTPUT);
  pinMode(runLED, OUTPUT);
  pinMode(finishLED, OUTPUT);
  pinMode(speaker, OUTPUT);
  
  digitalWrite(initLED, HIGH);
  
  Serial.begin(115200);                             //Initializing Serial
  Serial.println("");
  Serial.println("#########################################################");
  Serial.print("Range of numbers:  I = [0; ");
  Serial.print(n);
  Serial.println("]");
  if(printSerial)
    Serial.println("The results are shown (slowly)");
  else
    Serial.println("The results aren'n shown (faster)");
    
  Serial.println();
  Serial.println("Start :)");
  Serial.println();
  
  digitalWrite(initLED, LOW);
  digitalWrite(runLED, HIGH);
  
  beginTime = millis();

  amountOfPrim = findPrim(w, n, printSerial);

  endTime = millis();
  
  Serial.println();
  
  unsigned long needTime= endTime-beginTime;              //needed Time in millis
  Serial.print("Time needed:\r\n");
  Serial.print(needTime);
  Serial.println(" ms");
  if (needTime < 36000000)
    Serial.print("0");
  Serial.print(needTime / 3600000);
  needTime = needTime % 3600000;
  Serial.print(":");
  if (needTime < 600000);
    Serial.print("0");
  Serial.print(needTime / 60000);
  needTime = needTime % 60000;
  Serial.print(":");
  if (needTime < 10000)
    Serial.print("0");
  Serial.print(needTime / 1000);
  needTime = needTime % 1000;
  Serial.print(".");
  if (needTime < 100)
    Serial.print("0");
  if (needTime < 10)
    Serial.print("0");
  Serial.println(needTime);
  
  Serial.println();
  Serial.print("I = [0; ");
  Serial.print("n");
  Serial.print("] includes  -> ");
  Serial.print(amountOfPrim);
  Serial.print(" <- primes.");
  

  
  digitalWrite(runLED, LOW);
  digitalWrite(finishLED, HIGH);
}


void loop(){/*
  while(finish && !speakerSilent){
    tone(speaker, 349);
    delay(500);
    noTone(speaker);
    delay(100);
  }*/
}

unsigned long findPrim(const unsigned long zeilenumbruch,const  unsigned long numberRange, const boolean printSer){
  unsigned long numberOfPrim = 0;
  unsigned long toProfe = 1;
  byte tempZeilenumbruch = 0;
  boolean isPrim = true;
  
  for (int i=0; i < numberRange; i++){
    
    isPrim = true;             //reste isPrim
    toProfe++;                  //toProfe inkrementieren -> startet bei 1

    for (int j = 2; j < toProfe && isPrim; j++){
      if (toProfe % j == 0){
        isPrim = false;
      }
    }
    
    if(isPrim){
      if (printSer){
        Serial.print(toProfe);                        //Gibt Primzahl aus
        Serial.print("; ");
        tempZeilenumbruch++;
        if( tempZeilenumbruch >= zeilenumbruch){      //Zeilenumbruch nach der "zeilenumbruch"sten Zahl
          Serial.println();
          tempZeilenumbruch = 0;
        }
      }
      numberOfPrim++;
    }
  }
  Serial.println();
  return numberOfPrim;
}

JanoNoPro:
The Uno and Mega are minimal different because the internal counters are not equal clocked. (capacitor inaccuracy).

Maybe so, but the Mega pays a price for having more code space, some of the branching instructions take 1 additional cycle over smaller AVR's. You make use of these instructions too ( CALL, RET, RETI, ... ) so a difference in execution time is expected.

Unsigned long is basically the native register in ARM and it's 4 registers for AVR. A lot of the difference in the calcuation time, other than the clock speed and number of instructions you get per clock cycle (that I think is equal, though) is going to be the fact that for any given integer calculation on the ARM it is going to be 1 clock and on the AVR it is going to be a bunch of clocks putting together all the pieces of that one integer calculation. To move something from memory to register or register to memory is going to be the same. Note that for a lot of normal embedded software integer arithmatic can use smaller variables, sometimes just unsigned shorts, for looping (under 256 iterations), etc. So in those cases the results may be more similar. Also, if the bottleneck of your sketch, for example, is to drive an I2C bus at a few 100 Khz, you won't see any improvement at all. But otherwise, yes, modern 32 bit processors are faster clock per clock than 8 bit ones. You should see what a FPU does versus floating point in software, similar results.

JanoNoPro:
I was looking for primes with my Arduinos.

You chose something 8-bit AVRs are particularly bad at... (ie. 32-bit math)