Go Down

Topic: Benchmarking the Due (Read 381 times) previous topic - next topic

semipro

I saw a post on an arduino forum that benchmarked the arduino due vs mega. I then decided to make my own that gives you a "score" based on how fast your arduino is.

Currently, it uses 4 different for loops, two of which simply test floating point performance, one of which tests trigonometry performance, and the last of which tests RAM write speeds(I am not an expert at doing this so reply if you have a better method). Then the program adds up the four times it takes to do the four tests and divides it into a large macro variable.

I would like to try this with all arduinos, but as I only have a due, uno, and micro that is impossible.

My uno got a score of 563.
My micro got a score of 559.
My due got a score of 3301.

This shows that the due is much faster than the standard arduinos. But when I look at the individual test times, the arduino due is much faster than micro and uno in everything except trig, where it is only about twice or thrice as fast. This is still much faster, but does anyone know why the due is so inneficient at doing trig calculations?

Here is my code(it is also on github at https://github.com/awesommee333/Arduino-Benchmark):

______________________________________________________________________________________
#define ITER 200000L
#define SCORE_DIVIDER 99999999L
#define UNO_SCORE 563L// I might eventually use this to guess which arduino you are benchmarking
#define MICRO_SCORE 559L
#define DUE_SCORE 3301L
#define MALLOC_SIZE 256
void printFloat(float value, unsigned long precision){
   Serial.print(int(value));
   Serial.print(".");
   unsigned long frac;
   if (value >= 0)
      frac = (value - long(value)) * precision;
   else
      frac = (long(value) - value) * precision;
   Serial.println(frac, DEC);
}
void setup() {
   Serial.begin(57600);
   long Millis=millis();
   long increment = 0;
   float f = 0.0f;
   Serial.println("checking float performance using random");
   for (; increment < ITER; increment++) {
      f = random(0.0f, (float)increment);
   }
   Serial.print("rand performance time taken: ");
   Serial.println(millis() - Millis);
   Millis = millis();
   Serial.println("now calculating PI using Newtonian method...");
   long Millis2 = millis();
   float pi = 0.0f;
   float multiplier = 1.0f;
   for (float incrementf = 0.0f, increment=0; increment < ITER; incrementf+=1.0f, increment++){
      pi += 4.0f / (incrementf*2.0f + 1.0f)*multiplier;
      multiplier *= -1.0f;
   }
   Serial.print("PI estimating time: ");
   Serial.print(millis() - Millis2);
   Millis2 = millis();
   Serial.print(" PI estimate: ");
   Serial.println(pi, 8);
   Serial.println("About to test polar to rectangular coordinate conversion: ");
   long Millis3 = millis();
   float X = 0.0f;
   float Y = 0.0f;
   f = 0.0f;
   for (long i = 0L; i < ITER/3; i++, f+=PI/1000.0f){
      X = cos(f)*((float)i)/1000.0f;
      Y = sin(f)*((float)i) / 1000.0f;
   }
   Serial.print("Polar to rect coords converting time: ");
   Serial.println(millis() - Millis3);
   Serial.print("X: ");
   Serial.println(X, 8);
   Serial.print("Y: ");
   Serial.println(Y, 8);
   Millis3 = millis();
   Serial.println("now testing mem write speed(mallocing and freeing pointers):");
   long Millis4;
   for (long i = 0; i < ITER/4; i++){
      char *j = (char*)malloc(MALLOC_SIZE);
      char *f = j;
      for (int i = 0; i < MALLOC_SIZE; i++){
         *f = 'n';//just some random value to initialize the whole array of chars with
         f++;
      }
      free(j);
   }
   Serial.print("Ram test finished in: ");
   Serial.print(millis() - Millis4);
   Serial.println(" milliseconds");
   Serial.print("malloc size is: ");
   Serial.println(MALLOC_SIZE);
   Millis4 = millis();
   long score = Millis + Millis2 + Millis3+Millis4;
   score = SCORE_DIVIDER / score;
   Serial.println("Your arduino's score is: ");
   Serial.print(score);
}

void loop() {
}
______________________________________________________________________________________

semipro

I was using the debugger in visual micro, so the scores for all the arduinos were worse.
The Due actually had 4304.
The Uno had 590.
The micro had 486.
The Due took 6824 ms to do the Polar to rectangular(trig) calculations and the Uno took 15763 ms, so it was still a pretty big improvement, but it was only about 3 times larger and as you can see with the score it was almost 10 times better, including the Polar to rectangular calculations. Also I was kind of surprised how much faster the ram speeds were, since I expected only the CPU to be faster, or does it take lots of cpu to initialize variables?

MorganS

Code: [Select]
   Serial.print("rand performance time taken: ");
   Serial.println(millis() - Millis);

Oops!

You do know the Due is absolutely terrible at serial comms? I'm surprised that it's not the worst of the bunch if you do the calculation this way.

In case my point is missed: the end time for the loop under test should be taken from millis() immeidately after the loop, with no extra code like Serial.print() run before you take that sample.

MarkT

You need to capture the times using micros(), not millis(), and you must have no
calls to Serial inside the timing window or you'll just measure the baudrate!
[ I won't respond to messages, use the forum please ]

MorganS

Millis4 doesn't get assigned a value before the RAM test. The 'RAM test' will just measure how long the program has taken up until this point.

You know you can use a variable more than once? Just create a StartTime and FinishTime at the top and use those.

semipro

So I fixed the RAM test part, that it does not record the time.
Also I added a Start and an end time, but wouldn't that take in account the Serial, so wouldn't it be slightly less accurate?
I also added one more benchmark for the due, an integer benchmark, and I go results that surprised me a lot. The due did the benchmark in 243 milliseconds, while the arduino uno took almost a minute to do that benchmark, making the due about 213 times better at integer calculations. This completely does not conform with all the other results, so did I do something wrong?(I attached the new code so as not to take up too much space).
You need to capture the times using micros(), not millis(), and you must have no
calls to Serial inside the timing window or you'll just measure the baudrate!
Wouldn't changing millis() to micros() be too big for the longs to store the time accurately?

semipro

#6
Apr 08, 2015, 03:10 am Last Edit: Apr 08, 2015, 05:27 am by semipro
I just searched up the amount a long can hold in arduino and it is much larger than micros will ever be, so ignore my previous comment. I will change it to micros.


semipro

Could anyone please explain why the due is so much faster than the uno at integer computations, while it is only a few times faster than the uno at floating point computations(calculating pi).
 Aren't floating points just integers raised to some power, so should the performance not change too drastically.
Also the due has only like 5 times faster clock speed, and thus wouldn't it only at maximum 10 times faster than the uno at long calculations(since the uno is 1 byte processor and long is 4 bytes and for the due the processor is 4 bytes, while longs are 8 bytes)?
Also is the reason why it is only 3 times faster at trig because trig returns doubles?

bobcousins

The Due has a Flash buffer to optimize access ("memory accelerator" rather than cache), so if you have a tight loop that might make a difference. Floating point will require functions calls which might not utilize the buffer effectively.

http://en.wikipedia.org/wiki/Floating_point

A long is 4 bytes.

The execution time is largely proportional to the number of instructions and memory accesses required. It can be hard to guess from the C source code what assembler is being generated, so it is well worth looking at that. Your integer test is likely done mostly in registers. Smart optimisation could remove some of the steps.

Remember that the 8 bit requires a lot of extra code to do 4 byte arithmetic, it's far more than 4 times slower. When you add everything up, 280 times slower is not unreasonable.

I think you are right about float vs double. You could try with sinf and cosf which are the single precision versions.
Please don't PM me asking for help. Ask questions in the forum.

MorganS

You are not measuring what you think you are measuring. Fix your code and then we can make meaningful comparisons.

First, make sure that your start and end times are NOT including the Serial writes. If you are measuring Serial, then do that, but don't mix in this integer/float code in the middle.

Some of your times you add up in your final benchmark are measuring time-up-until-now. Millis4 is the most obvious to me.

Then you need to be really careful that the compiler hasn't optimised away your loop entirely. for(int x=0;x<32767;x++); will be completely removed by the compiler because you never use the value of x. I think you've avoided this problem, but I can't be sure.

And then we can look at the style: give your variables meaningful names instead of "Millis4". Describe what it contains.

semipro

#11
Apr 13, 2015, 03:00 am Last Edit: Apr 13, 2015, 05:25 am by semipro
So I have changed the code to not include any Serial.print calls when measuring the time it takes to do everything.

Also I have had issues with the compiler skipping the loops that where the numbers from them were not read, but I realised before that you have to print it or do something else to read the numbers.

I have not yet added meaninfull names for my variables, as I think something like singleInteger or MultipleIntegers wouldn't be too descriptive either and they are a bit too long in my opinion. Name suggestions are welcome.

Also I have added another loop in my code for testing integers, and it does a bit of arithmetic on multiple integers, as opposed to the previous one with only one integer. It seems the flash buffer was not as efficent at doing this as now my due is only about 119 times faster now(as opposed to 213 times faster).

I have not changed my trig functions to cosf and sinf because I think that the trig test would test double speed quite well, so I am going to keep that the same for the time being.

With all these extra tests that I have added, the due now gets about a ten times heigher score, as opposed to about a seven times heigher score.

Here is my updated code(I have updated the due and arduino scores, but not the micro score):
Code: [Select]
#define ITER 200000L
#define SCORE_DIVIDER 2147483647L
#define UNO_SCORE 10669L
#define MICRO_SCORE 352L// havent updated yet
#define DUE_SCORE 109414L
#define MALLOC_SIZE 256
void setup() {
  Serial.begin(57600);
//  long StartTime = millis();
  Serial.println("checking float performance using random");
  long Millis = millis();
  long increment = 0;
  float f = 0.0f;
  for (; increment < ITER; increment++) {
    f = random(0.0f, (float)increment);
  }
  Millis = millis() - Millis;
  Serial.print("rand performance time taken: ");
  Serial.println(Millis);
  Serial.println("now calculating PI using Newtonian method...");
  long Millis2 = millis();
  float pi = 0.0f;
  float multiplier = 1.0f;
  for (float incrementf = 0.0f, increment = 0; increment < ITER; incrementf += 1.0f, increment++) {
    pi += 4.0f / (incrementf * 2.0f + 1.0f) * multiplier;
    multiplier *= -1.0f;
  }
  Millis2 = millis() - Millis2;
  Serial.print("PI estimating time:");
  Serial.print(Millis2);
  Serial.print(" PI estimate: ");
  Serial.println(pi, 8);
  Serial.println("About to test polar to rectangular coordinate conversion: ");
  long Millis3 = millis();
  float X = 0.0f;
  float Y = 0.0f;
  f = 0.0f;
  for (long i = 0L; i < ITER / 3; i++, f += PI / 1000.0f) {
    X = cos(f) * ((float)i) / 1000.0f;
    Y = sin(f) * ((float)i) / 1000.0f;
  }
  Millis3 = millis() - Millis3;
  Serial.print("Polar to rect coords converting time: ");
  Serial.println(Millis3);
  Serial.print("X: ");
  Serial.println(X, 8);
  Serial.print("Y: ");
  Serial.println(Y, 8);
  Serial.println("now testing mem write speed(mallocing and freeing pointers):");
  long Millis4;
  for (long i = 0; i < ITER / 4; i++) {
    char *j = (char*)malloc(MALLOC_SIZE);
    char *f = j;
    for (int i = 0; i < MALLOC_SIZE; i++) {
      *f = 'n';//just some random value to initialize the whole array of chars with
      f++;
    }
    free(j);
  }
  Millis4 = millis()-Millis4;
  Serial.print("Ram test finished in: ");
  Serial.print(Millis4);
  Serial.println(" milliseconds");
  Serial.print("malloc size is: ");
  Serial.println(MALLOC_SIZE);
  long Millis5 = millis();
  long INTEGER_BENCHMARK = random(0L, 2323213);
  Serial.println("Single integer benchmark starting... ");
  for (long i = 0; i < ITER; i++) {
    INTEGER_BENCHMARK *= 10000;
    INTEGER_BENCHMARK /= 5;
    INTEGER_BENCHMARK += 7;
    for (long j = 0; j < 5; j++) {
      INTEGER_BENCHMARK *= 3;
      INTEGER_BENCHMARK /= 4;
    }
  }
  Millis5 = millis()-Millis5;
  Serial.print("Integer time: ");
  Serial.println(Millis5);
  Serial.print("Integer value: ");
  Serial.println(INTEGER_BENCHMARK);
  Serial.println("Multiple integer benchmark starting... ");
  long *Integers=new long[MALLOC_SIZE/sizeof(long)];
  for(int i=0;i<MALLOC_SIZE/sizeof(long);i++)
    Integers[i]=random(0, 2^(sizeof(long)*8-1)-1);
  long Millis6=millis();
  for(long i=0;i<ITER/16;i++){
    for(int j=0;j<MALLOC_SIZE/sizeof(long);j++){
      Integers[j]+=8;
      Integers[j]-=250;
      Integers[j]*=8;
      Integers[j]/=7;
    }
  }
  Millis6=millis()-Millis6;
  Serial.print("Multiple integer time: ");
  Serial.println(Millis6);
  Serial.println("Integer values: ");
  for(int i=0;i<MALLOC_SIZE/sizeof(long);i++){
    Serial.print(Integers[i]);
    Serial.print(" ");
  }
  Serial.println();
  delete[] Integers;
  long score = Millis+Millis2+Millis3+Millis4+Millis5+Millis6;
  score = SCORE_DIVIDER / score;
  Serial.println("Your arduino's score is: ");
  Serial.print(score);
}

void loop() {
}

MorganS

So I have changed the code to not include any Serial.print calls when measuring the time it takes to do everything.
I'm not going to read any further than this section:
Code: [Select]

  long Millis = millis();
  long increment = 0;
  float f = 0.0f;
  Serial.println("checking float performance using random");
  for (; increment < ITER; increment++) {
    f = random(0.0f, (float)increment);
  }
  Millis = millis() - Millis;

semipro

Ok I missed one Serial.print call in the beggining, so I just edited my previous post to have that call before calling millis(). I think I did not miss any more calls of serial print. Correct me if I am wrong though!

Go Up
 


Please enter a valid email to subscribe

Confirm your email address

We need to confirm your email address.
To complete the subscription, please click the link in the email we just sent you.

Thank you for subscribing!

Arduino
via Egeo 16
Torino, 10131
Italy