Go Down

Topic: Interesting benchmark figures from different MCUs (Read 383 times) previous topic - next topic

carry_disruptor

Jan 13, 2018, 09:12 pm Last Edit: Jan 14, 2018, 08:12 am by carry_disruptor
So I just wanted to test relative performance of some MCUs and I used this sketch to measure the performance of
1. ATTiny85 - Digispark
2. ATMega2560 - Arduino Mega
3. ESP8266 - Wemos D1 R2
4. ESP32 - Dev board

and the peculiar thing I noticed is that ATtiny85 crunch numbers faster than ATMega2560! I would expect a similar or may be somewhat lower performance from ATtiny but it's not the case according to the test. What could be the reason for this? Is ATTiny core more optimized? Please share your thoughts

Code: [Select]
ATTiny85 - 16 MHz

time for 1 mio. plus calculations in s: 8.56
time for 1 mio. minus calculations in s: 8.64
time for 1 mio. multiplications in s: 22.61
time for 1 mio. divisions in s: 29.62
time for 1 mio. analog reads in s: 108.50



ATMega2560 - 16 MHz

time for 1 mio. plus calculations in s: 9.21
time for 1 mio. minus calculations in s: 9.30
time for 1 mio. multiplications in s: 9.81
time for 1 mio. divisions in s: 31.33
time for 1 mio. analog reads in s: 112.10



ESP8266 - 80 MHz

time for 1 mio. plus calculations in s: 0.64
time for 1 mio. minus calculations in s: 0.76
time for 1 mio. multiplications in s: 1.26
time for 1 mio. divisions in s: 3.44
time for 1 mio. analog reads in s: 94.50



ESP8266 - 160 MHz

time for 1 mio. plus calculations in s: 0.32
time for 1 mio. minus calculations in s: 0.38
time for 1 mio. multiplications in s: 0.63
time for 1 mio. divisions in s: 1.72
time for 1 mio. analog reads in s: 81.20



ESP32 - 240 MHz

time for 1 mio. plus calculations in s: 0.02
time for 1 mio. minus calculations in s: 0.03
time for 1 mio. multiplications in s: 0.04
time for 1 mio. divisions in s: 0.22
time for 1 mio. analog reads in s: 10.30

Coding Badly


The supposed output in your post matches none of the prints in your code.

My theory is you ran the wrong sketch.


carry_disruptor

My apologies, I linked the wrong url there. I've fixed the link now. Thank you.

Coding Badly


Serial for the ATmega2560 is asynchronous.  Serial for the ATtiny85 is synchronous.  With one you are also measuring the transmit interrupt time.  With the other you are not.




westfw

The atmega2560 is in fact slightly slower than the tiny85.  For example, the "rcall" instruction on 2560 needs to push 3 bytes worth of PC and takes 4 cycles, while on the tiny85 it only needs to push 2 bytes and takes 3 cycles.
Whether those differences are enough to account for your timing differences is a separate question.


carry_disruptor

I don't understand how serial would affect the results. Serial is called after calculating the time it required to crunch the numbers and it shows in the result that pure addition and subtraction operations are faster on ATTiny

pert

There are possibly some extra factors added by using the Digistump hardware package:
  • It enforces the use of avr-gcc 4.8.1-arduino5, which may not be the same version used for the Arduino Mega (depending on which Arduino AVR Boards version you're using).
  • Different compiler flags (depending on which Arduino AVR Boards version you're using).


You could eliminate these factors by using damellis/attiny v1.0.2 since it uses the same compiler version, same compilation recipes, and the same core library as your Arduino AVR Boards package.
http://hlt.media.mit.edu/?p=1695

Koepel

#7
Jan 15, 2018, 03:25 pm Last Edit: Jan 15, 2018, 03:43 pm by Koepel
The ATtiny is probably indeed faster with some float math. A lot depends on the compiler and optimizations and the float library. This test does not just test the cpu, but it is a test for the float library as well.

Did you see that float multiplication with ATtiny takes 22.61 seconds and it takes 9.81 seconds for the ATmega2560. Looking at the average of the tests I say the ATmega2560 is faster than the ATtiny  :smiley-razz:

For the Arduino Uno, the numbers are almost the same as for the Arduino Mega 2560.
Code: [Select]
time for 1 mio. plus calculations in s: 9.08
time for 1 mio. minus calculations in s: 9.17
time for 1 mio. multiplications in s: 9.69
time for 1 mio. divisions in s: 30.83
time for 1 mio. analog reads in s: 112.00


Adding a delay after Serial.println() does not change a lot. The interrupts by the serial output have little effect.

The actual numbers themself determine the time to calculate as well.
Code: [Select]
result = result / (dataType)1.00001;    // takes 30.83 seconds
result = result / (dataType)1.1;        // takes 74.70 seconds !

That 74.70 seconds can be lowered to 9.05 seconds with: #pragma GCC optimize ("-ffast-math")

It was hard to lower the 9.08 seconds for a float add.
Using a 16-bits loop variable with two seperate for-loops to 50000 got it down to 8.90 seconds. With three for-loops in each other and byte loop variables (up to 10, 100 and 100) got it down to 8.85 seconds. That is still not as fast as the ATtiny.

Everything is as expected, the time to call analogRead() is the most interesting number.

Go Up