Benchmark STM32 vs ATMega328 (nano) vs SAM3X8E (due) vs MK20DX256 (teensy 3.2)

trycage · August 23, 2018, 3:57pm

moises1953:
Operations in less time than calibration loop?. Not posible. May be invalid formating of time functions.

Arduino Zero (Atmel ATSAMD21G18 48MHz Cortex-M0+)
INT_LOOP(30000) bench...= 116898 microseconds 11.92MIPS
LONG_LOOP(30000) bench...= 116898 microseconds 11.93MIPS
FLOAT_DIV(30000) bench...= 116898 microseconds 0.38MFLOPS
DOUBLE_DIV(30000) bench...= 113126 microseconds 0.27MFLOPS
FLOAT_MUL(30000) bench...= 92387 microseconds 0.33MFLOPS
DOUBLE_MUL(30000) bench...= 116898 microseconds 0.26MFLOPS

At high speed the results are imprecise:
Teensy 3.6 (Cortex M4@180Mhz). The result of FLOAT_MUL is 181.82 MIPS.
The empty reference loop has the following repetitive high level operations:
1)increment
2)compare
3)jump
And takes 502 microsecond for 30000 iterations, so 59.76Mloops. The high level operations MIPS are: 59.76*3=179.28
How is posible to achieve 181.82 MIPS using FLOAT_MUL?. Without optimizations must be 180 MIPS or 179.28 may be.

Operations are operation and asignement, and may be the asignement time was negligible. The inclusion of asignement to a constant in the LONG calibration loop may be a best approach, as sugested by westfw.

May be interesting to measure the asignement time (ad MIPS) of diferent data types

The attach contains a operations MIPS comparative table, asigning 3 operations to a loop

Thanks Moises. I am grateful you took the time to look at the code.

I wrote the code a while ago, (indeed 180MHz microcontrollers were not exactly a target).

, if I recall correctly I tried to make all the loops look similar "in structure" to the calibration loop (so I could remove the loop weight). A float should give about 180MFLOPS in cortex-M4+FPU. I see your points however the accuracy is quite undermined by the use of the function micros (which has a granularity of 8 microseconds) and a loop of 30000 is probably quite insufficient. Actually I think 181.82MFLOPS is quite close, but probably the number of digits is definetely pointless.

The "DUMMY" assignments were made (if I still recall) because they somewhat had an effect in the compiled code. Probably a better programmer would have coded directly in assembler caring to make all the loops exaclty the same (and I am also a lazy programmer most of the time!).

I recall testing the different suggestion (looking at the compiled code), but I did not have time to improve the bench for high speed (without affecting the old results).

Marco

Topic		Replies	Views
BENCHMARK: ESP32, ESP8266, ARDUINO-DUE, TEENSY3.6, ... Microcontrollers	14	24923	May 6, 2021
Hit a ceiling with interrupt speed on nano, 13Khz Project Guidance	13	726	May 5, 2021
Interesting benchmark figures from different MCUs Microcontrollers	9	1794	May 6, 2021
"Fastest" Arduino compatible board Microcontrollers	37	51076	May 6, 2021
sam-e processor for arduino? Programming Questions	8	1161	May 6, 2021

Benchmark STM32 vs ATMega328 (nano) vs SAM3X8E (due) vs MK20DX256 (teensy 3.2)

Related Topics