Benchmark STM32 vs ATMega328 (nano) vs SAM3X8E (due) vs MK20DX256 (teensy 3.2)

Operations in less time than calibration loop?. Not posible. May be invalid formating of time functions.

Arduino Zero (Atmel ATSAMD21G18 48MHz Cortex-M0+)
INT_LOOP(30000) bench...= 116898 microseconds 11.92MIPS
LONG_LOOP(30000) bench...= 116898 microseconds 11.93MIPS
FLOAT_DIV(30000) bench...= 116898 microseconds 0.38MFLOPS
DOUBLE_DIV(30000) bench...= 113126 microseconds 0.27MFLOPS
FLOAT_MUL(30000) bench...= 92387 microseconds 0.33MFLOPS
DOUBLE_MUL(30000) bench...= 116898 microseconds 0.26MFLOPS

At high speed the results are imprecise:
Teensy 3.6 (Cortex M4@180Mhz). The result of FLOAT_MUL is 181.82 MIPS.
The empty reference loop has the following repetitive high level operations:
1)increment
2)compare
3)jump
And takes 502 microsecond for 30000 iterations, so 59.76Mloops. The high level operations MIPS are: 59.76*3=179.28
How is posible to achieve 181.82 MIPS using FLOAT_MUL?. Without optimizations must be 180 MIPS or 179.28 may be.

Operations are operation and asignement, and may be the asignement time was negligible. The inclusion of asignement to a constant in the LONG calibration loop may be a best approach, as sugested by westfw.

May be interesting to measure the asignement time (ad MIPS) of diferent data types

The attach contains a operations MIPS comparative table, asigning 3 operations to a loop

Arduino Benchmark-v0101.doc (22 KB)