Go Down

Topic: Benchmark STM32 vs ATMega328 (nano) vs SAM3X8E (due) vs MK20DX256 (teensy 3.2)  (Read 26090 times) previous topic - next topic

trycage

I suggest modifying a volatile variable inside the loop.


Code: [Select]
volatile byte dosomething;
  :
 for (lc=le; lc<(le+30000); lc++) //this syntax avoid compiler semplifications
  {
      dosomething = 0;
  }

Because null loops are pretty boring.  Then you won't need to be so tricky with your loops, either...

Various "long" variables used to hold timestamps should be "unsigned long"

Thanks westfw, our initial version of the code included some operations in the INT loop, however we reason that in the FOR statement there was already an increment operation. The code use the INT loop to calibrate the speed of the FLOAT loop, and it is probably ok to have a rough comparison between the platforms we got.

Probably I could code a WHILE statement where comparison and increment can appear as different recognizable operation, but I got the feeling that It would not be that different for the compiler.

Thanks a lot for the input.

mantoui

FYI, here are some more results v1.01 (pragma -O1) on Teensy 3.5/3.6/3.2 and on dragonfly (STM32L4@80MHz, hardware float)
Code: [Select]
       t3.6 @180mhz
         INT_LOOP(30000) bench...= 500 microseconds 60.00MIPS
         LONG_LOOP(30000) bench...= 502 microseconds 59.76MIPS
         FLOAT_DIV(30000) bench...= 2503 microseconds 14.99MFLOPS
         DOUBLE_DIV(30000) bench...= 9343 microseconds 3.39MFLOPS
         FLOAT_MUL(30000) bench...= 667 microseconds 181.82MFLOPS
         DOUBLE_MUL(30000) bench...= 7008 microseconds 4.61MFLOPS

     t3.6 @120mhz
        INT_LOOP(30000) bench...= 752 microseconds 39.89MIPS
        LONG_LOOP(30000) bench...= 753 microseconds 39.84MIPS
        FLOAT_DIV(30000) bench...= 3756 microseconds 9.99MFLOPS
        DOUBLE_DIV(30000) bench...= 14019 microseconds 2.26MFLOPS
        FLOAT_MUL(30000) bench...= 1001 microseconds 120.97MFLOPS
        DOUBLE_MUL(30000) bench...= 10514 microseconds 3.07MFLOPS

       t3.5@120mhz
        INT_LOOP(30000) bench...= 752 microseconds 39.89MIPS
        LONG_LOOP(30000) bench...= 755 microseconds 39.74MIPS
        FLOAT_DIV(30000) bench...= 3758 microseconds 9.99MFLOPS
        DOUBLE_DIV(30000) bench...= 18797 microseconds 1.66MFLOPS
        FLOAT_MUL(30000) bench...= 1003 microseconds 120.97MFLOPS
        DOUBLE_MUL(30000) bench...= 10529 microseconds 3.07MFLOPS

      t3.2@120mhz
        INT_LOOP(30000) bench...= 751 microseconds 39.95MIPS
        LONG_LOOP(30000) bench...= 755 microseconds 39.74MIPS
        FLOAT_DIV(30000) bench...= 8784 microseconds 3.74MFLOPS
        DOUBLE_DIV(30000) bench...= 17559 microseconds 1.79MFLOPS
        FLOAT_MUL(30000) bench...= 6771 microseconds 4.99MFLOPS
        DOUBLE_MUL(30000) bench...= 10533 microseconds 3.07MFLOPS

    dragonfly@80MHz      
       INT_LOOP(30000) bench...= 1129 microseconds 26.57MIPS
       LONG_LOOP(30000) bench...= 1129 microseconds 26.57MIPS
       FLOAT_DIV(30000) bench...= 5641 microseconds 6.65MFLOPS
       DOUBLE_DIV(30000) bench...= 21813 microseconds 1.45MFLOPS
       FLOAT_MUL(30000) bench...= 1883 microseconds 39.79MFLOPS
       DOUBLE_MUL(30000) bench...= 16173 microseconds 1.99MFLOPS


trycage

-Updated
Added Arduino Zero and Arduino Pro 1284 (Thanks Budvar10)

gdsports

Adafruit Metro M4 Express (samd51 @120MHz) cache on
 INT_LOOP(30000) bench...= 752 microseconds 39.89MIPS
 LONG_LOOP(30000) bench...= 753 microseconds 39.84MIPS
 FLOAT_DIV(30000) bench...= 3756 microseconds 9.99MFLOPS
 DOUBLE_DIV(30000) bench...= 14022 microseconds 2.26MFLOPS
 FLOAT_MUL(30000) bench...= 1002 microseconds 120.48MFLOPS
 DOUBLE_MUL(30000) bench...= 10516 microseconds 3.07MFLOPS

trycage

@gdsports Thanks!!!!

Then:

-Update
Added Adafruit Metro M4 Express (Thanks gdsports)

moises1953

Operations in less time than calibration loop?. Not posible. May be invalid formating of time functions.

Arduino Zero (Atmel ATSAMD21G18 48MHz Cortex-M0+)
INT_LOOP(30000) bench...= 116898 microseconds 11.92MIPS
LONG_LOOP(30000) bench...= 116898 microseconds 11.93MIPS
FLOAT_DIV(30000) bench...= 116898 microseconds 0.38MFLOPS
DOUBLE_DIV(30000) bench...= 113126 microseconds 0.27MFLOPS
FLOAT_MUL(30000) bench...= 92387 microseconds 0.33MFLOPS
DOUBLE_MUL(30000) bench...= 116898 microseconds 0.26MFLOPS

At high speed the results are imprecise:
Teensy 3.6 (Cortex M4@180Mhz). The result of FLOAT_MUL is 181.82 MIPS.
The empty reference loop has the following repetitive high level operations:
1)increment
2)compare
3)jump
And takes 502 microsecond for 30000 iterations, so 59.76Mloops. The high level operations MIPS are: 59.76*3=179.28
How is posible to achieve 181.82 MIPS using FLOAT_MUL?. Without optimizations must be 180 MIPS or 179.28 may be.

Operations are operation and asignement, and may be the asignement time was negligible. The inclusion of asignement to a constant in the LONG calibration loop may be a best approach, as sugested by westfw.

May be interesting to measure the asignement time (ad MIPS) of diferent data types

The attach contains a operations MIPS comparative table, asigning 3 operations to a loop

moises1953

This is the code of FDIV loop:
Code: [Select]
fa=(float)random(1,2);
  fb=(float)random(1,1000);
  fb=0; // this line must be suppressed
  fg=0;
  le=random(1,2);
  elapsed=micros();
  for (lc=le; lc<(le+30000); lc++)
  {
    fb=fb/fa;       
  }
  elapsed=micros()-elapsed;


If fb is initialized to 0, then all operations are 0./fa, so this initialization must be suppressed, and also in the DDIV loop.

The use of variables fg and dg is useless, and may be suppressed.

The operation in for is a overcharge.

Proposed code for FDIV:
Code: [Select]
fa=(float)random(1,2);
  fb=(float)random(1,1000);
  le=random(1,2);
  lg=le+30000;
  elapsed=micros();
  for (lc=le; lc<lg); lc++)  //this syntax avoid compiler semplifications?
  {
    fb=fb/fa;       
  }
  elapsed=micros()-elapsed;
// compute MIPS and display


The int loop may be a ISUM
Code: [Select]
  ia=random(1,2);
  ib=random(1,1000);
  le=random(1,2);
  lg=le+30000;
  elapsed=micros();
  for (lc=le; lc<lg; lc++) //this syntax avoid compiler semplifications?
  {
    ib=ib+ia;
  }
  elapsed=micros()-elapsed;
// compute MIPS and display

trycage

Operations in less time than calibration loop?. Not posible. May be invalid formating of time functions.

Arduino Zero (Atmel ATSAMD21G18 48MHz Cortex-M0+)
INT_LOOP(30000) bench...= 116898 microseconds 11.92MIPS
LONG_LOOP(30000) bench...= 116898 microseconds 11.93MIPS
FLOAT_DIV(30000) bench...= 116898 microseconds 0.38MFLOPS
DOUBLE_DIV(30000) bench...= 113126 microseconds 0.27MFLOPS
FLOAT_MUL(30000) bench...= 92387 microseconds 0.33MFLOPS
DOUBLE_MUL(30000) bench...= 116898 microseconds 0.26MFLOPS

At high speed the results are imprecise:
Teensy 3.6 (Cortex M4@180Mhz). The result of FLOAT_MUL is 181.82 MIPS.
The empty reference loop has the following repetitive high level operations:
1)increment
2)compare
3)jump
And takes 502 microsecond for 30000 iterations, so 59.76Mloops. The high level operations MIPS are: 59.76*3=179.28
How is posible to achieve 181.82 MIPS using FLOAT_MUL?. Without optimizations must be 180 MIPS or 179.28 may be.

Operations are operation and asignement, and may be the asignement time was negligible. The inclusion of asignement to a constant in the LONG calibration loop may be a best approach, as sugested by westfw.

May be interesting to measure the asignement time (ad MIPS) of diferent data types

The attach contains a operations MIPS comparative table, asigning 3 operations to a loop
Thanks Moises. I am grateful you took the time to look at the code.

I wrote the code a while ago, (indeed 180MHz microcontrollers were not exactly a target).

, if I recall correctly I tried to make all the loops look similar "in structure" to the calibration loop (so I could remove the loop weight). A float should give about 180MFLOPS in cortex-M4+FPU. I see your points however the accuracy is quite undermined by the use of the function micros (which has a granularity of 8 microseconds) and a loop of 30000 is probably quite insufficient. Actually I think 181.82MFLOPS is quite close, but probably the number of digits is definetely pointless.

The "DUMMY" assignments were made (if I still recall) because they somewhat had an effect in the compiled code. Probably a better programmer would have coded directly in assembler caring to make all the loops exaclty the same (and I am also a lazy programmer most of the time!).

I recall testing the different suggestion (looking at the compiled code), but I did not have time to improve the bench for high speed (without affecting the old results).

:)



Marco



phantom_ts


 ESP32
 INT_LOOP(30000) bench...= 1 microseconds 30000.00MIPS
 LONG_LOOP(30000) bench...= 1 microseconds 30000.00MIPS
 FLOAT_DIV(30000) bench...= 6420 microseconds 4.67MFLOPS
 DOUBLE_DIV(30000) bench...= 5036 microseconds 5.96MFLOPS
 FLOAT_MUL(30000) bench...= 501 microseconds 60.00MFLOPS
 DOUBLE_MUL(30000) bench...= 5544 microseconds 5.41MFLOPS

dsyleixa

I also wrote a benchmark for different MCUs, both AVRs and ARMs.
The benchmark test peforms low- and high-level tests for integers, floats, doubles, bitshift, random, sort, matrix algebra, GPIO r/w, and graphics.
The test will run even without having attached a TFT, you may keep the  #included Adafruit libs or optionally substitute them by proprietary ones.

Update: the test for Raspberry Pi now also has been completed.

As AVRs don't feature 64 bit doubles, the 32bit float test is performed twice, without issueing penalty points though (which admittedly is not fair to the ARM boards ;) )

( ... to be continued ... )

Code: [Select]

test design:
  0   int_Add     50,000,000 int +,- plus counter
  1   int_Mult    10,000,000 int *,/  plus counter
  2   fp32_ops    2,500,000 fp32 mult, transc.  plus counter
  3   fp64_ops    2,500,000 fp64 mult, transc.  plus counter (if N/A: 32bit)
  4   randomize   2,500,000 Mersenne PRNG (+ * & ^ << >>)
  5   matrx_algb  150,000 2D Matrix algebra (mult, det)
  6   arr_sort    1500 shellsort of random array[500]
  7   GPIO toggle 6,000,000 toggle GPIO r/w  plus counter
  8   Graphics    10*8 textlines + 10*8 shapes + 20 clrscr

.


Vergleichswerte (update: auch für Raspi jetzt komplett durchgeführt):

Arduino MEGA + ILI9225 + Karlson UTFT + Arduino GPIO-r/w
  0     90244  int_Add
  1    237402  int_Mult
  2    163613  fp32_ops(float)
  3    163613  fp32_ops(float=double)
  4    158567  randomize
  5     46085  matrx_algb
  6     23052  arr_sort
  7     41569  GPIO toggle
  8     62109  Graphics   
runtime ges.:  986254
benchmark:     51




Arduino MEGA + ILI9225 + Karlson UTFT + Register bitRead/Write
  0     90238  int_Add
  1    237387  int_Mult
  2    163602  fp32_ops (float)
  3    163602  fp32_ops (float=double)
  4    158557  randomize
  5     45396  matrx_algb
  6     23051  arr_sort
  7      4528  GPIO_toggle bit r/w
  8     62106  Graphics   
runtime ges.:  948467
benchmark:     53 


Arduino MEGA + adafruit_ILI9341 Hardware-SPI  Arduino GPIO r/w
  0     90244  int_Add
  1    237401  int_Mult
  2    163612  fp32_ops (float)
  3    163612  fp32_ops (float=double)
  4    158725  randomize
  5     46079  matrx_algb
  6     23051  arr_sort
  7     41947  GPIO toggle
  8      6915  Graphics   
runtime ges.:  931586
benchmark:     54 
 
 



Arduino/Adafruit M0 + adafruit_ILI9341 Hardware-SPI 
  0      7746  int_Add
  1     15795  int_Mult
  2     89054  fp32_ops
  3    199888  fp64_ops(double)
  4     17675  randomize
  5     18650  matrx_algb
  6      6328  arr_sort
  7      9944  GPIO_toggle
  8      6752  Graphics
runtime ges.:  371832
benchmark:     134



Arduino DUE + adafruit_ILI9341 Hardware-SPI 
  0      4111  int_Add
  1      1389  int_Mult
  2     29124  fp32_ops(float)
  3     57225  fp64_ops(double)
  4      3853  randomize
  5      4669  matrx_algb
  6      2832  arr_sort
  7     11859  GPIO_toggle
  8      6142  Graphics   
runtime ges.:  121204
benchmark:     413   



Arduino/Adafruit M4 + adafruit_HX8357 Hardware-SPI 
  0      2253  int_Add
  1       872  int_Mult
  2      2773  fp32_ops (float)
  3     24455  fp64_ops (double)
  4      1680  randomize
  5      1962  matrx_algb
  6      1553  arr_sort
  7      2395  GPIO_toggle
  8      4600  Graphics   
runtime ges.:  39864
benchmark:     1254   



Arduino/Adafruit ESP32 + adafruit_HX8357 Hardware-SPI 
  0      2308  int_Add
  1       592  int_Mult
  2      1318  fp32_ops
  3     14528  fp64_ops
  4       825  randomize
  5      1101  matrx_algb
  6       687  arr_sort
  7       972  GPIO_toggle
  8      3053  Graphics   
runtime ges.:  25384     
benchmark:     1969


Raspberry Pi:

Raspi 2 (v1): 4x 900MHz,  GPU 400MHz, no CPU overclock, full-HD, openVG:
  0     384  int_Add
  1     439  int_Mult
  2     346  fp32_ops(float)
  3     441  fp64_ops(double)
  4     399  randomize
  5     173  matrx_algb
  6     508  arr_sort
  7     823  GPIO_toggle
  8    2632  graphics
runtime ges.: 6145
benchmark: 8137   




edit: updated for
Arduino/Adafruit Feather ESP32 



Dave_Rove

#25
Oct 28, 2019, 01:47 pm Last Edit: Oct 28, 2019, 01:58 pm by Dave_Rove Reason: Larger image note
Trycage, thanks a lot for posting the list of benchmarks, and for updating them as new boards appeared.

To avoid transcription errors on my part, I grabbed the table as-is and parsed it with Python. I've normalized the figures relative to the STM32, then took the mean of the normalized figures for each device and plotted them.

https://pastebin.com/Cg6DeQpj

It seems that the way that the MIPS & MFLOPS were calculated from the timings was the same for the STM32, Arduino Nano & Due, Teensy LC/3.2/3.2@120MHz, and ESP8266. The timings for the Arduino Zero seem overlong although its MIPS & MFLOPS look OK. A roughly doubled way of calculating the MIPS and MFLOPS from the timings seems to be the case for the Teensy 3.5/3.6/40 and Dragonfly.

I guess that this comes from getting the timings from different people, and possibly different versions of the benchmark, although it would be nice to figure out where this difference is coming from.

Larger image: https://i.imgur.com/PlJjQ72.png


Dave_Rove

Puzzling over why there was an inconsistency between the two ways of plotting the benchmarks, I've looked at the C++ code of the benchmark and I can see that the Mops figure is simply 30000 divided by the timing. So if the table is accurate, then multiplying the timing by the Mops figure should regenerate that 30000 figure, i.e. the number of times the test was run.

I've written another Python program to parse the benchmark table and highlight in red if that figure is not with 10% of 30000. Even with that wide range, there are many outside that range.

I don't know why that should be so. Anyway, that's enough for today, I'll try to figure that out some other time.

The python parsing code:

https://pastebin.com/VsSmsHzu

The parsed table:

   TEST           TIME        MOPS           TIME*MOPS
   ----           ----        ----           ---------

STM32F103C8T6 72MHz (Cortex-M3)
   INT_LOOP       2924 μs    10.26 Mips       30000.24
   LONG_LOOP      2926 μs    10.25 Mips       29991.50
   FLOAT_DIV     27979 μs     1.20 Mflops    33574.80
   DOUBLE_DIV    38000 μs     0.86 Mflops     32680.00
   FLOAT_MUL     20463 μs     1.71 Mflops    34991.73
   DOUBLE_MUL    25891 μs     1.31 Mflops    33917.21

Arduino Nano (ATMega328 16MHz AVR)
   INT_LOOP       7544 μs     3.98 Mips       30025.12
   LONG_LOOP     13408 μs     2.24 Mips       30033.92
   FLOAT_DIV    154792 μs     0.21 Mflops     32506.32
   DOUBLE_DIV   154800 μs     0.21 Mflops     32508.00
   FLOAT_MUL    156744 μs     0.21 Mflops     32916.24
   DOUBLE_MUL   156736 μs     0.21 Mflops     32914.56

Arduino Zero (Atmel ATSAMD21G18 48MHz Cortex-M0+)
   INT_LOOP     116898 μs    11.92 Mips    1393424.16
   LONG_LOOP    116898 μs    11.93 Mips    1394593.14
   FLOAT_DIV    116898 μs     0.38 Mflops    44421.24
   DOUBLE_DIV   113126 μs     0.27 Mflops     30544.02
   FLOAT_MUL     92387 μs     0.33 Mflops     30487.71
   DOUBLE_MUL   116898 μs     0.26 Mflops     30393.48

Arduino Due (Atmel SAM3X8E 84 MHz Cortex-M3)
   INT_LOOP       1074 μs    27.93 Mips       29996.82
   LONG_LOOP      1107 μs    27.10 Mips       29999.70
   FLOAT_DIV     25859 μs     1.21 Mflops     31289.39
   DOUBLE_DIV    37966 μs     0.81 Mflops     30752.46
   FLOAT_MUL     18659 μs     1.71 Mflops     31906.89
   DOUBLE_MUL    25450 μs     1.23 Mflops     31303.50

Teensy LC (MKL26Z64 Cortex-M0 48MHz)
   INT_LOOP       2508 μs    11.96 Mips       29995.68
   LONG_LOOP      2512 μs    11.94 Mips       29993.28
   FLOAT_DIV     76705 μs     0.40 Mflops     30682.00
   DOUBLE_DIV   101840 μs     0.30 Mflops     30552.00
   FLOAT_MUL     80471 μs     0.38 Mflops     30578.98
   DOUBLE_MUL   106242 μs     0.29 Mflops     30810.18

Teensy 3.2 (MK20DX256 Cortex-M4 96 MHz)
   INT_LOOP        940 μs    31.91 Mips       29995.40
   LONG_LOOP       944 μs    31.78 Mips       30000.32
   FLOAT_DIV     10977 μs     2.99 Mflops     32821.23
   DOUBLE_DIV    21317 μs     1.47 Mflops     31335.99
   FLOAT_MUL      8463 μs     3.99 Mflops    33767.37
   DOUBLE_MUL    13162 μs     2.46 Mflops     32378.52

Teensy 3.2 (MK20DX256 Cortex-M4 72MHz)
   INT_LOOP       1253 μs    23.94 Mips       29996.82
   LONG_LOOP      1256 μs    23.89 Mips       30005.84
   FLOAT_DIV     14635 μs     2.24 Mflops     32782.40
   DOUBLE_DIV    25083 μs     1.26 Mflops     31604.58
   FLOAT_MUL     11288 μs     2.99 Mflops    33751.12
   DOUBLE_MUL    17551 μs     1.84 Mflops     32293.84

ESP8266 esp-12e 160MHz
   INT_LOOP        752 μs    39.89 Mips       29997.28
   LONG_LOOP       751 μs    39.95 Mips       30002.45
   FLOAT_DIV      7500 μs     4.45 Mflops    33375.00
   DOUBLE_DIV     8063 μs     4.10 Mflops    33058.30
   FLOAT_MUL      9938 μs     3.27 Mflops     32497.26
   DOUBLE_MUL    10688 μs     3.02 Mflops     32277.76

ESP8266 esp-12e 80MHz
   INT_LOOP       1504 μs    19.95 Mips       30004.80
   LONG_LOOP      1501 μs    19.99 Mips       30004.99
   FLOAT_DIV     15001 μs     2.22 Mflops    33302.22
   DOUBLE_DIV    16126 μs     2.05 Mflops    33058.30
   FLOAT_MUL     19876 μs     1.63 Mflops     32397.88
   DOUBLE_MUL    21377 μs     1.51 Mflops     32279.27

#From mantoui

teensy3.6 @180mhz
   INT_LOOP        500 μs    60.00 Mips       30000.00
   LONG_LOOP       502 μs    59.76 Mips       29999.52
   FLOAT_DIV      2503 μs    14.99 Mflops    37519.97
   DOUBLE_DIV     9343 μs     3.39 Mflops     31672.77
   FLOAT_MUL       667 μs   181.82 Mflops   121273.94
   DOUBLE_MUL     7008 μs     4.61 Mflops     32306.88

teensy3.6 @120mhz
   INT_LOOP        752 μs    39.89 Mips       29997.28
   LONG_LOOP       753 μs    39.84 Mips       29999.52
   FLOAT_DIV      3756 μs     9.99 Mflops    37522.44
   DOUBLE_DIV    14019 μs     2.26 Mflops     31682.94
   FLOAT_MUL      1001 μs   120.97 Mflops   121090.97
   DOUBLE_MUL    10514 μs     3.07 Mflops     32277.98

teensy3.5@120mhz
   INT_LOOP        752 μs    39.89 Mips       29997.28
   LONG_LOOP       755 μs    39.74 Mips       30003.70
   FLOAT_DIV      3758 μs     9.99 Mflops    37542.42
   DOUBLE_DIV    18797 μs     1.66 Mflops     31203.02
   FLOAT_MUL      1003 μs   120.97 Mflops   121332.91
   DOUBLE_MUL    10529 μs     3.07 Mflops     32324.03

teensy3.2@120mhz
   INT_LOOP        751 μs    39.95 Mips       30002.45
   LONG_LOOP       755 μs    39.74 Mips       30003.70
   FLOAT_DIV      8784 μs     3.74 Mflops     32852.16
   DOUBLE_DIV    17559 μs     1.79 Mflops     31430.61
   FLOAT_MUL      6771 μs     4.99 Mflops    33787.29
   DOUBLE_MUL    10533 μs     3.07 Mflops     32336.31

dragonfly@80MHz    
   INT_LOOP       1129 μs    26.57 Mips       29997.53
   LONG_LOOP      1129 μs    26.57 Mips       29997.53
   FLOAT_DIV      5641 μs     6.65 Mflops    37512.65
   DOUBLE_DIV    21813 μs     1.45 Mflops     31628.85
   FLOAT_MUL      1883 μs    39.79 Mflops    74924.57
   DOUBLE_MUL    16173 μs     1.99 Mflops     32184.27

#From Budvar10

   INT_LOOP       5024 μs     5.97 Mips       29993.28
   LONG_LOOP      8992 μs     3.34 Mips       30033.28
   FLOAT_DIV     96789 μs     0.34 Mflops     32908.26
   DOUBLE_DIV    96800 μs     0.34 Mflops     32912.00
   FLOAT_MUL     98058 μs     0.34 Mflops    33339.72
   DOUBLE_MUL    98059 μs     0.34 Mflops    33340.06

#From gdsports

   INT_LOOP        752 μs    39.89 Mips       29997.28
   LONG_LOOP       753 μs    39.84 Mips       29999.52
   FLOAT_DIV      3756 μs     9.99 Mflops    37522.44
   DOUBLE_DIV    14022 μs     2.26 Mflops     31689.72
   FLOAT_MUL      1002 μs   120.48 Mflops   120720.96
   DOUBLE_MUL    10516 μs     3.07 Mflops     32284.12

Teensy 4.0 @600MHz
   FLOAT_DIV       200 μs   300.00 Mflops    60000.00
   DOUBLE_DIV      201 μs   297.03 Mflops    59703.03
   FLOAT_MUL       150 μs   600.00 Mflops    90000.00
   DOUBLE_MUL      300 μs   150.00 Mflops    45000.00

   INT_LOOP        300 μs   600.00 Mips     180000.00
   LONG_LOOP       300 μs   300.00 Mips      90000.00
   FLOAT_DIV       300 μs   300.00 Mflops    90000.00

Go Up