yes, thank you!
yesterday it was a bit late to rewrite my code by my own, but just now in that moment I finished my code update by myself with this new function:
float test_float_math32() { // 2,500,000 32bit float mult, transcend.
volatile float s=(float)PI;
unsigned long y;
for(y=0;y<500000UL;y++) {
s*=sqrtf(s);
s=sinf(s);
s=expf(s);
s*=s;
}
return s; // debug
}
and I found out already that on M3 and M0 float32 is 2x as fast as float64!
my results for M3 and M0 to float32 and float64:
Arduino/Adafruit M0 + adafruit_ILI9341 Hardware-SPI +32bit float
0 7746 int_Add
1 15795 int_Mult
2 89054 float_op (float)
3 17675 randomize
4 18650 matrx_algb
5 6328 arr_sort
6 9944 GPIO_toggle
7 6752 Graphics
runtime ges.: 171944
benchmark: 290
Arduino/Adafruit M0 + adafruit_ILI9341 Hardware-SPI +double fp
0 7746 int_Add
1 15795 int_Mult
2 199888 float_op (double)
3 17727 randomize
4 18559 matrx_algb
5 6330 arr_sort
6 9734 GPIO toggle
7 6759 Graphics
runtime ges.: 282538
benchmark: 176
Arduino DUE + adafruit_ILI9341 Hardware-SPI + 32bit float
0 4111 int_Add
1 1389 int_Mult
2 29124 float_op (float)
3 3853 randomize
4 4669 matrx_algb
5 2832 arr_sort
6 11859 GPIO_toggle
7 6142 Graphics
runtime ges.: 63979
benchmark: 781
Arduino DUE + adafruit_ILI9341 Hardware-SPI + double fp
0 4111 int_Add
1 1389 int_Mult
2 57225 float_op (double)
3 3852 randomize
4 4666 matrx_algb
5 2833 arr_sort
6 11787 GPIO toggle
7 6143 Graphics
runtime ges.: 92006
benchmark: 543
in comparison: Mega2560
Arduino MEGA + ILI9225 + Karlson UTFT
0 90244 int_Add
1 237402 int_Mult
2 163613 float_op (float)
3 158567 randomize
4 46085 matrx_algb
5 23052 arr_sort
6 41569 GPIO toggle
7 62109 Graphics
runtime ges.: 822641
benchmark: 60
I just now wanted to publish that and surprisingly found that you did that already and also for some other extra platforms - great!
(and this M4 thing is really amazing!) 8)
So back to my TO question, to summarize: IIUC, the poor M0 fp performance is mostly based on a bad fp code optimization in the M0 core, compared to AVR and M3 Due, and 2nd, it turned out that float32 by XXXf type fp functions can make it 2x as fast.
That is very precious to know!
Thanks a lot for your efforts!
PS, edit, offtopic:
do you think the Adafruit ItsyBitsy M4 Express featuring the ATSAMD51
has got the fpu, too? they write just "ATSAMD51 32-bit Cortex M4 core running at 120 MHz, Hardware DSP and floating point support" but do not write "M4F" though...?