-
When you compiled on the ARM platforms without using "double", did you also convert all the trig and sqrt function calls to their non-double forms (sinf(), sqrtf(), etc ?) If not, then much of the calculations you're doing are done with doubles anyway. (also, there's the "-fsingle-precision-constant" compiler option that should probably be used.)
(huh. How come there isn't a c++ library thing that overloads these to do the proper thing with whichever argument was provided? Rhetorical question
) -
The CM0, last I looked (gcc 4.8.x?), was the only processor of the three that has unoptimized floating point code. ARM CM3 has an ARM assemblyr float/double library. AVR has a highly-optimized assembly float library. ARM CM0 has the default gcc float library, written in C and not particularly optimized for any instruction set.
-
CM0 doesn't have a hardware divide instruction, and has a somewhat limited multiply instruction (compared to CM3), so it's not clear that it has much of a performance edge of AVR for calculating floating point, in an operation-by-operation comparison.
See also Qfplib: a family of floating-point libraries for ARM Cortex-M cores