Why is the Zero slower than the Uno?

  1. When you compiled on the ARM platforms without using "double", did you also convert all the trig and sqrt function calls to their non-double forms (sinf(), sqrtf(), etc ?) If not, then much of the calculations you're doing are done with doubles anyway. (also, there's the "-fsingle-precision-constant" compiler option that should probably be used.)
    (huh. How come there isn't a c++ library thing that overloads these to do the proper thing with whichever argument was provided? Rhetorical question :frowning: )

  2. The CM0, last I looked (gcc 4.8.x?), was the only processor of the three that has unoptimized floating point code. ARM CM3 has an ARM assemblyr float/double library. AVR has a highly-optimized assembly float library. ARM CM0 has the default gcc float library, written in C and not particularly optimized for any instruction set.

  3. CM0 doesn't have a hardware divide instruction, and has a somewhat limited multiply instruction (compared to CM3), so it's not clear that it has much of a performance edge of AVR for calculating floating point, in an operation-by-operation comparison.

See also Qfplib: a family of floating-point libraries for ARM Cortex-M cores