Teensy 3.0

..not sure your measurement does reflect the reality

//time elasped = 29721 micros = teensy
//time elasped = 47436 micros = Uno

A typical 32bit float sin() on an CM3 takes ~1050cycles = ~22usec @48MHz so it seems you have to get something like 9000 micros max...

CM3 none FPU :
fZ = fX * fY; // 41 cycles
fZ = sqrt(fY); // 624 cycles
fZ = sin(1.23); // 1017 cycles

CM4 with FPU:
fZ = fX * fY; // 6 cycles
fZ = sqrt(fY); // 20 cycles
fZ = sin(1.23); // 124 cycles