First the Teensy Kinetis CPU does not have hardware floating point. Hardware floating point is optional for Cortex M4. Only the high end K20 processors have floating point
http://www.freescale.com/files/microcontrollers/doc/fact_sheet/KNTSK20FMLYFS.pdf.
newlib math function are really old C functions.
newlib execution times depend on the value of the arguments.
Here are two examples for 32-bit sine:
float sinf(float);
I ran this code
float sinanswers[401];
float sinarg[401];
for (int i = 0; i < 400; i++) {
sinarg[i] = factor*i;
}
time1 = micros();
for (int i = 0; i < 400; i++) {
sinanswers[i] = sinf(sinarg[i]);
}
time2 = micros();
If factor is 0.01 so the range is from 0.0 - 4.0
time elapsed = 17110 micros
If factor is 1.0 so the range is 0.0 - 400.0
time elapsed = 105353 micros
The algorithms for 64-bit double are totally different than for 32-bit float.
Much of this dates back to work in the 1980s on BSD Unix at UC Berkeley. I was at UCB when BSD Unix was developed.
Bill Joy was a key developer of BSD and used it at Sun Microsystems as the base for Solaris.