Go Down

Topic: Teensy 3.0 (Read 37 times) previous topic - next topic

Paul Stoffregen

Oct 15, 2012, 03:41 pm Last Edit: Oct 15, 2012, 03:44 pm by Paul Stoffregen Reason: 1
There are indeed some complex things going on with this test.

For example, this takes 138 us:

Code: [Select]

 for (int i = 0; i < 3; i++)
   sinanswers[i] = sin(i);
 time2 = micros();

But this takes takes 229 us.... almost twice as long, just because the input is offset by 400.  Clearly sin()'s execution time is not constant.

Code: [Select]

 for (int i = 0; i < 3; i++)
   sinanswers[i] = sin(i+400);
 time2 = micros();

I suspected the slowness was due to computing double precision.  But I tried changing sin() to sinf(), and amazingly sinf() takes MUCH longer.  Clearly newlib or libgcc is not optimized very well, or some settings aren't quite right.  I need to dig into that......


Oct 15, 2012, 04:29 pm Last Edit: Oct 15, 2012, 04:37 pm by pito Reason: 1
And your test with 1000x (actually 500x) sin cos tan (STM32F100 CM3 @48MHz):
Code: [Select]

timer = millis;
         for (i=0;i<500;i++) {
timer = millis - timer;
printf("\rElapsed time float sin cos tan 500x into array: %u millis\n", timer);

Elapsed time float sin cos tan 500x into array: 31 millis

Such big arrays do not fit into my 8kB RAM so double it for 1000x (=62 millis, yours is 278 ms). Double it again for a double precision fp result.


Oct 15, 2012, 05:56 pm Last Edit: Oct 15, 2012, 06:13 pm by fat16lib Reason: 1
First the Teensy Kinetis CPU does not have hardware floating point.  Hardware floating point is optional for Cortex M4.  Only the high end K20 processors have floating point http://www.freescale.com/files/microcontrollers/doc/fact_sheet/KNTSK20FMLYFS.pdf.

newlib math function are really old C functions.

newlib execution times depend on the value of the arguments.

Here are two examples for 32-bit sine:
Code: [Select]
float sinf(float);

I ran this code
Code: [Select]

float sinanswers[401];
float sinarg[401];
 for (int i = 0; i < 400; i++) {
   sinarg[i] = factor*i;
 time1 = micros();
 for (int i = 0; i < 400; i++) {
   sinanswers[i] = sinf(sinarg[i]);
 time2 = micros();

If factor is 0.01 so the range is from 0.0 - 4.0

time elapsed = 17110 micros

If factor is 1.0 so the range is 0.0 - 400.0

time elapsed = 105353 micros

The algorithms for 64-bit double are totally different than for 32-bit float.

Much of this dates back to work in the 1980s on BSD Unix at UC Berkeley.  I was at UCB when BSD Unix was developed.

Bill Joy was a key developer of BSD and used it at Sun Microsystems as the base for Solaris.


   I assume that most of us will use fixed point maths, but for those that have a reason to use float and double, is there an alternative implementation that can be included at compile time or some other work around that provide more recent and faster implementations ?

  Duane B



avr-gcc has floating point algorithms that have been carefully optimized for the AVR architecture.
arm-gcc using newlib presumably has generic algorithms...

Go Up