Hello,
As far as I know, the sin() function inside these libraries are probably using Taylor Series (a kind of matematical expansion) which involves a lot of calculations per sample. Plus, the absence of floating point unit in the Cortex M3 will make such calculations even slower.
I would suggest that you use a table-lookup for these function and interpolate them between samples. Use an Excel to create a table of a cycle of sine wave and then round them and copy these values into an array, like SineTable[256]. (example 256 samples)
If you need to create a sine wave which you can control its frequency, you have to implement a DDS algorithm (Digital Direct Synthesis). Look in the Analog Electronics Example about the DDS algorithm in the net.
hope that helps.