Phase shift by arduinofft is not corrected

Why do you truncate 2π to 2 decimal places when your output has 4 decimal places? Make good use of what a float can store, and precompute it as

const float twoPi = 8 * atan(1);

Are you sure the problem is with the output of atan2() and it's not upstream?

Try to add debugging serial print after each intermediate step. Try to work it out on paper (maybe on a reduced set) with a pocket calculator to see if your formulas are sound.