I don't know if anybody has suggested this to you but what you want is called a phase discriminator and a seriously crazy easy thing to do in hardware. Even if you can't take the hardware road have a look at them if you can as background material.
The reason your code doesn't work is because you are not hunting the edges and therefore you are not getting the time difference between two rising edges (for example) but just a time between when they are both logic 1.
If your first test waited for a logic 0 and then took a time snapshot at the first logic 1 reading and then started looking for the same thing on the second input using the same method you'd get better results (although at 40kHz I would expect a fair amount of jitter in the readings still and maybe even a complete miss if the phase angle is small.)