Completely disabling timer interrupts properly

I am trying to have as consistent a time as possible for some sections of code, so I want to disable all timer interrupts. If I understand it correctly cli() should do that.

However, with this code:

  cli();
  PORTD |=  B00000010;
  
  gyro_roll_input  = (gyro_roll_input * 0.8)  + (gyro_roll * 0.2);
  gyro_pitch_input = (gyro_pitch_input * 0.8) + (gyro_pitch * 0.2);
  gyro_yaw_input   = (gyro_yaw_input * 0.8)   + (gyro_yaw * 0.2);

  PORTD &= ~B00000010;
  sei();

I would expect this to give me a constant time between the rise and fall of pin 1, but I am seeing times of 72 - 81us. For the 16MHz arduino I'm using that's a variation of about 150 or so instructions - where is that time being spent??

I should mention is that I'm using a logic analyzer with 24MS/s sample rate, which is not much above the 16MHz of the arduino itself - could that perhaps be an issue? Even if my 'scope' is not perfectly accurate, I would still not expect to see such a wide variation in timing.

So the question is, a) am I right to expect a constant time from this section of code? b) am I expecting too much from this 'scope'?

As usual I figure things out right after I post a question in forums.... it seems that floating point calculations can take different times depending the values involved. If I add this before the code above, I get consistent times (at least as consistent as I would expect from the resolution of my 'scope').

  gyro_roll_input = 1;
  gyro_pitch_input = 1;
  gyro_yaw_input = 1;
  gyro_roll = 1;
  gyro_pitch = 1;
  gyro_yaw = 1;

Just like a human, the Arduino can calculate 2x2 much faster than 5x13.

The other thing to watch out for is the compiler is extremely good at optimizing away calculations that don't get used later in the code. There's a lot of embarassed people who think they have found a bug when the complier correctly determined that the entire for() loop could be removed with no effect on the rest of the code, other than making it much faster.

iforce2d: As usual I figure things out right after I post a question in forums.... it seems that floating point calculations can take different times depending the values involved. If I add this before the code above, I get consistent times (at least as consistent as I would expect from the resolution of my 'scope').

  gyro_roll_input = 1;
  gyro_pitch_input = 1;
  gyro_yaw_input = 1;
  gyro_roll = 1;
  gyro_pitch = 1;
  gyro_yaw = 1;

Your scope is sampling 24 times every microsecond. What are the range of execution times now, given constant math values? +/- 40 nanoseconds?