Declaring variables & Program speed

I'm building an arduino based drone that has an IMU loop that runs at 500Hz, see below;

void GetIMU(){
float   gain=0.99; //comp. filter gain
float   filter=0.08; 
uint32_t IMUStartT = micros();
uint16_t  IMUdT = IMUStartT - IMUFinT;      //dT for each computation of the IMU loop
 
if(IMUdT >= IMUlooptime){
  
Wire.beginTransmission(MPU_Address);
Wire.write(0x3B);
Wire.endTransmission();
Wire.requestFrom(MPU_Address,4);
/*int16_t*/ Accel_X = Wire.read()<<8 | Wire.read();
int16_t Accel_Y = Wire.read()<<8 | Wire.read();
Wire.beginTransmission(MPU_Address);
Wire.write(0x3F);
Wire.endTransmission();
Wire.requestFrom(MPU_Address,4);
int16_t Accel_Z = Wire.read()<<8 | Wire.read();
int16_t Temp   =  Wire.read()<<8 | Wire.read();
Wire.beginTransmission(MPU_Address);
Wire.write(0x43);
Wire.endTransmission();
Wire.requestFrom(MPU_Address,6);
int16_t Gyro_X =  Wire.read()<<8 | Wire.read();
int16_t Gyro_Y =  Wire.read()<<8 | Wire.read();
int16_t Gyro_Z =  Wire.read()<<8 | Wire.read();

Accel_X = Accel_X - Accel_X_offset;
Accel_Y = Accel_Y - Accel_Y_offset;
Accel_Z = Accel_Z - Accel_Z_offset;    

Accel_X_Filter = Accel_X_Filter + filter*(Accel_X - Accel_X_Filter);
Accel_Y_Filter = Accel_Y_Filter + filter*(Accel_Y - Accel_Y_Filter);
Accel_Z_Filter = Accel_Z_Filter + filter*(Accel_Z - Accel_Z_Filter);

/*float*/ Acc_Angle_X = atan2(Accel_Y_Filter, sqrt( Accel_X_Filter*Accel_X_Filter + Accel_Z_Filter*Accel_Z_Filter )  )*57.30;
/*float*/ Acc_Angle_Y = atan2(Accel_X_Filter, sqrt( Accel_Y_Filter*Accel_Y_Filter + Accel_Z_Filter*Accel_Z_Filter )  )*57.30;

float IMUdt_Const = (IMUdT*0.0000000152579);        //65LSB/d/s, micros, 1/65540000= 0.00001525.. 
float Xdelta = compangleX + (Gyro_X*IMUdt_Const);    // calculates first bit of comp. filter eq. 
compangleX = gain*Xdelta + (1-gain)*Acc_Angle_X;   
float Ydelta = compangleY + (Gyro_Y*IMUdt_Const);    // calculates first bit of comp. filter eq. 
compangleY = gain*Ydelta + (1-gain)*Acc_Angle_Y;

YawRate = (-Gyro_Z/65.54); 

IMUFinT = IMUStartT;    // leave in IF loop, then only computed when If=true. 
  } 
}

I connected the code up to some hardware and had a sluggish response from the servos which are controlled in another function. So to debug I started printing variables to the serial monitor to watch them in real time. As some of these variables were declared in the sub-routine I then redeclared them globally at the start of the sketch and found that this made a dramatic improvement in the servo response time.

/*float*/ Acc_Angle_X = atan2(Accel_Y_Filter, sqrt( Accel_X_Filter*Accel_X_Filter + Accel_Z_Filter*Accel_Z_Filter )  )*57.30;
/*float*/ Acc_Angle_Y = atan2(Accel_X_Filter, sqrt( Accel_Y_Filter*Accel_Y_Filter + Accel_Z_Filter*Accel_Z_Filter )  )*57.30;

I initially declared one of these variables for Acc_Angle_ globally and one within the loop and one servo was updating instantly and one was updating with a noticeable amount of lag. Once both of these were commented out and declared at the beginning of the sketch, the problem of the servo lag went away.

My question is why?

Knowing this I can hopefully speed it up in other places too. Thanks.

Declaring variables in functions and subroutines calls for some system work, overhead, to allocate and deallocate memory at each call of the function/subroutine. Using a global variable will not call for that system overhead as I think.
Just make sure that no unintended, unauthorized access is made to those globals.

An alternative may be to declare those local variables as static, then they don't get deallocated when the function completes, but are still only accessible by that function.

The allocation lag shouldn’t be very significant, compared to calculating atan2().
Can you post code that includes both cases, and perhaps a timing loop?

The act of printing itself significantly slows down any loop with it in. You might speed things up with the baud rate value, but not as much as leaving the print out altogether.