Using digital.Write without delay from loop time

Hi, so I am trying to run two stepper motors for a self-balancing robot. My problem is that the time it takes to go through the loop() code is longer than the pulse time needed to drive the stepper motors at the desired speed. It takes about 2-4ms to run the loop() code but I need to do digital.Write(HIGH) and LOW with only a 400us gap between them ideally (That is about the max stepper speed I was able to obtain using no other code).

I have looked at using PWM even though it seems that it is not generating the same signals as the alternating digitalWrite signals but it makes a lot of noise in the stepper motor and drives at weird speeds it seemed.

So my question is, what is the best way to drive the stepper motors with the digital write without worrying about the loop time.

Here is my current code:

#include <Arduino_LSM9DS1.h>
//#include <mbed.h>

float x, y, z, xg, yg, zg, AccAngle,angle,speed, delT,err, offset, P, I, D, Command,lastSensorTime, lasttime, dt, serror, deriv, err1, err2, err3;
const int stepPin1 = 2; 
const int dirPin1 = 3; 
const int stepPin2 = 4;
const int dirPin2 = 5;
int i=1;
//mbed::PwmOut d2(P1_11);

void setup() {
  IMU.begin();
  Serial.begin(9600);
  //pinMode(stepPin1,OUTPUT); 
  pinMode(dirPin1,OUTPUT);
  pinMode(stepPin2,OUTPUT); 
  pinMode(dirPin2,OUTPUT);
  digitalWrite(dirPin1,HIGH);
  digitalWrite(dirPin2,LOW);

}

void loop() {


       
if (IMU.gyroscopeAvailable()) {
    IMU.readGyroscope(xg, yg, zg);
  }

if (IMU.accelerationAvailable()) {
    IMU.readAcceleration(x, y, z);
  }
    AccAngle=(atan2(z,-x))*180/PI;
    angle=0.98*(angle+(y/(IMU.gyroscopeSampleRate())))+(0.02*AccAngle);
//Serial.println(err);
    offset=3.0;
    err=angle+offset;
    err3=err2;
    err2=err1;
    err1=err;
    dt=micros()-lasttime;
    lasttime=micros();
    Serial.println(dt);

    //PID Controller
    P=-0.04;
    I=0.0;
    D=0.0;

    serror=serror+(err*dt);
    
    deriv=((3*err1)-(4*err2)+(err3))/(2*dt);

    Command=(P*err)+(I*serror)+(D*deriv);

    accel(Command);
    
}

void velocity(float speed) {
    if (speed<0.0) {
       digitalWrite(dirPin1,HIGH);
       digitalWrite(dirPin2,LOW); 
    }
    else {
       digitalWrite(dirPin1,LOW);
       digitalWrite(dirPin2,HIGH);
    }
    
    if (abs(speed)<=1000.0) {
        delT=4000000.0/(abs(speed));
    } 
    else {
        delT=4000.0; 
    }

    //d2.write(0.1f);
    //d2.period_us(1000);
     
        if (micros()-lastSensorTime>=delT){
            lastSensorTime+=delT;
            
            if (i==2) {
            digitalWrite(stepPin1,HIGH); 
            digitalWrite(stepPin2, HIGH);
            i=1;
            }
            else if (i==1) {
            digitalWrite(stepPin1,LOW);
            digitalWrite(stepPin2,LOW);
            i=2;
            } 
        }
    }
 
void accel(float force) {
    speed=speed+force;
    if (speed>1000){
      speed=1000;
    }
    else if (speed<-1000){
      speed=-1000;
    }
    velocity(speed);
}

And without the glacially slow 9600 serial debug every iteration through loop?

pcbbc:
And without the glacially slow 9600 serial debug every iteration through loop?

How would I get that value? And does it really increase the loop time that much?

each character would take approximately a millisecond to send over 9600.

assuming dt is five characters, plus the added \r\n, you are potentially using up 7ms just to print dt

I removed the Serial and the stepper response still appears to be going the same speed. Is there anything else in my code that could be taking so long? Perhaps the IMU.begin does something over Serial as well? It is the built-in LSM9DS1 from an Arduino Nano 33 BLE.

Is there any way to do the digital.Write without worrying about loop time though?

There are direct register access to CPU pins using 1 or 2 clock cycles, but your Arduino Nano is based on nRF52840 so you might have to dig on the datasheet to figure this out.

I doubt it's the digitalWrite that is slowing you down considering all the floating point operations you are doing

Your interface to the IMU is also I2C. It's possible it's running at the slower 100kHz instead of 400kHz

Have you timed the floating point calculations?

hzrnbgy:
There are direct register access to CPU pins using 1 or 2 clock cycles, but your Arduino Nano is based on nRF52840 so you might have to dig on the datasheet to figure this out.

I doubt it's the digitalWrite that is slowing you down considering all the floating point operations you are doing

Your interface to the IMU is also I2C. It's possible it's running at the slower 100kHz instead of 400kHz

https://content.arduino.cc/assets/Nano_BLE_MCU-nRF52840_PS_v1.1.pdf On page 459 it looks like there is some information about the direct register access but I am pretty lost on how I would go about implementing that to do what I want. Is there any way I could get any help with that? Goal would be to just do digitalWrite HIGH and LOW at different frequencies based on my loop() code.

aarg:
Have you timed the floating point calculations?

I have not. Do you think using int values multiplied by 1000 to replace most of the float values could improve the loop time?

nesser:
I have not. Do you think using int values multiplied by 1000 to replace most of the float values could improve the loop time?

You have the right idea, but that is too simplistic. You need to scale the values according to the expected ranges in order to avoid excessive truncation. While you are examining the code, also look for opportunities to optimize expressions to avoid duplication or unnecessary run time calculations.