AccelStepper slower than sending pulses manually?

I'm trying to use multistepper in accelstepper to control 2 stepper motors simultaneously.

void setup() {
  Serial.begin(250000);
  pinMode(STEP_PIN_1, OUTPUT);
  pinMode(DIR_PIN_1, OUTPUT);

  pinMode(STEP_PIN_2, OUTPUT);
  pinMode(DIR_PIN_2, OUTPUT);

  pinMode(LIMIT, OUTPUT);

  Serial.println("Start...");
  stepper1.setMaxSpeed(4000);
  stepper1.setMinPulseWidth(20);
  stepper2.setMaxSpeed(4000);
  stepper2.setMinPulseWidth(20);

  steppers.addStepper(stepper1);
  steppers.addStepper(stepper2);
}

Then I try to control the motors using this code:

void command3(bool down){
  Serial.println("Command 3 Starting");
  
  long positions[2];
  if (down){
     positions[0] = -distance;
     positions[1] = -distance;
  }
  else {
     positions[0] = distance;
     positions[1] = distance;
  }
  steppers.moveTo(positions);
  while(digitalRead(LIMIT) == LOW)
  {
    if (stepper1.distanceToGo() == 0)
    {
      break;
    }
    steppers.run();
  }
  delay(500);
  positions[0] = 0;
  positions[1] = 0;
  steppers.moveTo(positions);
  while(digitalRead(LIMIT) == LOW)
  {
    if (stepper1.distanceToGo() == 0)
    {
      break;
    }
    steppers.run();
  }
  delay(500);
  Serial.println("Command 3 Finished");
}

The code above is an order of magnitude slower than sending the pulses myself:

void command4(bool down){
  Serial.println("Command 4 Starting");
  if(down)
  {
    digitalWrite(DIR_PIN_1, LOW);
    digitalWrite(DIR_PIN_2, LOW);
  }
  else{
    digitalWrite(DIR_PIN_1, HIGH);
    digitalWrite(DIR_PIN_2, HIGH);
  }

  for (long i = 375000; i>0; i--)
  {
    if (digitalRead(LIMIT) == HIGH)
    {
      break;
    }
    digitalWrite(STEP_PIN_1, LOW);
    digitalWrite(STEP_PIN_1,HIGH);
    digitalWrite(STEP_PIN_2, LOW);
    digitalWrite(STEP_PIN_2,HIGH);
    delayMicroseconds(5);
  }
  Serial.println("Command 4 Finished");
}

Am I using the library wrong or should I just send the pulses myself?

sterisk:
The code above is an order of magnitude slower than sending the pulses myself:

You have not told us how many steps per second the slow and the fast programs can produce.

There is no doubt that you can write Arduino code to work a lot faster than the AccelStepper library - I believe that's mainly because the library uses floating-point maths which is very slow.

...R