Bug: delay not behaving well on MEGA

I've been having problems with the delay() function on my Arduino MEGA. I've been using it for creating delays that last at least a few seconds, but about 3% of the time the delay() function returns prematurely, only after a few milliseconds.

I'm using software version 0016.

The software and hardware are behaving correctly, i.e. it's not restarting and the execution of the program continues as it should, only it does so prematurely.

I have not had this problem with the Duemilanove.

I wrote a workaround, but I felt I should report this issue.

However, note that this workaround does not handle a millis() overflow between the checks (other than checking that the time-difference is a positive number), and I also limited my call-stack to 8 recursions (I don't even know how large the stack on the MEGA 1280 can be).

int totalPause = 0;
int failedPause = 0;
int pauseStackCounter = 0;

void pause(long m) {
  pauseStackCounter++;
  totalPause++;
  long time = millis();
  delay(m);
  long newTime = millis();
  long diff = newTime - time;
  if (diff < m && diff > 0) {
    failedPause++;
    Serial.print("\n-- Time error! --\nDelay: ");
    Serial.print(diff);
    Serial.print("ms out of ");
    Serial.print(m);
    Serial.println("ms");
    
    if (pauseStackCounter < 8) {
      pause(m-diff);
      Serial.println("Fixed.");
      Serial.print("Error rate: ");
      Serial.print(failedPause);
      Serial.print(" / ");
      Serial.println(totalPause);
      Serial.println();
    } else {
      Serial.println("Stack counter too big! Not doing anything.");
    }
  }
  pauseStackCounter--;
}

So this is your work around code? Can you post the actual code that is giving you the problems?

This is my original code:

int counter = 0;

void setup() {
  Serial.begin(9600);
  pinMode(52, OUTPUT);
  Serial.println("Setup.");
}

void loop() {
  digitalWrite(52, HIGH);
  Serial.print("on: ");
  Serial.println(counter);
  delay(3000);
  digitalWrite(52, LOW);
  Serial.print("off: ");
  Serial.println(counter);
  delay(7000);
  counter++;
}

This is my current code:

int counter = 0;
int totalPause = 0;
int failedPause = 0;
int pauseStackCounter = 0;

void setup() {
  Serial.begin(9600);
  pinMode(52, OUTPUT);
  Serial.println("Setup.");
}


void pause(long m) {
  pauseStackCounter++;
  totalPause++;
  long time = millis();
  delay(m);
  long newTime = millis();
  long diff = newTime - time;
  if (diff < m && diff > 0) {
    failedPause++;
    Serial.print("\n-- Time error! --\nDelay: ");
    Serial.print(diff);
    Serial.print("ms out of ");
    Serial.print(m);
    Serial.println("ms");
    
    if (pauseStackCounter < 8) {
      pause(m-diff);
      Serial.println("Fixed.");
      Serial.print("Error rate: ");
      Serial.print(failedPause);
      Serial.print(" / ");
      Serial.println(totalPause);
      Serial.println();
    } else {
      Serial.println("Stack counter too big! Not doing anything.");
    }
  }
  pauseStackCounter--;

}

void loop() {
  digitalWrite(52, HIGH);
  Serial.print("on: ");
  Serial.println(counter);
  pause(3000);
  digitalWrite(52, LOW);
  digitalWrite(52, LOW);
  digitalWrite(52, LOW);
  Serial.print("off: ");
  Serial.println(counter);
  pause(7000);
  counter++;
}

When the delay() routine exits prematurely, it does so after only a few milliseconds (my last example is 5ms). Sometimes it works, sometimes it doesn't. It's seems completely random.

I may be wrong, but my feeling is that the latter call to delay(), the longer one, usually is the one that does this.

Like I said earlier, the execution continues normally, it does increment this "counter" variable as it should and then loops again.

Here's an example of my serial log where I'm running the code with my workaround:

on: 14
off: 14
on: 15
off: 15

-- Time error! --
Delay: 5ms out of 7000ms
Fixed.
Error rate: 1 / 33

on: 16
off: 16

I've tested this many times now, and my error-rate is always somewhere around 3%.

I should mention that my output pin is connected to an LED (actually the LED-input part of an optocoupler) through a resistor. :slight_smile:

Nothing else is connected to the Arduino MEGA board (except the USB cable to my computer).

The delay() function uses the millis() function the same way you do in your pause function and if you look at millis() below - there is a comment about having to disable itnterrupts while reading the timer0_millis counter.

unsigned long millis()
{
      unsigned long m;
      uint8_t oldSREG = SREG;

      // disable interrupts while we read timer0_millis or we might get an
      // inconsistent value (e.g. in the middle of a write to timer0_millis)
      cli();
      m = timer0_millis;
      SREG = oldSREG;

      return m;
}

If interrupts are not disabled, the outcome would be consistent with what your workaround code outputs. In other words delays would be unpredictable. The cli() is meant to disable interrupts, but your code suggests it doesn't.

The cli() function is a macro that executes a single assembly instruction and it seems as if this macro is not properly defined in your copy of version 0016.

I don't have a mega to verify whether this is a general issue for the Arduino MEGA board, but you may want to install 0017 and try the same skecth to see if it behaves differently.

I have solved my problem. This, as most issues, is a classical PEBKAC (User error - Wikipedia). :wink:

The problem was simply electromagnetic interference. I simply couldn't imagine that:

  • The interference was so big
  • The Arduino MEGA would be so sensitive to it (having the opto-coupler and all)

This is my setup:

I modified my code for my experiment:

int counter = 0;
int totalPause = 0;
int failedPause = 0;
int pauseStackCounter = 0;

void setup() {
  Serial.begin(9600);
  pinMode(52, OUTPUT);
  Serial.println("Setup.");
}


void pause(long m) {
  pauseStackCounter++;
  totalPause++;
  long time = millis();
  delay(m);
  long newTime = millis();
  long diff = newTime - time;
  if (diff < m && diff > 0) {
    failedPause++;
    Serial.print("\n-- Time error! --\nDelay: ");
    Serial.print(diff);
    Serial.print("ms out of ");
    Serial.print(m);
    Serial.println("ms");
    
    if (pauseStackCounter < 8) {
      pause(m-diff);
      Serial.println("Fixed.");
      Serial.print("Error rate: ");
      Serial.print(failedPause);
      Serial.print(" / ");
      Serial.println(totalPause);
      Serial.println();
    } else {
      Serial.println("Stack counter too big! Not doing anything.");
    }
  }
  pauseStackCounter--;

}

void loop() {
  for (int i = 0 ; i < 1500 ; i++) {
    digitalWrite(52, HIGH);
    Serial.print("on : ");
    Serial.println(i);
    pause(100);
    digitalWrite(52, LOW);
    digitalWrite(52, LOW);
    digitalWrite(52, LOW);
    Serial.print("off: ");
    Serial.println(i);
    pause(100);
  }
  
  Serial.print("Error rate: ");
  Serial.print(failedPause);
  Serial.print(" / ");
  Serial.println(totalPause);
  Serial.println();
  
  
  for (;;) {
  }
}

The program loops 1500 times. That totals at least 3000 calls to the delay() routine, 2 times per loop (I didn't remove the recursion from my pause() routine, so if delay() exited prematurely it got called again).

I conducted designed my experiment to have 8 steps, one of them was conditioned on another step and it turned out I didn't have to do it. So I ended up doing 7.

1.

  • Arduino connected to optocoupler
  • No power on relay side of optocoupler
  • No power on the pump
  • No Faraday shield between the pump and the rest of the circuitry
  • No ferrite bead on any conductor (the one I used is quite big and when used, the conductor looped once, i.e. it went 2x through the bead)

Error rate: 0 / 3000

2.

  • Like no. 1 + power to relay side of optocoupler

Error rate: 0 / 3000

3.

  • Like no. 1 + no. 2 + power to the pump

Error rate: 28 / 3028 (28 / 3000 if I had removed the recursion)

4.

  • Like no. 1 + no. 2 + no. 3 + Faraday shield (grounded) between the pump and the rest of cir circuitry.

Error rate: 9 / 3009 (9 / 3000 if I had removed the recursion)

5.

  • Like no. 1 + no. 2 + no. 3 + no. 4 + ferrite bead on conductor from optocoupler to relay (number 1 in picture above).

Error rate: 2 / 3002 (2 / 3000 if I had removed the recursion)

6.

  • Like no. 1 + no. 2 + no. 3 + no. 5. I.e. no Faraday shield, only ferrite bead on conductor 1.

Error rate: 2 / 3002 (2 / 3000 if I had removed the recursion)

7.
If no. 2 (having the relay turned on but no the pump) turned out be any different from only having the optocoupler connected:

  • Like no. 1 + no. 2 + ferrite bead on relay conductor (1).

This one I did not do because step no. 2 did not show any difference.

8.

  • Like no. 6, no Faraday shield and ferrite bead now on conductor no. 2 (between Arduino and optocoupler).

Error rate: 0 / 3000

Result:
The best result was to simply put a ferrite bead on conductor no. 2 (experiment no. 8). The delay() routine now behaves well. Other parts of the code were of course subject to failure the whole time, but their execution didn't take as long as the delay and thus were statistically insignificant.

Note: I had already tried putting a capacitor on the conductor between the optocoupler and the relay.

I did not check if my version of the Arduino software does the cli() call or if the macros are any different from those in version 0017.

Thank you for your (wasted) time, hope someone new to this platform finds this post useful. :slight_smile:

I have solved my problem.

I must say it was unexpected to see the issue caused by EMF. Typically this would make systems even more erratic (e.g. restart, issues with upload etc.).

Nothing is better though than finding and fixing an issue.