Bug: millis() delay off by +1 ms

Calling millis(N) should delay program execution by N milliseconds. Instead, the function always delays by N+1 milliseconds. The error happened to be significant in my application where 2 ms was very different than 1 ms.

I wrote some sample code which demonstrates the problem and also shows a simple solution (replace "<=" loop test with "<").

I ran the following code on an Arduino DueMillanove compiled with the 018 version of the software.

/* Test delay and delayMicroseconds */

#define NUM_TESTS 100
#define MAX_DELAY 20

void delay_debugged(unsigned long ms)
{
      unsigned long start = millis();
      
   /* Original from wiring.c:  
      while (millis() - start <= ms)
   */ 
      while (millis() - start < ms)
            ;
}

void setup()   {                
  Serial.begin(115200);
}

void loop()                     
{
   int i, j, k; 
   unsigned long tic, toc;
   float avg_delay;
    
   for (i=1; i<=MAX_DELAY; i++) 
   {
      avg_delay = 0;
      for (j=0; j<NUM_TESTS; j++)
      {
         tic = micros();
         delay(i);
         toc = micros();
         avg_delay += (toc - tic); 
      }
      avg_delay /= (1000.0 * NUM_TESTS);
      Serial.print("delay(");Serial.print(i);Serial.print(")=");
      Serial.println(avg_delay);
   } 

   for (i=1; i<=MAX_DELAY; i++) 
   {
      avg_delay = 0;
      for (j=0; j<NUM_TESTS; j++)
      {
         tic = micros();
         delay_debugged(i);
         toc = micros();
         avg_delay += (toc - tic); 
      }
      avg_delay /= (1000.0 * NUM_TESTS);
      Serial.print("delay_debugged(");Serial.print(i);Serial.print(")=");
      Serial.println(avg_delay);
   } 

   for (i=1; i<=MAX_DELAY; i++) 
   {
      avg_delay = 0;
      for (j=0; j<NUM_TESTS; j++)
      {
         k = 4*i;
         tic = micros();
         delayMicroseconds(k);
         toc = micros();
         avg_delay += (toc - tic); 
      }
      avg_delay /= NUM_TESTS;
      Serial.print("delayMicrosecnds(");Serial.print(k);Serial.print(")=");
      Serial.println(avg_delay);
   } 

   Serial.println("-----------------------------------------------");
   delay(5000);
}

The error happened to be significant in my application where 2 ms was very different than 1 ms.

For accurate small delays use delayMicroseconds.

Calling millis(N) should delay program execution by N milliseconds. Instead, the function always delays by N+1 milliseconds.

On average, the delay is N + 0.5 milliseconds not N + 1.0. It is coincidence that your test always has a delay of N + 1.

I wrote some sample code which demonstrates the problem and also shows a simple solution (replace "<=" loop test with "<").

The change you suggest moves the average to N - 0.5; the delay on average will be less than what was requested.

I suspect the person who wrote the code meant the delay to be at least N milliseconds.

This issue has surfaced a few times on the forum. It would eliminate some confusion if delay called delayMicroseconds for short delays.

The definition of delay(n) was changed a couple of releases ago, and is now "delay AT LEAST n milliseconds" rather than the previous version (which was pretty much "delay no more than n milliseconds.) The new version is more compatible with other programming environments, and generally "safer" for devices that actually need a delay.

Since the millisecond "ticks" are not synchronized WRT calls to delay(), there is always going to be error. Unfortunately, for the "typical" loop where you do "small" amounts of computations after each delay, the error is likely to always be a significant chunk of one millisecond (about 1 - ) There is additional error because the clock ticks at every 1024 microseconds rather than every 1000 microseconds...

For accurate small delays use delayMicroseconds.

That is indeed how I solved the situation for myself. However, it definitely seems that either the function should be updated to match its documentation or the reverse and the documentation updated to match the function's behavior.

The definition of delay(n) was changed a couple of releases ago, and is now "delay AT LEAST n milliseconds" rather than the previous version (which was pretty much "delay no more than n milliseconds.)

The re-definition, apparently, has not made it to the documentation. http://arduino.cc/en/Reference/Delay makes no mention of the fact that the delay is in the range [N, N+1] milliseconds. For performance reasons it suggests using delay() only for short periods such as 10 ms. Unfortunately, this is just where the fractional error in delay, 1/N, is going to be large. If the existing implementation is kept then there should be a warning about not using delay for small intervals.

On average, the delay is N + 0.5 milliseconds not N + 1.0. It is coincidence that your test always has a delay of N + 1.

I take your point, but in fact this is only true A) if there is a randomized delay of some sort in loop() or B) the average is taken across all possible compiled programs exhibiting different natural delays.

To help explain what I mean consider the following pseudocode.

void loop()
{
   ******   
   C code block with execution time = et
   ******
   delay(N);
}

This is a fairly standard code architecture where the code does something useful and pauses at the end before jumping to the start of loop() to begin work again.

The delay() routine constantly monitors millis() and exits several microseconds after the millis() counter changes. This serves to synchronize loop() with the millis() counter.

Thus, after the first pass through I start loop() with the millis counter precisely at some value C. The execution time (et) is often very short. At 16 MHz, for example, 1600 instructions take just 0.1 ms. For shorthand I'll express et as a decimal fraction of a millisecond. Thus, when I hit the final delay(N) statement at the end of loop() the wall time is C.et. The function delay() waits for the change from C.et to C+1 which contributes (1-et) worth of delay. It then waits another N milliseconds for a total delay of N + (1 - 0.et) milliseconds.

It is not a coincidence that my code produced an extra delay of 1 unit. In fact, most code will be off by nearly 1 unit because et is usually small.

If the fractional part of execution time were to exhibit a uniform distribution over the range [0,1] then the average over that uniform distribution is 0.5 and the delay would be N + 0.5 as you suggest.

Such a random distribution might result if one were to consider all the potential programs that users might write and all the different execution times those different programs might have. However, my guess is that the fractional part of execution time is not uniformly distributed but weighted towards small execution times which means the delay will be statistically biased towards +1 ms.

Such a random distribution might result if one were to consider all the potential programs that users might write and all the different execution times those different programs might have

You've proposed a change that affects every Sketch written. For such a change, it is necessary to consider "all the potential programs that users might write".

However, my guess...

And my guess is different than yours.

However, it definitely seems that either the function should be updated to match its documentation or the reverse and the documentation updated to match the function's behavior

It's your thread which would you like to pursue?

Three options have been presented...

  • documentation updated to match the function's behavior
  • the function should be updated; calling millis(N) should delay program execution by N milliseconds
  • delay calls delayMicroseconds for short delays

the function should be updated; calling millis(N) should delay program execution by N milliseconds

Not practical given the current timer scheme.

delay calls delayMicroseconds for short delays

Not practical because delayMicroseconds() has significant drawbacks (most notably, it disables interrupts.)

delayMicroseconds() has significant drawbacks (most notably, it disables interrupts.)

Interrupt manipulation was removed in 0018.

Are there other drawbacks? (other than the obvious one of potentially breaking existing code)

The "full source" of delay() is quoted below so there shouldn't be much need for speculation on its behaviour.

void delay(unsigned long ms)
{
      unsigned long start = millis();
      
      while (millis() - start <= ms)
            ;
}

The notion of delay() yielding a guaranteed "minimum delay" is not always the case (this holds true however for the delayMicroseconds() function). Due to the fraction handling (1024 micros), millis may increment twice for a single timer0 overflow. As a consequence - a 1ms delay may return immediately as in the following:

  • timer0 is 1 microsecond away from overflowing
  • delay is called before timer0 overflows
  • delay reads current time (no overflow yet)
  • timer0 overlows (only 1 microscond has passed) and increments millis twice due to fractions adding up
  • delay reads current time and returns

The average will still be around the "delay +0 .5ms" mark, but this double increment may be an issue for some applications.

The delayMicroseconds() function is a good alternative for short interval timing as it is not affected by above issues.

Huh! Missed that. I can't think of any other issues other than interrupts, at least not in current code (eventually, in a hypothetical multi-tasking Arduino with an actual OS kernel, I can see "delay()" being a blocking system call that lets other tasks runs, while delayMicroseconds() continues to be a busy-loop...)

@BenF: Excellent analysis!

@westfw: "Not practical given the current timer scheme." I'm going to take a wild guess and say that BenF agrees. I agree as well.

That leaves two options...

  • documentation updated to match the function's behavior
  • delay calls delayMicroseconds for short delays

@Zack777: Which would you like to pursue?

If delayMicroseconds no longer disables interrupts, is there a reason for ever using millis() in delay?

void accurateDelay(unsigned long ms)
{
   while (ms--)
      delayMicroseconds(998);  // (or whatever)
}

I can't think of a reason.

delayMicroseconds(998); // (or whatever

Getting this ".. whatever" right may be a bit tricky for long delays and so using a timer reference may be preferred. E.g. an 8MHz Arduino will need a shorter "... whatever" than the 16MHz flavor. Busy loop timing and function call overhead may also change between compiler versions.

A combination of using timer overflow count + microseconds may be an option for a "better" milliseconds delay().

Otherwise I think an update to the documentation may go along way. Additional info to include:

  • average delay n+0.5ms
  • error range +/- 1ms @ 16Mhz, -3 to +2 @ 8MHz

My apologies for taking 2 days to get back to the many comments. I can only work on Arduino-related stuff in the morning and evening.

The open courses of action now are to modify the delay function to improve accuracy or to simply warn users about the delay function (a documentation patch).

I believe the best course of action is, unfortunately for we coders, to produce a better delay function. Most users of Arduino have a goal in mind when using the hardware/software and the goal is probably not to have to read the entire Atmel datasheet to determine whether their code will behave as expected.

For confirmation of the extreme level of detail required, I need only look at this thread where we've glibly tossed around some rather complicated ideas like probability distributions, function call overhead, interrupt timing, etc. BenF did a nice analysis showing that for the 8MHz devices the expected absolute error is -3/+2 ms. Even if we were to document this, I'm not sure this would help an average user. For example, given the specification of -3/+2 is it immediately obvious how to implement a precise 20 ms delay?

It seems like coders should figure out the answer to that question and solve it once in a library function so that average users can continue to use Arduino as a tool for their own needs. If they really wish to understand microcontrollers and machine language they can always do so, but it shouldn't be required.

To recap the technical situation, we measure time on the Arduino by counting ticks. My original bug filing was about issues in counting and whether we should be counting N or N+1 ticks. BenF pointed out that, in addition, the ticks themselves are not constant (usually 1, sometimes 2 at at a time). Thus, we have a two-fold problem.

Sometimes problems are more theoretical than practical and don't really need to be addressed. This might be the case for the inconsistent ticks if they manifested themselves only once a day or once a week. But, we're not that lucky. For every timer0 overflow 1024 usec pass on a 16MHz device. The time to overflow the fractional part of the millis counter is just 1000/24 or ~42 ms. This is happening all the time.

I'll suggest the standard engineers solution of divide and conquer. For small delays, use a scheme that doesn't depend on the millis counter; For large delays, the fractional error for the 16 MHz device is |1|/N and quickly becomes very small. So for large delays the existing routine is probably fine.

Here is one possibility derived from westfw's suggestion.

void new_delay(unsigned long ms)
{
   while (ms--) 
   {
#if F_CPU >= 16000000L
      delayMicroseconds(994);
#else
      delayMicroseconds(TBD);
#endif
   }
}

The value in delayMicroseconds was tuned using code I will post below. The fractional error for small delays is .033% which is very tight. This code is also smaller by 24 bytes than the existing delay function. Presumably this is because it doesn't need to instantiate another unsigned long to keep track of the millis() start value and the loop ending condition is simpler as well.

I know there is some concern about using tuned values, but it seems that there is little choice when trying to get extremely precise timings. The delayMicroseconds function already uses code tuned separately for 16 MHz and 8 MHz devices. I borrowed the construction with #ifdef F_CPU from that routine. Presumably, if delayMicroseconds() is calibrated for each release then delay() could be as well. To explore a change in underlying timings, imagine that function call overhead and busy loop timings do change by 2 us (32 instructions). The fractional error will change by 0.2%. This is still 25 times smaller than the current error of 5% for a small delay of 20 ms.

At least for small delays, this seems like a good solution. We can keep this code for large delays as well, but at some point the fractional error will exceed the absolute error of 1ms when using the millis timer. If accuracy is valued more than code size then we should revert to a millis() based method for long delays.

The crossover point is determined when the fractional errors are equal or .033% = (1/N)*100 which yields a crossover delay N of 3030. I tested this and indeed found that the measured delay at 3030 was 3030.917 ms or very nearly 1 ms in error.

There was some variability in the measured fractional error and the fractional error for the millis() approach is somewhat smaller ( (1-et)/N rather than 1/N). These considerations led me to switch over to the millis counter sooner at a value of 1500 ms.

The code below is nicely accurate, but now 70 bytes larger than the original function. I don't know whether that is acceptable or not. Alternative strategies are also welcome.

/* Test new delay subroutines */

#define NUM_TESTS 100
#define N_MS 20

/*
// Original from wiring.c:  
void new_delay(unsigned long ms)
{
      unsigned long start = millis();
      
      while (millis() - start <= ms)
            ;
}
*/

/*
void new_delay(unsigned long ms)
{
   while (ms--) {
#if F_CPU >= 16000000L
      delayMicroseconds(994);
#else
      delayMicroseconds(TBD);
#endif
   }
}
*/


void new_delay(unsigned long ms)
{
   if (ms < 1500)    // Short delays use microseconds for timing
   {
      while (ms--) 
      {
#if F_CPU >= 16000000L
         delayMicroseconds(994);
#else
         delayMicroseconds(TBD);
#endif
      }
   } else            // Long delays use milliseconds for timing
   {
      unsigned long start = millis();
      
      while (millis() - start < ms)
         ;
   }
}


void setup() 
{                
   Serial.begin(115200);
}

void loop()                     
{
   unsigned long i; 
   unsigned long tic, toc;
   double avg_delay;
    
   avg_delay = 0;
   for (i=0; i<NUM_TESTS; i++)
   {
      tic = micros();
      new_delay(N_MS);
      toc = micros();
      avg_delay += (toc - tic); 
   }

   avg_delay -= 3.262*NUM_TESTS;  // subtract overhead of tic & toc
                                  // measured previously on 16 MHz device
   avg_delay /= (1000.0 * NUM_TESTS);

   Serial.print("delay(");Serial.print(N_MS);Serial.print(")=");
   Serial.println(avg_delay, 3);
   Serial.println("------------------------");
}

For those interested - here's a simple delay function that will do a lot better than the current version in terms of low millis accuracy:

void setup()
{
  Serial.begin(57600);
  Serial.println("Delay Test Started");
}

void delay_x(uint32_t millis_delay)
{
  uint16_t micros_now = (uint16_t)micros();

  while (millis_delay > 0) {
    if (((uint16_t)micros() - micros_now) >= 1000) {
      millis_delay--;
      micros_now += 1000;
    }
  }  
}

void loop()
{
  static uint8_t counter = 0;
  
  if (counter <=10 ) {
    uint16_t t = (uint16_t)micros();
    delay_x(counter);
    t = (uint16_t)micros() - t;
    Serial.print(counter, DEC);
    Serial.print(": ");
    Serial.println(t, DEC);
    counter++;
  }
}

The small test application prints the timed delay (in microseconds) for the range 0 to 10 ms.

One advantage to using a timer synced delay versus a busy loop is that it will account for interrupt delays. That is interrupts occuring during wait will be accounted for timewise whereas for a busy loop time spent during interrupt processing will be added to the delay.

The timer0 resolution is only 4us (8 on an 8MHz AtMega) so we will never challenge delayMicroseconds() for delay accuracy in the microseconds range. For longer delays (milliseconds range) however we can likely do much better. As such I think there is room for both a pure microsecond busy loop as well as a timer synced millisecond version.

I like BenF's "compromise." Depending on how much we trust micros()

Hi, err... since this is in the hardware part of this forum there is another potential solution...

Instead of trying to make time sync with the system clock, make the system clock sync with time. Timing crystals with a frequency of 16.384 Mhz are commonly available. The timers would overflow on exact 1 ms boundaries. PWM frequency would be exactly 500 Khz. The USART baud rate prescalars (UBRRn's) would need re-calibrating. Nice clean interrupt code.

Something to think about.

I have done some testing of BenF's code solution and I think it is the best. The use of uint16 types saves memory. Overall the code is 20 bytes smaller than the existing delay() function. In addition, the accuracy is impressive. It displays an absolute accuracy of 6 microseconds on average. The fractional error at the smallest reasonable delay, 1 millisecond, is just 0.6%. I also confirmed that it handles rollovers in the micros() counter correctly. There are no glitches in delay() every 70 minutes. Adopting this code also means not having to update the delay() documentation.

In short, it's smaller, better, and requires less work from developers. Is there someone with authority to make this change monitoring this bug forum topic? Could you check in his proposed code?

The only change I would make is in the calling convention for delay where BenF has used 'uint32_t millis_delay' and I think it would be preferable to keep the existing 'unsigned long ms' since this matches the prototype of the function in wiring.h

I created Google Code Archive - Long-term storage for Google Code Project Hosting.
and attached BenF's code...

We'll see...

Why don't you talk to Hans-Juergen Heinrichs to use these and be done with it:
http://www.avrfreaks.net/index.php?module=Freaks%20Academy&func=viewItem&item_id=665&item_type=project

Unlike the standard AVR libC <util/delay.h> routines, these give you very accurate timings with good resolution even in the sub microsecond range.

--- bill