My apologies for taking 2 days to get back to the many comments. I can only work on Arduino-related stuff in the morning and evening.
The open courses of action now are to modify the delay function to improve accuracy or to simply warn users about the delay function (a documentation patch).
I believe the best course of action is, unfortunately for we coders, to produce a better delay function. Most users of Arduino have a goal in mind when using the hardware/software and the goal is probably not to have to read the entire Atmel datasheet to determine whether their code will behave as expected.
For confirmation of the extreme level of detail required, I need only look at this thread where we've glibly tossed around some rather complicated ideas like probability distributions, function call overhead, interrupt timing, etc. BenF did a nice analysis showing that for the 8MHz devices the expected absolute error is -3/+2 ms. Even if we were to document this, I'm not sure this would help an average user. For example, given the specification of -3/+2 is it immediately obvious how to implement a precise 20 ms delay?
It seems like coders should figure out the answer to that question and solve it once in a library function so that average users can continue to use Arduino as a tool for their own needs. If they really wish to understand microcontrollers and machine language they can always do so, but it shouldn't be required.
To recap the technical situation, we measure time on the Arduino by counting ticks. My original bug filing was about issues in counting and whether we should be counting N or N+1 ticks. BenF pointed out that, in addition, the ticks themselves are not constant (usually 1, sometimes 2 at at a time). Thus, we have a two-fold problem.
Sometimes problems are more theoretical than practical and don't really need to be addressed. This might be the case for the inconsistent ticks if they manifested themselves only once a day or once a week. But, we're not that lucky. For every timer0 overflow 1024 usec pass on a 16MHz device. The time to overflow the fractional part of the millis counter is just 1000/24 or ~42 ms. This is happening all the time.
I'll suggest the standard engineers solution of divide and conquer. For small delays, use a scheme that doesn't depend on the millis counter; For large delays, the fractional error for the 16 MHz device is |1|/N and quickly becomes very small. So for large delays the existing routine is probably fine.
Here is one possibility derived from westfw's suggestion.
void new_delay(unsigned long ms)
{
while (ms--)
{
#if F_CPU >= 16000000L
delayMicroseconds(994);
#else
delayMicroseconds(TBD);
#endif
}
}
The value in delayMicroseconds was tuned using code I will post below. The fractional error for small delays is .033% which is very tight. This code is also smaller by 24 bytes than the existing delay function. Presumably this is because it doesn't need to instantiate another unsigned long to keep track of the millis() start value and the loop ending condition is simpler as well.
I know there is some concern about using tuned values, but it seems that there is little choice when trying to get extremely precise timings. The delayMicroseconds function already uses code tuned separately for 16 MHz and 8 MHz devices. I borrowed the construction with #ifdef F_CPU from that routine. Presumably, if delayMicroseconds() is calibrated for each release then delay() could be as well. To explore a change in underlying timings, imagine that function call overhead and busy loop timings do change by 2 us (32 instructions). The fractional error will change by 0.2%. This is still 25 times smaller than the current error of 5% for a small delay of 20 ms.
At least for small delays, this seems like a good solution. We can keep this code for large delays as well, but at some point the fractional error will exceed the absolute error of 1ms when using the millis timer. If accuracy is valued more than code size then we should revert to a millis() based method for long delays.
The crossover point is determined when the fractional errors are equal or .033% = (1/N)*100 which yields a crossover delay N of 3030. I tested this and indeed found that the measured delay at 3030 was 3030.917 ms or very nearly 1 ms in error.
There was some variability in the measured fractional error and the fractional error for the millis() approach is somewhat smaller ( (1-et)/N rather than 1/N). These considerations led me to switch over to the millis counter sooner at a value of 1500 ms.
The code below is nicely accurate, but now 70 bytes larger than the original function. I don't know whether that is acceptable or not. Alternative strategies are also welcome.
/* Test new delay subroutines */
#define NUM_TESTS 100
#define N_MS 20
/*
// Original from wiring.c:
void new_delay(unsigned long ms)
{
unsigned long start = millis();
while (millis() - start <= ms)
;
}
*/
/*
void new_delay(unsigned long ms)
{
while (ms--) {
#if F_CPU >= 16000000L
delayMicroseconds(994);
#else
delayMicroseconds(TBD);
#endif
}
}
*/
void new_delay(unsigned long ms)
{
if (ms < 1500) // Short delays use microseconds for timing
{
while (ms--)
{
#if F_CPU >= 16000000L
delayMicroseconds(994);
#else
delayMicroseconds(TBD);
#endif
}
} else // Long delays use milliseconds for timing
{
unsigned long start = millis();
while (millis() - start < ms)
;
}
}
void setup()
{
Serial.begin(115200);
}
void loop()
{
unsigned long i;
unsigned long tic, toc;
double avg_delay;
avg_delay = 0;
for (i=0; i<NUM_TESTS; i++)
{
tic = micros();
new_delay(N_MS);
toc = micros();
avg_delay += (toc - tic);
}
avg_delay -= 3.262*NUM_TESTS; // subtract overhead of tic & toc
// measured previously on 16 MHz device
avg_delay /= (1000.0 * NUM_TESTS);
Serial.print("delay(");Serial.print(N_MS);Serial.print(")=");
Serial.println(avg_delay, 3);
Serial.println("------------------------");
}