bug in wiring.c millis() fctn, Sc2Card.cpp

It looks to me like the micros()/millis() functions in wiring.c have a somewhat nasty bug
that could lead to hangs after about 50 days of uptime.

The problem is that the 32 bit overflow_count and timer0_millis that are incremented from the
overflow interrupt handler in wining.c can themselves overflow. Here is some 'bc -l' code for overflow_count:

  f_cpu = 16000000
  f_timer = f_cpu / 64
  f_overflow = f_timer / 256
  oos = 2^32 / f_overflow   
  ood = oos / (60*60*24)
  ood
  50.90331610074074074074

Where oos is the numbe of seconds until the overflow overflows, and ood the
same value in days

The timer0_millis value has essentially the same problem and overflows
at about the same rate (since f_overflow ~= 1 kHz).

This bug could probably manifest itself in many different codes that uses the
millis() fctn (or others in wiring.c that use the same counter). One example
is in Sd2Card.cpp, where the SD card interface could trigger a hang
in waitNotBusy(), since it just uses a a differential time measurement based on
millis() to figure out when a timeout should happen.

I could be wrong and the counter gets reset somewhere somehow, but I didn't
find any code that does it.

I think the best fix is to just use an explicit 64 bit type for the overflow
counter. This would generate slightly slower code, but nobody should be
depending no millis() or micros() to be that precise anyway.

Also, I don't see any point in doing any time computation math in the handler.
It saves a tiny bit (and therefore increases precision) in millis()
average case, but doesn't do anything for the worst case, since the interrupt
could trigger during a micros() call anyway and produce latency. It would be
better to follow the general design principle of keeping handlers maximally
simple and just do something like:

  ISR (TIMER0_OVF_vect)
  {
    // Note that we don't need to use an atomic block here, as we're inside
    // an ordinary ISR block, so interrupts are globally deferred anyway.
    timer0_overflow_count++;
  }

And then do all the computation in millis()/micros()/etc().

brittonkerin:
It looks to me like the micros()/millis() functions in wiring.c have a somewhat nasty bug that could lead to hangs after about 50 days of uptime.

What are you saying exactly? The documentation for millis() says:

This number will overflow (go back to zero), after approximately 50 days.

It is known that it will overflow. Why will that cause a hang?

This bug could probably manifest itself in many different codes that uses the millis() fctn (or others in wiring.c that use the same counter).

It is well-documented that if you calculate time differences by subtracting, the modulo-32 bit arithmetic will still give the right result.

One example is in Sd2Card.cpp, where the SD card interface could trigger a hang in waitNotBusy(), since it just uses a a differential time measurement based on millis() to figure out when a timeout should happen.

The waitNotBusy function does a subtraction, as discussed above.

sounds like this guy has an improvement to the code base,

It comes down to do we document things and leave, or do we document and implement improvements.

brittonkerin:
I think the best fix is to just use an explicit 64 bit type for the overflow
counter. This would generate slightly slower code, but nobody should be
depending no millis() or micros() to be that precise anyway.

An improvement which he admits would slow it down. And which he doesn't mention, generates lengthier code.

The fact is, that coded correctly, programs do not "hang after 50 days of uptime". That is pure speculation, and if it did occur, is because the program is not written correctly.

Let me show you something:

volatile unsigned long long foo;

void setup ()
  {
  foo = 22;
  foo ++;
  
  unsigned long long bar = foo;
  
  unsigned long long elapsed = foo - bar;
  }

void loop () {}

This simple program, which uses 64 bit integers, takes 718 bytes of program memory when compiled. Change "long long" to "long" and it takes 560 bytes. That's another 158 bytes for a couple of pieces of arithmetic. Multiply that out for a program that does a lot of timing and you are going to gobble up program memory pretty quickly.

Or, you could just do the arithmetic correctly for the unsigned long that millis() returns, and your program never hangs.

Ohhhhh

this does seem to be a little .

could we agree that the millis() function wraps,
after a time
and that it is documented.

was of dealing with this are to re code the millis function
or to deal with the feature in our own code.

There is no problem with rollover for timeouts. You can even use 16 bit variables for short times like this.

  uint16_t t0 = millis();
  // do something that takes up to a few seconds
  uint16_t t1 = millis();

   // print elapsed time
  Serial.println(t1 - t0);

Here is an example where rollover happens between t0 and t1.

uint16_t t0 = 0XFFFF;
uint16_t t1 = 1;

void setup() {
  Serial.begin(9600);
  Serial.println(t1 - t0);
}

void loop() {}

This sketch prints "2", the correct elapsed time.