millis() function bug and improvement

This is partly a bug report, but mainly (I hope) an improvement in the millis() function resolution that can increase the millis resolution in a factor of 128, or from 9 hours to 50 days.

I came to this when wondering why does the millis() function only works up to 9 hours, and after analyzing the code from wiring.c I got to the conclusion that the formula timer0_overflow_count * 64UL * 2UL / (F_CPU / 128000UL) truncates the unsigned long when multiplying by 128, but because we were working with integers we were forced to this.

But then it occurred to me that I could think in the opposite direction, taking advantage in the fact that we were multiplying by factors of two. So, instead of simplifying the formula, I expanded it like this:

timer0_overflow_count * 64UL * 2UL * 128UL / (F_CPU / 1000UL);
timer0_overflow_count * 16384UL / (F_CPU / 1000UL);
timer0_overflow_count * 65536UL / (F_CPU / 1000UL * 4UL);
timer0_overflow_count * 65536UL / (F_CPU / 250UL);
(timer0_overflow_count / (F_CPU / 250) + (timer0_overflow_count % (F_CPU / 250)) / (F_CPU / 250)) * 65536

Ok, but I'll get to the point: From this formula, and some bit shifting, I came to this function:

#define DIVISOR (unsigned long)(F_CPU / 250UL)
unsigned long hi_millis() {
** return (unsigned long)((timer0_overflow_count / DIVISOR) << 16) + (unsigned long)(((timer0_overflow_count % DIVISOR) << 16) / DIVISOR);**
}

And then, I thought I could even add some rounding:

unsigned long hi_millis() {
** return (unsigned long)((timer0_overflow_count / DIVISOR) << 16) + (unsigned long)((((timer0_overflow_count % DIVISOR) << 16) + DIVISOR / 2UL) / DIVISOR);**
}

I also created a simple program to compare the different values generated by the millis() function, this hi_millis() function and the floating point value that would be calculated by the original formula. I think you will like the values. Of course I didn't use the whole unsigned long range, but skipped to the vecinity where the millis() function wraps.

But I mentioned at the beginning that it was partly a bug report. Well, the widely used delay() is defined like this:

void delay(unsigned long ms) {
** unsigned long start = millis();**
** while (millis() - start < ms) ;**
}

If you analyze this function it seems to work well if the millis() function went the whole unsigned long range. But it doesn't. The function wraps around 34359737 so, if you happen to call the function with a value of, lets say 60000 (one minute), and the start value is 34359730, after seven milliseconds the function wraps and millis() - start becomes 4260607566, which is clearly not less than 60000, so the function, instead of waiting one minute, waits only for 7 milliseconds. The good part is that this only can happen once every nine hours aprox. The bad part is that Murphy's Laws are omnipresent and this will happen when it can do the most damage.

By using the suggested function, you still have this risk because the value of millis wraps two times (because the timer0_overflow_count wraps also, some time later. I think this is inevitable, but with a resolution of 50 days, I think it's much more less probable.

One more thing: I think that, by using some assembler we could avoid having to divide once for the first division and then again for the modulo. At least in the 80x86 assembler you get both the division and reminder (modulo) in one operation, but I am completelly ignorant of assembler for the Arduino. This way the function could become as fast as the original.

Well, that's it. I would love to hear some comments, and would be delighted to be of any help. I'm attaching the source code for the test.

#include <iostream>
#include <iomanip.h>

using namespace std;

#define F_CPU 16000000

// Function millis as defined in \arduino-0010\hardware\cores\arduino\wiring.c

volatile unsigned long timer0_overflow_count;

unsigned long millis() {
      // timer 0 increments every 64 cycles, and overflows when it reaches
      // 256.  we would calculate the total number of clock cycles, then
      // divide by the number of clock cycles per millisecond, but this
      // overflows too often.
      //return timer0_overflow_count * 64UL * 256UL / (F_CPU / 1000UL);

      // instead find 1/128th the number of clock cycles and divide by
      // 1/128th the number of clock cycles per millisecond
      return timer0_overflow_count * 64UL * 2UL / (F_CPU / 128000UL);

}

#define DIVISOR (unsigned long)(F_CPU / 250UL)

unsigned long hi_millis() {

// From timer0_overflow_count * 64UL * 2UL / (F_CPU / 128000UL), you can get: timer0_overflow_count * 65536UL / (F_CPU / 250UL);
// which, with some bit shifting, we can write as (we even can get some rounding by adding DIVISOR / 2):
  return (unsigned long)((timer0_overflow_count / DIVISOR) << 16) + (unsigned long)((((timer0_overflow_count % DIVISOR) << 16) + DIVISOR / 2UL) / DIVISOR);
}

int main (int argc, char *argv[]) {
    
    char buffer[1024];
    unsigned long M1,M2;
    
    timer0_overflow_count = 0xFFFFFFFF / 128 - 10;
    do {

      timer0_overflow_count++;
      M1 = millis();
      M2 = hi_millis();

      cout << "timer0=" << timer0_overflow_count << ", millis=" << M1 << ", hi_millis=" << M2 << ", float=" << fixed << setprecision(2) << timer0_overflow_count * 64.0 * 2.0 / (F_CPU / 128000.0) << ")" << endl;
      if (timer0_overflow_count == 0) break;
      
      if(M1 == 0) {
//      system("PAUSE");
        cout  << endl << "---- millis Rollover ----" << endl;
        cout << endl << "increment timer0_overflow_count by " << 0xFFFFFFFF / 128 - 10 << " (about 9 hrs)" << endl;
        timer0_overflow_count += 0xFFFFFFFF / 128 - 10;
      }

      if (M2 == 0) {
        cout  << endl <<  "============================" << endl << "==== hi_millis Rollover ====" << endl << "============================" << endl << endl;
        system("PAUSE");
      }

    } while (timer0_overflow_count > 0);

    cout  << endl << "**** timer0_overflow_count Rollover ****" << endl;

    system("PAUSE");
    return EXIT_SUCCESS;
}