I was thinking about how we might make the SIG_OVERFLOW0 routine slightly more efficient.
For 16MHz Arduinos, using the 64 prescale and 256 counts gets us close to 1 ms per overflow (actually 1024us). However, we do not need to use an unsigned long for the calulation, if we reduce the respective values by their gcd. So, instead of using 64256=16384 and 16100=16000, we could use 256 and 250 instead.
This has the benefit of only requiring a BYTE accumulator, instead of a long.
Here is the proposed code:
volatile unsigned byte timer0_period_count = 0; // accumulator: 1 byte
volatile unsigned long timer0_millis = 0; // millis stays at unsigned long
SIGNAL(SIG_OVERFLOW0) {
// timer 0 prescale factor is 64 and the timer overflows at 256
// 64*256=16384, at 16MHz this is 1024us, 1ms=16000 cycles, the common denominator is 128
// 64*256/128=128, and 16000/128=125, so 128/125 = 1 with a remainder of 3.
timer0_period_count += 3; // This is equivalent to +128-125, so ..
timer0_millis++; // .. we are garanteed 1 ms plus a bit, so count it.
if (timer0_clock_cycles > 125) { // But, there might be another millisecond, if so ..
timer0_period_count -= 125; // .. correct the count, and ..
timer0_millis++; // .. count the millisecond.
}
}
Phase Correct:
If we changed to phase correct, then we can do similar with a gcc of 40:
volatile unsigned byte timer0_period_count = 0; // accumulator: 1 byte
volatile unsigned long timer0_millis = 0; // millis stays at unsigned long
SIGNAL(SIG_OVERFLOW0) {
// timer 0 prescale factor is 64 and the timer overflows at 510
// 64*510=32640, at 16MHz this is 2040us, 1ms=16000 cycles, the common denominator is 40
// 64*510/40=51, and 16000/40=50, so 51/50 = 1 with a remainder of 1.
timer0_period_count ++; // This is equivalent to +51-50, so ..
if(timer0_period_count <= 50) timer0_millis+=2; // Usually step by 2 ms, but...
else { // ... nned to correct every 50 times
timer0_period_count = 1; // Will only be over by one, so correct the count ...
timer0_millis+=4; // ... and increment by 4 ms
}
}
This is fun. If we used a 8MHz clock, then would interrupt every 4080us, with a gcc of 80, which gives 51 and 50, again. So, the code is almost identical, except we increment millis by 4 or 8.
8MHz Phase Correct with prescaler=8:
For 8MHz, we might opt for prescaler of 8, which would result in interruts every 510us, with a gcc of 10 and again gives 51 and 50. The code would change to:
volatile unsigned byte timer0_period_count = 0; // accumulator: 1 byte
volatile unsigned long timer0_millis = 0; // millis stays at unsigned long
SIGNAL(SIG_OVERFLOW0) {
// timer 0 prescale factor is 8 and the timer overflows at 510
// 8*510=4080, at 8MHz this is 510us, the common denominator with 500 is 10
// 8*510/10=51, and 500/10=50, so 51/50 = 1 with a remainder of 1.
timer0_period_count ++; // This is equivalent to +51-50, so ..
// Usually no step, but...
if(timer0_period_count >= 100) // ... need to correct every 2 times
timer0_period_count -= 100; // Correct the count ...
timer0_millis++; // ... and increment millis
} // ... and an extra one every 50
}
Food for thought? :o
Smarter people than I will be needed to generalize it. 
David