 millis() / 16ms resolution?

hi!

i have a question about millis():

can we use

#if F_CPU <= 64UL * 256UL * 1000UL
timer0_millis += (64UL * 256UL) / (F_CPU / 1000);
timer0_clock_cycles += (64UL * 256UL) % (F_CPU / 1000);
if (timer0_clock_cycles >= F_CPU / 1000) {
timer0_clock_cycles -= F_CPU / 1000;
timer0_millis++;
}
#else

#endif

instead of the while loop in SIGNAL(SIG_OVERFLOW0)

btw: timer0_clock_cycles could be an int then…

did it already change since 0012?

possibly it would save some(?) microseconds (4 byte cmp, 4 byte sub, 2 byte cmp/sub instead of 4 byte cmp/sub)?

-arne

Arne, this looks interesting and seems worthy of further analysis. If nothing else, it teaches me that the wiring.c line

while (timer0_clock_cycles > clockCyclesPerMicrosecond() * 1000UL) {

should be changed to

while (timer0_clock_cycles >= clockCyclesPerMicrosecond() * 1000UL) {

I did a cursory test, and your code seems to produce the correct values and is MUCH faster (factor of 20!)

Mikal

HOOOOOO! :-)

wag tail

-Arne

if timer0_clock_cycles is an unsigned int and F_CPU is less than 32000UL*1000UL,

then we could do without the "#if / #else / #endif",

because: +=0 is optimized away (IIRC) and timer0_clock_cycles can't be greater than F_CPU/1000*2.

ohoh somehow the subject is a little bit misleading... :-) it should be "efficiency up from 20% to 30%...?" or so... (the movie "1984" was disturbing for little Arne...)

hey

i did some timing tests, too:

1. i called 20e6 times (the loop counted backwards from 20L*1000000L, because i believed that checking for >0 is an easier loop condition…) a copy of that signal handler function without cli(), so that millis() still worked… result: 137757msec

2. i called 20e6 times a further enhanced version… result: 69413msec (1.98 times faster - not 20 times though…)

Here is my newest proposal (diff 0012/…/wiring.c wiring.c-millis):

27c27,32 < volatile unsigned long timer0_clock_cycles = 0;

#if F_CPU > 32500L*1000L
typedef unsigned long t0cc_t;
#else
typedef unsigned int t0cc_t;
#endif
volatile t0cc_t timer0_clock_cycles = 0;
32,36c37,50
< // timer 0 prescale factor is 64 and the timer overflows at 256
< timer0_clock_cycles += 64UL * 256UL;
< while (timer0_clock_cycles > clockCyclesPerMicrosecond() * 1000UL) {
< timer0_clock_cycles -= clockCyclesPerMicrosecond() * 1000UL;
< timer0_millis++;

#if F_CPU % 1000 != 0
#warning F_CPU is no integer multiple of 1000 Hz
#endif
// clock cycles per milli second
#define CPM (F_CPU / 1000UL)
// timer 0 prescale factor is 64 and the timer overflows at 256
#define BF (64U * 256U)
register t0cc_t tmp = timer0_clock_cycles + BF % CPM;
if (tmp >= CPM) {
timer0_clock_cycles = tmp - CPM;
timer0_millis += BF / CPM + 1;
} else {
timer0_clock_cycles = tmp;
timer0_millis += BF / CPM;

bye

Hi Arne,

I have been playing some with your new overflow handler, and mostly have replicated your results. I’ve also validated it by showing that your function yields the same values for timer0_millis and timer0_clock_cycles as the 0012 handler when F_CPU is 16MHz, 8MHz, or even 32MHz.

The timings I get when doing 100 million loops are as follows:

F_CPU = 16000000
Old = 613163ms
Arne’s = 265139ms
2.31 speed improvement

F_CPU = 8000000
Old = 961640ms
Arne’s = 265593ms
3.62 speed improvement

F_CPU = 32000000
Old = 438924ms
Arne’s = 262063ms
1.67 speed improvement

The code I used to time and validate your results is as follows. Would you mind trying my sketch to see if you get the same results? If so, I’ll type up a summary and present it to the developer’s forum.

Mikal

volatile unsigned long old_timer0_clock_cycles = 0;
volatile unsigned long old_timer0_millis = 0;
void old_overflow()
{
old_timer0_clock_cycles += 64UL * 256UL;
while (old_timer0_clock_cycles > clockCyclesPerMicrosecond() * 1000UL) {
old_timer0_clock_cycles -= clockCyclesPerMicrosecond() * 1000UL;
old_timer0_millis++;
}
}

#if F_CPU > 32500L*1000L
typedef uint32_t t0cc_t;
#else
typedef uint16_t t0cc_t;
#endif
volatile t0cc_t arnes_timer0_clock_cycles = 0;
volatile unsigned long arnes_timer0_millis = 0;

#if F_CPU % 1000 != 0
#warning F_CPU is no integer multiple of 1000 Hz
#endif
// clock cycles per milli second
#define CPM (F_CPU / 1000UL)
// timer 0 prescale factor is 64 and the timer overflows at 256
#define BF (64U * 256U)
void arnes_overflow()
{
register t0cc_t tmp = arnes_timer0_clock_cycles + BF % CPM;
if (tmp > CPM) {
arnes_timer0_clock_cycles = tmp - CPM;
arnes_timer0_millis += BF / CPM + 1;
} else {
arnes_timer0_clock_cycles = tmp;
arnes_timer0_millis += BF / CPM;
}
}

void setup()
{
Serial.begin(9600);
Serial.println(“Hello!”);
unsigned long start = millis();
for (long i=100000000; i; --i)
old_overflow();
start = millis() - start;
Serial.print("The old overflow function took ");
Serial.println(start);
start = millis();
for (long i=100000000; i; --i)
arnes_overflow();
start = millis() - start;
Serial.print("Arne’s overflow function took ");
Serial.println(start);

old_timer0_clock_cycles = old_timer0_millis = 0;
arnes_timer0_clock_cycles = arnes_timer0_millis = 0;

for (unsigned long i=0; i<100000000; ++i)
{
if (old_timer0_clock_cycles != arnes_timer0_clock_cycles || old_timer0_millis != arnes_timer0_millis)
{
Serial.println(“Mismatch!”);
exit(1);
}
old_overflow(); arnes_overflow();
}
Serial.println(“Both functions calculated the same values OK!”);
}

void loop(){}

hi!

it compiled correctly:
Binary sketch size: 2800 bytes (of a 14336 byte maximum)

and this is its output:

Hello!
The old overflow function took 688788
Arne’s overflow function took 347066
Both functions calculated the same values OK!

looks good… is my arduino at 20MHz then?
doesnt fit to ur 16MHz values…

i have a new idea:
we could make a 2-byte inc operation (+1) in the signal handler (just when the counter overflows, we do all the work) and the time-consuming part (division by CPM, storing the remainder in another volatile variable) in millis()…:

#if F_CPU/1000 > 65535
typedef unsigned long t0c_t;
#else
typedef unsigned int t0c_t;
#endif
#define MOC ( (sizeof(timer0_overflow_counter)==2 ? 65535UL/(F_CPU/1000) : 1) * (F_CPU / 1000))
volatile t0c_t timer0_overflow_counter = MOC;
volatile t0c_t timer0_remainder_cycles = 0;
volatile unsigned long timer0_millis = 0;

SIGNAL(SIG_OVERFLOW0)
{
timer0_overflow_counter--;
if (timer0_overflow_counter == 0) {
timer0_overflow_counter = MOC;
timer0_millis += (((uint32_t)MOC)*64UL*256UL) / (F_CPU / 1000);
}
}

unsigned long millis()
{
register unsigned long m;
register uint8_t oldSREG = SREG;

// disable interrupts while we read timer0_millis or we might get an
// inconsistent value (e.g. in the middle of the timer0_millis++)
cli();
m = timer0_millis;
if (timer0_overflow_counter == MOC)
SREG = oldSREG;
else if (timer0_overflow_counter == MOC-1) {
register unsigned long tmp = (64UL * 256UL)%(F_CPU / 1000) + timer0_remainder_cycles;
timer0_overflow_counter = MOC;
if (tmp >= F_CPU / 1000) {
m += (64UL * 256UL) / (F_CPU / 1000) + 1;
timer0_millis = m;
SREG = oldSREG;
tmp -= F_CPU / 1000;
} else {
if ((64UL * 256UL) / (F_CPU / 1000) > 0) {
m += (64UL * 256UL) / (F_CPU / 1000);
timer0_millis = m;
}
SREG = oldSREG;
}
timer0_remainder_cycles = tmp;
} else {
register unsigned long tmp = (((unsigned long)(((t0c_t)MOC)-timer0_overflow_counter)) << 16) >> 2; // 64*256 == <<(6+8) == <<(16-2)
timer0_overflow_counter = MOC;
tmp += timer0_remainder_cycles;
register uint32_t d = tmp / (F_CPU / 1000);
m += d;
timer0_millis = m;
SREG = oldSREG;
timer0_remainder_cycles = tmp - d * (F_CPU / 1000);
// why no side effect from above div?
}

return m;
}

^^^ untested… & changed…

bye
arne

Arne, are you running on a Diecimila Arduino? I’m surprised that your numbers are different than mine. (The numbers I get are identical with every run.)

Mikal

yes, that is what the bill said, and what the board says...

maybe my avr-gcc is old?

-arne

I'm using the Arduino 0012 on Windows. You?

i'm using arduino 0012 on linux... gcc version 4.1.2 (Fedora 4.1.2-6.fc9)

how big is ur executable? mine was 2800bytes...

-arne

Mine is 3378 bytes (!), gcc 4.3.0 (the one that comes with Arduino-0012).

Either way, it's a very worthwhile speedup!

Mikal

small code is important, too… and on linux they dont give avr-gcc (i installed it separately)…

what do u say about my new idea?
that division thing…?

-arne

hey yo!

is 50% faster not good enough? i mean: http://svn.berlios.de/viewcvs/arduino/trunk/hardware/cores/arduino/wiring.c?view=markup still uses a "while" loop, which makes only sense for F_CPU near 128MHz... my code doesn't look much more complicated with that "register" trick...

my approach is similar to this older one: http://www.arduino.cc/cgi-bin/yabb2/YaBB.pl?num=1216294585;start=all (but no plagiarism, because i didn't know of it...)

bye

hm i want to correct this:

still uses a "while" loop, which makes only sense for F_CPU near 128MHz...

the while loop is never necessary and i doubt, if the compiler can "unroll" that loop...

-arne