millis() / 16ms resolution?

hi!

i have a question about millis():

can we use

#if F_CPU <= 64UL * 256UL * 1000UL
timer0_millis += (64UL * 256UL) / (F_CPU / 1000);
timer0_clock_cycles += (64UL * 256UL) % (F_CPU / 1000);
if (timer0_clock_cycles >= F_CPU / 1000) {
timer0_clock_cycles -= F_CPU / 1000;
timer0_millis++;
}
#else
...
#endif

instead of the while loop in SIGNAL(SIG_OVERFLOW0)

btw: timer0_clock_cycles could be an int then...

did it already change since 0012?

possibly it would save some(?) microseconds (4 byte cmp, 4 byte sub, 2 byte cmp/sub instead of 4 byte cmp/sub)?

-arne

Arne, this looks interesting and seems worthy of further analysis. If nothing else, it teaches me that the wiring.c line

      while (timer0_clock_cycles > clockCyclesPerMicrosecond() * 1000UL) {

should be changed to

      while (timer0_clock_cycles >= clockCyclesPerMicrosecond() * 1000UL) {

I did a cursory test, and your code seems to produce the correct values and is MUCH faster (factor of 20!)

Mikal

HOOOOOO! :slight_smile:

wag tail

-Arne

if
timer0_clock_cycles is an unsigned int
and
F_CPU is less than 32000UL*1000UL,

then we could do without the "#if / #else / #endif",

because:
+=0 is optimized away (IIRC)
and
timer0_clock_cycles can't be greater than F_CPU/1000*2.

ohoh
somehow the subject is a little bit misleading... :slight_smile:
it should be "efficiency up from 20% to 30%...?" or so... (the movie "1984" was disturbing for little Arne...)

hey

i did some timing tests, too:

  1. i called 20e6 times (the loop counted backwards from 20L*1000000L, because i believed that checking for >0 is an easier loop condition...) a copy of that signal handler function without cli(), so that millis() still worked... result: 137757msec

  2. i called 20e6 times a further enhanced version... result: 69413msec (1.98 times faster - not 20 times though...)

Here is my newest proposal (diff 0012/.../wiring.c wiring.c-millis):

27c27,32
< volatile unsigned long timer0_clock_cycles = 0;

#if F_CPU > 32500L*1000L
typedef unsigned long t0cc_t;
#else
typedef unsigned int t0cc_t;
#endif
volatile t0cc_t timer0_clock_cycles = 0;
32,36c37,50
< // timer 0 prescale factor is 64 and the timer overflows at 256
< timer0_clock_cycles += 64UL * 256UL;
< while (timer0_clock_cycles > clockCyclesPerMicrosecond() * 1000UL) {
< timer0_clock_cycles -= clockCyclesPerMicrosecond() * 1000UL;
< timer0_millis++;


#if F_CPU % 1000 != 0
#warning F_CPU is no integer multiple of 1000 Hz
#endif
// clock cycles per milli second
#define CPM (F_CPU / 1000UL)
// timer 0 prescale factor is 64 and the timer overflows at 256
#define BF (64U * 256U)
register t0cc_t tmp = timer0_clock_cycles + BF % CPM;
if (tmp >= CPM) {
timer0_clock_cycles = tmp - CPM;
timer0_millis += BF / CPM + 1;
} else {
timer0_clock_cycles = tmp;
timer0_millis += BF / CPM;

bye

Hi Arne,

I have been playing some with your new overflow handler, and mostly have replicated your results. I've also validated it by showing that your function yields the same values for timer0_millis and timer0_clock_cycles as the 0012 handler when F_CPU is 16MHz, 8MHz, or even 32MHz.

The timings I get when doing 100 million loops are as follows:

F_CPU = 16000000
Old = 613163ms
Arne's = 265139ms
2.31 speed improvement

F_CPU = 8000000
Old = 961640ms
Arne's = 265593ms
3.62 speed improvement

F_CPU = 32000000
Old = 438924ms
Arne's = 262063ms
1.67 speed improvement

The code I used to time and validate your results is as follows. Would you mind trying my sketch to see if you get the same results? If so, I'll type up a summary and present it to the developer's forum.

Mikal

volatile unsigned long old_timer0_clock_cycles = 0;
volatile unsigned long old_timer0_millis = 0;
void old_overflow()
{
old_timer0_clock_cycles += 64UL * 256UL;
while (old_timer0_clock_cycles > clockCyclesPerMicrosecond() * 1000UL) {
old_timer0_clock_cycles -= clockCyclesPerMicrosecond() * 1000UL;
old_timer0_millis++;
}
}

#if F_CPU > 32500L*1000L
typedef uint32_t t0cc_t;
#else
typedef uint16_t t0cc_t;
#endif
volatile t0cc_t arnes_timer0_clock_cycles = 0;
volatile unsigned long arnes_timer0_millis = 0;

#if F_CPU % 1000 != 0
#warning F_CPU is no integer multiple of 1000 Hz
#endif
// clock cycles per milli second
#define CPM (F_CPU / 1000UL)
// timer 0 prescale factor is 64 and the timer overflows at 256
#define BF (64U * 256U)
void arnes_overflow()
{
register t0cc_t tmp = arnes_timer0_clock_cycles + BF % CPM;
if (tmp > CPM) {
arnes_timer0_clock_cycles = tmp - CPM;
arnes_timer0_millis += BF / CPM + 1;
} else {
arnes_timer0_clock_cycles = tmp;
arnes_timer0_millis += BF / CPM;
}
}

void setup()
{
Serial.begin(9600);
Serial.println("Hello!");
unsigned long start = millis();
for (long i=100000000; i; --i)
old_overflow();
start = millis() - start;
Serial.print("The old overflow function took ");
Serial.println(start);
start = millis();
for (long i=100000000; i; --i)
arnes_overflow();
start = millis() - start;
Serial.print("Arne's overflow function took ");
Serial.println(start);

old_timer0_clock_cycles = old_timer0_millis = 0;
arnes_timer0_clock_cycles = arnes_timer0_millis = 0;

for (unsigned long i=0; i<100000000; ++i)
{
if (old_timer0_clock_cycles != arnes_timer0_clock_cycles || old_timer0_millis != arnes_timer0_millis)
{
Serial.println("Mismatch!");
exit(1);
}
old_overflow(); arnes_overflow();
}
Serial.println("Both functions calculated the same values OK!");
}

void loop(){}

hi!

it compiled correctly:
Binary sketch size: 2800 bytes (of a 14336 byte maximum)

and this is its output:

Hello!
The old overflow function took 688788
Arne's overflow function took 347066
Both functions calculated the same values OK!

looks good... :slight_smile:
is my arduino at 20MHz then?
doesnt fit to ur 16MHz values...

i have a new idea:
we could make a 2-byte inc operation (+1) in the signal handler (just when the counter overflows, we do all the work) and the time-consuming part (division by CPM, storing the remainder in another volatile variable) in millis()...:

#if F_CPU/1000 > 65535
typedef unsigned long t0c_t;
#else
typedef unsigned int t0c_t;
#endif
#define MOC ( (sizeof(timer0_overflow_counter)==2 ? 65535UL/(F_CPU/1000) : 1) * (F_CPU / 1000))
volatile t0c_t timer0_overflow_counter = MOC;
volatile t0c_t timer0_remainder_cycles = 0;
volatile unsigned long timer0_millis = 0;

SIGNAL(SIG_OVERFLOW0)
{
   timer0_overflow_counter--;
   if (timer0_overflow_counter == 0) {
      timer0_overflow_counter = MOC;
      timer0_millis += (((uint32_t)MOC)*64UL*256UL) / (F_CPU / 1000);
   }
}

unsigned long millis()
{
   register unsigned long m;
   register uint8_t oldSREG = SREG;
  
   // disable interrupts while we read timer0_millis or we might get an
   // inconsistent value (e.g. in the middle of the timer0_millis++)
   cli();
   m = timer0_millis;
   if (timer0_overflow_counter == MOC)
      SREG = oldSREG;
   else if (timer0_overflow_counter == MOC-1) {
      register unsigned long tmp = (64UL * 256UL)%(F_CPU / 1000) + timer0_remainder_cycles;
      timer0_overflow_counter = MOC;
      if (tmp >= F_CPU / 1000) {
         m += (64UL * 256UL) / (F_CPU / 1000) + 1;
         timer0_millis = m;
         SREG = oldSREG;
       tmp -= F_CPU / 1000;
     } else {
        if ((64UL * 256UL) / (F_CPU / 1000) > 0) {
           m += (64UL * 256UL) / (F_CPU / 1000);
           timer0_millis = m;
        }
        SREG = oldSREG;
     }
     timer0_remainder_cycles = tmp;
   } else {
      register unsigned long tmp = (((unsigned long)(((t0c_t)MOC)-timer0_overflow_counter)) << 16) >> 2; // 64*256 == <<(6+8) == <<(16-2)
      timer0_overflow_counter = MOC;
      tmp += timer0_remainder_cycles;
      register uint32_t d = tmp / (F_CPU / 1000);
      m += d;
      timer0_millis = m;
      SREG = oldSREG;
      timer0_remainder_cycles = tmp - d * (F_CPU / 1000);
        // why no side effect from above div?
   }

   return m;
}

^^^ untested... :smiley: & changed...

bye
arne

Arne, are you running on a Diecimila Arduino? I'm surprised that your numbers are different than mine. (The numbers I get are identical with every run.)

Mikal

yes, that is what the bill said, and what the board says...

maybe my avr-gcc is old?

-arne

I'm using the Arduino 0012 on Windows. You?

i'm using arduino 0012 on linux...
gcc version 4.1.2 (Fedora 4.1.2-6.fc9)

how big is ur executable?
mine was 2800bytes...

-arne

Mine is 3378 bytes (!), gcc 4.3.0 (the one that comes with Arduino-0012).

Either way, it's a very worthwhile speedup!

Mikal

small code is important, too... :slight_smile:
and on linux they dont give avr-gcc (i installed it separately)...

what do u say about my new idea?
that division thing...?

-arne

hey yo!

is 50% faster not good enough?
i mean:

still uses a "while" loop, which makes only sense for F_CPU near 128MHz...
my code doesn't look much more complicated with that "register" trick...

my approach is similar to this older one:
http://www.arduino.cc/cgi-bin/yabb2/YaBB.pl?num=1216294585;start=all
(but no plagiarism, because i didn't know of it...)

bye

hm
i want to correct this:

still uses a "while" loop, which makes only sense for F_CPU near 128MHz...

the while loop is never necessary and i doubt, if the compiler can "unroll" that loop...

-arne