realtime clock, microseconds, etc.

fyi, timer0_tics appears to be identical to the former timer0_overflow_count ::slight_smile:

#if F_CPU >= 16000000L

I understand what you're trying to do here but keep in mind that with -0s the avr-gcc compiler is going to replace multiplication by powers of two with left shifting so it isn't necessary to code it explicitly with shifting. Moreover, the result won't be correct for 20MHz because the multiplication factor should be 3.2 rather than 4.

I would propose the alternate implementation of cycles to microseconds shown below which handles CPU speeds that are factors of 64,000,000 as one case, handles 20MHz as a special case and reports an error at compile time otherwise.

Note that the code for 20MHz will be somewhat slower due to the divide-by-10 operation (in addition to one more shift cycle to perform the multiplication) and it will have a slightly smaller dynamic range. If desired, the calculation for 20MHz could include rounding by adding 5 to the result prior to dividing by 10.

#define F_CPU_MHZ   (F_CPU / 1000000L)
  unsigned long us;
#if ((64 / F_CPU_MHZ) * F_CPU_MHZ) == 64
  us = ((m << 8) + t) * (64 / F_CPU_MHZ);
#elif F_CPU_MHZ == 20
  us = (((m << 8) + t) * 32) / 10;
#else
  #error clock speed not supported
#endif
  return(us);
#undef F_CPU_MHZ

I'm ok with a special case for 20mhz as long as millis() and elapsedMillis() is fast.

delayMicroseconds is currently 8 or 16mhz fyi it is in wiring.c if you want to take a stab at figuring out how to make that 20mhz compatable.

Also might want to think about 1mhz compatibility a bit, I think the avr butterfly (runs on a button cell) is a really neat device and worthy of some consideration as well @ 1mhz.

Obviously it may not be practical to work with every possible frequency and get good performance though. I would generally assert that where microseconds is concerned, performance is also a concern.

@Don, one possible objection to your proposal for the 20MHz case is that micros() will not overflow at 32-bits because of the final division by 10, right?

@dcb: I'm playing (illicitly) with your code at work, and the more I see the more I like! I've been doing some experiments with consecutive calls to (your) micros() and it does indeed seem much, much faster on a 16MHz Arduino. And the wiring.c mods are really quite minimal, aren't they?

It doesn't seem like 1MHz support would be too hard because 1 divides 64.

@mellis: I measured 1 million deltas between consecutive calls to dcb's micros and got these results:

Delta | Count
0us   | 35.7%
4us   | 64.1%
12us  | 0.2%
16us  | 0.02%

Mikal

one possible objection to your proposal for the 20MHz case is that micros() will not overflow at 32-bits because of the final division by 10, right?

Quite so. However, if you don't multiply by 3.2 the result won't have units of microseconds. If, for example, you choose to multiply by 4 (like the 16MHz case) the return value will represent units of 0.8uS.

This issue is why I've favored a routine that returns Timer0 clock cycles instead of microseconds. It is quite simple to implement microsecond-based timing by looking for the equivalent number of Timer0 clock cycles at the prevailing CPU speed. This strategy has the added convenience of allowing either truncation or rounding behavior to be employed (when a difference exists) depending on which is better for a particular application.

However, if you don't multiply by 3.2 the result won't have units of microseconds.

I understand the dilemma. You're going to have to lose some bits either at the high end or the low end one way or another.

What about replacing (for 20MHz only)

  us = (((m << 8) + t) * 32) / 10;

with

  us = (((m << 8) + t) / 5) * 16;

With this expression, we'd lose a bit of resolution (because of doing the divide by 5 first), but still overflow at 32 bits (if my analysis is correct).

Mikal

delayMicroseconds is currently 8 or 16mhz fyi it is in wiring.c if you want to take a stab at figuring out how to make that 20mhz compatable.

I've modified my version so that it works correctly at 20MHz. As shown in the code below, an extra cycle is conditionally added to the delay loop for 20MHz. The code preparing for the delay is slightly different, too, to account for the faster speed. My measurements put it right on the button.

/* Delay for the given number of microseconds.  Assumes a 8, 16 or 20 MHz clock. 
 * Disables interrupts, which will disrupt the millis() function if used
 * too frequently. */
void delayMicroseconds(unsigned int us)
{
#define EXTRA_CYCLES
    uint8_t oldSREG;

    // calling avrlib's delay_us() function with low values (e.g. 1 or
    // 2 microseconds) gives delays longer than desired.
    //delay_us(us);

#if F_CPU >= 20000000L
    // for a 20 MHz clock add one extra cycle to the delay loop
#undef EXTRA_CYCLES
#define EXTRA_CYCLES    " nop" "\n\t" // 1 cycle

    // for a one-microsecond delay, simply return.  the overhead
    // of the function call yields a delay of approximately 0.9 us.
    if (--us == 0)
        return;

    // the loop below takes 0.25 microseconds (5 cycles)
    // per iteration, so execute it four times for each microsecond of
    // delay requested.
    us <<= 2;

    // partially compensate for the overhead of getting into and out of the loop
    us -= 3;

#elif F_CPU >= 16000000L
    // for the 16 MHz clock on most Arduino boards

    // for a one-microsecond delay, simply return.  the overhead
    // of the function call yields a delay of approximately 1 1/8 us.
    if (--us == 0)
        return;

    // the following loop takes a quarter of a microsecond (4 cycles)
    // per iteration, so execute it four times for each microsecond of
    // delay requested.
    us <<= 2;

    // account for the time taken in the preceeding commands.
    us -= 2;
#else
    // for the 8 MHz internal clock on the ATmega168

    // for a one- or two-microsecond delay, simply return.  the overhead of
    // the function calls takes more than two microseconds.  can't just
    // subtract two, since us is unsigned; we'd overflow.
    if (--us == 0)
        return;
    if (--us == 0)
        return;

    // the following loop takes half of a microsecond (4 cycles)
    // per iteration, so execute it twice for each microsecond of
    // delay requested.
    us <<= 1;
    
    // partially compensate for the time taken by the preceeding commands.
    // we can't subtract any more than this or we'd overflow w/ small delays.
    us--;
#endif

    // disable interrupts, otherwise the timer 0 overflow interrupt that
    // tracks milliseconds will make us delay longer than we want.
    oldSREG = SREG;
    cli();

    // busy wait
    __asm__ __volatile__ (
        "1: sbiw %0,1" "\n\t" // 2 cycles
        EXTRA_CYCLES
        "brne 1b" : "=w" (us) : "0" (us) // 2 cycles
    );

    // reenable interrupts.
    SREG = oldSREG;
}

With this expression, we'd lose a bit of resolution (because of doing the divide by 5 first), but still overflow at 32 bits (if my analysis is correct).

I'm not sure that I understand what you mean about overflowing at 32 bits. I believe that you either lose range, resolution or both. With the scaling factors that you mentioned the maximum value of ((m << 8) + t) is 0x4FFFFFFB as compared to 0xFFFFFFFF in the other cases. While it is true that the maximum return value will be 0xFFFFFFFF you still need to know the range in order to compute elapsed time when the second data point has a lower value than the first, i.e., when the value has wrapped.

Besides that, having micros() return a value with different units also causes portability issues that may be more bothersome than the inherent difference in range.

I reiterate my support for hpticks() (returning Timer0 ticks) because it avoids these problems entirely and, as I indicated earlier, it is easy to convert the result to microseconds using the CPU speed if desired.

Ok, I haven't absorbed the preceeding entirely yet, but wondered when the last time someone tried avrlib's delay_us() function was?

And is it long becasue we call it from a function? In that case a #define delayMicroSeconds(us) delay_us(us)

might fix it?

I [...] wondered when the last time someone tried avrlib's delay_us() function was?

The delay_us() function, which takes a floating point parameter, works great if the parameter to it is a compile-time constant. In that case, the compiler does the real math and produces integral values to use in the inlined code. If the parameter is not constant at compile time, your program grows by a large amount due to the floating point math having to be linked in.

I think the ticks() function is nice, but I think it may be too confusing to include in the core. A tick can be hard to explain, especially because it varies based on the cpu speed. We'd probably want to include a microsToTicks() functions or something, which also complicates things. Again, this would be a great function to have a nice implementation of on the playground: Arduino Playground - GeneralCodeLibrary

For the micros() function, is it reasonable to simply count microseconds in the overflow handler? Or is there another way to avoid overflowing at a weird value? Especially with micros() that will overflow relatively quickly, I think it's important that people can just do a simple subtraction.

For the micros() function, is it reasonable to simply count microseconds in the overflow handler?

Unfortunately, the same problems exist at that level. Each time the overflow handler executes represents 256 ticks or 64 * 256 CPU cycles. Attempting to convert either of these quantities to an integral microseconds value will involve the same issues for 20MHz CPU frequency as it does in any of the other methods.

You could implement two accumulators - one numerator and one denominator and then do the division when micros() is called. Here again though, you'd have the same problems of range or resolution.

"I reiterate my support for hpticks()"

I think this is where I'm at, can we bring back timer0_overflow_count++ in SIGNAL(SIG_OVERFLOW0)?

It looks like things were optimized for millis, but unintentionally at the expense of the ability to track microseconds efficiently. There is probably enough bandwidth to do both in the timer0 interrupt.

Then the following function would work pretty well for 8 and 16mhz:

unsigned long micros()
{
  unsigned long m, t;
  uint8_t oldSREG = SREG;
  cli();
  t = TCNT0;
  if ((TIFR0 & _BV(TOV0)) && (t == 0))
    t = 256;
  m = timer0_overflow_count;
  SREG = oldSREG;
#if F_CPU >= 16000000L
  return ((m << 8) + t) <<2;
#else
  return ((m << 8) + t) <<3;
#endif  
  
}

re: 20mhz, can we address that after we get a working micros() for the target 8 and 16mhz chips? I would like to checkmark at an agreeable solution for 8 and 16mhz if that is ok.

A tick can be hard to explain, especially because it varies based on the cpu speed.

I'm not sure that it is necessary to do so but it is simple enough to say that a tick is 64/F_CPU seconds. It is more important to point out that a given value returned by hpticks() is not particularly useful. What is useful is the difference between two returned values, representing an elapsed time.

The function below returns the number of microseconds between two tick values. For CPU frequencies that are a factor of 64,000,000 the round vs. truncation issue is moot and the third parameter is ignored. For other situations, the third parameter is used to produce the desired effect.

unsigned long elapsedMicroseconds(unsigned long ticks0, unsigned long ticks1, bool round = false);
unsigned long
elapsedMicroseconds(unsigned long ticks0, unsigned long ticks1, bool round)
{
  #define F_CPU_MHZ (F_CPU / 1000000L)
  #define PRESCALER 64
  unsigned long us;
#if (((PRESCALER / F_CPU_MHZ) * F_CPU_MHZ) == PRESCALER)
  // exact result, no rounding factor needs to be applied
  us = (ticks1 - ticks0) * (PRESCALER / F_CPU_MHZ);
#elif (F_CPU_MHZ == 20)
  if (round)
    // result rounded to the nearest microsecond
    us = (((ticks1 - ticks0) * 32) + 5) / 10;
  else
    // result truncated
    us = ((ticks1 - ticks0) * 16) / 5;
#else
  #error CPU frequency not supported
#endif
  return(us);
  #undef PRESCALER
  #undef F_CPU_MHZ
}

Don, awesome work as usual :slight_smile:

FYI, I started a seperate thread on mhz discussions, I think it is a necessary discussion but beyond the scope of this thread, as David pointed out earlier.

http://www.arduino.cc/cgi-bin/yabb2/YaBB.pl?num=1226451895/0#0

If we are happy with keeping track of a tick every 1024 microseconds in the interrupt handler and a fast working millis() then lets lock in that progress.

I'm not sure that I understand what you mean about overflowing at 32 bits. I believe that you either lose range, resolution or both. With the scaling factors that you mentioned the maximum value of ((m << 8) + t) is 0x4FFFFFFB as compared to 0xFFFFFFFF in the other cases. While it is true that the maximum return value will be 0xFFFFFFFF you still need to know the range in order to compute elapsed time when the second data point has a lower value than the first, i.e., when the value has wrapped.

Don, I guess all I was trying to argue was that scaling (for the 20MHz case) using y = (x / 5) * 16 is superior to y = (x * 16) / 5 IF you are interested (as David is) in making sure the values of y are evenly distributed throughout the entire range of 32-bit values. Yes, you lose some resolution, but you gain the ability to compute time deltas by simply subtracting them.

The expression (((m << 8) + t) / 5) * 16) overflows at 0xFFFFFFF0 with resolution 16. Meanwhile, the expression (((m << 8) + t) * 16) / 5), which is roughly equivalent otherwise, overflows at 0x19999996 (albeit with better resolution). They both, more or less, represent microseconds elapsed.

Here's a brief summary of the the tradeoffs between overflow and resolution:

A = 16MHz algorithm
B = 8MHz algorithm
C = 20MHz algorithm with y = x * 16 / 5
D = 20MHz algorithm with y = x / 5 * 16

Scheme | Overflow | Resolution (us)
-----------------------------------
   A    | FFFFFFFC | 4
   B    | FFFFFFF8 | 8
   C    | 19999996 | ~16/5
   D    | FFFFFFF0 | 16

I vote for A, B, and D. :slight_smile: D could easily be applied to elapsedMicroseconds. Just do the division first and then the multiplication.

I like hpticks() a lot too, by the way. I think I would use it a bunch. Thanks!

Thoughts, anyone?

Mikal

PS: Nice work on "round". :wink:

I was trying to argue was that scaling (for the 20MHz case) using y = (x / 5) * 16 is superior to y = (x * 16) / 5 IF you are interested (as David is) in making sure the values of y are evenly distributed throughout the entire range of 32-bit values.

Perhaps I'm missing something. I don't see how you arrived at the overflow value for y = (x / 5) * 16, nor can I substantiate the claim that the values are evenly distributed over the 32-bit value range. In particular, given that the range of values for x is 0 to 0xffffffff, after applying the conversion function y = (x / 5) * 16, the range of values for y is 0 to 0x33333330, occupying less than one fourth of the range of a 32-bit value.

[From dcb] I would like to checkmark at an agreeable solution for 8 and 16mhz if that is ok.

dcb, I think Don's code from post #35 IS your solution, with the 8 and 16 MHz cases rolled into one. For what it's worth I give these two an enthusiastic CHECK. :wink:

Mikal

Perhaps I'm missing something. I don't see how you arrived at the overflow value for y = (x / 5) * 16, nor can I substantiate the claim that the values are evenly distributed over the 32-bit value range. In particular, given that the range of values for x is 0 to 0xffffffff, after applying the conversion function y = (x / 5) * 16, the range of values for y is 0 to 0x33333330, occupying less than one fourth of the range of a 32-bit value.

Don, the error is in assuming that the maximum value for f(x) = (x / 5) * 16 occurs when x = 0xFFFFFFFF. There are several values for x where the rollover occurs, but 0xFFFFFFFF is not one of them. For example
f(0x4FFFFFFF) = 0xFFFFFFF0
and
f(0x50000000) = 0x0

This table should clarify things, both in terms of overflow and resolution. All values are in hex, and show that the range for y is indeed 0 to 0xFFFFFFFF and evenly spaced.

x        | (x/5)*16 | (x*16)/5
0        | 0        | 0
1        | 0        | 3
2        | 0        | 6
3        | 0        | 9
4        | 0        | C
5        | 10       | 10
6        | 10       | 13
...
60       | 130      | 133
61       | 130      | 136
...
7FFFFF0  | 19999960 | 19999966
7FFFFF1  | 19999960 | 19999969
7FFFFF2  | 19999960 | 1999996C
7FFFFF3  | 19999970 | 19999970
7FFFFF4  | 19999970 | 19999973
7FFFFF5  | 19999970 | 19999976
7FFFFF6  | 19999970 | 19999979
7FFFFF7  | 19999970 | 1999997C
7FFFFF8  | 19999980 | 19999980
7FFFFF9  | 19999980 | 19999983
7FFFFFA  | 19999980 | 19999986
7FFFFFB  | 19999980 | 19999989
7FFFFFC  | 19999980 | 1999998C
7FFFFFD  | 19999990 | 19999990
7FFFFFE  | 19999990 | 19999993
7FFFFFF  | 19999990 | 19999996
8000000  | 19999990 | 0
8000001  | 19999990 | 3
...
4FFFFFF0 | FFFFFFC0 | 19999966
4FFFFFF1 | FFFFFFD0 | 19999969
4FFFFFF2 | FFFFFFD0 | 1999996C
4FFFFFF3 | FFFFFFD0 | 19999970
4FFFFFF4 | FFFFFFD0 | 19999973
4FFFFFF5 | FFFFFFD0 | 19999976
4FFFFFF6 | FFFFFFE0 | 19999979
4FFFFFF7 | FFFFFFE0 | 1999997C
4FFFFFF8 | FFFFFFE0 | 19999980
4FFFFFF9 | FFFFFFE0 | 19999983
4FFFFFFA | FFFFFFE0 | 19999986
4FFFFFFB | FFFFFFF0 | 19999989
4FFFFFFC | FFFFFFF0 | 1999998C
4FFFFFFD | FFFFFFF0 | 19999990 
4FFFFFFE | FFFFFFF0 | 19999993
4FFFFFFF | FFFFFFF0 | 19999996
50000000 | 0        | 0
50000001 | 0        | 3
...

Does that make sense?

Mikal

EDIT: I just realized the fatal flaw in my proposal is that when x itself overflows, there is a discontinuity in f(x).
** **FFFFFFFF | 33333330 | 19999996 0        | 0 (!)    | 0** **

Sorry to waste everyone's time on this. The discontinuity obviously defeats my goal of being able to calculate delta = time2 - time1. :-[

M

No sweat Mikal :slight_smile:

Ok, final proposal for micros (leaving the 20mhz can of worms out of it)

Mellis, if you agree:
update wiring.c

add a global variable:
volatile unsigned long timer0_tics = 0;

add
timer0_tics++;
to top of SIGNAL(TIMER0_OVF_vect), leave rest where it is.

add micros function (plus prototype in wiring.h):

unsigned long micros(){
  unsigned long m, t;
  uint8_t oldSREG = SREG;
  cli();
  t = TCNT0;
  if ((TIFR0 & _BV(TOV0)) && (t == 0))
    t = 256;
  m = timer0_tics;
  SREG = oldSREG;
#if ((64 / clockCyclesPerMicrosecond()) * clockCyclesPerMicrosecond()) == 64
  return ((m << 8) + t) * (64 / clockCyclesPerMicrosecond());
#else
  #error clock speed not supported
#endif  
}