Go Down

Topic: realtime clock, microseconds, etc. (Read 10602 times) previous topic - next topic

dcb

updated micros, passes all test thus far:

Code: [Select]

unsigned long micros()
{
 unsigned long m, t;
 uint8_t oldSREG = SREG;
 cli();
 t = TCNT0;
 if ((TIFR0 & _BV(TOV0)) && (t == 0))
   t = 256;
 m = timer0_tics;
 SREG = oldSREG;
#if F_CPU >= 16000000L
 return ((m << 8) + t) <<2;
#else
 return ((m << 8) + t) <<3;
#endif  
 
}

mikalhart

#31
Nov 11, 2008, 03:25 pm Last Edit: Nov 11, 2008, 03:31 pm by mikalhart Reason: 1
@dcb--

If you just change your hpticks derivative in reply 24 from

Code: [Select]
 unsigned long t0_ticks = (clock_cycles / 64) + (millis * (1000L * clockCyclesPerMicrosecond() / 64)) + t0;
 return ((t0_ticks) * 64L / (F_CPU / 1000000L));


to

Code: [Select]
 unsigned long t0_ticks = (clock_cycles / 64UL) + t0;
 return 1000 * millis + t0_ticks * 64UL / clockCyclesPerMicrosecond();


the 268-second micros() overflow problem goes away (overflows at the 32-bit boundary).  (This change avoids the unnecessary translation of millis into the "tick" domain and then back into the "micro" domain.)

This seems ideal to me.  No changing wiring.c and all the benefits of Don's and dcb's work.  It also has the added benefit of working with the 20MHz clock (I think).  Do you agree?

Mikal

dcb

Mikal, I do appreciate the investigation, but let me get a read from you on the function in reply 30 first.

Having seen this approach work 5 times faster than the no-change to wiring version, you can imagine I want the fast one :)

mikalhart

On the surface it looks good!  I'll study it enthusiastically later on.  (At some point I need to pretend to be doing "real" work today. :))  

Nice work!  This is fun.

Mikal

dcb

fyi, timer0_tics appears to be identical to the former timer0_overflow_count  ::)

Don Kinzer

Quote
#if F_CPU >= 16000000L

I understand what you're trying to do here but keep in mind that with -0s the avr-gcc compiler is going to replace multiplication by powers of two with left shifting so it isn't necessary to code it explicitly with shifting.  Moreover, the result won't be correct for 20MHz because the multiplication factor should be 3.2 rather than 4.

I would propose the alternate implementation of cycles to microseconds shown below which handles CPU speeds that are factors of 64,000,000 as one case, handles 20MHz as a special case and reports an error at compile time otherwise.

Note that the code for 20MHz will be somewhat slower due to the divide-by-10 operation (in addition to one more shift cycle to perform the multiplication) and it will have a slightly smaller dynamic range.  If desired, the calculation for 20MHz could include rounding by adding 5 to the result prior to dividing by 10.
Code: [Select]
#define F_CPU_MHZ   (F_CPU / 1000000L)
 unsigned long us;
#if ((64 / F_CPU_MHZ) * F_CPU_MHZ) == 64
 us = ((m << 8) + t) * (64 / F_CPU_MHZ);
#elif F_CPU_MHZ == 20
 us = (((m << 8) + t) * 32) / 10;
#else
 #error clock speed not supported
#endif
 return(us);
#undef F_CPU_MHZ
Don

ZBasic Microcontrollers
http://www.zbasic.net

dcb

I'm ok with a special case for 20mhz as long as millis() and elapsedMillis() is fast.

delayMicroseconds is currently 8 or 16mhz fyi it is in wiring.c if you want to take a stab at figuring out how to make that 20mhz compatable.

Also might want to think about 1mhz compatibility a bit, I think the avr butterfly (runs on a button cell) is a really neat device and worthy of some consideration as well @ 1mhz.

Obviously it may not be practical to work with every possible frequency and get good performance though.  I would generally assert that where microseconds is concerned, performance is also a concern.

mikalhart

#37
Nov 11, 2008, 06:30 pm Last Edit: Nov 11, 2008, 06:46 pm by mikalhart Reason: 1
@Don, one possible objection to your proposal for the 20MHz case is that micros() will not overflow at 32-bits because of the final division by 10, right?

@dcb: I'm playing (illicitly) with your code at work, and the more I see the more I like!  I've been doing some experiments with consecutive calls to (your) micros() and it does indeed seem much, much faster on a 16MHz Arduino.  And the wiring.c mods are really quite minimal, aren't they?

It doesn't seem like 1MHz support would be too hard because 1 divides 64.

@mellis: I measured 1 million deltas between consecutive calls to dcb's micros and got these results:

Delta | Count
0us   | 35.7%
4us   | 64.1%
12us  | 0.2%
16us  | 0.02%


Mikal

Don Kinzer

Quote
one possible objection to your proposal for the 20MHz case is that micros() will not overflow at 32-bits because of the final division by 10, right?
Quite so.  However, if you don't multiply by 3.2 the result won't have units of microseconds.  If, for example, you choose to multiply by 4 (like the 16MHz case) the return value will represent units of 0.8uS.

This issue is why I've favored a routine that returns Timer0 clock cycles instead of microseconds.  It is quite simple to implement microsecond-based timing by looking for the equivalent number of Timer0 clock cycles at the prevailing CPU speed.  This strategy has the added convenience of allowing either truncation or rounding behavior to be employed (when a difference exists) depending on which is better for a particular application.
Don

ZBasic Microcontrollers
http://www.zbasic.net

mikalhart

#39
Nov 11, 2008, 07:43 pm Last Edit: Nov 11, 2008, 07:43 pm by mikalhart Reason: 1
Quote
However, if you don't multiply by 3.2 the result won't have units of microseconds.

I understand the dilemma.  You're going to have to lose some bits either at the high end or the low end one way or another.

What about replacing (for 20MHz only)
Code: [Select]
 us = (((m << 8) + t) * 32) / 10;
with
Code: [Select]
 us = (((m << 8) + t) / 5) * 16;

With this expression, we'd lose a bit of resolution (because of doing the divide by 5 first), but still overflow at 32 bits (if my analysis is correct).

Mikal

Don Kinzer

Quote
delayMicroseconds is currently 8 or 16mhz fyi it is in wiring.c if you want to take a stab at figuring out how to make that 20mhz compatable.
I've modified my version so that it works correctly at 20MHz.  As shown in the code below, an extra cycle is conditionally added to the delay loop for 20MHz.  The code preparing for the delay is slightly different, too, to account for the faster speed.  My measurements put it right on the button.
Code: [Select]
/* Delay for the given number of microseconds.  Assumes a 8, 16 or 20 MHz clock.
* Disables interrupts, which will disrupt the millis() function if used
* too frequently. */
void delayMicroseconds(unsigned int us)
{
#define EXTRA_CYCLES
   uint8_t oldSREG;

   // calling avrlib's delay_us() function with low values (e.g. 1 or
   // 2 microseconds) gives delays longer than desired.
   //delay_us(us);

#if F_CPU >= 20000000L
   // for a 20 MHz clock add one extra cycle to the delay loop
#undef EXTRA_CYCLES
#define EXTRA_CYCLES    " nop" "\n\t" // 1 cycle

   // for a one-microsecond delay, simply return.  the overhead
   // of the function call yields a delay of approximately 0.9 us.
   if (--us == 0)
       return;

   // the loop below takes 0.25 microseconds (5 cycles)
   // per iteration, so execute it four times for each microsecond of
   // delay requested.
   us <<= 2;

   // partially compensate for the overhead of getting into and out of the loop
   us -= 3;

#elif F_CPU >= 16000000L
   // for the 16 MHz clock on most Arduino boards

   // for a one-microsecond delay, simply return.  the overhead
   // of the function call yields a delay of approximately 1 1/8 us.
   if (--us == 0)
       return;

   // the following loop takes a quarter of a microsecond (4 cycles)
   // per iteration, so execute it four times for each microsecond of
   // delay requested.
   us <<= 2;

   // account for the time taken in the preceeding commands.
   us -= 2;
#else
   // for the 8 MHz internal clock on the ATmega168

   // for a one- or two-microsecond delay, simply return.  the overhead of
   // the function calls takes more than two microseconds.  can't just
   // subtract two, since us is unsigned; we'd overflow.
   if (--us == 0)
       return;
   if (--us == 0)
       return;

   // the following loop takes half of a microsecond (4 cycles)
   // per iteration, so execute it twice for each microsecond of
   // delay requested.
   us <<= 1;
   
   // partially compensate for the time taken by the preceeding commands.
   // we can't subtract any more than this or we'd overflow w/ small delays.
   us--;
#endif

   // disable interrupts, otherwise the timer 0 overflow interrupt that
   // tracks milliseconds will make us delay longer than we want.
   oldSREG = SREG;
   cli();

   // busy wait
   __asm__ __volatile__ (
       "1: sbiw %0,1" "\n\t" // 2 cycles
       EXTRA_CYCLES
       "brne 1b" : "=w" (us) : "0" (us) // 2 cycles
   );

   // reenable interrupts.
   SREG = oldSREG;
}
Don

ZBasic Microcontrollers
http://www.zbasic.net

Don Kinzer

Quote
With this expression, we'd lose a bit of resolution (because of doing the divide by 5 first), but still overflow at 32 bits (if my analysis is correct).
I'm not sure that I understand what you mean about overflowing at 32 bits.  I believe that you either lose range, resolution or both.  With the scaling factors that you mentioned the maximum value of ((m << 8) + t) is 0x4FFFFFFB as compared to 0xFFFFFFFF in the other cases.  While it is true that the maximum return value will be 0xFFFFFFFF you still need to know the range in order to compute elapsed time when the second data point has a lower value than the first, i.e., when the value has wrapped.

Besides that, having micros() return a value with different units also causes portability issues that may be more bothersome than the inherent difference in range.

I reiterate my support for hpticks() (returning Timer0 ticks) because it avoids these problems entirely and, as I indicated earlier, it is easy to convert the result to microseconds using the CPU speed if desired.
Don

ZBasic Microcontrollers
http://www.zbasic.net

dcb

Ok, I haven't absorbed the preceeding entirely yet, but wondered when the last time someone tried avrlib's delay_us() function was?

And is it long becasue we call it from a function?  In that case a #define delayMicroSeconds(us) delay_us(us)

might fix it?

Don Kinzer

Quote
I [...] wondered when the last time someone tried avrlib's delay_us() function was?
The delay_us() function, which takes a floating point parameter, works great if the parameter to it is a compile-time constant.  In that case, the compiler does the real math and produces integral values to use in the inlined code.  If the parameter is not constant at compile time, your program grows by a large amount due to the floating point math having to be linked in.
Don

ZBasic Microcontrollers
http://www.zbasic.net

mellis

I think the ticks() function is nice, but I think it may be too confusing to include in the core.  A tick can be hard to explain, especially because it varies based on the cpu speed.  We'd probably want to include a microsToTicks() functions or something, which also complicates things.  Again, this would be a great function to have a nice implementation of on the playground: http://www.arduino.cc/playground/Main/GeneralCodeLibrary

For the micros() function, is it reasonable to simply count microseconds in the overflow handler?  Or is there another way to avoid overflowing at a weird value?  Especially with micros() that will overflow relatively quickly, I think it's important that people can just do a simple subtraction.  

Go Up