Go Down

Topic: realtime clock, microseconds, etc. (Read 10651 times) previous topic - next topic

Don Kinzer

#45
Nov 12, 2008, 12:44 am Last Edit: Nov 12, 2008, 12:50 am by dkinzer Reason: 1
Quote
For the micros() function, is it reasonable to simply count microseconds in the overflow handler?
Unfortunately, the same problems exist at that level.  Each time the overflow handler executes represents 256 ticks or 64 * 256 CPU cycles.  Attempting to convert either of these quantities to an integral microseconds value will involve the same issues for 20MHz CPU frequency as it does in any of the other methods.

You could implement two accumulators - one numerator and one denominator and then do the division when micros() is called.  Here again though, you'd have the same problems of range or resolution.
Don

ZBasic Microcontrollers
http://www.zbasic.net

dcb

"I reiterate my support for hpticks()"

I think this is where I'm at, can we bring back timer0_overflow_count++ in SIGNAL(SIG_OVERFLOW0)?

It looks like things were optimized for millis, but unintentionally at the expense of the ability to track microseconds efficiently.  There is probably enough bandwidth to do both in the timer0 interrupt.

Then the following function would work pretty well for 8 and 16mhz:

Code: [Select]

unsigned long micros()
{
 unsigned long m, t;
 uint8_t oldSREG = SREG;
 cli();
 t = TCNT0;
 if ((TIFR0 & _BV(TOV0)) && (t == 0))
   t = 256;
 m = timer0_overflow_count;
 SREG = oldSREG;
#if F_CPU >= 16000000L
 return ((m << 8) + t) <<2;
#else
 return ((m << 8) + t) <<3;
#endif  
 
}



re: 20mhz, can we address that after we get a working micros() for the target 8 and 16mhz chips?  I would like to checkmark at an agreeable  solution for 8 and 16mhz if that is ok.

Don Kinzer

Quote
A tick can be hard to explain, especially because it varies based on the cpu speed.
I'm not sure that it is necessary to do so but it is simple enough to say that a tick is 64/F_CPU seconds.  It is more important to point out that a given value returned by hpticks() is not particularly useful.  What is useful is the difference between two returned values, representing an elapsed time.

The function below returns the number of microseconds between two tick values.  For CPU frequencies that are a factor of 64,000,000 the round vs. truncation issue is moot and the third parameter is ignored.  For other situations, the third parameter is used to produce the desired effect.
Code: [Select]
unsigned long elapsedMicroseconds(unsigned long ticks0, unsigned long ticks1, bool round = false);
unsigned long
elapsedMicroseconds(unsigned long ticks0, unsigned long ticks1, bool round)
{
 #define F_CPU_MHZ (F_CPU / 1000000L)
 #define PRESCALER 64
 unsigned long us;
#if (((PRESCALER / F_CPU_MHZ) * F_CPU_MHZ) == PRESCALER)
 // exact result, no rounding factor needs to be applied
 us = (ticks1 - ticks0) * (PRESCALER / F_CPU_MHZ);
#elif (F_CPU_MHZ == 20)
 if (round)
   // result rounded to the nearest microsecond
   us = (((ticks1 - ticks0) * 32) + 5) / 10;
 else
   // result truncated
   us = ((ticks1 - ticks0) * 16) / 5;
#else
 #error CPU frequency not supported
#endif
 return(us);
 #undef PRESCALER
 #undef F_CPU_MHZ
}
Don

ZBasic Microcontrollers
http://www.zbasic.net

dcb

Don, awesome work as usual :)  

FYI, I started a seperate thread on mhz discussions, I think it is a necessary discussion but beyond the scope of this thread, as David pointed out earlier.

http://www.arduino.cc/cgi-bin/yabb2/YaBB.pl?num=1226451895/0#0

If we are happy with keeping track of a tick every 1024 microseconds in the interrupt handler and a fast working millis()  then lets lock in that progress.

mikalhart

#49
Nov 12, 2008, 04:34 am Last Edit: Nov 12, 2008, 04:58 am by mikalhart Reason: 1
Quote
I'm not sure that I understand what you mean about overflowing at 32 bits.  I believe that you either lose range, resolution or both.  With the scaling factors that you mentioned the maximum value of ((m << 8) + t) is 0x4FFFFFFB as compared to 0xFFFFFFFF in the other cases.  While it is true that the maximum return value will be 0xFFFFFFFF you still need to know the range in order to compute elapsed time when the second data point has a lower value than the first, i.e., when the value has wrapped.


Don, I guess all I was trying to argue was that scaling (for the 20MHz case) using y = (x / 5) * 16 is superior to y = (x * 16) / 5 IF you are interested (as David is) in making sure the values of y are evenly distributed throughout the entire range of 32-bit values.  Yes, you lose some resolution, but you gain the ability to compute time deltas by simply subtracting them.

The expression (((m << 8) + t) / 5) * 16) overflows at 0xFFFFFFF0 with resolution 16.  Meanwhile, the expression (((m << 8) + t) * 16) / 5), which is roughly equivalent otherwise, overflows at 0x19999996 (albeit with better resolution).  They both, more or less, represent microseconds elapsed.

Here's a brief summary of the the tradeoffs between overflow and resolution:

A = 16MHz algorithm
B = 8MHz algorithm
C = 20MHz algorithm with y = x * 16 / 5
D = 20MHz algorithm with y = x / 5 * 16

Scheme | Overflow | Resolution (us)
-----------------------------------
  A    | FFFFFFFC | 4
  B    | FFFFFFF8 | 8
  C    | 19999996 | ~16/5
  D    | FFFFFFF0 | 16


I vote for A, B, and D. :)  D could easily be applied to elapsedMicroseconds.  Just do the division first and then the multiplication.

I like hpticks() a lot too, by the way.  I think I would use it a bunch.  Thanks!

Thoughts, anyone?

Mikal

PS: Nice work on "round". ;)

Don Kinzer

Quote
I was trying to argue was that scaling (for the 20MHz case) using y = (x / 5) * 16 is superior to y = (x * 16) / 5 IF you are interested (as David is) in making sure the values of y are evenly distributed throughout the entire range of 32-bit values.
Perhaps I'm missing something.  I don't see how you arrived at the overflow value for y = (x / 5) * 16, nor can I substantiate the claim that the values are evenly distributed over the 32-bit value range.  In particular, given that the range of values for x is 0 to 0xffffffff, after applying the conversion function y = (x / 5) * 16, the range of values for y is 0 to 0x33333330, occupying less than one fourth of the range of a 32-bit value.
Don

ZBasic Microcontrollers
http://www.zbasic.net

mikalhart

#51
Nov 12, 2008, 05:05 am Last Edit: Nov 12, 2008, 05:29 am by mikalhart Reason: 1
Quote
[From dcb] I would like to checkmark at an agreeable solution for 8 and 16mhz if that is ok.


dcb, I think Don's code from post #35 IS your solution, with the 8 and 16 MHz cases rolled into one.  For what it's worth I give these two an enthusiastic CHECK.   ;)

Mikal

mikalhart

#52
Nov 12, 2008, 05:20 am Last Edit: Nov 12, 2008, 05:43 am by mikalhart Reason: 1
Quote
Perhaps I'm missing something.  I don't see how you arrived at the overflow value for y = (x / 5) * 16, nor can I substantiate the claim that the values are evenly distributed over the 32-bit value range.  In particular, given that the range of values for x is 0 to 0xffffffff, after applying the conversion function y = (x / 5) * 16, the range of values for y is 0 to 0x33333330, occupying less than one fourth of the range of a 32-bit value.


Don, the error is in assuming that the maximum value for f(x) = (x / 5) * 16 occurs when x = 0xFFFFFFFF.  There are several values for x where the rollover occurs, but 0xFFFFFFFF is not one of them.  For example
f(0x4FFFFFFF) = 0xFFFFFFF0
and
f(0x50000000) = 0x0

This table should clarify things, both in terms of overflow and resolution.  All values are in hex, and show that the range for y is indeed 0 to 0xFFFFFFFF and evenly spaced.

x        | (x/5)*16 | (x*16)/5
0        | 0        | 0
1        | 0        | 3
2        | 0        | 6
3        | 0        | 9
4        | 0        | C
5        | 10       | 10
6        | 10       | 13
...
60       | 130      | 133
61       | 130      | 136
...
7FFFFF0  | 19999960 | 19999966
7FFFFF1  | 19999960 | 19999969
7FFFFF2  | 19999960 | 1999996C
7FFFFF3  | 19999970 | 19999970
7FFFFF4  | 19999970 | 19999973
7FFFFF5  | 19999970 | 19999976
7FFFFF6  | 19999970 | 19999979
7FFFFF7  | 19999970 | 1999997C
7FFFFF8  | 19999980 | 19999980
7FFFFF9  | 19999980 | 19999983
7FFFFFA  | 19999980 | 19999986
7FFFFFB  | 19999980 | 19999989
7FFFFFC  | 19999980 | 1999998C
7FFFFFD  | 19999990 | 19999990
7FFFFFE  | 19999990 | 19999993
7FFFFFF  | 19999990 | 19999996
8000000  | 19999990 | 0
8000001  | 19999990 | 3
...
4FFFFFF0 | FFFFFFC0 | 19999966
4FFFFFF1 | FFFFFFD0 | 19999969
4FFFFFF2 | FFFFFFD0 | 1999996C
4FFFFFF3 | FFFFFFD0 | 19999970
4FFFFFF4 | FFFFFFD0 | 19999973
4FFFFFF5 | FFFFFFD0 | 19999976
4FFFFFF6 | FFFFFFE0 | 19999979
4FFFFFF7 | FFFFFFE0 | 1999997C
4FFFFFF8 | FFFFFFE0 | 19999980
4FFFFFF9 | FFFFFFE0 | 19999983
4FFFFFFA | FFFFFFE0 | 19999986
4FFFFFFB | FFFFFFF0 | 19999989
4FFFFFFC | FFFFFFF0 | 1999998C
4FFFFFFD | FFFFFFF0 | 19999990
4FFFFFFE | FFFFFFF0 | 19999993
4FFFFFFF | FFFFFFF0 | 19999996
50000000 | 0        | 0
50000001 | 0        | 3
...


Does that make sense?

Mikal

EDIT: I just realized the fatal flaw in my proposal is that when x itself overflows, there is a discontinuity in f(x).

FFFFFFFF | 33333330 | 19999996
0        | 0 (!)    | 0


Sorry to waste everyone's time on this.  The discontinuity obviously defeats my goal of being able to calculate delta = time2 - time1.  :-[

M

dcb

#53
Nov 12, 2008, 05:52 am Last Edit: Nov 12, 2008, 06:29 am by dcb Reason: 1
No sweat Mikal :)

Ok, final proposal for micros (leaving the 20mhz can of worms out of it)

Mellis, if you agree:
update wiring.c

add a global variable:
volatile unsigned long timer0_tics = 0;

add
     timer0_tics++;
to top of SIGNAL(TIMER0_OVF_vect), leave rest where it is.

add micros function (plus prototype in wiring.h):

Code: [Select]

unsigned long micros(){
 unsigned long m, t;
 uint8_t oldSREG = SREG;
 cli();
 t = TCNT0;
 if ((TIFR0 & _BV(TOV0)) && (t == 0))
   t = 256;
 m = timer0_tics;
 SREG = oldSREG;
#if ((64 / clockCyclesPerMicrosecond()) * clockCyclesPerMicrosecond()) == 64
 return ((m << 8) + t) * (64 / clockCyclesPerMicrosecond());
#else
 #error clock speed not supported
#endif  
}


mikalhart

#54
Nov 12, 2008, 06:10 am Last Edit: Nov 12, 2008, 06:24 am by mikalhart Reason: 1
With whatever credibility I have left I give this a thumbs up, although I agree with Don's advice to gate it with some #ifdefs.  You want to make sure that whoever eventually does add support for 20MHz doesn't overlook the fact that this code makes some serious assumptions about F_CPU.

Just as an aside, 64 is turning into quite the magic number.  I wonder if it might be prudent to #define PRESCALE 64 in wiring.h, especially since there is talk elsewhere of what might happen if the prescale value ever changed.

Thanks for the impressive work, all.

Mikal

dcb

clockCyclesPerMicrosecond(), done.

PRESCALE, I'll let David global replace on that if so desired.  It may come up in the supported frequencies thread.

re ifdef, ok, I can gate on the 8/16mhz deal, it looks like that lends it self to end-start on overflow and is about 3 microseconds per call.

mikalhart

[I'll delete this post when you reply, dcb, but I don't understand that last comment.  What does #ifdef have to do with "end-start", "overflow", or changing the call timing?]

M

dcb

you can pm me too :)

They were follow up tests.   The end-start one you might be interested in,  I tested that code at overflow by setting  timer0_tics to 4294967296 - 3000 before starting, and added  Serial.print(micros0-lastmicros) to the monitor and the value printed at overflow (499712) was consistent with the other subtractions.


re, performance, just checking compared to prior versions, it performs well.  




mellis

What about this?  It doesn't require any changes to the overflow handler, and I think it will overflow nicely (although it would be great if someone could test that).  It should provide as much accuracy and precision as a ticks() function at 8 MHz and 16 MHz (since each tick is a integral number of micros), although not as much range.  

Code: [Select]
unsigned long micros() {
 uint16_t t0;
 unsigned long cc;
 unsigned long m;

 uint8_t sreg = SREG;
 cli();

 t0 = TCNT0;
 if ((TIFR0 & _BV(TOV0)) && (t0 == 0))
   t0 = 256;

 cc = timer0_clock_cycles;
 m = timer0_millis;
 SREG = sreg;
 
 return m * 1000UL + (cc + (t0 * 64)) / clockCyclesPerMicrosecond();
}



dcb

Only problem I have with it is that it is  a bit slower,  12us per call, vs 3us for the code above.  We optimized millis in 0013 at the expense of a more optimal micros.  the only change to the interrupt is an increment of the overflow variable.




Go Up