0
Offline
God Member
Karma: 0
Posts: 507
|
 |
« Reply #30 on: November 11, 2008, 09:25:40 am » |
updated micros, passes all test thus far: unsigned long micros() { unsigned long m, t; uint8_t oldSREG = SREG; cli(); t = TCNT0; if ((TIFR0 & _BV(TOV0)) && (t == 0)) t = 256; m = timer0_tics; SREG = oldSREG; #if F_CPU >= 16000000L return ((m << 8) + t) <<2; #else return ((m << 8) + t) <<3; #endif }
|
|
|
|
|
Logged
|
|
|
|
|
Austin, TX USA
Offline
God Member
Karma: 3
Posts: 992
Arduino rocks
|
 |
« Reply #31 on: November 11, 2008, 09:25:43 am » |
@dcb-- If you just change your hpticks derivative in reply 24 from unsigned long t0_ticks = (clock_cycles / 64) + (millis * (1000L * clockCyclesPerMicrosecond() / 64)) + t0; return ((t0_ticks) * 64L / (F_CPU / 1000000L)); to unsigned long t0_ticks = (clock_cycles / 64UL) + t0; return 1000 * millis + t0_ticks * 64UL / clockCyclesPerMicrosecond(); the 268-second micros() overflow problem goes away (overflows at the 32-bit boundary). (This change avoids the unnecessary translation of millis into the "tick" domain and then back into the "micro" domain.) This seems ideal to me. No changing wiring.c and all the benefits of Don's and dcb's work. It also has the added benefit of working with the 20MHz clock (I think). Do you agree? Mikal
|
|
|
|
« Last Edit: November 11, 2008, 09:31:54 am by mikalhart »
|
Logged
|
|
|
|
|
0
Offline
God Member
Karma: 0
Posts: 507
|
 |
« Reply #32 on: November 11, 2008, 09:35:29 am » |
Mikal, I do appreciate the investigation, but let me get a read from you on the function in reply 30 first. Having seen this approach work 5 times faster than the no-change to wiring version, you can imagine I want the fast one 
|
|
|
|
|
Logged
|
|
|
|
|
Austin, TX USA
Offline
God Member
Karma: 3
Posts: 992
Arduino rocks
|
 |
« Reply #33 on: November 11, 2008, 11:22:16 am » |
On the surface it looks good! I'll study it enthusiastically later on. (At some point I need to pretend to be doing "real" work today.  ) Nice work! This is fun. Mikal
|
|
|
|
|
Logged
|
|
|
|
|
0
Offline
God Member
Karma: 0
Posts: 507
|
 |
« Reply #34 on: November 11, 2008, 11:36:36 am » |
fyi, timer0_tics appears to be identical to the former timer0_overflow_count : 
|
|
|
|
|
Logged
|
|
|
|
|
Portland, OR, USA
Offline
Jr. Member
Karma: 0
Posts: 78
|
 |
« Reply #35 on: November 11, 2008, 11:45:55 am » |
#if F_CPU >= 16000000L I understand what you're trying to do here but keep in mind that with -0s the avr-gcc compiler is going to replace multiplication by powers of two with left shifting so it isn't necessary to code it explicitly with shifting. Moreover, the result won't be correct for 20MHz because the multiplication factor should be 3.2 rather than 4. I would propose the alternate implementation of cycles to microseconds shown below which handles CPU speeds that are factors of 64,000,000 as one case, handles 20MHz as a special case and reports an error at compile time otherwise. Note that the code for 20MHz will be somewhat slower due to the divide-by-10 operation (in addition to one more shift cycle to perform the multiplication) and it will have a slightly smaller dynamic range. If desired, the calculation for 20MHz could include rounding by adding 5 to the result prior to dividing by 10. #define F_CPU_MHZ (F_CPU / 1000000L) unsigned long us; #if ((64 / F_CPU_MHZ) * F_CPU_MHZ) == 64 us = ((m << 8) + t) * (64 / F_CPU_MHZ); #elif F_CPU_MHZ == 20 us = (((m << 8) + t) * 32) / 10; #else #error clock speed not supported #endif return(us); #undef F_CPU_MHZ
|
|
|
|
|
Logged
|
|
|
|
|
0
Offline
God Member
Karma: 0
Posts: 507
|
 |
« Reply #36 on: November 11, 2008, 12:03:05 pm » |
I'm ok with a special case for 20mhz as long as millis() and elapsedMillis() is fast.
delayMicroseconds is currently 8 or 16mhz fyi it is in wiring.c if you want to take a stab at figuring out how to make that 20mhz compatable.
Also might want to think about 1mhz compatibility a bit, I think the avr butterfly (runs on a button cell) is a really neat device and worthy of some consideration as well @ 1mhz.
Obviously it may not be practical to work with every possible frequency and get good performance though. I would generally assert that where microseconds is concerned, performance is also a concern.
|
|
|
|
|
Logged
|
|
|
|
|
Austin, TX USA
Offline
God Member
Karma: 3
Posts: 992
Arduino rocks
|
 |
« Reply #37 on: November 11, 2008, 12:30:12 pm » |
@Don, one possible objection to your proposal for the 20MHz case is that micros() will not overflow at 32-bits because of the final division by 10, right?
@dcb: I'm playing (illicitly) with your code at work, and the more I see the more I like! I've been doing some experiments with consecutive calls to (your) micros() and it does indeed seem much, much faster on a 16MHz Arduino. And the wiring.c mods are really quite minimal, aren't they?
It doesn't seem like 1MHz support would be too hard because 1 divides 64.
@mellis: I measured 1 million deltas between consecutive calls to dcb's micros and got these results:
Delta | Count 0us | 35.7% 4us | 64.1% 12us | 0.2% 16us | 0.02%
Mikal
|
|
|
|
« Last Edit: November 11, 2008, 12:46:33 pm by mikalhart »
|
Logged
|
|
|
|
|
Portland, OR, USA
Offline
Jr. Member
Karma: 0
Posts: 78
|
 |
« Reply #38 on: November 11, 2008, 01:31:58 pm » |
one possible objection to your proposal for the 20MHz case is that micros() will not overflow at 32-bits because of the final division by 10, right? Quite so. However, if you don't multiply by 3.2 the result won't have units of microseconds. If, for example, you choose to multiply by 4 (like the 16MHz case) the return value will represent units of 0.8uS. This issue is why I've favored a routine that returns Timer0 clock cycles instead of microseconds. It is quite simple to implement microsecond-based timing by looking for the equivalent number of Timer0 clock cycles at the prevailing CPU speed. This strategy has the added convenience of allowing either truncation or rounding behavior to be employed (when a difference exists) depending on which is better for a particular application.
|
|
|
|
|
Logged
|
|
|
|
|
Austin, TX USA
Offline
God Member
Karma: 3
Posts: 992
Arduino rocks
|
 |
« Reply #39 on: November 11, 2008, 01:43:38 pm » |
However, if you don't multiply by 3.2 the result won't have units of microseconds. I understand the dilemma. You're going to have to lose some bits either at the high end or the low end one way or another. What about replacing (for 20MHz only) us = (((m << 8) + t) * 32) / 10; with us = (((m << 8) + t) / 5) * 16; With this expression, we'd lose a bit of resolution (because of doing the divide by 5 first), but still overflow at 32 bits (if my analysis is correct). Mikal
|
|
|
|
« Last Edit: November 11, 2008, 01:43:59 pm by mikalhart »
|
Logged
|
|
|
|
|
Portland, OR, USA
Offline
Jr. Member
Karma: 0
Posts: 78
|
 |
« Reply #40 on: November 11, 2008, 03:54:46 pm » |
delayMicroseconds is currently 8 or 16mhz fyi it is in wiring.c if you want to take a stab at figuring out how to make that 20mhz compatable. I've modified my version so that it works correctly at 20MHz. As shown in the code below, an extra cycle is conditionally added to the delay loop for 20MHz. The code preparing for the delay is slightly different, too, to account for the faster speed. My measurements put it right on the button. /* Delay for the given number of microseconds. Assumes a 8, 16 or 20 MHz clock. * Disables interrupts, which will disrupt the millis() function if used * too frequently. */ void delayMicroseconds(unsigned int us) { #define EXTRA_CYCLES uint8_t oldSREG;
// calling avrlib's delay_us() function with low values (e.g. 1 or // 2 microseconds) gives delays longer than desired. //delay_us(us);
#if F_CPU >= 20000000L // for a 20 MHz clock add one extra cycle to the delay loop #undef EXTRA_CYCLES #define EXTRA_CYCLES " nop" "\n\t" // 1 cycle
// for a one-microsecond delay, simply return. the overhead // of the function call yields a delay of approximately 0.9 us. if (--us == 0) return;
// the loop below takes 0.25 microseconds (5 cycles) // per iteration, so execute it four times for each microsecond of // delay requested. us <<= 2;
// partially compensate for the overhead of getting into and out of the loop us -= 3;
#elif F_CPU >= 16000000L // for the 16 MHz clock on most Arduino boards
// for a one-microsecond delay, simply return. the overhead // of the function call yields a delay of approximately 1 1/8 us. if (--us == 0) return;
// the following loop takes a quarter of a microsecond (4 cycles) // per iteration, so execute it four times for each microsecond of // delay requested. us <<= 2;
// account for the time taken in the preceeding commands. us -= 2; #else // for the 8 MHz internal clock on the ATmega168
// for a one- or two-microsecond delay, simply return. the overhead of // the function calls takes more than two microseconds. can't just // subtract two, since us is unsigned; we'd overflow. if (--us == 0) return; if (--us == 0) return;
// the following loop takes half of a microsecond (4 cycles) // per iteration, so execute it twice for each microsecond of // delay requested. us <<= 1; // partially compensate for the time taken by the preceeding commands. // we can't subtract any more than this or we'd overflow w/ small delays. us--; #endif
// disable interrupts, otherwise the timer 0 overflow interrupt that // tracks milliseconds will make us delay longer than we want. oldSREG = SREG; cli();
// busy wait __asm__ __volatile__ ( "1: sbiw %0,1" "\n\t" // 2 cycles EXTRA_CYCLES "brne 1b" : "=w" (us) : "0" (us) // 2 cycles );
// reenable interrupts. SREG = oldSREG; }
|
|
|
|
|
Logged
|
|
|
|
|
Portland, OR, USA
Offline
Jr. Member
Karma: 0
Posts: 78
|
 |
« Reply #41 on: November 11, 2008, 04:22:49 pm » |
With this expression, we'd lose a bit of resolution (because of doing the divide by 5 first), but still overflow at 32 bits (if my analysis is correct). I'm not sure that I understand what you mean about overflowing at 32 bits. I believe that you either lose range, resolution or both. With the scaling factors that you mentioned the maximum value of ((m <<  + t) is 0x4FFFFFFB as compared to 0xFFFFFFFF in the other cases. While it is true that the maximum return value will be 0xFFFFFFFF you still need to know the range in order to compute elapsed time when the second data point has a lower value than the first, i.e., when the value has wrapped. Besides that, having micros() return a value with different units also causes portability issues that may be more bothersome than the inherent difference in range. I reiterate my support for hpticks() (returning Timer0 ticks) because it avoids these problems entirely and, as I indicated earlier, it is easy to convert the result to microseconds using the CPU speed if desired.
|
|
|
|
|
Logged
|
|
|
|
|
0
Offline
God Member
Karma: 0
Posts: 507
|
 |
« Reply #42 on: November 11, 2008, 05:22:46 pm » |
Ok, I haven't absorbed the preceeding entirely yet, but wondered when the last time someone tried avrlib's delay_us() function was?
And is it long becasue we call it from a function? In that case a #define delayMicroSeconds(us) delay_us(us)
might fix it?
|
|
|
|
|
Logged
|
|
|
|
|
Portland, OR, USA
Offline
Jr. Member
Karma: 0
Posts: 78
|
 |
« Reply #43 on: November 11, 2008, 05:34:54 pm » |
I [...] wondered when the last time someone tried avrlib's delay_us() function was? The delay_us() function, which takes a floating point parameter, works great if the parameter to it is a compile-time constant. In that case, the compiler does the real math and produces integral values to use in the inlined code. If the parameter is not constant at compile time, your program grows by a large amount due to the floating point math having to be linked in.
|
|
|
|
|
Logged
|
|
|
|
|
Forum Administrator
Cambridge, MA
Offline
Faraday Member
Karma: 7
Posts: 3532
|
 |
« Reply #44 on: November 11, 2008, 06:05:36 pm » |
I think the ticks() function is nice, but I think it may be too confusing to include in the core. A tick can be hard to explain, especially because it varies based on the cpu speed. We'd probably want to include a microsToTicks() functions or something, which also complicates things. Again, this would be a great function to have a nice implementation of on the playground: http://www.arduino.cc/playground/Main/GeneralCodeLibraryFor the micros() function, is it reasonable to simply count microseconds in the overflow handler? Or is there another way to avoid overflowing at a weird value? Especially with micros() that will overflow relatively quickly, I think it's important that people can just do a simple subtraction.
|
|
|
|
|
Logged
|
|
|
|
|
|