Optimizing milli() and timer implementation.

This is an academic question: I've been reading how the timer is implemented here.

  • timer ticks every 1.024ms, and fraction handling is done in handler
  • milli() disables interrupts to read timer0_millis

For bullet 1, can't we set the counter's overflow to be 250 instead of 256?

For bullet 2, can't we avoid the cli() and implement atomicity by switching between two stores for milli?

The following code would work and be atomic IF the timer_0_data array is aligned so that ONLY the low byte of the address is different. Maybe a compiler macro can enforce alignment.

The down side of this implementation is it uses 1 pointer and two longs instead of two longs and a char.

volatile long timer0_data[2];
volatile long *timer0_milli=timer0_data[0];

SIGNAL(TIMER0_OVF_vect)
{
  long old_time = *timer0_milli;

  // Swap pointer
  timer0_milli = (timer0_milli == &timer0_data[0])? &timer0_data[1] : &timer0_data[0];
  *timer0_milli = old_time+1;
}

unsigned long millis() {
  return *timer0_millis;
}

1 would screw up PWM on those pins where the PWM is driven by TIMER0.

For bullet 1, can't we set the counter's overflow to be 250 instead of 256?

It is easier to detect the overflow of an 8 bit counter when you increment 255, than to do the comparison to 250 at every step.

For bullet 2, can't we avoid the cli() and implement atomicity by switching between two stores for milli?

I assume you meant "two reads". Interrupts are disabled when storing so that is atomic. Writing twice is pointless.

(Those I have worked with call what you propose a "dirty read" in case you want to Google.)

The answer to your question is "yes". However, there is a "but". With AVR processors (single core processors) it is far more efficient to disable interrupts.

For bullet 2, can't we avoid the cli() and implement atomicity by switching between two stores for milli?

How does that help? If the swapping of pointers in millis() is unprotected, you may get half of one pointer and half of the other.

The solution to "dirty reads" is to read the variable twice. If the values are the same then an interrupt could not possibly have occurred (the read was successful).

My guess is that is what @sonyhome was trying to show.

DrAzzy: Indeed, the PWM would have a cycle that's off by 2%. Is it critical? Could it be spec'ed to be as is? If not, I think the time correction should be done in milli(), not in the interrupt handler anyways.

Coding Badly: I am not talking of reading the value twice, no. I talk of storing 2 values.

Nick: The pointer is swapped by the interrupt handler, not milli(). Milli() just dereferences the pointer. Furthermore, alignment solves the issue.

I am talking of storing two timer0 values. The interrupt handler does not modify the value potentially being read by the program, but the other.

We control the addresses of the values via alignment so the pointer swap is atomic: If the address of timer0_data[0] is 0x1230 and timer0_data[1] is 0x1238, then changing the pointer timer0_milli between the two addresses only changes the lower byte of the address 30 <-> 38. In other words since only one byte of the address changes, it is ATOMIC for the program. Therefore the program will always dereference a valid address.

Furthermore the program will always dereference the old timestamp when the interrupt handler triggered at the same time. This eliminates the risk of the program reading the timer that the interrupt handler is updating. This implements an atomic read of the timer for a program without the need to disable interrupts.

If both bullets are implemented, the timer interrupt and milli() should be simpler and faster it seems.

sonyhome:
Nick: The pointer is swapped by the interrupt handler, not milli(). Milli() just dereferences the pointer. Furthermore, alignment solves the issue.

Not good enough. The pointer is two bytes. It can't be swapped atomically.

I can see that if you restrict the pointer change to one byte, the data read would become atomic. However, adding the complexity of that restriction and the pointer swap defeats the goal of "optimization". Normally, that would be the saving of processor time or memory space, or some kind of performance gain to offset some added complexity. But I can't see any of that in this proposal.

sonyhome:
Coding Badly: I am not talking of reading the value twice, no. I talk of storing 2 values.

Which is also pointless.

As I wrote, the way to handle the situation is a "dirty read".

If both bullets are implemented, the timer interrupt and milli() should be simpler and faster it seems.

Please, let us know how that works out. :wink:

aarg:
I can see that if you restrict the pointer change to one byte...

Nope. That "solution" also has a race condition that leads to failure. Solving that new problem degenerates to...

As I wrote, the way to handle the situation is a "dirty read".

Handling alignment of an array in memory is done at compile time, either via a compiler directive or by allocating space for 3 words and setting the pointer to the pair of words that have the same Most Significant Byte. In fact if you cast to uint64_t maybe the compiler would align the address to 64bit.

Coding Badly: explain to me why you think there's a dirty read involved. If you assume the pointer swap is atomic then the solution avoids the program from ever seeing the data while it is being changed.

Nick:
"Not good enough. The pointer is two bytes. It can't be swapped atomically."
Swap is atomic: Only the lower byte of the address changes.
If we were in a multi-threaded context with multiple cores, it would still work, we'd just have to swap at the right time in the signal handler to publish the new value.

Aarg: It saves code complexity, time and does not use that much more space. The restriction is a placement restriction. It can be done via a compiler directive or via a trick, for example:

volatile long timer0_data[2] __attribute__((aligned(64)));

volatile long *timer0_milli = timer0_data[0];

Probably better to leave the code alone, rather than teach thousands of people to use dirty reads. Or would you put that in millis()? By the time you read a long twice, then compare one to the other, and make a decision, I bet it would have been quicker to turn interrupts off and on again. And what if the "dirty read" compares false? You have to do it again. And maybe again. Sounds like a lot of work.

Swap is atomic: Only the lower byte of the address changes.

How do you know that? It might be safer to have an array of two items, and a single-byte index. Then just make the index 0 or 1. That would be atomic.

sonyhome:
If you assume the pointer swap is atomic then the solution avoids the program from ever seeing the data while it is being changed.

An assumption I'm not going to make. It's all very well saying only the low byte changes, but how can you ensure that?

Coding Badly: explain to me why you think there's a dirty read involved.

Apparently, I was not clear. A DIRTY READ IS REQUIRED TO DO WHAT YOU ARE TRYING TO DO. There is no way to avoid it. Every path you take will ultimately lead to a dirty read. Which means you are better served by just doing the simple dirty read.

If you assume...

That right there is the problem. You are making assumptions then failing to confirm they are actually correct.

Here's the next one...

Swap is atomic: Only the lower byte of the address changes.

Prove it. Prove that in all cases the linker places the data the way you expect.

sonyhome:
The down side of this implementation is it uses 1 pointer and two longs instead of two longs and a char.

volatile long timer0_data[2];

volatile long *timer0_milli=timer0_data[0];

SIGNAL(TIMER0_OVF_vect)
{
 long old_time = *timer0_milli;

// Swap pointer
 timer0_milli = (timer0_milli == &timer0_data[0])? &timer0_data[1] : &timer0_data[0];
 *timer0_milli = old_time+1;
}

unsigned long millis() {
 return *timer0_millis;
}

Apart from the fact that your code doesn't compile because of typos (timer0_milli vs timer0_millis, amongst others), I see the flaw now in my response. Ah yes, it is all becoming clear.


sonyhome:
If both bullets are implemented, the timer interrupt and milli() should be simpler and faster it seems.

I didn't actually challenge the "faster" part.

Disassembling, the current implementation of millis() does this:

00000166 <millis>:

unsigned long millis()
{
        unsigned long m;
        uint8_t oldSREG = SREG;
 166:   8f b7           in      r24, 0x3f       ; 63   (1)

        // disable interrupts while we read timer0_millis or we might get an
        // inconsistent value (e.g. in the middle of a write to timer0_millis)
        cli();
 168:   f8 94           cli   (1)
        m = timer0_millis;
 16a:   20 91 14 01     lds     r18, 0x0114   (2)
 16e:   30 91 15 01     lds     r19, 0x0115   (2)
 172:   40 91 16 01     lds     r20, 0x0116   (2)
 176:   50 91 17 01     lds     r21, 0x0117   (2)
        SREG = oldSREG;
 17a:   8f bf           out     0x3f, r24       ; 63   (1)

        return m;
}
 17c:   b9 01           movw    r22, r18   (1)
 17e:   ca 01           movw    r24, r20   (1)
 180:   08 95           ret   (4)

Adding up the cycles (in brackets) and you find the function takes 17 cycles.

Your proposed change (renamed to avoid linker clashes):

unsigned long millis_test() {
  return *timer0_millis_test;
  be:   e0 91 00 01     lds     r30, 0x0100   (2)
  c2:   f0 91 01 01     lds     r31, 0x0101   (2)
  c6:   20 81           ld      r18, Z   (2)
  c8:   31 81           ldd     r19, Z+1        ; 0x01   (2)
  ca:   42 81           ldd     r20, Z+2        ; 0x02   (2)
  cc:   53 81           ldd     r21, Z+3        ; 0x03   (2)
}
  ce:   b9 01           movw    r22, r18   (1)
  d0:   ca 01           movw    r24, r20   (1)
  d2:   08 95           ret   (4)

Your code takes 18 cycles. So it is actually one cycle slower!

Nick,

BTW, sorry I did not actually compile that. It was just pseudo-code to explain the concept.

From your ASM I realize that 32bit words take 8 cycles to load on AVR and are not streamlined D:
FYI: your own analysis points out that millis() is 31 cycles: Gammon Forum : Electronics : Microprocessors : Interrupts
so I was pretty sure this would beat that. :slight_smile:

I still think the signal handler should not do the tick adjustment as it is a critical section. I was thinking
the time saved (well it's not) could be used to move the math into milli(), if the timer can't be changed
to trigger at 250 cycles.

I'm still not sure why it can't...

sonyhome:
I still think the signal handler should not do the tick adjustment as it is a critical section. I was thinking
the time saved (well it's not) could be used to move the math into milli(), if the timer can't be changed
to trigger at 250 cycles.

I'm still not sure why it can't...

OK. Timer 0 is set up for fast PWM in order to support analogWrite, because analogWrite needs to be able to have a duty cycle of 0 to 255.

Now the only PWM modes which support a counter of other than 255 (0xFF) are 5 and 7 from the above chart. However if you use one of those modes it counts up with the "A" side (eg. to 250) and does the PWM on the "B" side (the duty cycle). See Gammon Forum : Electronics : Microprocessors : Timers and counters for screenshots.

So if they implemented your suggestion you would not be able to use the Timer 0 "A" side - in other words you would lose one of the analogWrite pins. And even if that wasn't a problem, the analogWrite duty cycle would now be 0 to 249, not 0 to 255.

The design of the software is a compromise that maximizes the use of the available hardware.

sonyhome:
I still think the signal handler should not do the tick adjustment as it is a critical section. I was thinking
the time saved (well it's not) could be used to move the math into milli(), if the timer can't be changed
to trigger at 250 cycles.

I'm still not sure why it can't...

Maybe it can. The Arduino software is not perfect.

But you should make your case more demonstrably. Don't expect us to write the "better" version and count or measure those precious cycles that you hope to save. You need to convince us. Let's see it. Don't forget to consider the potentially earth-shaking repercussions of adding a 32-bit divide to millis().

By the way, where is that photo taken? I can't quite place it.