oqibidipo:
The shift is by a constant amount so it can be calculated at compile time.
Sorry, I get what you are saying now.
PORTG |= (1 << PG5)
The "(1 << PG5)" part can be calculated at compile time, so only the "PORTG |= " is executed at run time, which would only take a couple of cycles.
Checking the AVR's shift and rotate instructions, as far as I can see they can only shift by one bit position in a single instruction, so for the operation "1 << N" where N is not a constant, the compiler would have to use N shift or rotate instructions, taking N+ cycles to execute.
EDIT: I should have read to the end of the thread before responding. Just saw post #15. Thanks, westfw.
Interesting. Always intrigued by timing issues, I decided to run a test. The code snippet is simply the case in a command parser to input the value of n so the pre-processor may (or may not) muck with it.
case 48 ... 57:
n = _ch - 48;
data = 0;
_delay_ms(100); // settling
start_time = nanos();
data |= 1 << n;
end_time = nanos();
duration = calc_nanos(start_time, end_time);
sprint("** Command test byte data = 0; data |= 1 << ");
prti(n);
sprint("; data = ");
prti(data);sprint(" ");tab();
prtd(duration);
sprint("ns **");
break;
And the output is
_nanos:328P:1
** Command test sbi 125.00ns **
** Command test cbi 125.00ns **
** Command test nop() 62.50ns **
** Command test byte data = 0; data |= 1 << 0; data = 1 250.00ns **
** Command test byte data = 0; data |= 1 << 1; data = 2 250.00ns **
** Command test byte data = 0; data |= 1 << 2; data = 4 250.00ns **
** Command test byte data = 0; data |= 1 << 3; data = 8 250.00ns **
** Command test byte data = 0; data |= 1 << 4; data = 16 250.00ns **
** Command test byte data = 0; data |= 1 << 5; data = 32 250.00ns **
** Command test byte data = 0; data |= 1 << 6; data = 64 250.00ns **
** Command test byte data = 0; data |= 1 << 7; data = 128 250.00ns **
The first three output lines are simply reference checks.
Now the interesting bit. Declaring n and data as volatile results in (drum roll please),
_nanos:328P:1
** Command test sbi 125.00ns **
** Command test cbi 125.00ns **
** Command test nop() 62.50ns **
** Command test byte data = 0; data |= 1 << 0; data = 1 812.50ns **
** Command test byte data = 0; data |= 1 << 1; data = 2 1062.50ns **
** Command test byte data = 0; data |= 1 << 2; data = 4 1312.50ns **
** Command test byte data = 0; data |= 1 << 3; data = 8 1562.50ns **
** Command test byte data = 0; data |= 1 << 4; data = 16 1812.50ns **
** Command test byte data = 0; data |= 1 << 5; data = 32 2062.50ns **
** Command test byte data = 0; data |= 1 << 6; data = 64 2312.50ns **
** Command test byte data = 0; data |= 1 << 7; data = 128 2562.50ns **
So, even with a variable shift the pre-processor is still capable of working some magic.
You're supposed to notice the first case. In the second case with both variables volatile, the pre-processor is essentially being told 'hands off'. In the first case, with the variables simply declared locally, whatever the pre-processor does results in each operation taking only 4 clks, regardless of how many positions require shifting. Sometimes it's useful/mandatory to declare variables as volatile, other times not. Helpful to know when.
Oh, wait. Did you mean that the magic is that the time is constant when n is not declared volatile? Yes, I see. How do you think it could be doing that? Somehow it is working out that n is incrementing each time, and so it can calculate "1 << n" by performing one more shift on the previous value of "1 << n"? That is clever!
Just to round out the results, declaring data and n as global (not volatile),
_nanos:328P:1
** Command test sbi 125.00ns **
** Command test cbi 125.00ns **
** Command test nop() 62.50ns **
** Command test byte data = 0; data |= 1 << 0; data = 1 812.50ns **
** Command test byte data = 0; data |= 1 << 1; data = 2 1125.00ns **
** Command test byte data = 0; data |= 1 << 2; data = 4 1437.50ns **
** Command test byte data = 0; data |= 1 << 3; data = 8 1750.00ns **
** Command test byte data = 0; data |= 1 << 4; data = 16 2062.50ns **
** Command test byte data = 0; data |= 1 << 5; data = 32 2375.00ns **
** Command test byte data = 0; data |= 1 << 6; data = 64 2687.50ns **
** Command test byte data = 0; data |= 1 << 7; data = 128 3000.00ns **
So pre-processor cannot do anything. Further though, declaring global and volatile results the same as local and volatile - quicker than global, non-volatile (by one clk per).
Yes, that's strange. The global variables might be stored on the heap rather than the stack, but that should not make any difference to the timing, unless the ATmega has a small, fast cache memory that can save a cycle compared to going to the main 2K ram.
What if, with n and data as non-volatilite and local, instead of incrementing n from 0 to 7, you go from 7 down to 0, or choose it's value randomly each time? Is the compiler able to do any magic then?
PaulRB:
What if, with n and data as non-volatilite and local, instead of incrementing n from 0 to 7, you go from 7 down to 0, or choose it's value randomly each time? Is the compiler able to do any magic then?
The value of n is not in a loop, it's keyboard entry so no joy there.
_nanos:328P:1
** Command test sbi 125.00ns **
** Command test cbi 125.00ns **
** Command test nop() 62.50ns **
** Command test byte data = 0; data |= 1 << 4; data = 16 250.00ns **
** Command test byte data = 0; data |= 1 << 7; data = 128 250.00ns **
** Command test byte data = 0; data |= 1 << 2; data = 4 250.00ns **
** Command test byte data = 0; data |= 1 << 6; data = 64 250.00ns **
** Command test byte data = 0; data |= 1 << 1; data = 2 250.00ns **
** Command test byte data = 0; data |= 1 << 5; data = 32 250.00ns **
** Command test byte data = 0; data |= 1 << 3; data = 8 250.00ns **
** Command test byte data = 0; data |= 1 << 0; data = 1 250.00ns **