Low level Port Manipulation is not working

oqibidipo:
The shift is by a constant amount so it can be calculated at compile time.

Sorry, I get what you are saying now.

PORTG |= (1 << PG5)

The "(1 << PG5)" part can be calculated at compile time, so only the "PORTG |= " is executed at run time, which would only take a couple of cycles.

Checking the AVR's shift and rotate instructions, as far as I can see they can only shift by one bit position in a single instruction, so for the operation "1 << N" where N is not a constant, the compiler would have to use N shift or rotate instructions, taking N+ cycles to execute.

EDIT: I should have read to the end of the thread before responding. Just saw post #15. Thanks, westfw.

Interesting. Always intrigued by timing issues, I decided to run a test. The code snippet is simply the case in a command parser to input the value of n so the pre-processor may (or may not) muck with it.

case 48 ... 57:
    n = _ch - 48;
    data = 0;
    _delay_ms(100); // settling
    start_time = nanos();
        data |= 1 << n;
    end_time   = nanos();
    duration = calc_nanos(start_time, end_time);
    sprint("** Command test byte data = 0; data |= 1 << ");
    prti(n);
    sprint("; data = ");
    prti(data);sprint(" ");tab();
    prtd(duration);
    sprint("ns **");
    break;

And the output is

_nanos:328P:1

** Command test sbi 125.00ns **
** Command test cbi 125.00ns **
** Command test nop() 62.50ns **
** Command test byte data = 0; data |= 1 << 0; data = 1 250.00ns **
** Command test byte data = 0; data |= 1 << 1; data = 2 250.00ns **
** Command test byte data = 0; data |= 1 << 2; data = 4 250.00ns **
** Command test byte data = 0; data |= 1 << 3; data = 8 250.00ns **
** Command test byte data = 0; data |= 1 << 4; data = 16 250.00ns **
** Command test byte data = 0; data |= 1 << 5; data = 32 250.00ns **
** Command test byte data = 0; data |= 1 << 6; data = 64 250.00ns **
** Command test byte data = 0; data |= 1 << 7; data = 128 250.00ns **

The first three output lines are simply reference checks.
Now the interesting bit. Declaring n and data as volatile results in (drum roll please),

_nanos:328P:1

** Command test sbi 125.00ns **
** Command test cbi 125.00ns **
** Command test nop() 62.50ns **
** Command test byte data = 0; data |= 1 << 0; data = 1 812.50ns **
** Command test byte data = 0; data |= 1 << 1; data = 2 1062.50ns **
** Command test byte data = 0; data |= 1 << 2; data = 4 1312.50ns **
** Command test byte data = 0; data |= 1 << 3; data = 8 1562.50ns **
** Command test byte data = 0; data |= 1 << 4; data = 16 1812.50ns **
** Command test byte data = 0; data |= 1 << 5; data = 32 2062.50ns **
** Command test byte data = 0; data |= 1 << 6; data = 64 2312.50ns **
** Command test byte data = 0; data |= 1 << 7; data = 128 2562.50ns **

So, even with a variable shift the pre-processor is still capable of working some magic.

Magic? I don't see it. The line of code is taking 812.5ns (13 cycles) plus 250ns (4 cycles) for each additional shift position.

You're supposed to notice the first case. In the second case with both variables volatile, the pre-processor is essentially being told 'hands off'. In the first case, with the variables simply declared locally, whatever the pre-processor does results in each operation taking only 4 clks, regardless of how many positions require shifting. Sometimes it's useful/mandatory to declare variables as volatile, other times not. Helpful to know when.

Oh, wait. Did you mean that the magic is that the time is constant when n is not declared volatile? Yes, I see. How do you think it could be doing that? Somehow it is working out that n is incrementing each time, and so it can calculate "1 << n" by performing one more shift on the previous value of "1 << n"? That is clever!

Normally the drum roll comes before the magic, not after it :wink:

Just to round out the results, declaring data and n as global (not volatile),

_nanos:328P:1

** Command test sbi 125.00ns **
** Command test cbi 125.00ns **
** Command test nop() 62.50ns **
** Command test byte data = 0; data |= 1 << 0; data = 1 812.50ns **
** Command test byte data = 0; data |= 1 << 1; data = 2 1125.00ns **
** Command test byte data = 0; data |= 1 << 2; data = 4 1437.50ns **
** Command test byte data = 0; data |= 1 << 3; data = 8 1750.00ns **
** Command test byte data = 0; data |= 1 << 4; data = 16 2062.50ns **
** Command test byte data = 0; data |= 1 << 5; data = 32 2375.00ns **
** Command test byte data = 0; data |= 1 << 6; data = 64 2687.50ns **
** Command test byte data = 0; data |= 1 << 7; data = 128 3000.00ns **

So pre-processor cannot do anything. Further though, declaring global and volatile results the same as local and volatile - quicker than global, non-volatile (by one clk per).

Yes, that's strange. The global variables might be stored on the heap rather than the stack, but that should not make any difference to the timing, unless the ATmega has a small, fast cache memory that can save a cycle compared to going to the main 2K ram.

What if, with n and data as non-volatilite and local, instead of incrementing n from 0 to 7, you go from 7 down to 0, or choose it's value randomly each time? Is the compiler able to do any magic then?

PaulRB:
What if, with n and data as non-volatilite and local, instead of incrementing n from 0 to 7, you go from 7 down to 0, or choose it's value randomly each time? Is the compiler able to do any magic then?

The value of n is not in a loop, it's keyboard entry so no joy there.

For your files,

_nanos:328P:1

** Command test sbi	125.00ns **
** Command test cbi	125.00ns **
** Command test nop()	62.50ns **
** Command test byte data = 0; data |= 1 << 4; data = 16 	250.00ns **
** Command test byte data = 0; data |= 1 << 7; data = 128 	250.00ns **
** Command test byte data = 0; data |= 1 << 2; data = 4 	250.00ns **
** Command test byte data = 0; data |= 1 << 6; data = 64 	250.00ns **
** Command test byte data = 0; data |= 1 << 1; data = 2 	250.00ns **
** Command test byte data = 0; data |= 1 << 5; data = 32 	250.00ns **
** Command test byte data = 0; data |= 1 << 3; data = 8 	250.00ns **
** Command test byte data = 0; data |= 1 << 0; data = 1 	250.00ns **

Yes, its very clever. My theory of how it might work is clearly wrong. Intriguing!

Time for a new theory. Be useful to find out what it's doing.