Input-Output as 8 MHz

Hi friends. This forum help me a lot in my search of speeding I/O with Arduino. Now I will share something that I found that I did not see posted before. Although the maximum speed at which a bit can be toggled should be 2.67 MHz, with this simple routine I could reach 8 MHz:
For example, for toggling bit A2 while A5 is high:

cli();
int i=0;
for(i = 0; i < 4; i++){

PORTC = 0x26;
PORTC = 0x22;

}

I did not test whether other ports can be used in this way. If you change the number of cycles to over 4 the routine get slow (~2.66 MHz). So if you need more than 4 cycles you have to repeat the portion of code. Putting one cycle inside another one does not work.
It worked wery nice to clock a controller.

I tested it with a chinese Arduino Uno.

Any comments ?

On an ATMega Arduino try this, which also toggles bit 2 of PORTC:

cli();
byte i=0;
for(i = 0; i < 4; i++) PINC = 4;

Also try eliminating the loop, and inlining as many "PINC = 4;" statements as you want. Check the ATMega data sheet to see why this works.

Hi Jremington. I see this. You say that for small loops the compiler is using these opcodes and that's why my code works at 8 MHz ?

A statement like:
      PORTC = 0x22;is translated by the compiler directly into one or two machine instructions and is much, much faster than using digitalWrite().

The technique is called "direct port access" in the Arduino world.

Oh, yes ! I already knew that, and that's why I did not use digitalWrite.
But I have read in at least 4 different places (including many posts on a thread at this forum) that the maximum speed in such a direct port addressing way was 2.67 MHz, given the number of clock cycles needed for each I/O instruction.

Now I see that I am having 8 MHz, so the toggle of the bit is occurring in just one clock cycle. I will test your instructions to see its speed.

You can probably make it faster by hanging your for loop to

for(byte i = 4; i;--i) {}

That way, the for condition becomes a check for zero - the compiler does not have to subtract 4 from the number to do the check. Note also that I use a predecrement operator. It probably doesn't matter here, but with a post decrement you use the value that a variable used to be, whereas with a predecrement the chip can simply fetch, subtract, store, and jump if not zero.

For small loops, the compiler is "unrolling" your loop, producing:

 96a:   f8 94           cli
 96c:   96 e2           ldi     r25, 0x26       ; 38
 96e:   98 b9           out     0x08, r25       ; 8
 970:   82 e2           ldi     r24, 0x22       ; 34
 972:   88 b9           out     0x08, r24       ; 8
 974:   98 b9           out     0x08, r25       ; 8
 976:   88 b9           out     0x08, r24       ; 8
 978:   98 b9           out     0x08, r25       ; 8
 97a:   88 b9           out     0x08, r24       ; 8
 97c:   98 b9           out     0x08, r25       ; 8
 97e:   88 b9           out     0x08, r24       ; 8

You could produce that as much as you want, explicitly. Probably.

      PORTC = 0x26;
      PORTC = 0x22;
      PORTC = 0x26;
      PORTC = 0x22;
      PORTC = 0x26;
      PORTC = 0x22;
      PORTC = 0x26;
      PORTC = 0x22;
      PORTC = 0x26;
      PORTC = 0x22;

You should also be able to get 8MHz with:

   PINC = 4;
   PINC = 4;
   PINC = 4;
   PINC = 4;
   PINC = 4;

It's a little to hard to predict exactly what the compiler will do; perhaps (since it's invoked with -Os), it will decide to move repeated occurrences INTO a loop. If you're going to count on the behavior, you need to check what the compiler is actually doing by looking at the object code. (and doing so again, every time something changes. Sigh.)
Or you can use inline assembler. Or real assembly language code in a separate .S file.