Clock speed logic

I'm trying to switch some outputs at true increments of the clock speed, i.e. every 64ns or so for an Arduino Uno. I'm having some luck with a very basic program, but things slow down when it gets more complex, e.g. when I loop over a larger variable. Curious as to know why. An example:

int del = 0;

void setup() {
  DDRB=0xFF; 
}

void loop() {
  cli();  // disable interrupts

  do {
    PORTB=0b00000000;
    PORTB=0b00000000;
    PORTB=0b00000000; 
    PORTB=0b00000000; 
    PORTB=0b00000000;
    PORTB=0b00000011;  // turn 2 monitored outputs on to show start of loop
    PORTB=0b00000000;
    del = 3;
    do {
      PORTB=0b00000001;  // toggle an output 'del' times
      PORTB=0b00000000;
      del -= 1;
    } while (del > 0);
    PORTB=0b00000011;  // turn 2 monitored outputs off to show end of loop
    PORTB=0b00000000;


  } while (true);  // infinite loop
}

When monitoring the outputs with a o-scope, the output toggles at the clock speed, i.e. with a period of 8 MHz. However, if I increase the variable "del" to be 4, then the loop speed slows down to 2 MHz. Any ideas why this is and how to avoid it?

I think your best bet would be to look at the assembly the compiler generates and see what it's doing with the loop. Honestly, I'm surprised the frequency of the output ever gets to 8 MHz. You realize each line of C code doesn't necessarily correspond to one clock cycle?

Yea I was impressed that it works at all, and now I want to see how far I can push it :). Was inspired by this post: http://forum.arduino.cc/index.php/topic,36729.0.html

Probably, when del = 3, compiler unrolled loop , and when del = 4 , run it as it written, 4 operation - set port + minus + compare + jmp ,
For sure, better to check in assembly .elf file

I just checked - it is unrolling the loop. It also unrolls the loop for me when del=4.

Exactly, Magician. The compiler unrolled the loop and did straight bit banging. It put a 3 in one register and a 0 in the zero register and alternated the out instructions. Once I use 4 for del, it put them in a loop with an adiw, a cpi, a cpc, and a brne instruction. Much slower.

I think there is a way to set the timer2 up in fastPWM mode with a prescaler of 1. Couldn't you get 1 clock high and 1 clock low out of that?