Assembly too slow

I'm trying to make cycle-adjusted code in Assembly, but for some reason every operation takes twice as much time as it should. I'm working on an Arduino Leonardo and this is my sketch:

int main()
{
  // set pin 13 as output
  asm("SBI    0x07, 7   \n\t");
  // toggle pin 13
  asm(
    "_loop:             \n\t" // cycles
    "SBI    0x06, 7     \n\t" // 2
    "NOP                \n\t" // 1
    "NOP                \n\t" // 1
    "CBI    0x06, 7     \n\t" // 2
    "RJMP   _loop       \n\t" // 2
  );
}

The Arduino Leonardo runs at 16MHz, my code is 8 cycles long. On the oscilloscope the period of the signal takes 1us, 8/(1us) = 8MHz. When I remove one of the nops I see a decrease of 125ns, which is 8MHz cycle, not 16MHz... I don't know what to think of it, I have a 16MHz oscillator at 5V but it seems it runs at 8MHz. Does anybody have an explanation?

Did you re-flash the fuses? Maybe you run the internal 8Mhz oscillator.

You might be writing to the PINx register rather than the PORTx register - this would toggle the output once per loop which could explain why the frequency is exactly half what you expect.

Yes, address 0x06 is PINC - try using PORTC - address 0x08.

majenko:
Yes, address 0x06 is PINC - try using PORTC - address 0x08.

That was it, thanks.