ATmega32xx Micro-controller I/O eg. GPIO Bit-Banging up to 8MHz

Hi all, after many turns of code optimization for Pin_xx (usually D13) on Arduino UNO and Leonardo boards with 8-bit ATmega328p/32u4 Micro-controllers (MCU's) I come up with something interesting.
So far we all consider empty-endless While(1){...} and For(;;){...} loops to be fastest C/C++ structures that will ensure us minimal possible delays in bit banging of the I/O ports. Unfortunately they consume more then 2T cycles and structure like this:

void loop()
{
  noInterrupts();
  for(;;){
  PORTB = B00100000;
  PORTB = B11011110;
}

Will give us Port B5 eg. Arduino Uno PIN13 signal with frequency less then 4MHz and with distinctive peak on after 2nd PORTB command.

By taking brute force we can make sequence of tens or hundreds repetitions of just assigning bit register values like this:

void loop()
{
  noInterrupts();
  for(;;){
  PORTB = B00100000;
  PORTB = B11011110;
  PORTB = B00100000;
  PORTB = B11011110;
  PORTB = B00100000;
  PORTB = B11011110;
  PORTB = B00100000;
...
  }
}

It will render signal pattern with glitch after last line but it will reach theoretical limits of MCU's (almost as Assembler code will do) and produce Bit-banging signal of almost 1/2 F(cloock) or in Arduino Uno/Leonardo case ~ 8 MHz.
Check attached images >

Question is can we optimize loop code to make faster transition between end and start?

You post only code fragments - so the images are meaningless.

The best you can do Is two clock cycles, the time required for an IJMP or RJMP instruction to take you from the end of the code to the beginning of the code.

Lots of discussion here: Maximum pin toggle speed - Frequently-Asked Questions - Arduino Forum
(but essentially both the "while (1)" and "for (;;)" loops are as fast as they can be.)