SPI maximum speed and optimization

The 1.6.x core added a nop into the SPI transfer method with a comment stating that it improves performance at "maximum speed". What is maximum speed?

  inline static uint8_t transfer(uint8_t data) {
    SPDR = data;
    /*
     * The following NOP introduces a small delay that can prevent the wait
     * loop form iterating when running at the maximum speed. This gives
     * about 10% more speed, even if it seems counter-intuitive. At lower
     * speeds it is unnoticed.
     */
    asm volatile("nop");
    while (!(SPSR & _BV(SPIF))) ; // wait
    return SPDR;
  }

Maximum speed would be SPI_CLOCK_DIV2, which is half the processor speed.

I am presuming that the NOP tweaks the timing of the loop slightly, so that the loop completes in such a way that it saves having to do one more "branch back", thus possibly saving a couple of clock cycles.

Thanks, that really should have been obvious to me. For some reason I was thinking that it depended on processor speed but obviously it doesn't.