Have I made this hardware SPI transfer as fast as possible?

Apparently I missed Nick's post where he disassembled it and added the cycle count. :confused:

So I did it manually myself:

    do {       
      SPDR = *--thisLED; WAIT;        
     846:	82 91       	ld	r24, -Z								2
     848:	8e bd       	out	0x2e, r24	; 46							1
     84a:	00 00       	nop										1
     84c:	00 00       	nop										1
     84e:	00 00       	nop										1
     850:	00 00       	nop										1
     852:	00 00       	nop										1
     854:	00 00       	nop										1
     856:	00 00       	nop										1
     858:	00 00       	nop										1
     85a:	00 00       	nop										1
     85c:	00 00       	nop										1
     85e:	00 00       	nop										1
     860:	e4 17       	cp	r30, r20								1
     862:	f5 07       	cpc	r31, r21								1
     864:	81 f7       	brne	.-32     	; 0x846 <_Z10updateLEDsv+0x6c> 		1/2
      SPDR = *--thisLED; WAIT;        
    } while (thisLED != lastLED); // thisLED is decremented one last time before we hit the end of the loop, so after byte 0 transfers, the loop exits.

    WAIT; // Wait for last byte to finish transfer. 
     866:	00 00       	nop										1
     868:	00 00       	nop										1
     86a:	00 00       	nop										1
     86c:	00 00       	nop										1
     86e:	00 00       	nop										1
     870:	00 00       	nop										1
     872:	00 00       	nop										1
     874:	00 00       	nop										1
     876:	00 00       	nop										1
     878:	00 00       	nop										1
     87a:	00 00       	nop										1
    SPSR = SPSR & ~_BV(SPIF); // Clear transfer flag.
     87c:	8d b5       	in	r24, 0x2d	; 45							1
     87e:	8f 77       	andi	r24, 0x7F	; 127							1
     880:	8d bd       	out	0x2d, r24	; 45							1

Obviously I can get rid of some nop's after the loop there but I gotta go back over the posts and see what was said should be the theoretical minimum for the SPI transfer cycles.