Have I made this hardware SPI transfer as fast as possible?

It would be at least 16 because SPI is running at half the clock speed and you have to clock out 8 bits. Someone above suggested the 17th cycle to handle the final clock transition before starting again.