try editing SPI.cpp and changing the occurances of SPI_CSR_DLYBCT(1) to SPI_CSR_DLYBCT(0)
I believe the (1) will cause 32 clocks between transfers, and I believe that the comment about CS is wrong since CS is manipulated manually (SPI_CSR_CSAAT - although I'm not sure why that's missing from setClockDivider()