pico,
You need DMA to get the speedup on Due. There is no FIFO in the SPI controller. The Teensy FIFO allows SPI to go at almost full speed, 24 MHz, without DMA.
For Due, I wrote non-DMA SPI optimized for SD cards and got about twice the standard SPI library speed. The standard SPI library trades speed for features which is probably a good thing.
I will post this version of SdFat soon. I need to do a lot more stress testing.
I have one puzzle, sometimes a DMA read hangs at 42 MHz. To drive read, I use one DMA channel to send a stream of 0XFF bytes to the SPI controller. Data is read from the SPI controller using a second DMA channel. I sometimes get a hang when the 0XFF byte is in the same SRAM bank as the receive buffer.
I need to investigate options. I am using byte transfers for SPI. Other high speed interfaces like Ethernet and HSMCI can use 32-bit transfers. I am using a DMA channel for receive that has an eight deep FIFO. I will try a channel with a 32 entry FIFO.