ran some simple 1000-byte unconnected SPI performance tests on both Arduino and Maple (plus DMA), and then tested Arduino SD/RTC shield block read/write performance. SD read/write uses readBlock()/writeBlock() (512 byte blocks). summary below. your mileage may vary

unconnected SPI performance (transfer 1000 bytes)
maple spi maple spi/dma
SPI clock read/write read/write
1.125MHz .85mbs 1.125mbs
2.25MHz 1.35mbs 2.25mbs
4.5MHz 1.8mbs 4.5mbs
9MHz 2.2mbs 8.9mbs
18MHz 2.2mbs 17.8mbs
SD/SPI block (512 bytes) read/write Arduino SD/RTC shield
maple spi maple spi/dma
SPI clock read write read write
1.125MHz 0.75mbs 0.64mbs 1mbs 0.84mbs
2.25MHz 1.1mbs 0.91mbs 1.86mbs 1.36mbs
4.5MHz 1.4mbs 1.2mbs 3.19mbs 2.0mbs
9MHz 1.6mbs 1.25mbs 5.1mbs 2.6mbs
18MHz 1.6mbs 1.3mbs 7.1mbs 3.06mbs
arduino avr328/16MHz
SPI SD (with OPTIMIZE_HARDWARE_SPI)
SPI clock read/write read write
2MHz 1.57mbs 1.4mbs 1.2mbs
4MHz 2.6 mbs 2.3mbs 1.7mbs
8MHz 3.8mbs 3.4mbs 2.1mbs