STM32, Maple and Maple mini port to IDE 1.5.x

Ray, I agree it should be compatible with old code, I think that should be the 1st concern.
If we add a new DMA transfer function, with a new name such as SPI.DMATransfer (), it should not interfere at all with previous code that doesn't use it.

About the callback, I believe it should be easy to implement with the ISR.

I propose something like this:
1 TX SPI.DMATXTransfer (*char src, uint16_t n bytes, *callback pointer OR optional CS to toggle before return); any returned byte is dropped.

1 RX SPI.DMARXTransfer (*char dst, *char src OR byte (where byte is to be sent over and over for n times), uint16_t n bytes, *callback pointer OR optional CS to toggle before return);

Internally, it if the user pass a callback pointer, it sets the ISR to call that callback on interrupt, fires the transfer, returns right away and will notify with the callback. If the user instead pass a CS to toggle, it sets the ISR to call a callback defined within the library to toggle the pin for the user. If the CS is passed as a parameter, it does not return until transfer is over.

My logic on that is, if the user does not need to do other stuff while the DMA is going on, it simplifies the use of the library, specially for new users. We can even remove the CS toggling from here altogether, and have the callback pointer as optional. If not passed as parameter, it will use it's own and block until complete transfer.
If the user wants to use the callback, then I think CS is better managed outside the DMATransfer function.

Just for reference, this is how it is done in the SdFat and the ILI_due libraries right now. The structure is copied from how they had it set up in the due, it could be simplified more:
https://github.com/greiman/SdFat/blob/master/SdFat/SdSpiSTM32F1.cpp

One more note, why I want to pass 2 pointers to the RX function with a buffer to send?
Imagine you need to send 4 bytes with a command, and receive 512.
Instead of setting a 4 bytes DMATX, and then a 512 DMARX sending blanks, you could set up a 516 bytes transfer. In the TX buffer write your 4 commands at the start of the buffer and fill blank the rest. The receive buffer 516 bytes long too. Then you fire a RX that will transmit 4 bytes with your command, and receive 4 empty responses to those 4, and the 512 bytes that you were expecting. With a single DMA setup you sent the command and received the data.
Given that I think buffers should be setup once with their maximum size and just used over and over, I don't see a problem wasting a bit of memory. You can even setup multiple command, blanks n bytes, command, black n bytes if you know you will need to do those 2 transmissions in sequence in advance. If you don't want to use the feature, just send the commands by yourself, and set the RX to just send 00 or FF over and over, doesn't need a buffer to do that.