I agree having to check if DMA is finished before each other transfer is a big waste of time, I started rewriting the library that way. I made so many small changes to try to improve speed even a bit more, that I don't remember if I finished that one, or was something else, but resulted in losing a big of speed in the screen test.
About the Fifo, I thought on it, then I checked the specifications for the chip to see if we can read where the DMA transfer pointer is at any given time. Not possible

so even though we could enable circular mode, and keep a pointer on where are we writing, we wouldn't be able to know if we passed ahead of the DMA controller. According to the specs, you can write the start address and fire the transfer, but the pointer is only writable and not readable. I didn't check thought to see if there is any pointer that tells you how many bytes has transferred so far, so you can write a routine to see within a circular buffer what position would that be.
Having the SPIDMA transfer is possibly the best solution, so it can be used for anything.
That way you can use the library to initialize the display and draw lines and text, but for dumping a whole bitmap you could call the library function to set the tft coordinates, fire a DMA, while the DMA is going prepare the next scan line, when it is finished set new tft coordinates and fire another DMA.
In all reality, that's fairly easy to do as of right now. We can copy the functions I wrote in that library, and take the While out.
Currently I think it passes only 2 parameters, the buffer pointer, and the byte count.
EDIT: Just re-read what you wrote, I said basically the same again.
I only have 1 concern. I don't think the DMA transfer function should wait with a while inside, again wasted time, I would rather make it return an error code. The main program can check the return and know, well this didn't fire, I'll do something else and try again in X time. Or if there is nothing to do, keep retrying, but if it doesn't finish in certain period, for whatever reason, can call an abort function. That would be better than unexpectedly sitting in a while loop, don't you think?
Re. the SDcard. I forked the new sdcard library and added the code for DMA to it, but I couldn't get it to work. I only have 2 sd cards around so I'm not sure if it is the code (most likely not, as it is almost to the letter the one I used in the ili due library), or the new library that doesn't work fine in the maple, or rather my sdcards.
What sdcard library and sketch have you used successfully in the Maple so far? I just want to make sure my SDcard get detected so I can use them for troubleshooting.