In theory with an extremely fast CPU, yes.
How do you mean?
If It takes W cycles to enter an interrupt, X cycles to begin an SPI transfer, Y cycles to complete an SPI transfer and Z cycles to exit an interrupt, and I skip Y by starting the transfer but not waiting for it to finish, then at the very least, the cost to exit that interrupt should be reduced by however many cycles it takes to transmit 8 bits at 12mhz on a 48mhz processor. Which is probably a lot.
Let's see, 48mhz / 12mhz = 4... And tansmitting 8 bits at 12mhz takes 16 cycles. Times our factor of 4... That's 64 CPU cycles wasted waiting for that byte to finish transmitting if we stick around waiting for the next byte to arrive!
Am I wrong?
In practice with 8 bit AVR or 32 bit ARM Cortex-M0+, with 8 or 12 Mbit/sec SPI, the CPU speed is far too slow. Especially for 2 general purpose libraries like SD and a display, just the function exit-entry-exit-entry to get between the 2 unrelated code bases will eat up nearly all the CPU time.
Yes, if you transfer one byte at a time. But the SDFat Library doesn't do that.
I mean you're right... If you transfer a single byte at a time with an interrupt, you're going to waste boatloads of time entering and exiting that interrupt whether you use one or two SPI busses to transfer the data.
And I don't claim to know how to solve that, if you transfer one byte at a time.
But, if one were to write an SPI function that could transfer blocks of data at a time, then I think you could interleave the transfers, initiating the transfer of one byte for one bus, and then initiating the transfer for the second bus, and then repeating, and only exiting the interrupt when the transfer completes.
Maybe?
I know the WaveHC lib and the SDFat lib sped up their transfers a great deal by transferring 512 bytes at a time, and I know that when I was working with some others to make SPI transfers on the Atmega as fast as possible there were a whole lot of NOPs we had to insert in the loop when we cheated and didn't use the transfer complete bit to get that last bit of speed out. So it stands to reason that there ought to be enough spare cycles in the loop to interleave two transfers. But I have no idea how easy it would be to do this.
Of course, you can try to prove me wrong! Just a small matter of programming, right?
Hey, you said you were curious how I thought it could work. I'm not saying you're wrong. 
And normally I'd be up for the challenge, but I've got my hands full at the moment designing PCBs. I'm just putting some ideas out there!