You don't need buffering to use the streaming multi-block mode. I do it in my binaryLogger on the 328. You just send a multi-block write command to the SD and then write a block each time a block buffer is full. Finally you send a write end command to the SD.
The binaryLogger is an example in fastLoggerBeta20110802.zip http://code.google.com/p/beta-lib/downloads/list
. I will soon post another example that uses this mode to log 100,000 8-bit samples per second from the built-in AVR ADC.
An absolute minimum time to write a block due to SPI speed is 520 us. It takes longer since you must fetch the data, check that the SPI data register is empty, and do loop control. The current SdFat block write function takes about 820 us to write a block or about 620 KB per second.
This is the max rate for streaming in multi-block raw write mode. It might be possible to improve a bit on this with some cleaver assembly code in the loop. Currently I have optimized it with two bytes per iteration.
In a practical application like my fast ADC logger which uses this mode, the main overhead is the ISR for the ADC. At 100,000 samples per second that's two interrupts ever 10 us. One to clear the timer flag which starts the next conversion and a conversion done interrupt to read the data. The SD write is a small part of the overhead. The write needs to be reliable with no random delays and streaming raw write does that. On a Mega I use thirteen 512 byte buffers to increase reliability. This means a write can take as much as 65,000 us and still not lose data.
For normal file operations extra cache won't payoff on the Arduino. The AVR is at most a few percent as fast as the STM32F4 for data handling. SDIO is a 4-bit bus and much faster so the Arduino is hopelessly outclassed here also.