The next generation of consumer SD cards will be even slower on Arduino.
Here is the result of running bench on a 64GB SDXC card.
Write 116.69 KB/sec
Maximum latency: 192708 usec, Minimum Latency: 84 usec, Avg Latency: 851 usec
Read 296.70 KB/sec
Maximum latency: 2980 usec, Minimum Latency: 80 usec, Avg Latency: 331 usec
The reason is that new cards have very dense TLC (Triple Level Cell) NAND flash. Three bits are stored in the gate of the one transistor cell as eight levels of charge, each level differing by less than 100 electrons.
These chips are designed for very large contiguous writes. The Arduino does not have sufficient buffering to achieve this.
The page size of these chips is much larger than a 512 byte block so in only part of a page is used when a single 512 byte block is written. The rest of the page is "dead" meaning unusable until it is erased.
Eventually the SD controller collect these blocks and rewrites them to a new page. This is a very high overhead process.
In addition TLC flash is very susceptible to wear so the Arduino's inefficient use of the device causes lots of data movement. The Erase Block Size for these flash chips can be 256KB. Moving this much data take a long time.
Even unchanged data gets moved with the wear-leveling algorithms.
Blocks that contain static data with erase counts that begin to lag behind other blocks will be included in the wear-leveling block pool, with the static data being moved to blocks with higher erase counts.
The 64GB SDXC card I tested can have an occasional write latency of almost 200 ms.
If you want to learn more about the internals of SD cards and modern NAND flash here are some links.
http://www.eetindia.co.in/STATIC/PDF/200809/EEIOL_2008SEP22_STOR_AN_01.pdf