Chan's website is very good but most of the stuff about performance of SD cards is out of date.
Wear leveling in modern TLC flash can dominate latency. Modern cards move data that has not been changed for a long time to a new physical location. Flash controllers have various amounts of RAM buffering and use different algorithms so generalizations like small cards are faster just isn't true.
Write latency also depends on the access pattern. 328 Arduinos have limited memory so there are not many caching options.
Small cards are often formatted with small clusters which increases the overhead since the FAT must be accessed often.
For embedded applications cards with SLC flash tends to be best. Modern consumer cards use MLC or TLC flash. Industrial SD cards mostly use SLC flash. I have had good luck with this cheap Industrial card http://www.newegg.com/Product/Product.aspx?Item=9SIA12K0CT6829
No card has uniform low write latency. All cards have occasional longer latencies. You must design fast data loggers with this in mind to avoid data loss.
I am now working on improved performance for SdFat with Cortex chips and the Mega. Here is a link to some results for the Due with DMA SPI http://arduino.cc/forum/index.php/topic,134512.0.html