SdFat - tuned for 1284p

Hi, as I can see a new developments in SdFat (ie for Teensy, etc) let me kindly ask whether there is a similar update for 1284p (ie utilising larger ram for buffers)..

I have have modified SdFat so large reads/writes will go faster and I will soon post a beta.

I changed the way cache is handled and the SD commands used for large reads and writes.

Increasing internal buffering on AVR is not effective since the CPU cycles to copy data cost so much.

I still use a single block cache to hold partial blocks. Most of a large write is sent to SPI directly from user memory.

If you use a size that is a power of two for a write, all data will be written directly to the SPI bus. Writes will not cross cluster boundaries if you correctly format the card. This allows all data to be written to the SD with a single multi-block write command.

Here is the improvement for a Mega with a 4096 byte write.

Old SdFat:

Type is FAT16
File size 5MB
Buffer size 4096 bytes
Starting write test. Please wait up to a minute
Write 247.37 KB/sec
Maximum latency: 100604 usec, Minimum Latency: 15084 usec, Avg Latency: 16447 usec

Starting read test. Please wait up to a minute
Read 451.70 KB/sec
Maximum latency: 10168 usec, Minimum Latency: 8844 usec, Avg Latency: 9061 usec

New SdFat:

Type is FAT16
File size 5MB
Buffer size 4096 bytes
Starting write test. Please wait up to a minute
Write 535.71 KB/sec
Maximum latency: 21940 usec, Minimum Latency: 6912 usec, Avg Latency: 7601 usec

Starting read test. Please wait up to a minute
Read 595.53 KB/sec
Maximum latency: 7992 usec, Minimum Latency: 6808 usec, Avg Latency: 6872 usec

Write speed has doubled.

This C function is used to send data to the SPI bus:

void spiSendBlock(uint8_t token, const uint8_t* buf) {
  SPDR = token;
  for (uint16_t i = 0; i < 512; i++) {
    uint8_t b = buf[i];
    while (!(SPSR & (1 << SPIF)));
    SPDR = b;
  }
  while (!(SPSR & (1 << SPIF)));
}

It takes about 820 microseconds to execute so the limit for a write is about 625 KB/sec. Converting it to assembly would probably improve performance a lot. Currently I am more interested in Cortex M ports of SdFat.

This will not help 328 Arduinos with small writes. Here is the result for a Uno with 100 byte writes:

Type is FAT16
File size 5MB
Buffer size 100 bytes
Starting write test. Please wait up to a minute
Write 179.75 KB/sec
Maximum latency: 109332 usec, Minimum Latency: 84 usec, Avg Latency: 550 usec

Starting read test. Please wait up to a minute
Read 311.84 KB/sec
Maximum latency: 3500 usec, Minimum Latency: 88 usec, Avg Latency: 314 usec