Go Down

Topic: SdFatEX - up to 10X faster writes (Read 1 time) previous topic - next topic

fat16lib

Can you use higher SD write performance?  A factor of ten better on Due, and double on Uno.

Modern SD cards have very poor performance on most development boards since they only emulate 512 byte blocks devices.

I have been experimenting with a new high performance class, SdFatEX.  This class uses extended multi-block transfers to get higher performance.

The down side is that a dedicated SPI bus is required since the card must remain selected to maximize the size of multi-block transfers.

This is not a problem on boards like the STM32 Maple Mini, about $5.00 on ebay, since it has two SPI controllers.  It is supported by Arduino for STM32.  The Maple Mini has 18 MHz SPI but other STM32 board have up to 50 MHz SPI and get 10X improvement.

Here are some results for 512 byte transfers with the SdFat bench example.  The card is a SanDisk 16GB Ultra MicroSD.

Due with standard SdFat:
Quote
FreeStack: 91587
Type is FAT32
Card size: 15.93 GB (GB = 1E9 bytes)

Manufacturer ID: 0X3
OEM ID: SD
Product: SL16G
Version: 8.0
Serial number: 0X91203A25
Manufacturing date: 4/2014

File size 5 MB
Buffer size 512 bytes
Starting write test, please wait.

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
340.35,92031,1149,1502
342.19,21583,1147,1494

Starting read test, please wait.

read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
1697.68,1230,286,300
1698.26,599,287,299
Due SdFatEX:
Quote
FreeStack: 91579
Type is FAT32
Card size: 15.93 GB (GB = 1E9 bytes)

Manufacturer ID: 0X3
OEM ID: SD
Product: SL16G
Version: 8.0
Serial number: 0X91203A25
Manufacturing date: 4/2014

File size 5 MB
Buffer size 512 bytes
Starting write test, please wait.

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
4440.21,7298,110,113
4464.00,6885,110,112

Starting read test, please wait.

read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
4616.51,1230,108,109
4620.78,981,108,109
Maple Mini with SdFat:
Quote
FreeStack: 11799
Type is FAT32
Card size: 15.93 GB (GB = 1E9 bytes)

Manufacturer ID: 0X3
OEM ID: SD
Product: SL16G
Version: 8.0
Serial number: 0X91203A25
Manufacturing date: 4/2014

File size 5 MB
Buffer size 512 bytes
Starting write test, please wait.

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
310.83,77589,1273,1645
312.95,21471,1256,1634

Starting read test, please wait.

read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
1164.07,872,425,438
1164.07,872,425,438
Maple Mini SdFatEX:
Quote
FreeStack: 11791
Type is FAT32
Card size: 15.93 GB (GB = 1E9 bytes)

Manufacturer ID: 0X3
OEM ID: SD
Product: SL16G
Version: 8.0
Serial number: 0X91203A25
Manufacturing date: 4/2014

File size 5 MB
Buffer size 512 bytes
Starting write test, please wait.

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
1946.92,91744,247,260
2038.19,7978,247,249

Starting read test, please wait.

read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
2053.26,1511,247,248
2053.26,1325,247,248
Uno standard SdFat:
Quote
FreeStack: 555
Type is FAT32
Card size: 15.93 GB (GB = 1E9 bytes)

Manufacturer ID: 0X3
OEM ID: SD
Product: SL16G
Version: 8.0
Serial number: 0X91203A25
Manufacturing date: 4/2014

File size 5 MB
Buffer size 512 bytes
Starting write test, please wait.

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
243.59,77036,1740,2095
244.59,18120,1740,2086

Starting read test, please wait.

read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
507.63,2000,980,1002
507.74,1996,980,1002
Uno SdFatEX:
Quote
FreeStack: 550
Type is FAT32
Card size: 15.93 GB (GB = 1E9 bytes)

Manufacturer ID: 0X3
OEM ID: SD
Product: SL16G
Version: 8.0
Serial number: 0X91203A25
Manufacturing date: 4/2014

File size 5 MB
Buffer size 512 bytes
Starting write test, please wait.

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
677.83,75824,728,748
684.61,29980,728,741

Starting read test, please wait.

read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
641.89,2368,784,791
641.97,2368,784,791

Flinkly

where would one find the library to download/implement?  checked github where i pulled your sdfat, and only found sdfat-beta.

thanks

fat16lib

#2
Jul 28, 2016, 10:41 pm Last Edit: Jul 28, 2016, 10:42 pm by fat16lib
I am testing now and it will be in the the next SdFat-beta.

It's taking a bit longer than I planed since I decided to restructure SdFat to simplify configuration options.

pito

#3
Aug 05, 2016, 09:07 pm Last Edit: Aug 05, 2016, 09:20 pm by pito
Quote
The Maple Mini has 18 MHz SPI
FYI - the Maple Mini (and the Blue/Red Pills) has got two SPIs - the SPI1 @36MHz max, and the SPI2 @18MHz max.
The SPI1 @36MHz works fine with SdFat beta and CL10 cards..
FYI - MMini @36MHz and SdFat beta, Sandisk Ultra 16GB CL10 UHS-I:
Code: [Select]
File size 5 MB
Buffer size 8192 bytes
Starting write test, please wait.

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
3073.26,8747,2532,2659
3052.61,9452,2580,2679
3052.61,10577,2536,2679
3048.88,12224,2537,2681
3013.94,14400,2538,2713
3043.31,15502,2537,2687

Starting read test, please wait.

read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
2997.67,3104,2724,2732
2995.88,3103,2688,2732
2995.88,3112,2687,2732
2997.67,3103,2687,2732
2995.88,3104,2683,2733
2995.88,3106,2686,2732

fat16lib

#4
Aug 06, 2016, 02:44 am Last Edit: Aug 06, 2016, 03:54 am by fat16lib
pito,

Yes, I pointed out in the original post that Maple Mini has two SPI controllers.

I found not all STM32F103x8/B chips run with SPI1 at 36MHz.  Worse, some occasionally fail.   Check the datasheet, both SPI controllers are rated at 18 MHz max.

Yes you get high performance with 8 KB writes but SdFatEX gets very high performance with 512 byte or smaller writes.

Try 512 byte writes on Maple Mini at 36 MHz.  A buffer pool with 8 KB buffers on a Maple Mini is not practical.

With the SD I used on Maple Mini at 18 MHz, 512 byte writes were 310 KB/sec SdFat and 2,000 KB/sec SdFatEX.

On a Nucleo STM32F411 SdFatEX gets 3,985 KB/sec with 50 byte writes and 5,279 KB/sec with 512 byte writes.

Even SDIO doesn't help small writes with SdFat or FatFS.  You must do massive multi-block writes to take advantage of SDIO at 48 or 50 MHz.

pito

#5
Aug 06, 2016, 11:17 am Last Edit: Aug 06, 2016, 11:24 am by pito
Yes, the 512bytes writes/reads on MM with SdFat even at 36MHz are quite slow. 8kB buffer with MM is still feasible (it has got 20kB ram available). But the sdcard's write latencies are much worse issue we have to cope with :)
I did a naive exercise with stm32f407 SDIO in past and the best results were with 16kB and 32kB buffers.
What is the main difference between SdFat and SdFatEX? - the speedup is enormous  :smiley-eek:

fat16lib

#6
Aug 06, 2016, 03:01 pm Last Edit: Aug 06, 2016, 03:06 pm by fat16lib
Modern SD cards now have a lot of RAM buffering and even understand the FAT structure for a properly formatted card.  

There is a large RAM buffer for a single data stream to the data section of the card.  In very high end cards there are separate smaller RAM buffers for the two FAT areas and a directory block. (Note this is the programmers model - who knows how it is really done.)

If you keep the card selected and access the card properly, you get the advantage of this buffering.

I sent an email to you earlier.  for others that want to understand this:

Quote
You should read the requirements for the latest generation of SD cards.  The SD spec has had requirements for max performance for some time but you could get away with not following it.
 
Look at section 4.13 of the simplified spec.  Too bad I can't give you the full spec.  Be sure to look at "4.13.1.3 Write Performance".
 
Note in table 4-56 that an AU (Allocation Unit) can be as large as 4MB and you should plan for an RU (Recording Unit) size of 512KB.
 
Newer high end SD cards have controllers that perform extremely well when you follow the spec but a new $30 card may not perform as well as a five year class 4 card for small transfers.
 
I suspect there is major flash wear when the $30 card is used in the current SdFat.  
 
I have been experimenting with extreme access patterns and some cards appear to fail.  I think it is because they exhaust their pool of erased flash.  They come back to life if you let then rest with power on and apply clock but don't access data.

fat16lib

My Maple Mini boards won't run reliably with SPI1 clock at 36 MHz but I found a cheap, $2.72 with free shipping, ebay STM32F103C8T6 board that will run at 36 MHz.

Here are the results for 512 byte write/read with SdFatEX using a 20 MB file.
Quote
File size 20 MB
Buffer size 512 bytes
Starting write test, please wait.

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
3779.96,11540,132,134

Starting read test, please wait.

read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
3249.88,1312,155,156
I ran the test a number of times and the write latency is never more than 13 ms.

Here is the block number and latency in usec for for cases over 1 ms.  SdFatEX uses a second cache block for FAT/directory entries on this board so the long latencies happen in areas where this cache is written.  This involves a write to each copy of the FAT and a read of the new FAT block.  There are 128 FAT entries in a block and the cluster size is 64 blocks so this happens with an interval of about 8192 blocks.

for example 16064 - 7872 = 8192

Wear leveling and erasing flash can cause other long latencies.

Quote
block,usec
0,1052
7872,9400
7936,5490
16064,11540
16128,4775
24256,9819
24320,5403
32448,11380
32512,4754
32513,2246
Here are results for an 18 MHz clock on SPI1.
Quote
File size 20 MB
Buffer size 512 bytes
Starting write test, please wait.

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
2039.13,12739,247,249

Starting read test, please wait.

read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
2041.00,1253,249,249
The over 1 ms  latency list.
Quote
block,usec
0,1152
7872,10843
7936,6104
16064,12739
16128,7068
24256,10673
24320,6113
32448,11658
32512,5251

Flinkly

#8
Aug 09, 2016, 11:27 pm Last Edit: Aug 10, 2016, 10:59 pm by Flinkly
Pretty excited for this. 

Moved to an Adafruit WICED feather to get a powerful chip (STM32F205, 120MHz, multiple hardware SPI) to enable fast datalogging (moved from 16MHz Adafruit Pro Trinket(s), my current go-to controller). 

want to get 500+ samples a second of 8 analog inputs (12bit ADC, so 6Kb/sec; but 4x that would be great - 24Kb/sec). I'd also like to output some data to 4 OLEDs over SPI (clock of 10 MHz maximum), but first things first.

My other problem is finding a decent micro sd card, but am checking a sandisk ultra in a second.  My new Samsung evo+'s were pretty average/slow for latency.

I'd move away from Arduino, but i'm such a beginner that i just can't.

thanks for the work!

fat16lib

I have posted a version of SdFat-beta with the fast SdFatEX class.

pcborges

Hi, does it also improve read performance?
Thanks

Go Up