Go Down

Topic: SD Card Performance - Open/close multiple files or read one big file (Read 1 time) previous topic - next topic

dtbingle

Working on an automotive project that reads in current gear, pedal position, and engine RPM in realtime and adjusts an active exhaust valve based on those parameters.  Each gear can be assigned a different exhaust valve calibration table.  I will have 10 tables that are [16x11] and 1 table that is [6x1].  Arduino Uno doesn't have enough internal memory for this much data, so I need to store/read these tables on an sd card.

As far as realtime reading from the SD card, which is faster to read:

1) separate files for each valve calibration table (table01.csv, table02.csv, etc)
2) have one big text file with all tables separated with labels
          table01
          ....table data here
          table02
          ....table data here
          ....

Lucario448

Depending on how you encode the data, those tables might fit in the Arduino's program memory; that is, if the values are constant.

The thing is: why you have to rely on external storage when timing is crucial? SD cards are not as fast as the built-in program (flash) memory; and also 28 KB should be plenty for what I think you store on those tables.

dtbingle

The space issue I'm running into is that they aren't constant.

While reading vehicle parameters and searching through the tables, the table values remain constant.  However, the arduino interfaces with a visual c# gui where you can modify the table values on the PC and download via serial, which would change update/change these table values.

I'm not expecting crazy fast read times like <5ms, but ideally to be less than 50ms, worst case no more than 100ms.

Will have to do some SD card read time tests.  Do you think 50ms-100ms read times are possible via these tables being read from the SD card?

Lucario448

Will have to do some SD card read time tests.  Do you think 50ms-100ms read times are possible via these tables being read from the SD card?
Again, it will depend on how you encode the data. If you encode them as binary (sequence of "raw" bytes), there should be less "processing effort" because reading from text introduces an overhead.

Fortunately, SD cards are faster on reading than on writting; so I estimate the worst latency will be around 10 ms if minimal processing is required.

If at least the table sizes are constant, then I'll suggest you to put both in a single file. Closing and opening files also create another overhead. Maybe when having to update the values you'll reopen the file; but for the rest, once (as read-only) is enough.

dtbingle

It seems like I'll be able to achieve the read times I'm after from what you're saying.  I attached a test file of the data that will be stored in these 10 tables just to clarify.

So a separate table will map a vehicle gear to a vlvXX_dgv table.  Let's say this table mapped 3rd gear to vlv01_dgv table.  The arduino would then search through vlv01_dgv table to find the closest engine RPM (Y axis) and pedal % (X axis).  So if engine RPM was 1600 and pedal was at 30%, the arduino would search through vlv01_dgv to find the 0.50 value @ [1600, 30].  Now if the car shifts to a different gear, pedal changes, or engine RPM changes, it will have to continually jump between these vlv01-vlv10 tables finding the closest engine RPM/pedal value.

So if I'm reading from the SD card and the tables have a static size, I would jump around using myFile.seek(location).  For example, if vlv01_dgv started @ byte 5, vlv02_dgv @ byte 200, vlv03_dgv @ byte 395, etc, I could jump between the start of each table using myFile.seek(5), .seek(200), and so on?

EDIT:  Doing a quick test - opening a file, closing it, and reopening it only takes 3-4 ms on average.  Running 1000 iterations of open/close looks like min open/close time is <1 ms and max is 10ms.  Maybe separate files are a viable option.

Lucario448

So if I'm reading from the SD card and the tables have a static size, I would jump around using myFile.seek(location).  For example, if vlv01_dgv started @ byte 5, vlv02_dgv @ byte 200, vlv03_dgv @ byte 395, etc, I could jump between the start of each table using myFile.seek(5), .seek(200), and so on?
Encoded as binary, that's how it works.

Furthermore, looking at the text file you attached, I noticed all tables follow the exact same layout and data meanings; thus you can remove the first row and column because those are there just for reference. If you already know this layout, storing the references is redundant.



Since those tables are 10x15, this means that there are 150 values per table and 10 per row (15 rows per table of course). Knowing this, is possible to encode tables (aka two-dimensional arrays) into a single sequence of values (aka unidimensional array, or basically how files work).

The way you can access a specific cell in a unidimensional array (file) treated as a table (two-dimensional array or matrix), is by converting the coordinates into an index (because seek() only accepts one value) with the following formula:

idx = (tn * tw * th + (y * tw + x)) * vs

Where:
  • idx: the resulting index to access an specific cell.
  • tn: table number (zero-relative). Only if the file has multiple tables; otherwise assume it as zero.
  • tw: table's width (column count), it's also the row's size. In your case, it's 10.
  • th: table's height (row count). In your case, it's 15.
  • y: Y coordinate (row number). It's zero-relative, so this value is only valid between 0 and th.
  • x: X coordinate (column number). It's zero-relative, so this value is only valid between 0 and tw.
  • vs: value's size (in bytes). This depends on what type of datum you want to store, more information on the next paragraph.

I also noticed those values have decimal point, so are you trying to store decimals or those are just integers (whole numbers)? If it's the last one, what's the range of values you expect to have?
This is what determines vs in the previous formula (and the valid file size).



Maybe separate files are a viable option.
For your particular application probably it is; but as I expected, it's still an overhead. Changing file's "cursor" position takes between almost nothing to 2 ms (when file is fragmented and seeking outside of the current cached block).

dtbingle

Thanks for taking the time to write that out.  Makes a lot more sense now on how to manage the tables by encoding them into a single array and using that equation for index locations.

To answer a few of the questions you had for me:

1) Although the first row/column in my example are the same between each table, they will be different table-to-table during actual use.  So this brings it back to 16x11.

2) About table data types and expected numbers.  The upperleft cell containing "Y / X" is not important and can be replaced with any filler number to match the data of the first row to make processing easier.  The first ROW is integer values only, restricted to 0 to 100.  The first COLUMN is integer values only, with expected range around 500 - 10000.  The main table portion (the 15x10 section) is double values only, restricted from 0.00 to 1.00.

I think opening/closing separate files might be a quicker short term solution, but encoding would yield better performance.

I guess where I'm confused now is how would you handle encoding the table from a two-dimensional array that includes both integers and decimals given that the indexing assumes a fixed vs?

As additional info, the test.txt file attached is an output from my visual C# interface on the computer.  When the data is being sent to the arduino and writing to a text file, it is very easy to modify the output in whatever way is easiest to read after being encoded.  For example, crunching the table into a single dimension could be changed to look like:

[first header row of integers] [first header col of integers] [decimal data row 1] [decimal data row 2] ....
[integer] [integer] [decimal] [decimal] ....

instead of going strictly line-by-line in the text file which would result in

[first header row of integer] [header col integer row 1] [decimal data row 1] [header col integer row 2] [decimal data row 2]...
[integer] [integer] [decimal] [integer] [decimal] ....

Lucario448

1) Although the first row/column in my example are the same between each table, they will be different table-to-table during actual use.  So this brings it back to 16x11.
Values might be different, but layout and dimensions are the same. All have the same percentage scale and the same RPM scale, so what's different?


2) About table data types and expected numbers.  The upperleft cell containing "Y / X" is not important and can be replaced with any filler number to match the data of the first row to make processing easier.  The first ROW is integer values only, restricted to 0 to 100.  The first COLUMN is integer values only, with expected range around 500 - 10000.  The main table portion (the 15x10 section) is double values only, restricted from 0.00 to 1.00.
But are those scales different for each table? If they are different but remain unchanged for a specific table, you can store them in the program memory as a kind of "lookup tables".
If even that can vary, then you may store such "meta-data" in the file as a header. The actual table will still be 10x15, however it will be placed after the header that contains the percentages (aka your first row) and the RPMs (aka your first column).
Being the percentage not higher than 255, you should store them as a single byte; and being the RPM not higher than 65535, you should store them as two bytes (word or AVR's int).

Furthermore, the floating-point implementation of AVR is limited to a 32-bit precision (but still slower than dealing with a long due to lack of FPU/implemented by software); so all floating-point values are bound to be encoded as 4 bytes.

If you will combine all tables into a single file, you must also merge the headers if those are required.


I think opening/closing separate files might be a quicker short term solution, but encoding would yield better performance.
Opening a file requires around 1024 or more bytes worth of SPI transactions, because the library has to query the FAT to find the requied entry that has the information about the file, specially the "pointer" of its first cluster (physical location), the size and a chian of occupied clusters if it's fragmented. The name is needed since that's the criterion (search key) used to find it in the FAT.

Subdirectories are more juggling for the library. In this filesystem, folders are actually special files that contain entries similar to the FAT itself. Those entries represent what files/folders it contains.
The "deeper" a file is, the longer will take to reach; since it's equivalent to open as many intermediate files as subfolders involved in the path.



I guess where I'm confused now is how would you handle encoding the table from a two-dimensional array that includes both integers and decimals given that the indexing assumes a fixed vs?
You aren't supposed to mix types in the "body" of the file. You can store this meta-data either as a lookup table in the program memory, or as a header for the file itself. Knowing the amount and types, this meta-data section should have a fixed length as well.



As additional info, the test.txt file attached is an output from my visual C# interface on the computer.  When the data is being sent to the arduino and writing to a text file, it is very easy to modify the output in whatever way is easiest to read after being encoded.  For example, crunching the table into a single dimension could be changed to look like:

[first header row of integers] [first header col of integers] [decimal data row 1] [decimal data row 2] ....
[integer] [integer] [decimal] [decimal] ....

instead of going strictly line-by-line in the text file which would result in

[first header row of integer] [header col integer row 1] [decimal data row 1] [header col integer row 2] [decimal data row 2]...
[integer] [integer] [decimal] [integer] [decimal] ....
Yeah, but the idea isn't to write text, it is to write binary values instead. Why? The conversion from text to a variable in runtime, is yet another overhead; on the other hand, reading raw bytes is straightfoward since it's just matter of copying those to the variable's location (in RAM).


Your programming language should have implemented a base (superclass) for output streams where you can write at a byte level. Whatever you're currently using, should have a method or function that writes a single byte to the corresponding stream. From there you can encode tables and headers (if necessary) as an unidimensional array of "raw" data.
If you use the text files as a simple way to change parameters, then you could create another program that converts this file into the mentioned binary ("raw") tables.



PD: I've made a little mistake when referring to the FAT.
The FAT (File Allocation Table) does what its name says: keep track of the free spaces. What the library queries at first when searching a file, is the "root directory table". The file's information (not its content) is stored either here or inside the data of a subdirectory.

dtbingle

In my example test.txt file, the first row (pedal %) and first column (engine RPM) in every table has the same scale - [0, 10, 20, ... , 100] and [650, 850, 1000, ... , 5500].  However, each table CAN have different pedal %'s and engine RPM scales.  Yes, the table dimensions remain the same....16x11 overall, 15x10 if you're excluding header row/column.

All of the table values (including the header row/column), need to be able to be modified during runtime, which I think eliminates the possibility of using PROGMEM.  While the program is running untouched, it is a read-only operation from these tables.  The user has a visual C# interface, like the attached picture, where it can send a command to pause arduino operation, write new table values to the controller, then resume its read-only running state.  I'm thinking all of these tables have to be stored and read from the sd card to allow for changes @ runtime, as well as allowing changes made via the pc interface to be kept after disconnecting from the pc and running independently.

With the overhead you described for managing multiple files, the best solution would be using a single file that stores the "meta-data" in the file as a header.  What I'm having trouble wrapping my head around is how the headers are laid out in the text/binary file.

To recap, pedal % (header row) can be encoded as 1 byte, engine RPM (header column) can be encoded as 2 bytes (an int), and the main table body is floating point values encoded as 4 bytes.  For fastest runtime performance, the values would be stored and read from the file as raw bytes.  So for example, I have this array (shortened array for clarity):

[0, 10, 20]

These values would be encoded to a text file with 1 byte each.  0 = 00, 10 = 0A, 20 = 14.  In other words, the file would contain:

00 0A 14

Then for RPMs [500, 2000, 5000] as 2 bytes each -> 01 F4  07 D0  13 88
Then the main table body data as floating point is same deal, but 4 bytes each.

So say this small example table body is 3x3 with a header row and header column for 4x4 overall.  It would be encoded to look as follows:
Code: [Select]
As "text"
[X       0      10     20]
[500     0.1    0.1    0.1]
[2000    0.2    0.2    0.2]
[5000    0.3    0.3    0.3]

Byte representation in table format
[X 00 0A 14      ]
[01F4 3dcccccd 3dcccccd 3dcccccd]
[07D0 3e4ccccd 3e4ccccd 3e4ccccd]
[1388 3e99999a 3e99999a 3e99999a]

Encoded in a file
[pedal %] [rpm] [data row 1] [data row 2] [data row 3]
00 0A 14 01 F4 07 D0 13 88 3D CC CC CD 3D CC CC CD 3D CC CC CD 3E 4C CC CD 3E 4C CC CD 3E 4C CC CD 3E 99 99 9A 3E 99 99 9A 3E 99 99 9A



From here, we know the pedal % is a fixed 3 bytes, rpm is 6 bytes, each data row is 12 bytes, for a total of 3+6+12+12+12 = 45 bytes for these miniature tables.  Should be pretty easy to create an indexing equation to jump between table "headers" and data positions.

Am I on the right track following your guidance?

Lucario448

All of the table values (including the header row/column), need to be able to be modified during runtime, which I think eliminates the possibility of using PROGMEM.
So file headers are the only option.


The user has a visual C# interface, like the attached picture, where it can send a command to pause arduino operation, write new table values to the controller, then resume its read-only running state.  I'm thinking all of these tables have to be stored and read from the sd card to allow for changes @ runtime, as well as allowing changes made via the pc interface to be kept after disconnecting from the pc and running independently.
Looks like you have to use or create the routines that encode and decode the binary data in the file.
If the language doesn't implement a way to "de-serialize" (reconstruct from binary) variables, or support for direct pointers; you may have to at least retrieve the data as an array of bytes, and then use those to reconstruct the variables with bitwise operations (mostly ORs and left shifts). Similar occurs when encoding (writting)


With the overhead you described for managing multiple files, the best solution would be using a single file that stores the "meta-data" in the file as a header.  What I'm having trouble wrapping my head around is how the headers are laid out in the text/binary file.
Simple: 10 1-byte values + 15 2-byte values = 40 bytes per header. The first 10 bytes correspond to the percentages, and the remaining 30 to the RPMs (30 / 2 = 15).

Accessing a particular datum in the header is a bit tricky due to size differences; nevertheless you can get around this by using two formulas depending on if it's a percentage or RPM:

Percentage:
idx = tn * 40 + x

Where:
  • idx: the resulting index to access an specific percentage.
  • tn: table number (zero-relative). Only if the file has multiple tables; otherwise assume it as zero.
  • x: datum position. It's zero-relative, so this value is only valid between 0 and 9.



RPM:
idx = ((tn * 40) + 10 + y) * 2

Where:
  • idx: the resulting index to access an specific percentage.
  • tn: table number (zero-relative). Only if the file has multiple tables; otherwise assume it as zero.
  • y: datum position. It's zero-relative, so this value is only valid between 0 and 14.

Here we assume all headers are contiguous and ordered by their corresponding table. After this, starts the actual data from the tables; whose layout we already discussed.
Since now we know the datum's size and the header(s) is (are) in our way, that previous formula has to be rearranged like this:

idx = tc * 40 + (tn * tw * th + (y * tw + x)) * 4
Where:
  • tc: table count (amount). Shouldn't be zero because otherwise it's pointless to say this file stores zero tables.

Adding this new variable you should realize that tn has to be between 0 and tc - 1.



To recap, pedal % (header row) can be encoded as 1 byte, engine RPM (header column) can be encoded as 2 bytes (an int), and the main table body is floating point values encoded as 4 bytes.  For fastest runtime performance, the values would be stored and read from the file as raw bytes.  So for example, I have this array (shortened array for clarity):

[0, 10, 20]

These values would be encoded to a text file with 1 byte each.  0 = 00, 10 = 0A, 20 = 14.  In other words, the file would contain:

00 0A 14

Then for RPMs [500, 2000, 5000] as 2 bytes each -> 01 F4  07 D0  13 88
Up to here, we are talking about the header.




Then the main table body data as floating point is same deal, but 4 bytes each.

So say this small example table body is 3x3 with a header row and header column for 4x4 overall.  It would be encoded to look as follows:
Code: [Select]
As "text"
[X       0      10     20]
[500     0.1    0.1    0.1]
[2000    0.2    0.2    0.2]
[5000    0.3    0.3    0.3]

Byte representation in table format
[X 00 0A 14      ]
[01F4 3dcccccd 3dcccccd 3dcccccd]
[07D0 3e4ccccd 3e4ccccd 3e4ccccd]
[1388 3e99999a 3e99999a 3e99999a]
Correct, but remember that the header already puts together the percentages and RPMs in a single "chunk"; right before the actual table. Combining tables requires also combining the headers, by simply placing them contiguously.
I don't mean intercalating header with table; but putting together headers with headers, and tables with tables, forming two larger chunks: the meta-data and the data.


Code: [Select]
Encoded in a file
[pedal %] [rpm] [data row 1] [data row 2] [data row 3]
00 0A 14 01 F4 07 D0 13 88 3D CC CC CD 3D CC CC CD 3D CC CC CD 3E 4C CC CD 3E 4C CC CD 3E 4C CC CD 3E 99 99 9A 3E 99 99 9A 3E 99 99 9A
Just like that. The [pedal %] [rpm] section is the header, while the rest is the table.


Should be pretty easy to create an indexing equation to jump between table "headers" and data positions.
Ahem, look above; I already did that


Am I on the right track following your guidance?
Yeah. Keep in mind your example was just in a smaller scale.
For the real deal, the header takes 40 bytes; while the table itself takes 600 bytes. Thus, the file size validation will be checking multiplicity with 640.

dtbingle

Ahhh I see why all table metadata at the top makes more sense compared to way I have it laid out.

Thanks for all of the help on this.  I think you've given me enough information to finish this project.  Just need to do a little researching on encoding/decoding practices for binary files, but otherwise should be good to go.

Will have to add an asterisk to that statement though.......good chances I'll be posting back after being stuck again haha.

Lucario448

Update: I've made another mistake, this time with the RPM formula. The following is the correct one:
idx = (tn * 40) + 10 + (y * 2)

If you apply the previous one, the resulting index will go far beyond from where it's supposed to.

Go Up