SD card as a lookup table

Hi you all,

I've been struggling for a week+ now on this matter, I thought I found the solution but unfortunately I closed the browser window and haven't been able to find it since!

I'm trying to use a micro SD card as a sort of lookup-table, comparing live GPS coordinates with the coordinates on the SD-card. If the Coordinates match it should return the air quality values of the coordinates stored on the SD card.

So far I've tried to use the Sdfat library; both readCSV, readCsvArray and Stream. I've also tried this example: https://www.bethedev.com/2017/01/reading-csv-files-from-sd-card-with.html
However, I haven't really been able to make it work... My attempts have been overwritten, so unfortunately I can't really show any code...

I'm using arduino Uno and the Adafruit SD-card breakout. It would be great if someone could point me in the right direction, I've got the GPS integration down, it is just the SD-card issue....

my dataset btw is +2000 lines of data; Latitude, Longitude, PM2.5 and PM10 colums

shtrier:
my dataset btw is +2000 lines of data; Latitude, Longitude, PM2.5 and PM10 colums

Wow that's a lot to work on, I hope processing time doesn't matter because that will take a good while in the worst case (probably almost a minute if it has to reach the end). Binary-encoded data would be way faster, but since you already have the lookup table as plain text, the conversion is just some extra work that the Arduino itself can do if you want.

Anyway, if you want to extract the data of a CSV-formatted line of text, the procedure is the following:

Read the whole line, making sure the destination array is big enough to hold even the longest line:

line[file.readBytesUntil('\r', line, sizeof(line) - 1)] = 0;

// We do it this way because readBytes() does not place the string terminator automatically

file.read(); // Discarding the remaining '\n' character

Tokenize (split) the line to obtain it's individual parts:

char* part = strtok(line, ","); // Assuming the separator is a comma

Parse or copy those parts according to you needs:

strcpy(anotherString, part); // If you only need to retrieve a substring

int ni = atoi(part); // for integer type
long nl = atol(part); // for long integer type
// If you need to parse (convert) it to an integer variable

double dn = atof(part); // If you need to parse (convert) it to a decimal number, being the period (.) the decimal separator

Retrieve the next part and check if it exists (not reached the end of the line):

part = strtok(NULL, ","); // This is how you get the second and beyond, the first parameter is different from the first call.

if (part) { // The same applies to a while and a for
  // There's still something
} else {
  // There's no more data and the end of the line has been reached
}

Not sure how exactly that tutorial teached you, but more or less this is the way.

Thanks for your answer Lucario! I'll give it a try latter today. Unfortunately I was unable to try it out yesterday. I really appreciate your help, you hit the nail on the head with the decimal numbers (atof), that has been a main issue in my previous tries :slight_smile: I do have some more questions, if you have time:

Lucario448:
Wow that's a lot to work on, I hope processing time doesn't matter because that will take a good while in the worst case (probably almost a minute if it has to reach the end).

from a pure functionality point of view it would not have to reach the end everytime, It would have to match latitude and longitude with the CSV file, if it finds a match it should return the air quality data of that string in the CSV. The rows, or substrings I guess is sorted by {latitude, longitude, PM25, PM10}. Is there a way to make this matching process on the run so to speak? A guess a for-loop could check each line of the CSV-file ??

Lucario448:
Binary-encoded data would be way faster, but since you already have the lookup table as plain text, the conversion is just some extra work that the Arduino itself can do if you want.

I don't really know about the specific for binary code, is it alot to convert?? 30 seconds of processing time wouldn't matter much however, a minute would probably touch the boarder of what is desirable.

shtrier:
from a pure functionality point of view it would not have to reach the end everytime

Of course, I already acknowledged that when I said:

Lucario448:
I hope processing time doesn't matter because that will take a good while in the worst case

That's the sequential search algorithm, which becomes linearly inefficient (slower) the further it has to travel. The most efficient is the binary search (assuming a good performance at random access), but the problem is that the data has to be indexed and sorted by a given criterion.

shtrier:
The rows, or substrings I guess is sorted by {latitude, longitude, PM25, PM10}.

As long as the data order remains consistent though the whole list, errors should never occur.

shtrier:
Is there a way to make this matching process on the run so to speak? A guess a for-loop could check each line of the CSV-file ??

Why not? After the part you parse the numbers, you should just simply compare variables.
Although be aware there's a concern that comparing two floating-point variables for equality might be somewhat inaccurate, due to precision limitations and the rounding method used by the parser function (aka atof()). In case of coordenates, this leads to an error margin (a small "area") that will always cause a match, instead of a single exact point.

shtrier:
I don't really know about the specific for binary code, is it alot to convert??

"As binary" means "as stored in RAM" (as a bunch of bytes). For example, a float is actually a long (both are 32-bit or 4-byte variables), but they differ on how the holding value is encoded.

I say binary is faster because copying bytes is quicker than parsing text. If you save a variable "as it is" somewhere else, then loading it back is straightfoward because before you saved it "as it was".
It's like if the machine was reading in its native language (binary), instead of having to translate (text to binary and viceversa) beforehand.

The only disadvantage of this, is illegibility for humans; you cannot easily change the values with a text editor, or at least not without causing a big mess for the target system.

shtrier:
30 seconds of processing time wouldn't matter much however, a minute would probably touch the boarder of what is desirable.

Again, it's because mainly three things slow down the process: text tokenizing and parsing, filesystem overhead, and the search algorithm.

The first two can be addressed by binary encoding, since copying bytes is straightfoward and file size is usually smaller than the CSV counterpart.
The third one only if you dare to sort all the records, indexing can be considered done with binary encoding since all records would have a fixed length.

A database on an UNO with 2k memory does not make sense IMHO.

If you have WiFi access the database could be placed on a server but even if not,
a ES8266 or ESP32 is a cheap device with a lot more memory, even if database has to be local.

Another method for quick lookup is hashing.
As You have giant SD card available, You can create various hash files, either with other computer,
or with arduino itself (it will be slow, but it will work too)

example :
Your coordinate file is pollution.csv
You want to find all "within 1km" points
You create pollution1km.hash file , containing :
hash, calculated by truncating each lat/lon to 1km precision, and then simply doing any
hashing function over - be it crc, or even simple lat xor lon.
Do not worry about false positives - You will filter them out later anyway.

The shorter the hash, the faster your routine will be. Ideally if You fit into 8bit, that's ideal.
If it will yield too many false positives, you can remedy with many ways, like binary hash
(splitting database into two parts, and hashing each part separately)

the hash file can contain only hashes, as their lentgth is invariable. that means, the position of hash (hash index) points You to right CSV row.

Arduino must simply reverse the process - take GPS coords, hash them same way, and then search thru hash table .
It will get rows of candidates . copy those rows to
candidates.csv
Then You perform ordinary search over this file. It will be far shorter than original database, so search will be
quick :slight_smile:

hashing gives You more options - You can create more hash files.
f.e. pollution20km.hash pollution_peakPM10_1km.hash etc. etc. - sky is the limit.

Downside of this method is that You need to recreate hash file each time You change order of data in the original file. But hey - appending more data is still possible, and/or using more files - like pollution_2019.csv > pollution_2019_1k.hash

Ofc FAT has some limit on the names.. but that's just minor obstacle for such small project.
And for bigger, filenames can be hash, expanding in filenames.csv :slight_smile:

Good luck :slight_smile:

please help programmers experts, can anyone make code for press brakes.
i am new in here, i want to learn. this is my whatsapp : 62 85850772579