shtrier:
from a pure functionality point of view it would not have to reach the end everytime
Of course, I already acknowledged that when I said:
Lucario448:
I hope processing time doesn't matter because that will take a good while in the worst case
That's the sequential search algorithm, which becomes linearly inefficient (slower) the further it has to travel. The most efficient is the binary search (assuming a good performance at random access), but the problem is that the data has to be indexed and sorted by a given criterion.
shtrier:
The rows, or substrings I guess is sorted by {latitude, longitude, PM25, PM10}.
As long as the data order remains consistent though the whole list, errors should never occur.
shtrier:
Is there a way to make this matching process on the run so to speak? A guess a for-loop could check each line of the CSV-file ??
Why not? After the part you parse the numbers, you should just simply compare variables.
Although be aware there's a concern that comparing two floating-point variables for equality might be somewhat inaccurate, due to precision limitations and the rounding method used by the parser function (aka atof()). In case of coordenates, this leads to an error margin (a small "area") that will always cause a match, instead of a single exact point.
shtrier:
I don't really know about the specific for binary code, is it alot to convert??
"As binary" means "as stored in RAM" (as a bunch of bytes). For example, a float is actually a long (both are 32-bit or 4-byte variables), but they differ on how the holding value is encoded.
I say binary is faster because copying bytes is quicker than parsing text. If you save a variable "as it is" somewhere else, then loading it back is straightfoward because before you saved it "as it was".
It's like if the machine was reading in its native language (binary), instead of having to translate (text to binary and viceversa) beforehand.
The only disadvantage of this, is illegibility for humans; you cannot easily change the values with a text editor, or at least not without causing a big mess for the target system.
shtrier:
30 seconds of processing time wouldn't matter much however, a minute would probably touch the boarder of what is desirable.
Again, it's because mainly three things slow down the process: text tokenizing and parsing, filesystem overhead, and the search algorithm.
The first two can be addressed by binary encoding, since copying bytes is straightfoward and file size is usually smaller than the CSV counterpart.
The third one only if you dare to sort all the records, indexing can be considered done with binary encoding since all records would have a fixed length.