There is a better and quickly way to read a .txt file, find a Match string ?

Hello everyone,

I’m trying to read a .txt file (used as a database), and find a “Match” string inside that.
So far I did it, but if I have more than 50.000 lines inside the txt file, it’s going to take couple seconds to find that…

The .txt file looks like this:
18429800
18429801
18429802
18429803
18429804
18429805

Here is the code I’m using:

/*
  SD card read/write
 
 This example shows how to read and write data to and from an SD card file 	
 The circuit:
 * SD card attached to SPI bus as follows:
 ** MOSI - pin 11
 ** MISO - pin 12
 ** CLK - pin 13
 ** CS - pin 4
 
 created   Nov 2010
 by David A. Mellis
 modified 9 Apr 2012
 by Tom Igoe
 
 This example code is in the public domain.
 	 
 */
 
#include <SD.h>

File myFile;
char buf[10];

void setup()
{
 // Open serial communications and wait for port to open:
  Serial.begin(9600);
   while (!Serial) {
    ; // wait for serial port to connect. Needed for Leonardo only
  }


  Serial.print("Initializing SD card...");
  // On the Ethernet Shield, CS is pin 4. It's set as an output by default.
  // Note that even if it's not used as the CS pin, the hardware SS pin 
  // (10 on most Arduino boards, 53 on the Mega) must be left as an output 
  // or the SD library functions will not work. 
   pinMode(10, OUTPUT);
   
  if (!SD.begin(4)) {
    Serial.println("initialization failed!");
    return;
  }
  Serial.println("initialization done.");  
    
  // re-open the file for reading:
  myFile = SD.open("test.txt");
  if (myFile) {
    Serial.println("test.txt:");
    
    // read from the file until there's nothing else in it:    
    while (myFile.available()) {    	
        myFile.read(buf,10);
        //if(strncmp(&buf[0],"18429882",8)==0)
        if(strncmp(buf, "18438846", 8) == 0)
        {
            Serial.println("Match!");
           break;     
        } 
    }
    // close the file:
    myFile.close();
  } else {
  	// if the file didn't open, print an error:
    Serial.println("error opening test.txt");
  }
}

void loop()
{
	// nothing happens after setup
}

So far I did it, but if I have more than 50.000 lines inside the txt file, it’s going to take couple seconds to find that…

It would be better to break the 50,000 line file up into a series of much smaller files. 50 files with 1000 records can be scanned a lot faster than one 50,000 record file, if there is consistency as to which file (or small group of files) the record should be in.

Another possibility is to have the file contain ranges. The snippet you posted could be one record:
18429800 - 18429805
Then, read a record, until the / is encountered, parse it (to get the start and end tokens), and use strtoul() to get the values as numbers. Do the same for the target. Comparing numbers is faster than comparing strings.

Knowing something about the data is really necessary to offer good advice.

Thank you Paull!

I'll do that!

If it's only a lookup of static data, I'd use Paul's approach and split the number up so that no one file needs to contain more entries than you can afford to read. For example you could take the first three digits as a directory name, the next three as a file name and then store the last two digits in numerical order in the file - finding whether a given number is present can then be done by opening the named file and reading through it until you find the target value or get to a number that is bigger than the target. That's just one way to split the values up - you would be best to choose a way that minimises the number of files and directories needed and the number of files in a directory and the number of numbers in a file.

Thank you Peter!

Actually, I'm trying to create a access control, the numbers may be not going to be sequential, but randomically (by the way all the numbers will have 8 digits).
I was thinking about that Paul told me and you, and it's going to be hard to me to control random numbers with more files...
This project will have a TCP Client (Windows app) that will conect to the Arduino and send the 8 digits code that have the access on the system.
Maybe using a .txt file is a bad idea for that when you need to speed up the query data right ?

If any one have a better idea how to manage / handle that, I really appreciate that!

saormart:
Maybe using a .txt file is a bad idea for that when you need to speed up the query data right ?

Well, if you store the numbers as binary values you would avoid needing to parse each number. But essentially the scheme of putting different ranges of numbers in different files (effectively, using the file system as an index) seems like the simplest way to do it.

I assume that you would put logic in your sketch to receive a number and create the corresponding file (if not already exists) and insert the number into the file at the correct position (if not already present). This process would be a lot easier at the Arduino if you did this work on the PC and just sent the set of files over, or even wrote the files directly to an SD card and just installed that on the Arduino.

This approach might not work so well if the set of supported numbers is very sparse, but it's still the first approach I'd try.

You could use a hashing system to produce a filename which contains all numbers that have that hash and choose a hashing algorithm that gives you your desired granularity. At it's simplest, you could just take the last digit of the number & store all numbers that end in (for example) 0 in "0.txt". Choose two (or more digits to further reduce the number of entries in each file.

Can you give a bit more detail about what your system as a whole is going to do?

A totally different way of dealing with this kind of problem is

  1. Ensure that each to the records in the file has the same length

  2. Sort the file so that the records are in order (do this when the file is created) (say lowest number first)

  3. Open the file for random not sequential access. Your not reading one char at a time but a block at a time from wherever you want.

  4. Now apply a binary chop/search. Basically read the middle record in the file if you went to far read the record 0.25 of the way in, if you did not go far enough read at 0.75 of the file length. The next move is 1/16 and then 1/32 and so on.

  5. read sequentially to get you the last bit of the way.

With 50000 records 10 chops one or two hundred records of your target.

Mark

wildbill:
Can you give a bit more detail about what your system as a whole is going to do?

Hello wildbill,
Sorry for my late response…
Basically I’m creating a access control system that I’m working by “part”.
I really appreciate everyone that’s trying to help me with this !
I would like to ask if it’s possible to help me with how do “code” that, because I’m new in Arduino and C language…
I mean about some example code based on that I post how to do, where I should change to make that better!
Sorry if I’m asking too much for you guys, I know that you have to be patient with people like me here…

Thank you all !