Is it possible? Find IDs from SD card.

Can I write IDs directly on SD card (one ID is about 10 bytes)?
IDs are about 50-80 thousand

I need in short time find ID, read ID, write ID on SD card.

I do not require to use FAT library and files.... I need just one big area to find/read/write IDs on SD card!

What cat you advice me something?

I would suggest that the easiest thing if you do not want to go through the hassle of libraries for SD cards is to use the uMMC data module from Rogue Robotics.

This takes serial data (up to 460800 which is faster than the arduino can do) and logs it to the SD card.

The module also has a whole raft of other features which allow you to easily list all the files, go to a specific place in a file, create new files, add to files, remove files etc.
It will also happily use SD HC cards so 50-80 thousand should not be a problem.

All of this is controlled by serial commands (explained on the website) so it is probably the simplest datalogging module for the features you get.


50$ is a big price! I have it already in Ethernet Shield that I must to have in my project! So. I need 10-100 identical hardware (in the future) ... and +50$ is too many... 50$*100.

I hope it is possible to do with standard software of SdFat and with SD reader of Ethernet Shield. :wink:

Reading from and writing to the SD card will be the easy part. Finding whether a given ID is already on the card will be the hard part. Without some some of organization, you will need to read every record on the card to see if it contains the value of interest. Potentially, the last record on the card contains the ID of interest.

If the ID is not there, you WILL have to read every record to learn that.

The organization that makes this easier is to use multiple files. All the 0xxxxxx records go in 0.dat. All the 1xxxxxx records go in 1.dat. Etc.

Then, finding a record takes 1/10th as long, since there are 1/10 as many records to search. (This assumes a uniform distribution of IDs.)

This can be taken one level further. All the 0xyyyyyy IDs go in files in the 0 directory. The x defines the file, and the yyyyyy is the data stored in the file. Finding a record becomes easier, since it is in a specific file in a specific directory.

The advantages are obvious. Many fewer records in any given file. The drawbacks are obvious, too. Many more files to maintain (although the code does all the work) in many more directories.

Add to this the fact that the SD library is not all that efficient at managing files and directories, and you can see that you have your work cut out for you.

If there is any way to offload the checking to a relational database on a PC, you should pursue that approach.

I agree with Paul's proposal: let a central PC do the job. You state you have ethernet shields so they can all communicate to a server. The Arduino could cache the latest n entries - but caching not allways speed things up.

Besides the trick with folders Paul mentioned the following:
You state you have <100.000 ID's but the ID's are 10 bytes long. That means e.g. 1000 files with 100 ID's could do the trick.

Selecting the right file can be done with a hash function that folds the ID to be tested into a number between 1 and 1000. Then use that number as filename to store/search the ID. You only need to check ~100 lines (max) to find the ID.

A folding hash function for a 10 digit number might be:

  • split the number in 3 groups of 3 (or 4)
  • add the 3 numbers
  • modulo 1000 et voila a reasonable hash

in code: (not compiled/tested)

int SimpleHash(long n)    // can be more efficient
  int n1 = n / 1000000; 
  long t = n % 1000000;
  int n2 = t / 1000;
  int n3 = t % 1000;
  return (n1 + n2 + n3) % 1000;

Better hash functions exists and developing good hash functions is a science in itself. More about hashing see Hash function - Wikipedia

robtillaart, your hash function is more efficient because IDs may be similar (several first or last numbers of ID) and several files will have 95000 of ID and just 5000 ID will be located in remaining files.

Yes hash function is necessary think. Result of hash function should be more unpredictable if it possible.

I am satisfied by your answer... so... I do not need database. Hash functions is absolutely suitable.

The more you know about the distribution of the ID's the better you can design your hash function. e.g. if the ID's are 10 digits and the last two are allways 00 then they are not usable for generating the hashvalue.
More real life example, suppose you have 10000 ID's and they are randomly chosen between 000000 and 123456 that would give the first digit 85% chance 0 and 15% chance 1 (other digits have chance 0% !!) => not good for a hash function. The other digits have a far more equal distributions for all digits so these can be used.

If you know this is the case with your distribution of the ID's as you imply (several first or last numbers of ID) than don't use them in the hash function.

Also if ID's include a checksum e.g. digit 10 = sum digit 0..9 % 10. That would introduce a dependency that could disturb hashfunctions.

In short if you know of such "problems" your algorithm must ignore these digits.

So the steps to define a Hash algorithm are:

  • select the most volatile digits
  • use them in a mathematical formula that reduces the possible outcomes