Storing and compare hex-values in a large array

Currently I am working on a project where I want to check an array that is being delivered as JSON.
I am working with an ESP32.
Comparing an input to the array need to be as fast as possible.
Because the array could eventually have 10,000+ elements, it needs to be efficient for both memory and time.

{
  "userID": "[\"1a69d8c3afaf04d6\",\"b72a427a4d91eb63\",\"5213bcd8fe0e91b6\",\"714996238c2b7536\",\"593d38d7404b2890\",\"a1d181ebca49fe4d\",\"bd6a15a0efbf8af1\",\"c26764b0ae8a2965\",\"706598a41cd5b33e\",\"9deaf2ffcf7da2ac\"]"
}

This JSON will be parsed by ArduinoJson-library.
Now I am wondering in what type of array I should put it in. The array consists of multiple HEX values. The length of the HEX can vary between 4 bytes and 16 bytes. The size of the complete array could eventually have 1000+ elements.

The array will be used as storage which will be compared to an input. I will be using a for-loop to go through the array to compare the input with each element in the array.
I heard comparing it as String-array will eat up a lot of heap, therefore I hope to store it as HEX in an array.

goal is higher. 8 Bytes per element * 10,000 equals 80KB which the ESP32 could easily handle

You can store two hex digits in a byte.

comparing data can be done with a simple

memcmp

But I doubt this is not what your really need.

may be you shouldn't read the total data to memory, but just read and store the parts your really need?
But that would need that you have to provide more context about
what data
you get how
and what you really want to achieve.

Declare an array of unsigned long long. Convert the ASCII hex strings to binary first then store in the corresponding element in the array based on the JSON index. You will also have to convert the user input to binary in order to compare. A for loop iterating through entire array is not the most efficient but it may be sufficient for what you are doing. Alternatively you can sort the array and use a binary search.

Do you have control over the JSON being delivered? If you don't need a 'name' ("0", "1", "2"...) for each element you could save some space by making it a JSON array.

Do you mean 4 to 16 hex digits representing 2 to 4 bytes of binary or do you mean 8 to 32 hex digits representing 4 to 16 bytes of binary?

This is indeed the best option I can think of.
It is really necessary that I store the data in the MCU itself. The ESP32 has 520KB RAM and a JSON with 1000 16-byte-hex strings should be 20KB.

Though the same thing when I was reading it again. Did JSON.stringify but I made of each array an object and now it is just a normal array.

I mean that each element can consist of a maximum of 16 characters.
For example:

{
"userID": [ "0123456789abcdef", "01234abc", "2749badfe", "bcaf145"]
}

The most compact storage would also be the fastest to search. I would not use ArduinoJson to deserialize because it would store the strings of ASCII as ASCII. You can cut the size almost in half by converting each hex string into binary.
The binary would be padded with a leading zero to an even number of bytes. Then add those bytes to the end of a 20k byte buffer and record the address of the string in a list of byte pointers. To search, use each pointer to compare the ID with each element of the list. When a match is found, make sure that the next pointer points to the next byte after the match.

1 Like

Thanks!
I am not familiar with the converting to binary and pointers.
I have managed to compare two char[] with each other by deserialize the json and convert to JsonArray and then the following function.

boolean checkArray(const char* ID){
   File databaseUsers = SPIFFS.open("/userDB.txt", "r");

   DynamicJsonDocument doc(ESP.getMaxAllocHeap() - 1024);
   deserializeJson(doc, databaseUsers);
   doc.shrinkToFit();
   JsonArray array = doc["package"]["allowedTokens"];
   int size = array.size();

   for(int i=0; i<size; i++){
      if(strcmp(ID, array[i]) == 0){
         return true;
      }
   }
   else return false;
}

But scanning through 1300 of elements takes about 2 seconds. Is it going to get much faster when using binary? And how should I do this? The SPIFFS will be updated when there is an MQTT-package with a new json. The parsing of the array with IDs to binary may take a few seconds if it makes the comparing process faster.

Is it 2 seconds just for the loop over the array or does 2 seconds include reading from SPIFFs and deserializing? I would time just the loop.

A pretty good motivation for binary versus sequential (also known as linear) search. Binary search (article) | Algorithms | Khan Academy

Sorting and binary search are so common that C includes functions for them. C library function - qsort() C library function - bsearch()

If the JSON data is sent from a PC, have it sort the array so the ESP32 does not have to.

I send the JSON from Node-Red. In Node-Red I have a sorting-node, but also function-node where I can write my own JavaScript. So sorting it externally is not a problem.

I will dive in the way binary search works.

I will still have to change the data I get from the JSON MQTT and store it as binary in the SPIFFS. All I get from MQTT and deserialized with ArduinoJson will be a const char*.

But storing a value to a (const) char* in a global variable while running the process is tricky due to pointers.
Already got wrong pointers. A const char* with a variable that consisted of a TLS-certificate was displayed in Serial.println while there should another variable displayed.

Would it also be possible to store the hex as decimal in the json, which reduces the size of the json. Then the esp32 can compare the hex as decimal instead of String or char*.

[Additional] I tried splitting the hex into bytes and store them as array of bytes (decimal) inside the JSON. MQTT already buffers binary. When I send the hex as string, it is less size than as I convert it do decimal.

I think I have found a solution with the help of both @johnwasser and @Icebuster.
The values will be sorted so I can use the binary search method.
I will make my own buffer.
Inside the buffer I will have identifiers in the first part of my data as bytes formatted in UTF-8.
I will use one identifier as indexer and I will give the amount of bytes for each UserID-value, so some UserIDs will only consist of 4 bytes and other UserIDs will consist of 7 or 8 Bytes so I need to know how long each ID is. The amount of bytes in this part also gives the amount of IDs are stored.
So if there are four ID-cards stored, with the following codes: A2B3C4D5 FEABDC78 FF12459001 0013948548F37EC2. The indexer should look like: 04 04 05 08 and the database looks like: A2 B3 C4 D5 FE AB DC 78 FF 12 45 90 01 00 13 94 85 48 F3 7E C2.
With the indexer I will be able to find the size of the UserID.

Using the binary search I could skip a step by just only taking the indexers with the right amount of bytes. Then I check the value of the middle one of the indexers, then determine if it is higher or lower and repeat those steps.

Correct me if I am handling the wrong way.

Now I will make a Javascript that will parse everything into a hex-buffer. (If there is already a function in Javascript or Node-Red, let me know :innocent:)

16 hex digits will fit in a uint64_t. uint64_t is unsigned long long which is an unsigned 64 bit integer. sizeof(uint64_t) is 8 bytes.

I do not know enough of JSON format such as how big a number it can store.

"myArray": [123456789012345, 2345678890123456]

In theory this is an array of big numbers without decimal points (integers) so it "should" convert it to a C array of uint64_t. There may be limitations on ArduinoJson on how big an integer it can handle. It may fail for integer values bigger than what can fit in a uint32_t.

"myArray": ["1234567890abcdef", "23456788901abcdef"]

This is an array of string holding hex digits. If ArduinoJson converts this to a C array of pointers to string constants, convert to array of uint64_t.

// Allocate on heap an array of uint64_t
myuint64Array = (uint64_t *)malloc(sizeof(uint64_t) * <number of elements in myArray>

for each index i in myArray
    myuint64Array[i] = strtoull(myArray[i], NULL, 16);

strtoull() converts a string of digits (the 3rd parameter means base 16) to
unsigned long long integer.

I am very curious to see if ESP32 supports uint64_t and bsearch so I gave it a
try. Looks good in a small example.

int MyArraySize;
uint64_t *MyUint64Array;

void setup() {
  Serial.begin(115200);

  const char *myArray[] = {
    "abcdef", "1234567890abcdef", "2345678901abcdef"
  };
  MyArraySize = sizeof(myArray) / sizeof(const char *);
  MyUint64Array = (uint64_t *)malloc(sizeof(uint64_t) * MyArraySize);

  for (int i = 0; i < MyArraySize; i++) {
    MyUint64Array[i] = strtoull(myArray[i], NULL, 16);
    // Show the string version and the converted uint64 version
    Serial.print(myArray[i]);
    Serial.print(" == ");
    Serial.print(MyUint64Array[i], HEX);
    Serial.println('?');
  }
}

int uint64_compare(const void *a, const void *b)
{
  // Do not use substraction as seen in some tutorials since these are unsigned
  if (*(uint64_t*)a > *(uint64_t*)b) return 1;
  if (*(uint64_t*)a < *(uint64_t*)b) return -1;
  // If not > and not <, must be equal
  return 0;
}


void loop() {
  uint64_t searchkey;
  uint64_t *resultkey;

  // Search for every possible value. Sanity check: If "Not Found" is returned there is a bug!
  for (int i = 0; i < MyArraySize; i++) {
    searchkey = MyUint64Array[i];
    resultkey = (uint64_t *)bsearch(&searchkey, MyUint64Array, MyArraySize, sizeof(uint64_t), uint64_compare);
    Serial.print("Search for ");
    Serial.print(searchkey, HEX);
    Serial.print(" bsearch returns: ");
    if (resultkey == NULL)
      Serial.println("Not found");
    else
      Serial.println(*resultkey, HEX);
  }

  // Value not in the array so "Not Found" is the expected result.
  searchkey = 0x1234ULL;
  resultkey = (uint64_t *)bsearch(&searchkey, MyUint64Array, MyArraySize, sizeof(uint64_t), uint64_compare);
  Serial.print("Search for ");
  Serial.print(searchkey, HEX);
  Serial.print(" bsearch returns: ");
  if (resultkey == NULL)
    Serial.println("Not found");
  else
    Serial.println(*resultkey, HEX);

  delay(1000);
}

I got it working!
Finally I can get the binary data parsed right and made a strict indexer of where in the byte array I can find what data.

Now the problem is to store it inside SPIFFS without it being changed... Have been trying and searching for the past two days... But that is for another post on this forum...