Efficient "in array" Check

In my example below the array only has 5 elements.
In practice it will contain up to 100 elements.

long sidArray[] = {16288247, 8862204, 21429836, 1328539, 124157};
int size_sidArray = 5;
long y = 1328539;
bool y_valid;

void setup() {

//  Start Serial
    Serial.begin(9600);
    Serial.println("Serial Started");

    y_valid = false;

    for (int x = 0; x < size_sidArray; x++) {
        if (y == sidArray[x]) y_valid = true;
    } 
    Serial.print(y);
    Serial.print(" = ");
    Serial.println(y_valid);
       
}

void loop() {

}

I have 2 questions:

  1. Is there a more efficient way to check if y is present in the array?
  2. Using my example, how can I break out of the for loop once y has been found to be present and will it greatly add to efficiency especially if y is amongst the first few values of the array?
for (int x = 0; x < size_sidArray; x++) {
        if (y == sidArray[x]) y_valid = true;
    }

try:

for (int x = 0; x < size_sidArray; x++) 
{
  if (y == sidArray[x])
  {
     y_valid = true;
     break;
  }

If you sort your array you can use a binary search to determine if the element is present. That would be a maximum of 8 comparisons for 256 entries.

  1. Is there a more efficient way to check if y is present in the array?

Sorting the array could speed up searching but you have to weigh up the time needed to sort compared to the time to search. All depends on how often the array data changes and how often you search.
Another trick might be to initially only check the low order byte from the 4 byte long and if it does not match then move on to next long (assuming the core compare function does not already do this). This would save checking all 4 bytes per compare.
Create an 8 bit hash from the value and store in another array but this will use more RAM.

  1. Using my example, how can I break out of the for loop once y has been found to be present and will it greatly add to efficiency especially if y is amongst the first few values of the array?

break;
If the amount of entries in the array is dynamic then having an end marker/count beyond the last array entry will speed up searches for no existent values.

johnwasser:
If you sort your array you can use a binary search to determine if the element is present. That would be a maximum of 8 comparisons for 256 entries.

The array values are fixed and can be sorted when created. The values are stored in a txt file on an SD card and are read into an array at start-up.

// function call to the binary search function (listed below)
// for the array shown above
   binarySearch(num, 0, 9, 64);

//Binary Search Function
/ Function accepts an array, the lower bound and upper bound subscripts...
// to be searched, and the key number for which we are searching.
// There is nothing returned.
void binarySearch(apvector <int> &array, int lowerbound, int upperbound, int key)
{
       int position;
       int comparisonCount = 1;    //count the number of comparisons (optional)

       // To start, find the subscript of the middle position.
       position = ( lowerbound + upperbound) / 2;

       while((array[position] != key) && (lowerbound <= upperbound))
       {
              comparisonCount++;
              if (array[position] > key)               // If the number is > key, ..
             {                                                       // decrease position by one.
                    upperbound = position - 1;    
             }                                                      
             else                                                
            {                                                        // Else, increase position by one.
                   lowerbound = position + 1;     
             }
             position = (lowerbound + upperbound) / 2;
       }
      if (lowerbound < = upperbound)
      {
            cout<< "The number was found in array subscript "<< position<<endl<<endl; 
            cout<< "The binary search found the number after " << comparisonCount 
                   << " comparisons.\n";              
            // printing the number of comparisons is optional
       }      
       else
             cout<< "Sorry, the number is not in this array.  The binary search made "
                   <<comparisonCount << " comparisons.";
       return;  // you may also consider returning the subscript
}

I found the above function on the web, which I will customize.
Is it a good example to use as a base?

For an array a 100 elements it will make little or no difference. In fact the cost of the sort may be greater than the saving when you go to find the value. A 16MHz processor has a huge amount of processing power, any excess is just thrown away - so don't worry until ...

Mark

Yes looks like a good binary search.

what do the numbers in the array represent?

Thanks for all the input.
I think “break;” could be helpful.

Nonetheless I managed to create a working binary search function.

holmes4:
For an array a 100 elements it will make little or no difference. In fact the cost of the sort may be greater than the saving when you go to find the value. A 16MHz processor has a huge amount of processing power, any excess is just thrown away - so don’t worry until …
Mark

There is no cost for the sort.
Looking for values will be quite frequent.

robtillaart:
Yes looks like a good binary search.

what do the numbers in the array represent?

Sensor id's.

A comparison sketch to see the speed differences (three searches above and an optimized binary search)

precondition: array must be sorted. (but as the array is static that is done)

//
//    FILE: searchCompare.ino
//  AUTHOR: Rob Tillaart
// VERSION: 0.1.00
// PURPOSE: demo
//    DATE: 2015-11-01
//     URL: http://forum.arduino.cc/index.php?topic=356763.0
//
// Released to the public domain
//

#define ARSIZE 100

long arr[ARSIZE];

uint32_t start;
uint32_t stop;

void setup()
{
  Serial.begin(115200);
  Serial.print("Start ");
  Serial.println(__FILE__);

  for (int i = 0; i < ARSIZE; i++)
  {
    arr[i] = i * 83;  // sorted array
  }

  long se = arr[ARSIZE * 3 / 4];
  volatile int pos1; // volatile to prevent compiler from optimizing

  start = micros();
  for (int i = 0; i < 1000; i++) pos1 = linearSearch(se, arr, ARSIZE);
  stop = micros();
  Serial.print("linearA:\t");
  Serial.println(stop - start);
  Serial.println(pos1);

  start = micros();
  for (int i = 0; i < 1000; i++) pos1 = linearSearchBreak(se, arr, ARSIZE);
  stop = micros();
  Serial.print("linearB:\t");
  Serial.println(stop - start);
  Serial.println(pos1);

  start = micros();
  for (int i = 0; i < 1000; i++) pos1 = binarySearch1(se, arr, ARSIZE);
  stop = micros();
  Serial.print(" binair:\t");
  Serial.println(stop - start);
  Serial.println(pos1);

  start = micros();
  for (int i = 0; i < 1000; i++) pos1 = binarySearch2(se, arr, ARSIZE);
  stop = micros();
  Serial.print(" binair:\t");
  Serial.println(stop - start);
  Serial.println(pos1);

  /////////////////////////////////////////////////////////////////////
  Serial.println("\nELEMENT NOT IN ARRAY\n");
  se = -1;
  start = micros();
  for (int i = 0; i < 1000; i++) pos1 = linearSearch(se, arr, ARSIZE);
  stop = micros();
  Serial.print("linearA:\t");
  Serial.println(stop - start);
  Serial.println(pos1);

  start = micros();
  for (int i = 0; i < 1000; i++) pos1 = linearSearchBreak(se, arr, ARSIZE);
  stop = micros();
  Serial.print("linearB:\t");
  Serial.println(stop - start);
  Serial.println(pos1);

  start = micros();
  for (int i = 0; i < 1000; i++) pos1 = binarySearch1(se, arr, ARSIZE);
  stop = micros();
  Serial.print(" binair:\t");
  Serial.println(stop - start);
  Serial.println(pos1);

  start = micros();
  for (int i = 0; i < 1000; i++) pos1 = binarySearch2(se, arr, ARSIZE);
  stop = micros();
  Serial.print(" binair:\t");
  Serial.println(stop - start);
  Serial.println(pos1);
}

void loop()
{
}

int linearSearch(long val, long arr[], int sz)
{
  int pos = -1;
  for (int i = 0; i < sz; i++)
  {
    if (arr[i] == val)
    {
      pos = i;
    }
  }
  return pos;
}


int linearSearchBreak(long val, long arr[], int sz)
{
  int pos = -1;
  for (int i = 0; i < sz; i++)
  {
    if (arr[i] == val)
    {
      pos = i;
      break;
    }
  }
  return pos;
}


int binarySearch1(long val, long arr[], int sz)
{
  int lowerbound = 0;
  int upperbound = sz;
  int pos = -1;
  int tpos;

  tpos = (lowerbound + upperbound) / 2;
  while ((arr[tpos] != val) && (lowerbound <= upperbound))
  {
    if (arr[tpos] > val)
    {
      upperbound = tpos - 1;
    }
    else
    {
      lowerbound = tpos + 1;
    }
    tpos = (lowerbound + upperbound) / 2;
  }
  if (lowerbound <= upperbound) pos = tpos;
  return pos;
}

// DO NOT USE THIS BS2 AS IT IS NOT CORRECT
int binarySearch2(long val, long arr[], int sz)
{
  int pos = -1;
  int tpos = sz / 2;
  int step = sz / 4;

  while (arr[tpos] != val && step > 0)
  {
    if (arr[tpos] > val)
    {
      tpos -= step;
    }
    else if (arr[tpos] < val)
    {
      tpos += step;
    }
    step /= 2;
  }
  if (arr[tpos] == val) pos = tpos;
  return pos;
}

output:

Start searchCompare.ino
linearA:	135272
75
linearB:	97596
75
 binair:	7896
75
 binair:	7136 *** incorrect
75

ELEMENT NOT IN ARRAY

linearA:	135388
-1
linearB:	128376
-1
 binair:	21588
-1
 binair:	16312   *** incorrect
-1

Binary search is substantial faster, both in finding the value as non-finding.
The optimized binary search is 10-20% faster than proposed binary search.
note this is a testrun for only one of the values. (todo test for all)

update: fixed small typo in code above numbers are approx equal
update: the binarySearch2 failed when tested with all possible values

Thanks Rob.
I just figured out the other binary search, but I guess I better wrap my head around your optimized binary search.

Oops the optimized binary search2 works only for array sizes of powers of 2 :frowning:
updated post above that it is incorrect.
lets try to fix BS2


binarySearch1 does work correctly for all values in the array. (tested with size 400)

int binarySearch1(long val, long arr[], int sz)
{
  int lowerbound = 0;
  int upperbound = sz;
  int pos = -1;
  int tpos;

  tpos = (lowerbound + upperbound) / 2;
  while ((arr[tpos] != val) && (lowerbound <= upperbound))
  {
    if (arr[tpos] > val)
    {
      upperbound = tpos - 1;
    }
    else
    {
      lowerbound = tpos + 1;
    }
    tpos = (lowerbound + upperbound) / 2;
  }
  if (lowerbound <= upperbound) pos = tpos;
  return pos;
}

This is the version I came up with based on the sample I found on the web.
y_valid is a global variable in my sketch.

int binarySearch(long array[], int valLo, int valHi, long val) {

    int index;
 
    index = (valLo + valHi) / 2; // find array mid position

    while((array[index] != val) && (valLo <= valHi)) {
        if (array[index] > val) valHi = index - 1;  // decrease Hi position by 1  
        else valLo = index + 1; // increase Lo position by 1
        index = (valLo + valHi) / 2;
    }

    if (valLo <= valHi) y_valid = true;
    else y_valid = false;
    return index;
}

Took some time to rethink how the strategy for “power 2 sized” arrays could be applied to any array while keeping the performance gain.

In short it cannot,

but with some tweaks I got a slightly faster version than binarysearch1() above.
I share this code for educational purposes only.

// assumes binary search of a bigger array size power 2
// extended use of binary masks
int binarySearch2(long val, long arr[], int sz)
{
  // special case
  if (arr[0] == val) return 0;

  uint16_t pos = 256;  // biggest power of 2 < sz     // sz == 400 in my test
  uint16_t step = 256;

  while (step)
  {
    if (pos >= sz)
    {
      // step /= 2;
      // pos -= step;
      pos ^= step;
      step /= 2;
      pos |= step;
      continue;
    }
    if (arr[pos] < val)
    {
      // step /= 2;
      // pos += step;
      step /= 2;
      pos |= step;
      continue;
    }
    if (arr[pos] > val)
    {
      // step /= 2;
      // pos -= step;      
      pos ^= step;
      step /= 2;
      pos |= step;
      continue;
    }
    return pos;
  }
  return -1;
}

The working:

The algorithm is based upon an assumption that there exist a bigger array which is a power of 2. The algorithm defines a position pos which is the middle element of that bigger array. This is hard coded for an array of size 400 in the above snippet. This makes it less portable. Furthermore the algorithm defines a step size which halves at every iteration.

This works as every number can be written as 2n ± 2n-1 ± … ± 21

e.g. 17 = 16 + 8 - 4 - 2 - 1

except for the number 0 so that index is tested as a special case in the begin.

A problem that will occur is that pos, which is a valid position in the bigger array, is not a valid position in the array under test. So for that case a separate test is made.

instead of adding and subtracting step to/ from pos, bitmask techniques are used as both pos and step are powers of 2. This squeezes each iteration to minimum

for completeness some performance numbers: arraysize 400, every value of the array is tested, so 400 calls

Start searchCompare.ino
linearA:	213156
linearB:	102248
 binair:	9912
 binair:	9360

ELEMENT NOT IN ARRAY

linearA:	212824
linearB:	202380
 binair:	10880
 binair:	10672

 done...

so just 2-5% faster, not enough to justify unmaintainable code :wink: