Find "which bit is high" quickly

Is there a faster and more elegant way to "find which bit is high"? Or more accurately "find the first high bit" as after one is found, it exits, which is fine.

byte test (byte someRandomByte)
{
  for (byte b = 0; b < 8; b++)
  {
    if (bitRead(someRandomByte, b))
      return b;
  }
  return 255;//or whatever
}

You could check to make sure the value is greater than 0 to make sure there is a high bit.

bitRead() isn't a function, it's a Macro that uses bit shifting. I doubt there's a much faster method than using it.

 bitRead(value, bit) (((value) >> (bit)) & 0x01)

Could instead be 8 contigous if statements :

byteRead(someRandomByte);

if (someRandomByte > 127) return (7); // bit 7 is set
if (someRandomByte > 63) return (6); // bit 6 is set
if (someRandomByte > 31) return (5); // bit 5 is set
if (someRandomByte > 15) return (4); // bit 4 is set
if (someRandomByte > 7) return (3); // bit 3 is set
if (someRandomByte > 3) return (2); // bit 2 is set
if (someRandomByte > 1) return (1); // bit 1 is set
if (someRandomByte > 0) return (0); // bit 0 is set

If you truly want fast, and damn the cost, a 256-case switch statement is likely as fast as it gets. But a more reasonable approach with reasonable speed and size would be a binary search:

byte test (byte someRandomByte)
{
byte result = 0;

    if (someRandomByte > 15)
    {
        result = 4;
        xsomeRandomByte >>= 4;
    }
    if (someRandomByte > 3)
    {
        result += 2;
        someRandomByte >>= 2;
    }
    if (someRandomByte)
        result++;
    return result;
}

Regards,
Ray L.

There's some logic errors in Anders53's code, but the if-statements are far faster than the for-loop. Assuming 20microseconds is really too long for you to wait.

if-statements: 4 micros, 0
for loop: 20 micros, 7

tested using this sketch

void setup() {
  // put your setup code here, to run once:
 Serial.begin(9600);
 
 Serial.print("if-statements: ");
 Serial.flush();
 unsigned long microsBefore;
 unsigned long microsAfter;
 int val;
 
 microsBefore = micros();
 val = ifStatements(0x01);
 microsAfter = micros();
 Serial.print(microsAfter - microsBefore);
 Serial.print(" micros, "); 
 Serial.println(val);
 Serial.flush();

 Serial.print("for loop: ");
 Serial.flush();
 microsBefore = micros();
 val = forLoopTest(0x80);
 microsAfter = micros();
 Serial.print(microsAfter - microsBefore);
 Serial.print(" micros, "); 
 Serial.println(val);
 Serial.flush();

while(1); 
  
}

int ifStatements(byte someRandomByte) {

if (someRandomByte > 127) return 7; // bit 7 is set
if (someRandomByte > 63) return 6; // bit 6 is set
if (someRandomByte > 31) return 5; // bit 5 is set
if (someRandomByte > 15) return 4; // bit 4 is set
if (someRandomByte > 7) return 3; // bit 3 is set
if (someRandomByte > 3) return 2; // bit 2 is set
if (someRandomByte > 1) return 1; // bit 1 is set
if (someRandomByte > 0) return 0; // bit 0 is set
return 0;
}

 
 int forLoopTest (byte someRandomByte)
{
  for (byte b = 0; b < 8; b++)
  {
    if (bitRead(someRandomByte, b))
      return b;
  }
  return 255;//or whatever
}

void loop() {
  // put your main code here, to run repeatedly:

}

Not code, just 8 contigous if statements to demonstrate the idea, which you read loud and clear 8)
Assuming you would average 4 if statements to find the high bit, you would have an average time lapse of just 2 µS to run through it.

I can't promise that it's fast (I guess not), but perhaps this might lead to something else that is...

void setup() {
  Serial.begin(9600);
  for (int singleBit = 1; singleBit < 256; singleBit = singleBit << 1)
  {
    Serial.print("singleBit = ");
    Serial.print(singleBit);
    Serial.print(" position = ");
    Serial.println(bitToByteFast (singleBit));
  }
}

void loop() {
}

byte bitToByteFast (byte b)
{
  byte result = (b & 0b10101010) != 0;
  result +=     ((b & 0b11001100) != 0)<<1;
  result +=     ((b & 0b11110000) != 0)<<2;
  
  return result;
}

But it doesn't find the first high bit, it finds a single bit.

If you want to do it by comparison, which does meet the criterion, you have to decide whether you want to have a search that is balanced, or unbalanced. If you start looking "from the top" you will quickly find high values, but low values can take up to the maximum number of comparisons. If you want a balanced search, you can find it in a fixed number(3 in this case) of comparisons, by first testing whether it is greater than half the value, then split again by half of that and so on...

Thanks guys, that answers everything for me. I wasn't sure if there was some basic bit math that would simply calculate it somehow instead of searching.

If you don't mind searching from the least-significant end, try the standard library function ffs(). It's likely to be faster and/or smaller than anything you or I would write in C++.

RayLivingston:
If you truly want fast, and damn the cost, a 256-case switch statement is likely as fast as it gets.

A 256 element lookup table in progmem would solve the problem in ~3 instructions.

Delta_G:
Looks to me like the return value would be the bit position of the most significant bit that is set.

unfortunately, no. Here is some output:

singleBit = 30 position = 7
singleBit = 31 position = 7
singleBit = 32 position = 5
singleBit = 33 position = 5
singleBit = 34 position = 5
singleBit = 35 position = 5
singleBit = 36 position = 7
singleBit = 37 position = 7

By the way, some processors have a "find the first high bit" instruction. But not the AVR.

I came up with

byte findBit(byte b){
    byte i;
    b ^= 0xFF;
    while(b & 0x01){
        i++;
        b >>= 1;
    }
    return i;
}

Compared all the suggestions here

0b1
findBit => 0 in 4us
forLoopTest => 0 in 4us
binarySearch => 1 in 4us
ifStatements => 0 in 4us
bitToByteFast => 0 in 4us
0b10
findBit => 1 in 4us
forLoopTest => 1 in 4us
binarySearch => 1 in 4us
ifStatements => 1 in 4us
bitToByteFast => 1 in 4us
0b100
findBit => 2 in 4us
forLoopTest => 2 in 8us
binarySearch => 3 in 4us
ifStatements => 2 in 4us
bitToByteFast => 2 in 4us
0b1000
findBit => 3 in 4us
forLoopTest => 3 in 8us
binarySearch => 3 in 4us
ifStatements => 3 in 4us
bitToByteFast => 3 in 4us
0b10000
findBit => 4 in 4us
forLoopTest => 4 in 12us
binarySearch => 5 in 4us
ifStatements => 4 in 4us
bitToByteFast => 4 in 4us
0b100000
findBit => 5 in 4us
forLoopTest => 5 in 12us
binarySearch => 5 in 4us
ifStatements => 5 in 4us
bitToByteFast => 5 in 4us
0b1000000
findBit => 6 in 4us
forLoopTest => 6 in 16us
binarySearch => 7 in 4us
ifStatements => 6 in 4us
bitToByteFast => 6 in 4us
0b10000000
findBit => 7 in 4us
forLoopTest => 7 in 20us
binarySearch => 7 in 4us
ifStatements => 7 in 12us
bitToByteFast => 7 in 4us

So all except the forLoopTest from the OP post they are fast. Only thing, binairySearch does not give a correct answer all the time :stuck_out_tongue:

Foot note, micros has a stepsize of 4us. A more accurate way would be with the reading from the timer but I'm to lazy for that 8)

Code for the test:

void setup(){
    byte n, f;
    unsigned long tic;
    unsigned long toc;
    Serial.begin(9600);
    
    for(byte i = 0; i < 8; i++){
        f = 0x01 << i; 
        
        Serial.print("0b");
        Serial.println(f, BIN);
       
        tic = micros(); 
        n = findBit(f);
        toc = micros();
        
        Serial.print("findBit");
        printTicToc(tic, toc, n);
        
        tic = micros(); 
        n = forLoopTest(f);
        toc = micros();
        
        Serial.print("forLoopTest");
        printTicToc(tic, toc, n);
        
        tic = micros(); 
        n = binarySearch(f);
        toc = micros();
        
        Serial.print("binarySearch");
        printTicToc(tic, toc, n);
        
        tic = micros(); 
        n = ifStatements(f);
        toc = micros();
        
        Serial.print("ifStatements");
        printTicToc(tic, toc, n);
        
        tic = micros(); 
        n = bitToByteFast(f);
        toc = micros();
        
        Serial.print("bitToByteFast");
        printTicToc(tic, toc, n);
        
    }
        
       
}

void loop(){
    
}

void printTicToc(unsigned long tic, unsigned long toc, byte n){
    Serial.print(" => ");
    Serial.print(n);
    Serial.print(" in ");
    Serial.print(toc - tic);
    Serial.println("us");
}

byte findBit(byte b){
    byte i;
    b ^= 0xFF;
    while(b & 0x01){
        i++;
        b >>= 1;
    }
    return i;
}
    
byte forLoopTest(byte someRandomByte)
{
  for (byte b = 0; b < 8; b++)
  {
    if (bitRead(someRandomByte, b))
      return b;
  }
  return 255;//or whatever
}

byte binarySearch(byte someRandomByte){
    byte result = 0;

    if (someRandomByte > 15)
    {
        result = 4;
        someRandomByte >>= 4;
    }
    if (someRandomByte > 3)
    {
        result += 2;
        someRandomByte >>= 2;
    }
    if (someRandomByte)
        result++;
    return result;
}

byte ifStatements(byte someRandomByte) {

    if (someRandomByte > 127) return 7; // bit 7 is set
    if (someRandomByte > 63) return 6; // bit 6 is set
    if (someRandomByte > 31) return 5; // bit 5 is set
    if (someRandomByte > 15) return 4; // bit 4 is set
    if (someRandomByte > 7) return 3; // bit 3 is set
    if (someRandomByte > 3) return 2; // bit 2 is set
    if (someRandomByte > 1) return 1; // bit 1 is set
    if (someRandomByte > 0) return 0; // bit 0 is set
    return 0;
}

byte bitToByteFast (byte b)
{
  byte result = (b & 0b10101010) != 0;
  result +=     ((b & 0b11001100) != 0)<<1;
  result +=     ((b & 0b11110000) != 0)<<2;
 
  return result;
}

I wonder how the if method could be so lazy returning bit 7 value .... ???

Quite fun to see a comparison of methods :sunglasses:

The GNU compiler has a builtin macro for counting bits: __builtin_popcount (unsigned int); it is based on an old bit hack, invented by MIT in the early 1970s; if you want the lowest one bit in an 8 bit int, you can do the following:

int lowest(uint8_t b) {
   if (!b) return 0; // no one bit here ...
   b^= b-1; // contains n consecutive one bits where n is the bit position we want;

   // count the bits
   b= ((b>>1)&0x55)+(b&0x55);
   b= ((b>>2)&0x33)+(b&0x33);
   b= ((b>>4)&0x0F)+(b&0x0F);

   return b;
}

This little method doesn't buy you much, but it gets better for 16 or 32 bit words (one or two lines need to be added).

kind regards,

Jos

gardner:
A 256 element lookup table in progmem would solve the problem in ~3 instructions.

The suggested set of 8 tests above, in itself generates 87 bytes of code, so that's 87 bytes gone anyway. The lookup table would take 256 bytes, but would certainly be the fastest. Plus it's more general. You can look up any fact about a byte with a lookup table.

In fact I believe the standard library functions isdigit, isxdigit and so on, share a 256-byte lookup table. One bit says if it is a digit, another if it is a hex digit, another if it is alpha, another if it is a space, and so on.

These, as presented, fail validation...
Original post
Reply #3
Reply #8
Reply #12
Reply #14

I modified these to pass validation...
Original post
Reply #14

Several fail when the parameter is zero which was not included in the validation. I modified the code in Reply #2 to return 255 when the parameter is zero.

Array lookup from SRAM is fastest at an average of 0.377µs.

Array lookup from Flash is second at an average of 0.440µs.

The (modified) code in Reply #2 is third at an average of 1.193µs.

static byte Test2F( byte someRandomByte )
{
  if (someRandomByte > 127) return (7); // bit 7 is set
  if (someRandomByte > 63) return (6); // bit 6 is set
  if (someRandomByte > 31) return (5); // bit 5 is set
  if (someRandomByte > 15) return (4); // bit 4 is set
  if (someRandomByte > 7) return (3); // bit 3 is set
  if (someRandomByte > 3) return (2); // bit 2 is set
  if (someRandomByte > 1) return (1); // bit 1 is set
  if (someRandomByte > 0) return (0); // bit 0 is set
  return( 255 );
}

What is this "validation" of which you speak? :wink:

Array lookup from Flash is second at an average of 0.440µs.

Sounds about right:

 1c4:   e8 e6           ldi     r30, 0x68       ; 104   (1)
 1c6:   f0 e0           ldi     r31, 0x00       ; 0   (1)
 1c8:   e8 0f           add     r30, r24   (1)
 1ca:   f1 1d           adc     r31, r1   (1)
 1cc:   e4 91           lpm     r30, Z   (3)

Array lookup from SRAM is fastest at an average of 0.377µs.

Well, 0.375 µS, but close enough. :slight_smile:

[quote author=Nick Gammon date=1428648012 link=msg=2180553]
In fact I believe the standard library functions isdigit, isxdigit and so on, share a 256-byte lookup table. One bit says if it is a digit, another if it is a hex digit, another if it is alpha, another if it is a space, and so on.[/quote]
avr-libc-1.8.1/libc/stdlib/ctype.S looks to be doing things the hard way. It must be more compact than using the table that would be used in most non-memory-constrained platforms.. Here's an example:

          ASSEMBLY_CLIB_SECTION
          FUNCTION(ispunct)

GLOBAL(ispunct)
        cpse    rHigh, __zero_reg__
1:      rjmp    _U(__ctype_isfalse)
        subi    rLow, ' ' + 1
        subi    rLow, 0x7e - ' '
        brsh    1b                      ; if (!isgraph(c)) return 0
        subi    rLow, lo8(-0x7e - 1)    ; restore rLow
        rcall   _U(isalnum)
        tst     rLow
        brne    1b                      ; if (isalnum(c)) return 0
        ldi     rLow, 1
        ret

          ENDFUNC

Let's call it a "rough" validation (it only checks values with one bit high)...

static void ValidateOne( __FlashStringHelper const * label, TTestFunction ValidateThis )
{
  byte i1;
  byte i2;
  byte i3;
  
  i1 = 1;
  i2 = 0;
  
  while ( i1 != 0 )
  {
    i3 = ValidateThis( i1 );

    if ( i3 != i2 )
    {
      Serial.print( label );
      Serial.write( '\t' );
      Serial.print( i1 );
      Serial.write( '\t' );
      Serial.print( i2 );
      Serial.write( '\t' );
      Serial.print( i3 );
      Serial.print( F( " - validation failed" ) );
      Serial.println();
    }
    i1 = i1 << 1;
    ++i2;
  }
  Serial.flush();
}

I figured if it could pass that test it was close enough to run a timing test.

:grin:

Probably a little bias from not bothering to use a perfectly balanced dataset.

On the plus side, I did not have to look at any assembly. :wink: