counting '1' in a byte

just asking question out there to see if there is any better.

I’m reading a Port directly on a UNO then using a for loop to count how may bits were ‘1’ in the read byte.

Is this the only way of doing it or is there a more efficient way?

byte x = PINB;
byte B_cnt=0;

for(uint8_t i=0; i<8;++i){
if(x&0x01) ++B_cnt;
x>>=1;
}

Define what you mean by efficient !

An array with 256 entries, each with the number of corresponding 1s would be efficient in terms of speed but not memory, unless that does not matter.

@OP

Your codes work well; however, I would be glad should your codes stand at more literate side having comments, treating Port-B as a 6-bit input port as there are only 6 active bits and if() structure, preferably, be followed by {}.

``````void setup()
{
Serial.begin(9600);
DDRB = 0x00;     //inputs: PB5 - PB0 = 010101

byte x = PINB;
byte B_cnt = 0;

for (uint8_t i = 0; i<6; i++)//i < 8; ++i)
{
if (x & 0x01) //checking PINB0, PIB1, ..., PINB5; x remains undisturbed
{
++B_cnt;   //LH found; increase counter
}
x >>= 1;   //x is shifted right by 1-bit and then assigned back to x
}

Serial.println(B_cnt);

}

void loop() {
// put your main code here, to run repeatedly:

}
``````

Alternative Sketch with 7 lines of Codes

``````void setup()
{
Serial.begin(9600);
DDRB = 0x00;     //inputs: PB5 - PB0 = 010101
byte B_cnt = 0;

for (int i = 0; i < 6; i++)
{
{
++B_cnt;
}
}
Serial.println(B_cnt); //shows: 3
}

void loop() {
// put your main code here, to run repeatedly:

}
``````

Alternative Sketch with 7 lines of Codes

The number of lines of source code is not relevant

For instance, is this more efficient than your 7 line code ?

``````void setup()
{Serial.begin(9600);DDRB = 0x00;byte B_cnt = 0;for (int i = 0; i < 6; i++){if (bitRead(PINB, i) == HIGH){++B_cnt;}}Serial.println(B_cnt);}

void loop() {}
``````

UKHeliBob:
Define what you mean by efficient !

An array with 256 entries, each with the number of corresponding 1s would be efficient in terms of speed but not memory, unless that does not matter.

efficient either in terms of speed and memory usage. its any open question on how to count the 1’s!

another way of count the 1’s in the byte would be this:

for(uint8_t i=1; i>0;i<<=1){
if(x&i) ++B_cnt;
}

so… of these 2 (of course other options are welcomed!) which one is more efficient from a compiler perpective (speed or memory). is there an even better way of count the 1’s?

option1
for(uint8_t i=0; i<8;++i){
if(x&0x01) ++B_cnt;
x>>=1;
}

option2
for(uint8_t i=1; i>0;i<<=1){
if(x&i) ++B_cnt;
}

UKHeliBob:
The number of lines of source code is not relevant

For instance, is this more efficient than your 7 line code ?

``````void setup()
``````

{Serial.begin(9600);DDRB = 0x00;byte B_cnt = 0;for (int i = 0; i < 6; i++){if (bitRead(PINB, i) == HIGH){++B_cnt;}}Serial.println(B_cnt);}

void loop() {}

Do you want to say that there is only one line of code in the above quote? I have learnt to count the lines of the source code as:

line that has ; termination
line that is a control structure

The line counting of the compiler is different from that of human counting?

I think it is a little more efficient this way:

``````byte x = PINB;
byte B_cnt = 0;

for (; x; x >>= 1) {
if (x & 1) {
B_cnt++;
}
}
``````

Is this the only way of doing it...

...or is there a more efficient way?

Given the lack of a barrel shifter some variation of your code (e.g. @Whandall's) is very likely the most overall efficient.

If the compiler has a specific optimization then shifting left and comparing to less-than-zero may reduce to the smallest possible code (which will also very likely be the fastest).

good to know! thanks for the feedback @Coding Badly, @Whandall

Comparison of execution times of various Programs Presented so far:

OP’s Option-1 (OP) 4 cycles ----------> 4* 1/16106 = 0.25 us
Post#2 9 cycles ----------> 9
1/16106 = 0.5625 uS
Post#4 (Option-2) 4 cycles ----------> 4
1/16106 = 0.25 us
Post#6 32 cycles ------------> 32
1/16*106 = 2.00 us

To measure execution time of a sketch, the TC1 is made ON at 16 MHz clock just at the beginning of execution. The TC1 is made OFF just after finishing the execution of the program. The content of TCNT1 is directly proportional to program execution time. The codes are given below:

``````void  setup()
{
Serial.begin(9600);
//Timer-1 as a n Internal Pulse Counter----
TCCR1A = 0x00;   //Normal UpCounting Mode
TCCR1B = 0x00;  //TC1 is OFF is OFF
TCNT1 = 0x00;   //Initial Value
TCCR1B = 0x01;  //TC1 is ON with 16 MHz clkTC1
//------------------------
//32 cycles Post#5
DDRB = 0x00;     //inputs: PB5 - PB0 = 010101
byte x = PINB;
byte B_cnt = 0;

for (; x; x >>= 1)
{
if (x & 1)
{
B_cnt++;
}
}

TCCR1B = 0x00;    //TC1 is OFF
Serial.println(TCNT1, DEC);  //execution time as cycles

/* 4 cycles    Post#4
DDRB = 0x00;     //inputs: PB5 - PB0 = 010101
byte x = PINB;
byte B_cnt = 0;

for (uint8_t i = 1; i > 0; i <<= 1)
{
if (x & i) ++B_cnt;
}

/* 9 cycles   Post#2
DDRB = 0x00;     //inputs: PB5 - PB0 = 010101
byte x = PINB;
byte B_cnt = 0;
for (int i = 0; i < 6; i++)
{
{
++B_cnt;
}

/* 4 cycles   OP's Option-1
DDRB = 0x00;     //inputs: PB5 - PB0 = 010101
byte x = PINB;
byte B_cnt = 0;

for (uint8_t i = 0; i < 8; ++i)
{
if (x & 0x01) ++B_cnt;
x >>= 1;
}
*/
//------------------------

}

void loop()
{

}
``````

Have you tried __builtin_popcount()?

DKWatson:
Have you tried __builtin_popcount()?

did not know if that nifty function until now. Thanks for sharing!

If you're so inclined, there's a rather neat function (which may even be behind popcount)

``````uint8_t bitcount(uint8_t b)
{
b = b - ((b >> 1) & 0x55);
b = (b & 0x33) + ((b >> 2) & 0x33);
return (((b + (b >> 4)) & 0x0F) * 0x01);
}
``````

GolamMostafa:
Do you want to say that there is only one line of code in the above quote? I have learnt to count the lines of the source code as:

line that has ; termination
line that is a control structure

The line counting of the compiler is different from that of human counting?

It does not matter how you count the number of lines it does not relate to the efficiency of the code.

As to how the compiler counts lines, try introducing an error into the long line. The compiler will report an error in line 3

``````sketch_sep26b:3: error: 'XHIGH' was not declared in this scope
``````

GolamMostafa:
Do you want to say that there is only one line of code in the above quote? I have learnt to count the lines of the source code as:

line that has ; termination
line that is a control structure

The line counting of the compiler is different from that of human counting?

Or a line terminates in a newline character.

Perhaps you're confusing lines with statements and expressions.

The for and if are not needed:

``````byte x = PINB;
byte B_cnt = 0;

while(x)
{
B_cnt += x & 0x01;
x >>= 1;
}
``````

Regards,
Ray L.

RayLivingston:
The for and if are not needed:

``````byte x = PINB;
``````

byte B_cnt = 0;

while(x)
{
B_cnt += x & 0x01;
x >>= 1;
}

Execution time of your codes is: 27 cycles = 27 1/16106 = 1.6875 us

So far, we have executed time as low as 0.25 us (Post#9).

GolamMostafa:
Execution time of your codes is: 27 cycles = 27 1/16106 = 1.6875 us

So far, we have executed time as low as 0.25 us (Post#9).

You seem to be looking at clocks per iteration, while ignoring the number of iterations. My code will iterate a different number of times, from 0 to 8, depending on the input data, so it does NOT have a single execution time. All the other solutions I saw, except table lookup, will iterate 8 times, even if NONE of those change the output at all.
Regards,
Ray L.

RayLivingston:
All the other solutions I saw, except table lookup, will iterate 8 times, even if NONE of those change the output at all.

Then you did not look at that:

Whandall:
I think it is a little more efficient this way:

``````byte x = PINB;
``````

byte B_cnt = 0;

for (; x; x >>= 1) {
if (x & 1) {
B_cnt++;
}
}

RayLivingston:
You seem to be looking at clocks per iteration, while ignoring the number of iterations. My code will iterate a different number of times, from 0 to 8, depending on the input data, so it does NOT have a single execution time. All the other solutions I saw, except table lookup, will iterate 8 times, even if NONE of those change the output at all.

Thank you very for reminding the very important issue. I have now connected LH at all the input pins of Port-B and then computed the execution time. It is now 1862 cycles (116.375 us). Previous execution time was for the input values of 101000 (B_cnt = 2).

``````void  setup()
{
Serial.begin(9600);
//Timer-1 as a n Internal Pulse Counter----
TCCR1A = 0x00;   //Normal UpCounting Mode
TCCR1B = 0x00;  //T1 is OFF
TCNT1 = 0x00;   //Initial Value
TCCR1B = 0x01;  //16 MHz
//------------------------
//1862 cycles Post#15
DDRB = 0x00;     //inputs: PB5 - PB0 = 111111
byte x = PINB;
Serial.println(x, HEX);//111111 = 3F
byte B_cnt = 0;

while (x)
{
B_cnt += x & 0x01;
x >>= 1;
}

TCCR1B = 0x00;
Serial.println(B_cnt); //6
Serial.println(TCNT1, DEC);  //execution time as cycles
}

void loop()
{

}
``````