Why to use & 0xff when bitshifting.?

In a typical case I might be bitshifting a uint16_t to get two bytes:

uint8_t a = (uint8_t)((memAddress >> 8) & 0xFF);
uint8_t b = (uint8_t)(memAddress & 0xFF);

This is fine, but when I'm casting to uint8_t, why do I need to & with 0xFF? It seems to be standard practice, and I'm sure I'm missing something, but I can't find any useful explanation on the net.

Why shouldn't I just:

uint8_t a = (uint8_t)(memAddress >> 8));
uint8_t b = (uint8_t)memAddress;

Thanks in advance!

jmusther:

uint8_t a = (uint8_t)(memAddress >> 8));

uint8_t b = (uint8_t)memAddress;

1. Given that:
int memAddress = 0x1234;

2. I am not personally sure if this operation: uint8_t b = (uint8_t)memAddress; provides the lower byte of the memAddress, which is 0x34 though 'Serial.print(b, HEX)' shows 0x34. I always do like this: uint8_t b = (uint8_t)memAddress & 0x00FF; to be sure that I am getting the lower 8-bit value.

3. Some users prefer to do this: uint8_t a = (uint8_t)((memAddress >> 8) & 0xFF); rather than this: uint8_t a = (uint8_t)((memAddress >> 8); as some of them are not sure that 'right shifting' pushes 0s from the left.

4. There are alternative codes to extract highbyte and lowbyte from the 16-bit composite value:

int memAddress = 0x1234;
uint8_t a = highByte(memAddress);
uint8_t b = lowByte(memAddress);

It's stripping the higher bits before casting.

This is important if you're dealing with signed numbers. For unsigned, I don't think it makes a difference.

jmusther:
In a typical case I might be bitshifting a uint16_t to get two bytes:

uint8_t a = (uint8_t)((memAddress >> 8) & 0xFF);

uint8_t b = (uint8_t)(memAddress & 0xFF);




This is fine, but when I'm casting to uint8_t, why do I need to & with 0xFF? It seems to be standard practice, and I'm sure I'm missing something, but I can't find any useful explanation on the net. 

Why shouldn't I just:



uint8_t a = (uint8_t)(memAddress >> 8));
uint8_t b = (uint8_t)memAddress;

You are absolutely right. In this case there's no point in doing & 0xFF whatsoever. Your version is perfectly correct (aside from some () balance issues). And you can even get rid of the explicit cast

uint8_t a = memAddress >> 8; // higher-significance byte
uint8_t b = memAddress;      // lower-significance byte

(The code is valid without casts, but some compilers might issue warnings. The casts might be needed to suppress them.)


As for that "standard practice" you are talking about...

In some cases it is just a classic example of a cargo cult. People do it because they saw it somewhere, maybe in a more appropriate context. And now they just mindlessly parrot it without realizing that this is completely unnecessary. In a way, it is like casting the result of malloc in C programming.

In other cases it is just lack of knowledge of language features, which begets fear of "something wrong happening". People feel more safe and confident with this & 0xFF in their code than without it, so they do it just in case. I remember that back in the day some people used to spray floppy disks with a bug spray to ward off computer viruses.

GolamMostafa:
1. Given that:
int memAddress = 0x1234;

2. I am not personally sure if this operation:

uint8_t b = (uint8_t)memAddress;

provides the lower byte of the memAddress, which is 0x34 though ‘Serial.print(b, HEX)’ shows 0x34.

While you are “not personally sure”, the C and C++ programming languages explicitly guarantee that

uint8_t b = memAddress;

initializes b with the value of memAddress modulo 28=256. And that is 0x34.

Montmorency:

uint8_t b = memAddress;

My co-worker will immediately ask me the question:

memAddress is a 16-bit variable; how can we assign it in an 8-bit variable 'b'? He does not all those that are going on in the background of C Compiler. We need a simple and rational answer that will satisfy the co-worker.

That's why the concept of 'casting' is here. We execute the following code to force the Compiler to take 8-bit (lower 8-bit) value of memAddress and assign it to the variable 'b'. I know what I am doing, and I am safe?

uint8_t b = (uint8_t)memAddress;

GolamMostafa:
My co-worker will immediately ask me the question:

memAddress is a 16-bit variable; how can we assign it in an 8-bit variable ‘b’? He does not all those that are going on in the background of C Compiler. We need a simple and rational answer that will satisfy the co-worker.

When you attempt to store a value of one type in an object of another type in C++ or C (by assignment or initialization), two outcomes are possible:

  1. The code is invalid (“a compile error”).
  2. The code is perfectly valid. The language will use an implicit conversion, a so called Standard Conversion to transform the value from source type to destination type.

Standard conversions in C++ are performed implicitly. You don’t have to explicitly invoke them through a cast. You can use a cast if you want, but it is not required. There’s a well-defined finite list of allowable Standard Conversions in C++.

So, when such type mismatch occurs, the language looks for a suitable Standard Conversion. If a suitable Standard Conversion exists, it is used. It it doesn’t exist, the code is rejected by the compiler.

For example

int *pi = nullptr;
void *pv1 = pi;    // OK
void *pv2 = 5;     // Error

The initialization of pv1 is valid, since C++ language has Standard Conversion from int * to void *. The initialization of pv2 is invalid, since there’s no Standard Conversion from int to void *.

In out particular case we are we are initializing an uint8_t variable with a uint16_t value. Standard Conversions that take place in such contexts do exist. They are called Integral Conversions

The language specification says that the result of such conversion is a unique value of destination type that is congruent to the original value modulo 2N, where N is the bit-width of the destination type. (Congruence is a term related to modulo arithmetics.)

In this case N = 8, 2N = 256, and “congruent” means that we simply divide the original value by 256 and take the remainder. This is why this code

uint16_t memAddress = 0x1234;
uint8_t b = memAddress;

is guaranteed to initialize b with 0x1234 % 256 = 0x34.

The above quote is taken from C++ standard. The standard of C language has virtually the same specification, albeit worded differently.

GolamMostafa:
That’s why the concept of ‘casting’ is here.

“Casting” does not change anything here. Casting performs the very same conversion as I described above. The only difference is that with a cast the conversion is done explicitly, and without a cast it is done implicitly. The conversion itself is exactly the same in both cases and the result is exactly the same.

Cool and Karma.

jmusther:
Why shouldn't I just:

uint8_t a = (uint8_t)(memAddress >> 8));

...

Because...

sketch_nov26a:5:41: error: expected ',' or ';' before ')' token

   uint8_t a = (uint8_t)(memAddress >> 8));

Montmorency:
The code is valid without casts, but some compilers might issue warnings.

...might issue errors...

sketch_nov26a:10:44: error: conversion to 'uint8_t {aka unsigned char}' from 'uint16_t {aka unsigned int}' may alter its value [-Werror=conversion]

   uint8_t a = /*(uint8_t)*/(memAddress >> 8);

                                            ^

sketch_nov26a:11:28: error: conversion to 'uint8_t {aka unsigned char}' from 'uint16_t {aka unsigned int}' may alter its value [-Werror=conversion]

   uint8_t b = /*(uint8_t)*/memAddress;

@jmusther, do the person who will maintain your code (the future you) a favour and keep the explicit casts. If nothing else it makes the intent crystal clear and the price for including the casts is zero.

The cost is not zero, the cost is massive. And that is exactly the maintenance cost that will suffer badly. Imagine, for example, what will happen if one day someone will have to change the types involved. They will not only have to change the declarations, but they will also have to locate and change all casts!

The good programming practice is to keep the code as type-agnostic as possible. This is also known as DRY principle (Don't Repeat Yourself). Avoid mentioning specific type names in the code unless it is absolutely necessary. Type names should be used in declarations to specify the type of the object being declared. This is where they belong, and nowhere else. Do not use explicit type names for any other purpose besides declaring new entities. And this also means: no annoying unnecessary casts.

The modern C++ language is actually making huge efforts to help you to evolve in that direction. It introduced a number of new features like auto keyword and decltype keyword, which are specifically dedicated to one and only purpose: to help you to write type-agnostic code, to avoid mentioning specific type names in your code, to help you follow the DRY principle. And this is very good.

Some people claim that repetitive references to type names help them to understand and maintain the code, since without them they "forget" what types they are working with. This is a misguided mindset. You are not supposed to constantly remember what types you are working with. You are supposed to learn to read type-agnostic code and feel comfortable with such code. It is not as difficult as it seems.

Constant repetitive mentions of type names in the code are akin to training wheels on a bicycle. You might think they are helping you, but in reality they are impeding you. Learn to ride without training wheels. It is scary at first, but you'll quickly grow to like it.

A fine argument. But an argument that fails to address the topic at hand: should a range reducing typecast be included. I guess that makes your entire post a straw man.

But, I'm a big fan of proper datatypes so, @jmusther, here's a snippet for you to ponder...

typedef uint8_t Montmorency_unnecessary_byte_type_just_in_case_the_AVR_hardware_suddenly_morphs_into_a_Control_Data_6600;
typedef uint16_t memory_addres_t;

void setup( void )
{
  memory_addres_t memAddress = 0x1234;
  auto a = (Montmorency_unnecessary_byte_type_just_in_case_the_AVR_hardware_suddenly_morphs_into_a_Control_Data_6600)(memAddress >> 8);
  auto b = (Montmorency_unnecessary_byte_type_just_in_case_the_AVR_hardware_suddenly_morphs_into_a_Control_Data_6600)memAddress;
  Serial.begin(250000);
  Serial.println(a);
  Serial.println(b);
}

void loop( void ){}

No warnings nor errors and some lovely insulating typedefs. Perfect. Not even @Montmorency would disapprove.

Yes, "insulating typedef" is another way to approach the problem of maintainability. But the mob the exalted elders have already spoken: generic and type-aghnostic code is the way of the future. Hence the massive wave of type-agnostic features in C++ language since C++11 and onwards.

Montmorency:
the cast is unnecessary.

My AVR-GCC compiler disagrees. As implied by the compilation errors in an earlier post.

Only when explicitly told so with -Werror=conversion.

Your argument is invalid.

It means that you (or someone else) deliberately blocked this language feature in compiler setup. I would understand if someone wanted a warning in such cases. But making it an error (!) is unacceptable.

Basically, a compiler setup that issues such error messages is formally "broken" - it violates requirements of C++ language by rejecting perfectly valid code.

Note that modern C++ already has a feature that prevents so called narrowing conversions

uint8_t a = { memAddress }; // Error - narrowing conversion
uint8_t b = memAddress;     // No error here

Given that:

uint8_t y = 0x23;

uint32_t z = y<<24;   //nor error, no warning but gives 0 for z.

But,

uint32_t z = (uint32_t)y<<24; //gives expected 0x23000000 for z

Conclusion:
Casting is necessary?

uint32_t before z is the ‘data type’; uint32_t before ‘y<<24’ is the ‘casting’; what are their significance or what do they do?

oqibidipo:
Only when explicitly told so with -Werror=conversion.

Yup.

oqibidipo:
Your argument is invalid.

Yup. You win.

Either that I'm tired of tilting at windmills.

Montmorency:
Basically, a compiler setup that issues such error messages is formally "broken" - it violates requirements of C++ language by rejecting perfectly valid code.

C and C++ are filthy languages. They allow a long list of things guaranteed to crash a program. Which is why there are options to reject "perfectly valid code" that is "suspicious code likely to be buggy".

GolamMostafa:
Given that:

uint8_t y = 0x23;

uint32_t z = y<<24;  //nor error, no warning but gives 0 for z.

This is actually undefined behavior. The original value will be implicitly promoted to type int and int is a 16 bit type. Shifting a 16 bit value by 24 bits is undefined behavior in C and C++.

And GCC, including AVR-GCC, will definitely (!) issue a warning for this, unless it is deliberately suppressed:

See for yourself: Compiler Explorer
warning: left shift count >= width of type [enabled by default]

I get the same warning in my Arduino IDE.

The reason you get no warning in your Arduino IDE is that in your Arduino IDE preferences you have “Compiler warnings” set to “None”. You basically yourself suppressed all warnings in compiler settings. And now you are telling us that there’s “no warning”?

Go to Arduino IDE preferences and set “Compiler warnings” to “Default”. Once you set it to “Default”, try compiling this code again. Report your results to this forum.

(I have no idea what “genius” decided that it should be set to “None” in fresh IDE installs, especially considering that Arduino IDE is intended to be a learning tool.)

GolamMostafa:
But,

uint32_t z = (uint32_t)y<<24; //gives expected 0x23000000 for z

Conclusion:
Casting is necessary?

Sigh…

As it has been stated very clearly many times, casting is necessary when casting is necessary. And casting is UNnecessary when casting is UNnecessary. Each case is different.

In this case casting is indeed the right thing to do. There are many other cases when casting is necessary.

The point is that one has to realize what casting does and strive to use casts only when they are necessary. Think and decide. This will help one not to pollute, pollute and overpollute their code with unnecessary and annoying casts.

[quote author=Coding Badly link=msg=4149256 date=1556126701]
C and C++ are filthy languages. They allow a long list of things guaranteed to crash a program. [/quote]

If those things actually crashed the program with guaranteed certainty, we'd be living in a perfect world. A crash is obvious. It is an undeniable and unavoidable signal that the program needs fixing. (re: "Let It Crash" approach to program design.)

The problem with undefined behavior in C and C++ is that it does not guarantee to crash the program. Quite the opposite, a faulting program might limp along in undefined state for a very long time, doing God knows what, while giving external observers absolutely no indication that something is not right.

That's the real problem. Not crashes. Crashes are honest. Crashes play nice.

"Suspicious code likely to be buggy" is a very fuzzy category. There's no "one size fits all" kind of "suspicious" , that is equally applicable to a newbie and to an experienced developer.

Which is why I again stand with my mouth agape, looking in dumbfounded amazement at the fact that Arduino IDE is shipped with "Compiler Warnings" to "None" by default. O_o !!!