Is this Illegal Type Punning?

There have been many discussions here about the pointer and union shenanigans people attempt to subvert C++'s type system and the resultant risk of undefined behavior.

My understanding is that the one exception that is permitted is to cast a pointer to variable (or array, or struct, etc) to a 'char *' (or 'uint8_t *', etc). You may then read the original variable's bytes using this pointer.

For example (since the LoRaClass class inherits from Stream), you can legal do:

  uint32_t dataToSend;
  dataToSend = getSomeData();
  LoRa.write(reinterpret_cast<const uint8_t*>(dataToSend), sizeof(dataToSend));

Looking at the Print class's implementation of write() you can see how it reads the data using a 'const uint8_t *':

size_t Print::write(const uint8_t *buffer, size_t size)
{
  size_t n = 0;
  while (size--) {
    if (write(*buffer++)) n++;
    else break;
  }
  return n;
}

So, finally the question that @J-M-L and I were were debating in a different thread … is it also legal to write into a variable byte-by-byte or must you explicitly use memcpy()? Again with the LoRa example, can you legally do:

  uint32_t receivedData;
  LoRa.readBytes(reinterpret_cast<uint8_t*>(receivedData), sizeof(receivedData));

Looking at the code for readBytes(), you can see that it puts data into the variable byte-by-byte:

size_t Stream::readBytes(char *buffer, size_t length)
{
  size_t count = 0;
  while (count < length) {
    int c = timedRead();
    if (c < 0) break;
    *buffer++ = (char)c;
    count++;
  }
  return count;
}

memcpy() also uses the byte-by-byte technique. So is it OK or must the compiler explicitly see the function memcpy() being called? If that latter, it sure seems to be an inconsistent asymmetry. It also means I couldn't write my own function myMemCpy() that uses the EXACT SAME source code as memcpy() just with a different name. That blows my mind.

I know that @PieterP has written on this extensively. Perhaps he has an answer.

1 Like

wouldn't this also apply to copying structured data into a TCP/UDP frames or various other transmission mechanisms that just transmit bytes?

In our discussion my views were:

  • you can always convert to a byte (signed or unsigned char) pointer any object to permit examination of the object representation as an array of bytes

  • you can't reinterpret_cast a uint32_t pointer as a uint8_t pointer and use that to modify the uint32_t because of strict aliasing rule: an assumption made by the C++ compiler that dereferencing pointers to objects of different types will never refer to the same memory location (i.e. are aliases).

this was my understanding of the specification which states

The purpose of strict aliasing and related rules is to enable type-based alias analysis, which would be decimated if a program can validly create a situation where two pointers to unrelated types (e.g., an int* and a float*) could simultaneously exist and both can be used to load or store the same memory. Thus, any technique that is seemingly capable of creating such a situation necessarily invokes undefined behavior.
When it is needed to interpret the bytes of an object as a value of a different type, std::memcpy or std::bit_cast (since C++20) can be used

so my take was that to meet the standard you need to do

  uint32_t receivedData;
  uint8_t buffer[sizeof receivedData]; // note: GCC compiler provides an extension to support VLA but it’s not part of C++11. Malloc/free would be an option
  LoRa.readBytes(buffer, sizeof receivedData);
  memcpy(&receivedData, buffer, sizeof receivedData);

and let the compiler optimize the intermediary buffer away and just use the receivedData memory bytes, but knowingly.

(even if it's likely to work with our version of GCC)

1 Like

Please let me know if this is an accurate summary of your position ....

You may cast a point-to-object into an unsigned char * (or equivalently a uint8_t *, etc) for purposes of peeking at the object's constituent bytes in memory. However, you may not poke new byte values into memory using the unit8_t * to modify the original object and be guaranteed the expected result.

I imagine it does. My question was agnostic to the source of the bytes. I only wanted to know if you could poke those bytes into an object via a type cast pointer. Or, is memcpy() the only entity allowed to do such poking (even if a custom function does it with the same source code).

that's how I read the spec, basically an ugly way of informing the compiler about your intent

(I would add) if you access the data through the original object

or another way of interpreting this is, that by generating the error, the compiler is saying "are you sure you want to do this, tell me it's OK"?

there is no error when using cast below, in other words, the author is telling the compiler "it's OK"

    float f;
    float *pF = &f;
    char  *pC = (char*) &f;

(you might do this to look at, possibly modify the exponent and mantissa values of the float displayed in hex)

Yea, that was implied by my condition of "expected result".

I think whether or not you get an error is irrelevant. There was no error in my original type-punning code using reinterpret_cast. But it is still apparently incorrect per the spec and may cause undefined behavior. So, if that's the case it doesn't matter if there's no compiler error. You still can't do it. We just have to accept that and not try to weasel around it.

so "undefined behavior" == "incorrect" code?

why doesn't the compiler generate an error, not generate code, anytime/every time the compiler is given code resulting in "undefined behavior"?

IMO, yes.

I don't know.

does the compiler always generate "incorrect" code when the spec says it is "undefined"?

i believe you can "correctly" modify the value of a float to achieve some desired result when you know the byte/bit format for the float on the machine being used. this is more than likely some test code used to understand floating format or debug something

i wouldn't expect any guarantees from the compiler and agree it should be described as "undefined", the code is not doing something the compiler is designed to do

perhaps another example of modifying the values of a float using a byte pointer are when the float is written along with other data to some non-32-bit aligned storage media (e.g. flash) or over some other media (e.g. a radio link) as a sequence of bytes.

the packet could be an odd # of bytes comprised of a header identifying the type of message and the payload starting on some non-32-bit aligned byte could be written into a 32-bit align variable.

i'm guessing the "right" thing is to use memcpy(), but i don't see how it is any different than a straight byte copy.

I going to change my answer to ... "there's now way of knowing, that's what 'undefined' means"

After researching some more, that's where I think you are wrong. I'm not a compiler writer by any stretch of the imagination. But here's a contrived example to consider based on your proposal:

  float x;
  float y;
  uint8_t *ptr = reinterpret_cast<uint8_t *>(&x);

  x = 462.25;
  *ptr = 6;
  y = x;
  Serial.println(y);

Per what @J-M-L quoted above:
The purpose of strict aliasing and related rules is to enable type-based alias analysis, which would be decimated if a program can validly create a situation where two pointers to unrelated types (e.g., an int* and a float*) could simultaneously exist and both can be used to load or store the same memory. Thus, any technique that is seemingly capable of creating such a situation necessarily invokes undefined behavior.

The compiler "knows" that the byte write via the pointer can't possible effect the value of 'x'. So, it's perfectly free to (in the name of optimization) rearrange things:

  float x;
  float y;
  uint8_t *ptr = reinterpret_cast<uint8_t *>(&x);

  x = 462.25;
  y = x;
  *ptr = 6;
  Serial.println(y);

That obviously won't produce the same result. But you have no way of knowing that .... Undefined Behavior!

I would vote for this. Code needs to be deterministic. If you can’t trust that what you expect will happen (on any platform or with any compiler) then the code is not correct

Without even rearranging it could keep some data in a register

Sometimes it does not see it.

I have been following this with interest hoping to learn something. Unfortunately some of the discussion is beyond my understanding of C++.

@gfvalvo
Please would you explain in detail what the code in reply #13 is supposed to do, line by line, and why the putative complied version would produce different behaviour? With my level of knowledge I so no material difference and no possibility of different behaviour.

Thank you.

what result are you expecting? it modifies the value of x

It actually modifies one of the memory locations containing the value of x. However, because of optimization by the compiler, that "new value" of x may not get assigned to y and printed.

Again, because the compiler assumes that the programmer has followed the language rules (which this example violates), it could have legally rearranged the statement order as shown in the second code in that post. Thus, y won't be assigned your new value of x. Or, as @J-M-L pointed out, the compiler is free to assign to y the value of x that it has stored in a processor register ... thus never going to memory to pick up your new value of x. In both cases, y now has the 'old value' of x, not the one resulting from poking into memory.

Again, again, the above can happen because the compiler doesn't know x has been modified because doing so via a punned pointer is illegal.

Again, again, again .... if you'd like it put another way .... the compiler does NOT turn your source code into object code. Rather, it emits object code that will perform as your source code is written assuming that source code abides by the pact you've made with it. That pact is that you will obey the language standard. If you violate that pact (which the example code does), then all bets are off ---> UNDEFINED BEHAVIOR.

but y is the value of x before it is modified

are you suggesting that in the following code, the compiler will set "y" to the original value of "x", before it has been modified using a "valid" ptr because it doesn't know that the value of x is modified?

   float *ptr = &x;

    x = 462.25;
    *ptr = 6;
    y = x;