Is this Illegal Type Punning?

Go back and look at my post. The first code in that post is what to programmer wrote. The second code is how the compiler rearranged things. Note how the order is different. The programmer wanted 'y' to get the new value of 'x'. But because of compiler optimization, 'y' gets the old value of x.

No, that will work as intended ('y' will get the new value of 'x') because 'x' was modified using a 'float *' not a punned 'uint8_t *' as in the illegal code that I posted.

the thing is GCC is not doing any fancy stuff most of the time so it's hard to demonstrate the fault.

the C++ documentation gives an example that breaks the spec and be Undefined behavior (but would likely still work with GCC)

double d = 0.1;                           // 64 bits double on ESP32
int64_t n;                                // 64 bits integer
n = *reinterpret_cast<std::int64_t*>(&d); // attempting to copy the 8 bytes from our double into our integer by dereferencing a pointer ➜  Undefined behavior
memcpy(&n, &d, sizeof d);            // attempting to copy the 8 bytes from our double into our integer with memcpy ➜ fine

PS/ I tried with GCC using an ESP32 platform

it did not fail and we got the right 8 bytes into memory.

void setup() {
  // put your setup code here, to run once:
  Serial.begin(115200);
  double d = 0.1;                           // 64 bits double on ESP32 represented with 0x3FB999999999999A in IEEE format
  int64_t n;                                // 64 bits integer
  n = *reinterpret_cast<std::int64_t*>(&d); // attempting to copy the 8 bytes from our double into our integer by dereferencing a pointer ➜  Undefined behavior
  Serial.print("n = 0x");
  Serial.println(n, HEX);

  memcpy(&n, &d, sizeof d);                 // attempting to copy the 8 bytes from our double into our integer with memcpy ➜ fine
  Serial.print("n = 0x");
  Serial.println(n, HEX);
}

void loop() {}

Perry,
Did the discussion subsequent to your question elucidate things? If not, let me know what you'd like further clarification on.

so, in this case, the "undefined behavior" didn't result in incorrect code. it isn't guaranteed to do what you expect, but isn't necessarily wrong.

i think "undefined behavior" suggests "user beware".

having had to write assembler routines for Pascal code to interact with memory mapped hardware registers, my jaw dropped when i read about pointers in K&R. (no need for assembler routines).

i'm curious about what other languages have pointers like C that can be arbitrarily assigned any memory address? Pascal has pointers, but as far as I know, they can only be set to existing variables defined within the program.

How can you possibly find that to be an acceptable situation? The code may do what you want the first time you build it. But it might fail to do so with the next build after a compiler update. It might even fail inexplicably after a minor source code change in an unrelated section causes the compiler to make different optimization decisions.

I think so.
Is this correct:
x has the value 462.25
*ptr points to one of the bytes of x
*ptr = 6 arbitrarily changes that byte of x to 6, thus messing up the value of x
y is given the messed up value of x
printing y should not give 462.25

In the second version y is given the value of x before it is messed up, so changing the behaviour. Or rather, y might or might not be given the value of x before it is messed up, so the behaviour is not predictable, so undefined.

If I got any of that wrong then the wrong bit is the bit I need help with.

Thank you.

1 Like

It is wrong to rely on undefined behavior to be the right behavior. It will come bite you later :wink:

It absolutely does not: Undefined Behavior means that your code is incorrect. Period, end of discussion.
Please read cppreference: Undefined behavior

IMO, code invoking undefined behavior is even worse than code that contains logic errors or other bugs, because undefined behavior is not localized, it can affect your entire program, and can be almost impossible to debug because the behavior changes depending on the optimization level you're using.


I didn't read the entire discussion, but I'd like to clarify two things:

  1. Any casts to things like float, uint32_t, uint64_t etc. as in some of the examples in this thread are invalid (assuming you're dereferencing the result, and with the exception of casting to the original type of the variable, in which case the cast is usually unnecessary, or an unsigned/signed variant of the original type).
    Casts to byte types like (unsigned) char or std::byte are always allowed, so let's focus on that case.
  2. You can use the pointer-to-byte to some object of trivially copyable type to write to as well, specifically, the standard guarantees the following: [basic.types]

For any object (other than a potentially-overlapping subobject) of trivially copyable type T, whether or not the object holds a valid value of type T, the underlying bytes (6.7.1) making up the object can be copied into an array of char, unsigned char, or std::byte (17.2.1).37 If the content of that array is copied back into the object, the object shall subsequently hold its original value.

As far as I can tell, you don't necessarily have to use memcpy, you can manually copy the data byte-by-byte.

The standard makes no distinction between using such a byte pointer to read or to write to the object, it just mentions accessing the object through such a pointer: [basic.lval]

If a program attempts to access (3.1) the stored value of an object through a glvalue whose type is not similar (7.3.5) to one of the following types the behavior is undefined: 52

  • (11.1) the dynamic type of the object,
  • (11.2) a type that is the signed or unsigned type corresponding to the dynamic type of the object, or
  • (11.3) a char, unsigned char, or std::byte type.

Definition of access: [defns.access]

3.1 access
〈execution-time action〉 read (7.3.1) or modify (7.6.19, 7.6.1.5, 7.6.2.2) the value of an object

i don't believe "undefined behavior" means non-repeatable results.

i've found results to be reproducible and reliable for the type and scope of applications i've worked on.

i'm not saying you don't need to understand what the limitations are or just put your faith in the compiler.

It's impossible for there to be undefined behaviour when creating a pointer. The undefined behaviour is a result of dereferencing the pointers; actually reading / writing memory. Your snippet only creates the two pointers.

No. It's worse and literal. "Undefined behaviour" can mean that the compiler you are using today may always generate code that works as you expect. And that the next compiler version may not. Or that the compiler you are using today may sometimes generate code that works as you expect and sometimes not. Or that adding a new library can result in correct code. Or not. Undefined behaviour is precisely that: undefined.

No. Which is what makes "undefined behaviour" very problematic.

I think so. This may help...

Imagine you have a value stored in memory. Let's call it x and say it has the value 462.25. Let's say that's your holiday present in pound sterling.

Let's say we have a poorly coded function that's supposed to transfer x to your bank account. Let's say that function is long and complicated and it includes math to calculate two pointers to x. One pointer is used to verify that x is a valid floating point number. In the process of verification, the value is temporarily altered to -42. The other pointer is used when transferring the amount to your bank account. After transfer the value is set to zero.

How much money actually makes it into your account? (The answer is "undefined".)

The problem is that the compiler is free to reorder statements and is free to hold values in temporary places like registers. The only guarantee is visible side effects. At specific points in the execution it has to appear that the code did what you expected it to do.

For example, let's say our troubled function writes the value of x to a log file. Let's also say that the compiler is aware that one pointer points to x (is an alias). Does it matter if the compiler loads the value of x using x or using the pointer? The answer is "no" because what we observe is the same if the pointer is truly an alias. In both cases we get the correct answer.

Bad things happen when the compiler is not aware of an alias. Continuing the example above, if the compiler is unaware that the one or both pointers are aliases for x you could end with 42 pounds being removed from your account. The compiler may not realize that -42 was written to x on the verification code path then read that value early from x as an optimization for the transfer path.

1 Like

Living is easy with eyes closed ...

Thanks Pieter. But unfortunately, much of text from the official documentation that you quoted appears targeted at compiler writers not regular programmers. So, let me directly re-ask my original question ... hopefully there's a yes / no answer. IS IT OK TO DO THIS:

  uint32_t receivedData;
  LoRa.readBytes(reinterpret_cast<uint8_t*>(receivedData), sizeof(receivedData));

In the above, assume that LoRa is an object of a class that inherits from Stream. And, assume that Stream's implementation of readBytes() is this:

size_t Stream::readBytes(char *buffer, size_t length)
{
  size_t count = 0;
  while (count < length) {
    int c = timedRead();
    if (c < 0) break;
    *buffer++ = (char)c;
    count++;
  }
  return count;
}

I understand where you are coming from. But this is only a behavioural validation. Someone who wants certainty needs to obey the contract rules set out by the language definition. Either that, or know absolutely everything about the compiler. I'm pretty sure even most top notch programmers don't.

Some of my older code has violated these access rules, and worked fine. Luckily, I just haven't required such a feature in a long time, but when I do, I will now pay attention to these rules. It's not really any more difficult than the "easy" way, using unions.

I would prefer you not be the person who writes the code for my autopilot.

2 Likes

Slightly off-topic, but simply creating a pointer can invoke undefined behavior, e.g. [expr.add]

int arr[3];
int *p = arr + 3; // okay
int *q = arr + 4; // Undefined Behavior

Yes.

Strawberry Fields forever.

you might be using my Android Ethernet driver

i truly understand the need to comply with "the spec" when writing such code. my "violations" are mostly when trying to debug problems and i have to reach into the "innards" of both hardware or software and am convinced of the validity of the approach

we've had this discussion before. it would be more helpful to explain "how t do it per the standard" rather than say "you can't do that because the standard says it's undefined behavior"

(there was one exception where someone suggested instead of casting the const char string (e.g. "henry") to set it to a char * variable, simply define the variable as "const char *")

unfortunately, the examples are often contrived rather than practical. i think the other thread involved copying data from a data stream of bytes into structured data and people disliked my approach. apparently memcpy() was the solution (apparently it was nothing more than a NOP)

but my understanding of the subtle difference was lost in the haystack of quotes from "the standard"

i'd like a better understanding of why the standard describes things as "undefined" rather than just accepting the approach as unacceptable. i understand that unless a behavior is valid in all possible cases, it needs to be identified as "undefined"

engineering is sometimes about "bending" the rules

Good :+1: Then i got it wrong (again)

You had me convinced.

Should you not use the address of receivedData there?

1 Like