Problem with the use of union

In 1990 for the first time, I used union structure to split a 16-bit data item into two 8-bit data items based on its name and without knowing the "actual purpose" for the inclusion of this structure in the C Language. Since the, I have been successfully using it to obtain 32-bit binrya32 formatted bit pattern for a float type data. In recent times, having heard comments from the veteran posters in this Forum, I have stopped using union for the float to/from bit conversion.; instead, I am using memcpy() function.

Would appreciate to hear/see examples about the actual purpose of union.

The union was and is a very handy and perfectly legitimate feature of the Fortran programming language, and its purpose is to allow the same memory cells to be accessed in different ways, exactly as the OP wishes to use it (e.g. float versus integers or bytes).

My theory is that C and later C++ language designers were more or less forced to adopt the union, but they absolutely hate of the idea of using a union to get around strong variable typing.

The union works for that on every C or C++ compiler I've tried. But of course no compiler is required to accommodate such usage, so use it with caution.

One of the common purposes is to store a variant type. For example if you were parsing some code, you might have a token structure that could hold identifiers, strings, integers, or floats.

enum TokenType {
  TOKEN_IDENTIFIER,
  TOKEN_STRING,
  TOKEN_INTEGER,
  TOKEN_FLOAT,
  /// etc.
};

struct Token{
  union {
    const char* pIdentifier;
    const char* pString;
    int         integer;
    float       fl;
  } value;
  enum TokenType type;
};

struct Token sampleToken = { .value = { .pIdentifier = "union" }, 
                             .type = TOKEN_IDENTIFIER };

The union field that you want to de-reference depends on the value of type.

1 Like

Been reading this thread with interest as until now I'd never even heard of UNION.

Question: Why not just use * to point at the base variable (no matter what type, the 1st byte is the first byte of whatever type it is) and just read whatever bytes you need from there? Or am I missing something?

this way was discussed above, it is how memcpy() works

1 Like

Ok, so why use UNION? It seems to obscure more than it clarifies.

1 Like

memcpy() just copies a source space into a destination space. The union structure allows many variants (variables) of identical capacity to share a common memory space. For example:

union myData
{
  long y;
  float z;
  int myArray1[2];
  byte myArray2[4];
};

In the above example, union provides an easy way of breaking a 32-bit composite data (or a float number like 83.67) into four individual bytes before transmitting it over any 8-bit communication channel like UART/SPI/I2C.

Ok, i was completely misunderstanding the syntax. in your example its not a structure of a long, float, int, and byte, these are effectively alternate type aliases. So you save to, say, the float Z, and then your io routine accesses myArray2 to read a byte at a time.
Got it, makes sense so the compiler doesn't go ape-y when two different sections of code hit the same var with a different implicit types.
Ok, while were on this, what about casting?

It is a data structure of category union and NOT of category struct.

Casting transforms data from one type to another. In union, I think that casting mechanism is embedded.

I use the following codes to transmit temperature signal (a float data item) using I2C Bus of UNO:

#include<Wire.h>

union myData
{
  float z;
  byte myArray2[4];
};

myData data;

void setup() 
{
  Serial.begin(9600);
  data.z = 83.67;        //temperature signal acquired by DS18B20 for example
  Wire.beginTransmission(0x13);
  Wire.write(data.myArray2, sizeof data.myArray2); //lower-byte first
  Wire.endTransmission();
 }

void loop() 
{
  
}
1 Like

Gotcha! Starting to make sense. Cxx can do lots of memory tricks that us old ASM kids did when no-one was looking. LOL
For a high level language it flys real close to the bare silicon!

This is exactly the type of undefined behaviour what @PieterP is talking about, right? data.myArray2 is not active, so we should not read from it.

I don't think union was ever part of standard Fortran. But even as an extension, not all compilers allowed you to use it for type punning:

Unions and Maps (FORTRAN 77 Language Reference)
When you reference a field in a map, the fields in any previous map become undefined and are succeeded by the fields in the map of the newly referenced field.

Of course: Type punning using unions implicitly undermines the entire type system. In the rare case where that's something you need, you should be very explicit about it.

It is not clear to me why you are so adamant to keep using unions for type punning. It has always been a hack, unrelated to the true purpose of the union, and more importantly, safer alternatives that actually express the programmer's intent are available:

  • If you want to access the underlying bytes of an object, use a pointer cast, e.g.

    float f = 123.456;
    Serial.write(reinterpret_cast<const byte *>(&f), sizeof(f));
    

    This is more expressive, shorter than the union hack, and easier to search for in a code base.

  • If you want an object of one type with the same memory representation as an object of another type, use bit_cast or memcpy, e.g.

    float f = 123.456;
    auto bits_of_f = std::bit_cast<uint32_t>(f);
    
    float f = 123.456;
    uint32_t bits_of_f;
    memcpy(&bits_of_f, &f, sizeof(f));
    

    This is again more expressive, indicating that you're copying or casting memory representations rather than values, and in the case of bit_cast, there are additional compile-time checks (matching sizes and trivial copyability).

2 Likes

Why do you say that data.myArray2 is not active? What does it mean?

[class.union.general]
In a union, a non-static data member is active if its name refers to an object whose lifetime has begun and has not ended (6.7.3). At most one of the non-static data members of an object of union type can be active at any time, that is, the value of at most one of the non-static data members can be stored in a union at any time.
[...]
When the left operand of an assignment operator involves a member access expression (7.6.1.5) that nominates a union member, it may begin the lifetime of that union member [...]

Roughly speaking, the active member of a union is the one you most recently assigned to.

union U {
  float value;
  const char *string;
};
U u;
u.value = 123.456;
// 'value' is the active member
u.string = "abcdef";
// now 'string' is the active member
2 Likes

And that's where the step to Variant should have been taken, with an explicit note about the type of the active member, including eventual type conversions.

But as is the coder has no chance to satisfy the "active member" restriction, and the purpose of a union is just to disable internal type conversions on a reference to any member. It's the coders decision to use a union for some purpose, with no artificial and almost unsupported restrictions.

That's just my opinion, which stands in contrast to the current C++ specs. IMO the C++ designers better should have preserved union as a legacy, deprecated, unsafe and unprotected language feature, and introduce a new fully conforming variant (class) type.

I am now confused with my understanding on the definition of of union type data structure.

I have the understanding that the members of a union could of different data types, but their individual storage capacity must be identical. Being inspired from your example of post #34, I have compiled the following codes.

union U 
{
  float value;
  byte y;
};
U u;

Here, the union variable u contains two members of different storage capacity -- 32-bit and 8-bit.

Has the union been designed just to save memory having assigned a common space for the storage of many variables; where, only one variable is active at a time? Is it the purpose of union? The code conversion using union is its "logical implementation" as @PieterP has elicited in post #17 and yet a user is discouraged to do so?

That's the implementation, the designed purpose is questionable. Allocated is space for the largest member, and if a member consists of multiple fields then alignment gaps etc. may introduce well known chances and problems.

1 Like

C++ has std::variant, which is a tagged union.

The “normal”, C-style union is the low-level construct that allows you to implement variant types.

The language reference for union includes such a note:

It is undefined behavior to read from the member of the union that wasn't most recently written.

And both the C++ standard and the official C++ Core Guidelines forbid using unions for type punning:

C++ Core Guidelines: C.183: Don’t use a union for type punning
Reason It is undefined behavior to read a union member with a different type from the one with which it was written. Such punning is invisible, or at least harder to spot than using a named cast. Type punning using a union is a source of errors.

Then you're in luck, because that's exactly what they did:
C-style unions are discouraged, and the standard library provides a variant type.

  • C++ Core Guidelines: C.181: Avoid “naked” unions

    Reason A naked union is a union without an associated indicator which member (if any) it holds, so that the programmer has to keep track. Naked unions are a source of type errors.
    Alternative Wrap a union in a class together with a type field.
    The C++17 variant type (found in <variant>) does that for you.

  • std::variant - cppreference.com

    The class template std::variant represents a type-safe union. An instance of std::variant at any given time either holds a value of one of its alternative types [...]

Yes, this is the purpose of the union. ibuildrobots posted a good example of this:

The name union comes from the mathematical union of sets.

For example, the type uint32_t represents the set of all 32-bit integers, or

uint32_t = {n ∊ ℕ | 0 ≤ n < 2³²}.

Similarly, float usually represents the set of all 32-bit IEEE-754 floating point numbers.

Then the type union int_or_float { uint32_t u; float f; } represents mathematical union of the set of all 32-bit integers and the set of all IEEE-754 floating point numbers,

int_or_float = {n ∊ ℕ | 0 ≤ n < 2³²} {set of 32-bit IEEE-754 numbers}

1 Like

I have come to know about union from Venn Diagram in Statistical Communication Theory of my EEE 4th Year class some 45 years ago.