Bit packing (union) puzzle

From what I've read using a union to access bit values saves RAM - size of union = size of largest element. My two example snippets do not prove this.

// Structure containing union

struct PLCtimer {
#define ton 0
#define rto 1

  public:
    unsigned long pre;
    unsigned long acc;
    unsigned long last;
    union {  // save RAM by combining flags into one byte
      byte allbits;   // allow access of the whole byte.
      struct {   // allow access of individual bits.
        byte retentive: 1;
        byte dn: 1;
        byte en: 1;
        byte tt: 1;
        byte res: 1;
        byte intv: 1;
        byte tmrOS: 1;
        byte osSetup: 1;
      };
    } TMRFlags;
};

byte testVal;

void setup() {
  // put your setup code here, to run once:
PLCtimer ex1;
testVal = ex1.TMRFlags.en;

}

void loop() {
  // put your main code here, to run repeatedly:

}
// Structure without union

struct PLCtimer {
#define ton false
#define rto true

  public:
    unsigned long pre;
    unsigned long acc;
    unsigned long last;
    bool retentive; // TON = false, RTO = true.
    bool dn;
    bool en;
    bool tt;
    bool res;
    bool intv;
    bool tmrOS;
    bool osSetup;
}; // end of structure  PLC timer

byte testVal;

void setup() {
  // put your setup code here, to run once:
PLCtimer ex2;
testVal = ex2.en;
}

void loop() {
  // put your main code here, to run repeatedly:

}

The compiler reports 478 bytes flash and 10 bytes RAM used for both (on an UNO).

I also removed the struct from the first example and memory usage remains unchanged.

What am I missing?

The :n value is a recommendation. It is not required to be recognized or used.

An enum, for all the bit names, and bitRead() or bitSet() would do what you are trying to do with the union, with the help of a #define macro or two.

If you check sizeof TMRFlags in your first example you'll find it to be 1.

If, in example 2 you change your bool to byte, you may find memory use increases. I believe the compiler packs bools into bitfields and part of its optimization

There's only one instance of PLCtimer, and it's an automatic variable. It doesn't count toward the RAM usage report.

At least it is doing the mapping to bits. LSB first, it appears.

allbits = 1  Individual bits = 1 0 0 0 0 0 0 0
allbits = 2  Individual bits = 0 1 0 0 0 0 0 0
allbits = 4  Individual bits = 0 0 1 0 0 0 0 0
allbits = 8  Individual bits = 0 0 0 1 0 0 0 0
allbits = 16  Individual bits = 0 0 0 0 1 0 0 0
allbits = 32  Individual bits = 0 0 0 0 0 1 0 0
allbits = 64  Individual bits = 0 0 0 0 0 0 1 0
allbits = 128  Individual bits = 0 0 0 0 0 0 0 1
void setup()
{
  Serial.begin(115200);


  PLCtimer ex1;
  // put your setup code here, to run once:
  for (byte i = 1; i > 0 ; i <<= 1)
  {
    ex1.TMRFlags.allbits = i;
    Serial.print("allbits = ");
    Serial.print(ex1.TMRFlags.allbits);
    Serial.print("  Individual bits = ");
    Serial.print(ex1.TMRFlags.retentive);
    Serial.print(" ");
    Serial.print(ex1.TMRFlags.dn);
    Serial.print(" ");
    Serial.print(ex1.TMRFlags.en);
    Serial.print(" ");
    Serial.print(ex1.TMRFlags.tt);
    Serial.print(" ");
    Serial.print(ex1.TMRFlags.res);
    Serial.print(" ");
    Serial.print(ex1.TMRFlags.intv);
    Serial.print(" ");
    Serial.print(ex1.TMRFlags.tmrOS);
    Serial.print(" ");
    Serial.println(ex1.TMRFlags.osSetup);
  }
  testVal = ex1.TMRFlags.en;
}

Thanks for all the replies! I've found an answer - updated snippets posted below.

PaulS:
The :n value is a recommendation. It is not required to be recognized or used.

Interesting. I put the value 11 after one of the bit names and the compiler made no complaint. Odd, since that’s greater than the number of bits in the type. That's a tidbit to be kept.

PaulS:
An enum, for all the bit names, and bitRead() or bitSet() would do what you are trying to do with the union, with the help of a #define macro or two.

Alas, building a #define macro is not yet in my kit.

holmes4:
Err you read wrong
Mark

Ummm, I think I got that one right. Reply #1 in this thread has a link to this other thread which explains in example #3 the bit field idea.

DKWatson:
If you check sizeof TMRFlags in your first example you'll find it to be 1.
If, in example 2 you change your bool to byte, you may find memory use increases. I believe the compiler packs bools into bitfields and part of its optimization

I substituted bools for bytes in both programs – no difference in RAM usage in either.

christop:
There's only one instance of PLCtimer, and it's an automatic variable. It doesn't count toward the RAM usage report.

I did a little searching on ‘c++ automatic variable’ and didn’t find anything definitive. That’s not saying what I read was not definitive, it’s just saying it wasn’t definitive to me. Sometimes these concepts are hard to grasp!

johnwasser:
At least it is doing the mapping to bits. LSB first, it appears.

Your post and DKWatson's combined with the second link noted above jogged something loose upstairs. I added a print statement to the two snippets and got expected results. LSB first is jarring.

// Structure containing union

struct PLCtimer {
#define ton 0
#define rto 1

  public:
    unsigned long pre;
    unsigned long acc;
    unsigned long last;
    union {  // save RAM by combining flags into one byte
      byte allbits;   // allow access of the whole byte.
      struct {   // allow access of individual bits.
        byte retentive: 1;
        byte dn: 1;
        byte en: 1;
        byte tt: 1;
        byte res: 1;  // unused
        byte intv: 1;   //
        byte tmrOS: 1; //
        byte osSetup: 1;
      };
    } TMRFlags;
};

byte testVal;

void setup() {
  // put your setup code here, to run once:
  Serial.begin(230400);
PLCtimer ex1;
testVal = ex1.TMRFlags.en;
Serial.print(sizeof(ex1));

}

void loop() {
  // put your main code here, to run repeatedly:

}
// Structure without union

struct PLCtimer {
#define ton false
#define rto true

  public:
    unsigned long pre;
    unsigned long acc;
    unsigned long last;
    bool retentive; // TON = false, RTO = true.
    bool dn; // timer has reached preset
    bool en; // timer is enabled to run
    bool tt; // timer is enabled and not done
    bool res; // timer is reset
    bool intv; // self resetting timer
    bool tmrOS; // on one scan when done bit goes on
    bool osSetup;
}; // end of structure  PLC timer

byte testVal;

void setup() {
  // put your setup code here, to run once:
  Serial.begin(230400);
  PLCtimer ex2;
  testVal = ex2.en;
  Serial.print(sizeof(ex2));
}

void loop() {
  // put your main code here, to run repeatedly:

}

Size of ex1 is 13, 4 x 3 for the unsigned longs plus one for the union.

Size of ex2 is 20, 4 x 3 for the unsigned longs (12 total) plus 8 for the bits.

So, using a union does reduce RAM usage.

But it also increases flash. The un-union version reports 1456 used while the union version report 1592 bytes flash. A lot more than I'd expect.

Unions are used to combine different types into the same variable. In your case to be able to access all the bits in one go using allbits (e.g. for easy checking if one of the bits is set or all bits are set) or to access an individual bit using e.g. retentive.

Mostly you’re missing that the “ex1” variable you create is “local variable”, which means that it’s allocated on the stack at runtime, and doesn’t show up in compile-time “RAM used” statistic at all…

Your 10 bytes of data is 9 bytes related to keeping track of millis(), and one byte for “testVal.” I’m a little surprised that testVal survives all the way to the elf file. Frequently, avr-gcc optimization is so good at saying “there’s an assignment to testVal, but it’s not ever used” or “ex1.TMRFlags.en is a constant 1; I can tell at compile-time” that figuring out how the compiler works from little test programs like this is nearly impossible.

 avr-nm -S *.elf | grep ' [bBdD] '
0080010a B __bss_end
00800100 B __bss_start
00800100 D _edata
00800100 00000001 B testVal
00800101 00000001 b timer0_fract
00800102 00000004 B timer0_millis
00800106 00000004 B timer0_overflow_count

(this doesn’t mean that the other warnings you’re getting from people aren’t also true. But they aren’t really coming into play.)

The :n value is a recommendation. It is not required to be recognized or used.

Yeah, well, embedded C programmers are a bit insistent that they be able to map structures onto IO registers, and all of Atmel’s CMSIS .h files (as one example) would break if bitfields didn’t work in all the compilers they’re expected to work with…

/* -------- PORT_WRCONFIG : (PORT Offset: 0x28) ( /W 32) GROUP Write Configuration -------- */
#if !(defined(__ASSEMBLY__) || defined(__IAR_SYSTEMS_ASM__))
typedef union {
  struct {
    uint32_t PINMASK:16;       /*!< bit:  0..15  Pin Mask for Multiple Pin Configuration */
    uint32_t PMUXEN:1;         /*!< bit:     16  Peripheral Multiplexer Enable      */
    uint32_t INEN:1;           /*!< bit:     17  Input Enable                       */
    uint32_t PULLEN:1;         /*!< bit:     18  Pull Enable                        */
    uint32_t :3;               /*!< bit: 19..21  Reserved                           */
    uint32_t DRVSTR:1;         /*!< bit:     22  Output Driver Strength Selection   */
    uint32_t :1;               /*!< bit:     23  Reserved                           */
    uint32_t PMUX:4;           /*!< bit: 24..27  Peripheral Multiplexing            */
    uint32_t WRPMUX:1;         /*!< bit:     28  Write PMUX                         */
    uint32_t :1;               /*!< bit:     29  Reserved                           */
    uint32_t WRPINCFG:1;       /*!< bit:     30  Write PINCFG                       */
    uint32_t HWSEL:1;          /*!< bit:     31  Half-Word Select                   */
  } bit;                       /*!< Structure used for bit  access                  */
  uint32_t reg;                /*!< Type      used for register access              */
} PORT_WRCONFIG_Type;

(OTOH, see also https://github.com/arduino/ArduinoCore-samd/issues/319)

Yeah, well, embedded C programmers are a bit insistent that they be able to map structures onto IO registers, and all of Atmel's CMSIS .h files (as one example) would break if bitfields didn't work in all the compilers they're expected to work with...

I have to wonder if bitfields are supported only on some types - i.e. those of fixed size. A uint8_t is always 8 bits. A short is not. A byte probably is. A bool definitely is not. An int is not.

Usually "int", "short", and "long" are the "native" types, and the uint8_t and so on are just system-wide typedefs tucked in stdint.h or similar. I've been using bitfields overlayed on real data since the 80s, before the uintN_t types existed. (and getting burned, too, since (for example) little endian and big-endian cpus seem to define their bitfields starting at opposite ends of the larger datatype.)

dougp:
So, using a union does reduce RAM usage.

It's not the union that reduces RAM usage; using bitfields does. The union simply allows you to "alias" the collection of bitfields as a single byte. If you remove the union but leave the bitfields, you'll find the whole structure to be the same size as with the union.

sterretje:
Unions are used to combine different types into the same variable. In your case to be able to access all the bits in one go using allbits (e.g. for easy checking if one of the bits is set or all bits are set) or to access an individual bit using e.g. retentive.

Bit by bit (pun coincidental) more light dawns. Since I don’t need, or intend, to access the bits as a group, allbits can go away. I tested this and sizeof(ex1) is unchanged. I thought I needed allbits since it was part of the example I copied.

christop:
It's not the union that reduces RAM usage; using bitfields does. The union simply allows you to "alias" the collection of bitfields as a single byte. If you remove the union but leave the bitfields, you'll find the whole structure to be the same size as with the union.

Yes, it’s true. Thanks for that!

westfw:
Mostly you're missing that the "ex1" variable you create is "local variable", which means that it's allocated on the stack at runtime, and doesn't show up in compile-time "RAM used" statistic at all...
Your 10 bytes of data is 9 bytes related to keeping track of millis(), and one byte for "testVal."…

Ah. That was staring me in the face. Should've seen it sooner.

Thanks again for all the informative replies.