Sizeof() not returning as expected for a struct on rp2040

Here is my program:

#include <Arduino.h>

typedef struct
  {
  uint32_t RPM_Pulses_Per_Revolution;
  int32_t  Load_Cell_Tare;
  double   Load_Cell_Calibrate;
  double   Lever_Length_Feet;
  double   Efficiency_Factor;
  uint32_t CRC;
  } SETUP_STRUCT_TYPE;

SETUP_STRUCT_TYPE dyno_setup =
  {              //                                     RP2040  UNO
  4,             // uint32_t RPM_Pulses_Per_Revolution; 4       4
  -75066,        // int32_t  Load_Cell_Tare;            4 = 8   4 = 8
  -66502.082794, // double   Load_Cell_Calibrate;       8 = 16  4 = 12
  1.0,           // double   Lever_Length_Feet;         8 = 24  4 = 16
  1.0,           // double   Efficiency_Factor;         8 = 32  4 = 20
  0x2A40D05E     // uint32_t CRC;                       4 = 36  4 = 24
  };

void setup()
{

  Serial.begin(115200);
  // Wait for serial port to connect. Needed for native Adalogger
  // RP2040 USB port only.
  while(!Serial)
    {
    delay(10);
    }  
  Serial.print("PS sizeof(SETUP_STRUCT_TYPE):                     "); Serial.println(sizeof(SETUP_STRUCT_TYPE));
  Serial.print("PS sizeof(dyno_setup):                            "); Serial.println(sizeof(dyno_setup));
  Serial.print("PS sizeof(dyno_setup.RPM_Pulses_Per_Revolution):  "); Serial.println(sizeof(dyno_setup.RPM_Pulses_Per_Revolution));
  Serial.print("PS sizeof(dyno_setup.Load_Cell_Tare):             "); Serial.println(sizeof(dyno_setup.Load_Cell_Tare));
  Serial.print("PS sizeof(dyno_setup.Load_Cell_Calibrate):        "); Serial.println(sizeof(dyno_setup.Load_Cell_Calibrate));
  Serial.print("PS sizeof(dyno_setup.Lever_Length_Feet):          "); Serial.println(sizeof(dyno_setup.Lever_Length_Feet));
  Serial.print("PS sizeof(dyno_setup.Efficiency_Factor):          "); Serial.println(sizeof(dyno_setup.Efficiency_Factor));
  Serial.print("PS sizeof(dyno_setup.CRC):                        "); Serial.println(sizeof(dyno_setup.CRC));
  Serial.print("PS addr of &dyno_setup:                           "); Serial.println((uint32_t)&dyno_setup, HEX);
  Serial.print("PS addr of &dyno_setup.RPM_Pulses_Per_Revolution: "); Serial.println((uint32_t)&dyno_setup.RPM_Pulses_Per_Revolution, HEX);
  Serial.print("PS addr of &dyno_setup.Load_Cell_Tare:            "); Serial.println((uint32_t)&dyno_setup.Load_Cell_Tare, HEX);
  Serial.print("PS addr of &dyno_setup.Load_Cell_Calibrate:       "); Serial.println((uint32_t)&dyno_setup.Load_Cell_Calibrate, HEX);
  Serial.print("PS addr of &dyno_setup.Lever_Length_Feet:         "); Serial.println((uint32_t)&dyno_setup.Lever_Length_Feet, HEX);
  Serial.print("PS addr of &dyno_setup.Efficiency_Factor:         "); Serial.println((uint32_t)&dyno_setup.Efficiency_Factor, HEX);
  Serial.print("PS addr of &dyno_setup.CRC:                       "); Serial.println((uint32_t)&dyno_setup.CRC, HEX);
}

void loop()
{
}

On an UNO, sizeof() returns as I expect:

Run on a UNO:
PS sizeof(SETUP_STRUCT_TYPE):                     24
PS sizeof(dyno_setup):                            24
PS sizeof(dyno_setup.RPM_Pulses_Per_Revolution):  4
PS sizeof(dyno_setup.Load_Cell_Tare):             4
PS sizeof(dyno_setup.Load_Cell_Calibrate):        4
PS sizeof(dyno_setup.Lever_Length_Feet):          4
PS sizeof(dyno_setup.Efficiency_Factor):          4
PS sizeof(dyno_setup.CRC):                        4
PS addr of &dyno_setup:                           100
PS addr of &dyno_setup.RPM_Pulses_Per_Revolution: 100
PS addr of &dyno_setup.Load_Cell_Tare:            104
PS addr of &dyno_setup.Load_Cell_Calibrate:       108
PS addr of &dyno_setup.Lever_Length_Feet:         10C
PS addr of &dyno_setup.Efficiency_Factor:         110
PS addr of &dyno_setup.CRC:                       114 // 114 - 100 =0x14 = 20 which checks, Add 4 bytes for CRC and we are at 24

On the rp2040 it returns 4 bytes larger than I expect:

PS sizeof(SETUP_STRUCT_TYPE):                     40 // !!  I THINK THIS SHOULD BE 36 !!
PS sizeof(dyno_setup):                            40 // !!  I THINK THIS SHOULD BE 36 !!
PS sizeof(dyno_setup.RPM_Pulses_Per_Revolution):  4
PS sizeof(dyno_setup.Load_Cell_Tare):             4
PS sizeof(dyno_setup.Load_Cell_Calibrate):        8
PS sizeof(dyno_setup.Lever_Length_Feet):          8
PS sizeof(dyno_setup.Efficiency_Factor):          8
PS sizeof(dyno_setup.CRC):                        4 // THESE ADD TO 36 
PS addr of &dyno_setup:                           20000B98
PS addr of &dyno_setup.RPM_Pulses_Per_Revolution: 20000B98
PS addr of &dyno_setup.Load_Cell_Tare:            20000B9C
PS addr of &dyno_setup.Load_Cell_Calibrate:       20000BA0
PS addr of &dyno_setup.Lever_Length_Feet:         20000BA8
PS addr of &dyno_setup.Efficiency_Factor:         20000BB0
PS addr of &dyno_setup.CRC:                       20000BB8  // 20000BB8 - 20000B98 =0x20 = 32 which checks,  Add 4 bytes for CRC and we are at 36

I can code around it, but this is really unexpected.

Is this a known issue? Does the rp2040 align on an 8 byte boundary?

What am I missing?

On AVR, there is no double; you just get a float instead, which is 4 bytes.

You also don't need to upload and run the code to see this, with a static_assert enforced at compile-time

struct SETUP_STRUCT_TYPE
{
  uint32_t RPM_Pulses_Per_Revolution;
  int32_t  Load_Cell_Tare;
  double   Load_Cell_Calibrate;
  double   Lever_Length_Feet;
  double   Efficiency_Factor;
  uint32_t CRC;
} 
dyno_setup =
{                //                                     RP2040  UNO
  4,             // uint32_t RPM_Pulses_Per_Revolution; 4       4
  -75066,        // int32_t  Load_Cell_Tare;            4 = 8   4 = 8
  -66502.082794, // double   Load_Cell_Calibrate;       8 = 16  4 = 12
  1.0,           // double   Lever_Length_Feet;         8 = 24  4 = 16
  1.0,           // double   Efficiency_Factor;         8 = 32  4 = 20
  0x2A40D05E     // uint32_t CRC;                       4 = 36  4 = 24
};
static_assert(sizeof(double) == 4);
static_assert(offsetof(SETUP_STRUCT_TYPE, Lever_Length_Feet) == 12);

void setup() {}

void loop() {}

Compiles fine on AVR Uno, but not on R4 or ESP32. The latter has better error messages

sketch_feb17a.ino:19:30: error: static assertion failed
   19 | static_assert(sizeof(double) == 4);
      |               ~~~~~~~~~~~~~~~^~~~
sketch_feb17a.ino:19:30: note: the comparison reduces to '(8 == 4)'
sketch_feb17a.ino:20:62: error: static assertion failed
   20 | static_assert(offsetof(SETUP_STRUCT_TYPE, Lever_Length_Feet) == 12);
      |                                                              ^
sketch_feb17a.ino:20:62: note: the comparison reduces to '(16 == 12)'

ETA: What you see on RP2040 also happens on ESP32

// static_assert(sizeof(double) == 4);
// static_assert(offsetof(SETUP_STRUCT_TYPE, Lever_Length_Feet) == 12);
static_assert(offsetof(SETUP_STRUCT_TYPE, CRC) == 32);
static_assert(sizeof(dyno_setup.CRC) == 4);
static_assert(sizeof(SETUP_STRUCT_TYPE) == 40);  // does not "add up"

Making the struct packed

struct __attribute__((packed)) SETUP_STRUCT_TYPE

fixes it to do what you expect

note: the comparison reduces to '(36 == 40)'
1 Like

@kenb4 provided the solution. Thanks.

Since the rp2040 is 32-bit architecture and every one of my structure elements is 4 or 8 bytes, it never occurred to me that packing of the structure would be an issue.

Apparently the rp2040 uses some kind of 8-byte alignment on structures. Not the elements inside the structure, but where the next varaible will be placed.

sizeof() bafflingly takes this into account -- leading to the situation where the sum of all the sizeof() of all the elements does not equal the sizeof() the array, even though all the elements are 4-bytes or 8-bytes in size.

Using typedef struct _ _ attribute _ _ ((packed)) {. . .} makes sizeof() work correctly, even though none of the sizes or addresses of the structure's members have changed.

Here is the new code that masks the sizeof() / 8-byte alignment issue:

#include <Arduino.h>

typedef struct __attribute__((packed))
  {
  uint32_t RPM_Pulses_Per_Revolution;
  int32_t  Load_Cell_Tare;
  double   Load_Cell_Calibrate;
  double   Lever_Length_Feet;
  double   Efficiency_Factor;
  uint32_t CRC;
  } SETUP_STRUCT_TYPE;

SETUP_STRUCT_TYPE dyno_setup =
  {
  4,             // uint32_t RPM_Pulses_Per_Revolution; 4
  -75066,        // int32_t  Load_Cell_Tare;            4 = 8 
  -66502.082794, // double   Load_Cell_Calibrate;       8 = 16
  1.0,           // double   Lever_Length_Feet;         8 = 24
  1.0,           // double   Efficiency_Factor;         8 = 32
  0x2A40D05E     // uint32_t CRC;                       4 = 36
  };

void setup()
{

  Serial.begin(115200);
  // Wait for serial port to connect. Needed for native Adalogger
  // RP2040 USB port only.
  while(!Serial)
    {
    delay(10);
    }  
  Serial.print("PS sizeof(SETUP_STRUCT_TYPE):                     "); Serial.println(sizeof(SETUP_STRUCT_TYPE));
  Serial.print("PS sizeof(dyno_setup):                            "); Serial.println(sizeof(dyno_setup));
  Serial.print("PS sizeof(dyno_setup.RPM_Pulses_Per_Revolution):  "); Serial.println(sizeof(dyno_setup.RPM_Pulses_Per_Revolution));
  Serial.print("PS sizeof(dyno_setup.Load_Cell_Tare):             "); Serial.println(sizeof(dyno_setup.Load_Cell_Tare));
  Serial.print("PS sizeof(dyno_setup.Load_Cell_Calibrate):        "); Serial.println(sizeof(dyno_setup.Load_Cell_Calibrate));
  Serial.print("PS sizeof(dyno_setup.Lever_Length_Feet):          "); Serial.println(sizeof(dyno_setup.Lever_Length_Feet));
  Serial.print("PS sizeof(dyno_setup.Efficiency_Factor):          "); Serial.println(sizeof(dyno_setup.Efficiency_Factor));
  Serial.print("PS sizeof(dyno_setup.CRC):                        "); Serial.println(sizeof(dyno_setup.CRC));
  Serial.print("PS addr of &dyno_setup:                           "); Serial.println((uint32_t)&dyno_setup, HEX);
  Serial.print("PS addr of &dyno_setup.RPM_Pulses_Per_Revolution: "); Serial.println((uint32_t)&dyno_setup.RPM_Pulses_Per_Revolution, HEX);
  Serial.print("PS addr of &dyno_setup.Load_Cell_Tare:            "); Serial.println((uint32_t)&dyno_setup.Load_Cell_Tare, HEX);
  Serial.print("PS addr of &dyno_setup.Load_Cell_Calibrate:       "); Serial.println((uint32_t)&dyno_setup.Load_Cell_Calibrate, HEX);
  Serial.print("PS addr of &dyno_setup.Lever_Length_Feet:         "); Serial.println((uint32_t)&dyno_setup.Lever_Length_Feet, HEX);
  Serial.print("PS addr of &dyno_setup.Efficiency_Factor:         "); Serial.println((uint32_t)&dyno_setup.Efficiency_Factor, HEX);
  Serial.print("PS addr of &dyno_setup.CRC:                       "); Serial.println((uint32_t)&dyno_setup.CRC, HEX);
}

void loop()
{
}

And the result, with the structure showing 36 bytes as expected.

PS sizeof(SETUP_STRUCT_TYPE):                     36
PS sizeof(dyno_setup):                            36
PS sizeof(dyno_setup.RPM_Pulses_Per_Revolution):  4
PS sizeof(dyno_setup.Load_Cell_Tare):             4
PS sizeof(dyno_setup.Load_Cell_Calibrate):        8
PS sizeof(dyno_setup.Lever_Length_Feet):          8
PS sizeof(dyno_setup.Efficiency_Factor):          8
PS sizeof(dyno_setup.CRC):                        4
PS addr of &dyno_setup:                           20000B98
PS addr of &dyno_setup.RPM_Pulses_Per_Revolution: 20000B98
PS addr of &dyno_setup.Load_Cell_Tare:            20000B9C
PS addr of &dyno_setup.Load_Cell_Calibrate:       20000BA0
PS addr of &dyno_setup.Lever_Length_Feet:         20000BA8
PS addr of &dyno_setup.Efficiency_Factor:         20000BB0
PS addr of &dyno_setup.CRC:                       20000BB8

Interestingly, every single element of the struct is at the exactly the same address. So while attribute((packed)) fixes it, it absolutely does not make sense to me.

It must be that the rp2040 has some 8-byte alignment in some situations for some reason,

It has to do with being able to efficiently access each member, if multiple instances are contiguous, like in an array. This compiles on ESP32

struct {
  uint8_t one;
} one;
static_assert(sizeof(one) == 1);
struct {
  uint16_t two;
} two;
static_assert(sizeof(two) == 2);
struct {
  uint16_t two;
  uint8_t one;
} three;
static_assert(sizeof(three) == 4);
struct {
  uint64_t eight;
  uint8_t one;
} nine;
static_assert(sizeof(nine) == 16);
struct {
  uint32_t first;
  uint32_t second;
  uint32_t third;
} twelve;
static_assert(sizeof(twelve) == 12);

void setup() {}

void loop() {}

BTW, you don't have to #include <Arduino.h> in an .ino -- the builder tacks that on at the top for you.

1 Like

You are going the same place I was . . . I did some more digging.

It is because I am using doubles (8-bytes) in the structure, that the compiler enforces the 8-byte alignment.

If I replace the doubles with two floats, then the 8-byte alignment is relaxed and sizeof() returns 36 as expected.

That still does not excuse sizeof()'s behavior IMO - but it is not like sizeof() historically has exemplary comportment.

So I guess the general rule for the rp2040is that the struct will be aligned on the size of its largest element, up to 8 ?

I suspect that it is due to the compiler supporting ARM processors in general. Some processors have hardware double support and 64 bit data paths that require 8 byte alignment. This requirement is imposed on the RP2040 even tho it only has a 32 bit data path.

sizeof is telling you the effective size of the struct; what it is actually using. This is so you can write code that is not as worried about these kinds of details, and it all adds up. Bytes "wasted" on padding and alignment is secondary.

For example, if you were going to copy ten of them, it would naturally be

memcpy(dst, src, 10 * sizeof(whatever));

and that works regardless of whether the struct is packed.

Is eight the "largest-smallest" size? Also, consider nested structs when applicable. On ESP32:

struct {
  __int128 sixteen;
  uint8_t one;
} seventeen;
static_assert(sizeof(seventeen) == 32);

struct Twelve {
  uint64_t eight;
  uint32_t four;
};
static_assert(sizeof(Twelve) == 16);
struct Fortyish {
  Twelve first;
  Twelve second;
  Twelve third;
};
static_assert(sizeof(Fortyish) == 48);
static_assert(offsetof(Fortyish, third) == 32);

void setup() {}

void loop() {}

… tells the truth.

Good thing it does!

a7

It would be useful to have a way to find the data in the struct that is actually used (and would be important to include in the CRC), and the amount of data up to but not including the CRC.

I have this and it does work:

#define SETUP_BYTES_TO_CRC  (sizeof(uint32_t)+sizeof(int32_t)+sizeof(double)+sizeof(double)+sizeof(double))
#define SETUP_BYTES_TO_SAVE (sizeof(uint32_t)+sizeof(int32_t)+sizeof(double)+sizeof(double)+sizeof(double)+sizeof(uint32_t))

but not even close to elegant.

Peace,

I am curious only, no need to get in the crib with me for this, about using a CRC in this context.

a7

The setup can be stored in EEPROM (emulated in flash for the rp2040) by the user.

There is a (very) slight chance that a write could go bad, so I write the CRC with the structure and check it before recalling.

The same structure will also be able to be stored to and recalled from the SD card.

The calibration and zeroing of the load cell might be fussy (disassembling things, using known weights etc) so I do not want to make it easy to lose the data.

As shown in my first reply, use offsetof

static_assert(offsetof(SETUP_STRUCT_TYPE, CRC) == 32);

For a CRC though, not sure if any padding before that CRC would be zeroed if it's a local stack variable (probably not). So you'd need to be careful about intra-struct padding.

OTOH, it is just padding, so the CRC could be "right" to check that useless random padding is consistent.

1 Like

Thanks for your help letting me understand what is going on -- I really appreciate it.

I was not aware of offsetof() or static_assert(). I am now those will come in handy. Thanks.

I checked the addresses, so I knew the elements are contiguous -- in my particular case.

To make absolutely robust code, I guess you would have to use offsetof() and sizeof() on each element of the structure and then only use those bytes for the CRC and only write those bytes to the EEPROM etc, That seems very awkward -- at that point you might as well not use a struct at all.

In any case I have my code running.

I just was really not expecting an alignment of more than 4 bytes on a 32-bit processor. Learning at the school of hard knocks -- tuition: one frustrated and lost morning.

Vanishingly small I would say.

Why bother? Just write / read a ((packed)) struct (with CRC field) to / from flash and move on.

2 Likes

THX, IC.

In a not-packed struct, there is no guarantee for what the space consists in data-wise.

But who cares? Just CRC the entire thing, that will still check the integrity of the store-load trip, no matter a few unknown bytes go along for the ride.

Smaller is better I guess for fewer errors to be possible.

Same as

a7

1 Like

Written embedded code for years. Some in products that have shipped more than 1M units. If I can see a path to failure I will trap it.

Different things are important to different people.

My "why bother" comment was referring to going to the trouble of using offsetof() and sizeof() to pick out each field individually for computing the CRC rather than using a ((packed)) struct allowing you to know where all the bytes are. With this:

struct __attribute__((packed)) SETUP_STRUCT_TYPE  {
  uint32_t RPM_Pulses_Per_Revolution;
  int32_t  Load_Cell_Tare;
  double   Load_Cell_Calibrate;
  double   Lever_Length_Feet;
  double   Efficiency_Factor;
  uint32_t CRC;
};

... you know to run the CRC over the entire sizeof(SETUP_STRUCT_TYPE) bytes except for the last four. Then shove that value into the CRC field, write to Flash, and move on.

And, of course we don't need to use typedef for structs in C++.

1 Like

With a non-packed struct like in @Baxsie's first post, you can't say for sure which bytes would contain the CRC field. It might be the last four, it might be the four previous ones. So which bytes do you run the CRC over since it shouldn't include the CRC field itself?

1 Like

IDK, up to offsetof the CRC?

a7

1 Like

using packed is quite common in embedded code, and a much better solution than offsetof/etc...

I found this: ( https://developer.arm.com/documentation/dui0491/i/C-and-C---Implementation-Details/Structures--unions--enumerations--and-bitfields )

Structure alignment

The alignment of a nonpacked structure is the maximum alignment required by any of its fields.

So I'd guess that since you use 8byte doubles, the struct is getting aligned to 8byte boundaries. I'm not sure exactly why. IIRC, ARM does have some fussiness about alignment of stack frames, but I thought that was mainly an issue with interrupts...