Once again I have seen posts promoting use of __attribute__((packed)) on a struct. I can't stand it anymore, so I have written this rambling post on why I think that is a bad solution.
The packed attribute is commonly used when a poorly designed structure is shared between processors with different bus widths. Lets see how this could happen.
Say you are building a small system where you have a couple of UNOs talking to a MEGA 2560. You build a struct to hold the data something like this. You arranged the data in order you thought of it and the order it is used in your code.
struct StructData {
byte NodeID; // ID of transmitting node
byte seq; // sequence number
int32_t someData;
};
Things have been working fine for a year and now you decide to upgrade the MEGA to an ESP32 to add additional functions. You immediately start getting errors on the data field. You post the problem on the internet and get the response (from ChatGPT??) to sprinkle the magic words _attribute__((packed)) on your structure and things will be fine. You do that and things are working. Problem solved.
struct __attribute__((packed)) StructData {
byte NodeID; // ID of transmitting node
byte seq; // sequence number
int32_t someData;
};
Now lets see what actually happened. We write a little test program to see what the different processors do.
struct StructData {
byte NodeID; // ID of transmitting node
byte seq; // sequence number
int32_t someData;
};
StructData struct1;
void changeStruct(int32_t x){
struct1.someData += x;
}
void setup() {
Serial.begin(115200);
Serial.print("size of StructData ");
Serial.println(sizeof(StructData));
Serial.print("offset to StructData.someData ");
Serial.println(offsetof(StructData ,someData));
changeStruct(100);
// do something with data so it doesn't get optimized out
Serial.println( struct1.someData);
}
void loop() {
}
On the UNO we get
size of StructData 6
offset to StructData.someData 2
100
On the ESP32 we get
size of StructData 8
offset to StructData.someData 4
100
Hmm, it seems that with structures what you see may not be what you get.The UNO and MEGA2560 are 8 bit processors so data can be byte aligned.
The ESP32 is a 32 bit processor and requires 4 byte alignment for 32 bit data so the compiler added a hidden 2 byte pad to get the correct alignment. Something like this.
struct __attribute__((packed)) StructData {
byte NodeID; // ID of transmitting node
byte seq; // sequence number
char hiddenPad[2]; // added bytes for alignment (not actually named)
int32_t someData;
};
Sprinkling the magic __attribute__((packed)) gives us the following. Where the pad bytes are removed from the structure.
ESP32 with __attribute__((packed))
size of StructData 6
offset to StructData.someData 2
100
Both processors now think the data is in the same place so you get the expected result.
A better solution would be to rearrange the struct so that the data has the correct alignment. Something like this where the alignment is the same on both processors.
struct StructData {
int32_t someData;
byte NodeID; // ID of transmitting node
byte seq; // sequence number
};
But how did the ESP32 suddenly handle unaligned data? The magic words didn't suddenly change the processor. Well lets take a look at what actually happens. Generate another test program with 2 structs where the data is correctly aligned and add __attribute__((packed)) to one of them and see what code is generated.
struct StructData {
int32_t someData;
byte NodeID; // ID of transmitting node
byte seq; // sequence number
};
struct __attribute__((packed)) StructDataPack {
int32_t someData;
byte NodeID; // ID of transmitting node
byte seq; // sequence number
};
StructData struct1;
StructDataPack structP;
// a function to reference the data
void changeStruct(int32_t x){
structP.someData += x;
struct1.someData += x;
}
void setup() {
Serial.begin(115200);
changeStruct(100);
// do something with data so it doesn't get optimized out
Serial.println( structP.someData);
Serial.println( struct1.someData);
}
void loop() {
}
Some of the generated code where the fields are used.
void changeStruct(int32_t x){
0: 004136 entry a1, 32
structP.someData += x;
3: 000091 l32r a9, fffc0004 <_Z12changeStructi+0xfffc0004>
6: 0109a2 l8ui a10, a9, 1
9: 000982 l8ui a8, a9, 0
c: 11aa80 slli a10, a10, 8
f: 20aa80 or a10, a10, a8
12: 020982 l8ui a8, a9, 2
15: 118800 slli a8, a8, 16
18: 20a8a0 or a10, a8, a10
1b: 030982 l8ui a8, a9, 3
1e: 018880 slli a8, a8, 24
21: 2088a0 or a8, a8, a10
24: 882a add.n a8, a8, a2
26: 74a880 extui a10, a8, 8, 8
29: 004982 s8i a8, a9, 0
2c: 0149a2 s8i a10, a9, 1
2f: 75a080 extui a10, a8, 16, 8
32: 758880 extui a8, a8, 24, 8
35: 0249a2 s8i a10, a9, 2
38: 034982 s8i a8, a9, 3
struct1.someData += x;
3b: 000091 l32r a9, fffc003c <_Z12changeStructi+0xfffc003c>
3e: 0988 l32i.n a8, a9, 0
40: 882a add.n a8, a8, a2
42: 0989 s32i.n a8, a9, 0
}
44: f01d retw.n
The compiler writers take the __attribute__((packed)) to mean "I screwed up, the data may not be aligned please fix it".
The fix for data that may not be aligned is to generates code to do the unaligned 32 bit load and store a byte at a time. It is not so magic. The packed case take almost 5 times as many instructions as the unpacked case.
In addition the packed case has only 1 short form instruction where in the unpacked case 3 of the 4 instructions are short form.
So __attribute__((packed)) leads to poor design, code bloat and potential performance problems.
By the way if you are using an ARM M0 or M0+ (Pi pico and some NANOs for example) the same thing applies.
void changeStruct(int32_t x){
structP.someData += x;
0: 4a0b ldr r2, [pc, #44] @ (30 <_Z12changeStructl+0x30>)
2: 7853 ldrb r3, [r2, #1]
4: 7811 ldrb r1, [r2, #0]
6: 021b lsls r3, r3, #8
8: 430b orrs r3, r1
a: 7891 ldrb r1, [r2, #2]
c: 0409 lsls r1, r1, #16
e: 4319 orrs r1, r3
10: 78d3 ldrb r3, [r2, #3]
12: 061b lsls r3, r3, #24
14: 430b orrs r3, r1
16: 181b adds r3, r3, r0
18: 0a19 lsrs r1, r3, #8
1a: 7013 strb r3, [r2, #0]
1c: 7051 strb r1, [r2, #1]
1e: 0c19 lsrs r1, r3, #16
20: 0e1b lsrs r3, r3, #24
22: 7091 strb r1, [r2, #2]
24: 70d3 strb r3, [r2, #3]
struct1.someData += x;
26: 4a03 ldr r2, [pc, #12] @ (34 <_Z12changeStructl+0x34>)
28: 6813 ldr r3, [r2, #0]
2a: 181b adds r3, r3, r0
2c: 6013 str r3, [r2, #0]
}
2e: 4770 bx lr