unsigned long to 4 bytes conversion fails after 2^16

The following example code converts a unsigned long into four bytes and back again. The result is sent to the serial monitor at 9600. if the number contained in the unsigned long is less than 2^16 the conversion works...if above it fails. This makes no sense to me since unsigned long is 32 bits...not 16?!?

What am I doing wrong?

Example code below:

unsigned long Works = 32000 ;
unsigned long NotWork = 33000 ;
byte WorksArray[4];
byte NotWorkArray[4];
unsigned long WorksResult;
unsigned long NotWorkResult;
int i;

void setup() {
 // put your setup code here, to run once:

Serial.begin(9600);
while(!Serial);

for ( i = 0; i < 4; i++) {    // Stuff the four byte from unsigned long
 WorksArray[i]    = (uint32_t) Works >> (i*8);
 NotWorkArray[i] = (uint32_t) NotWork >> (i*8);
}

WorksResult = 0;  
NotWorkResult = 0;
for ( i = 0; i < 4; i++) {  // reassemble unsigned int from bytes
 WorksResult    += WorksArray[i] << (i*8);
 NotWorkResult += NotWorkArray[i] << (i*8);
}

Serial.println(WorksResult);
Serial.println(NotWorkResult);
}

void loop() {
 // put your main code here, to run repeatedly:

}

Insatman:
The following example code converts a unsigned long into four bytes and back again. The result is sent to the serial monitor at 9600. if the number contained in the unsigned long is less than 2^16 the conversion works...if above it fails. This makes no sense to me since unsigned long is 32 bits...not 16?!?

What am I doing wrong?

Example code below:

unsigned long Works = 32000 ;
unsigned long NotWork = 33000 ;
byte WorksArray[4];
byte NotWorkArray[4];
unsigned long WorksResult;
unsigned long NotWorkResult;
int i;

void setup() {
// put your setup code here, to run once:

Serial.begin(9600);
while(!Serial);

for ( i = 0; i < 4; i++) { // Stuff the four byte from unsigned long
WorksArray = (uint32_t) Works >> (i8);
NotWorkArray = (uint32_t) NotWork >> (i8);

}
*WorksResult = 0; *
NotWorkResult = 0;
for ( i = 0; i < 4; i++) { // reassemble unsigned int from bytes
WorksResult += WorksArray << (i*8);
NotWorkResult += NotWorkArray << (i*8);
}
Serial.println(WorksResult);
Serial.println(NotWorkResult);
}
void loop() {
* // put your main code here, to run repeatedly:*
}
[/quote]
This seems terribly complicated for something that could be done with far less code.
```
*unsigned long WorksResult;
unsigned long NotWorkResult;
byte *pWorksArray = &WorksResult;
byte *pNotWorkArray = &NotWorkResult;

for (uint8_t nI = 0; nI < 4; nI++)
{
Serial.println(pWorksArray[nI]);
Serial.println(pNotWorkArray [nI]);
}*

*_</em></em> <em><em>_*Alternative you could use a union:*_</em></em> <em><em>_*
union
{
   uint32_t nVal;
   uint8_t arrayBytes[4];

```

Hi,
Welcome to the forum.

Please read the first post in any forum entitled how to use this forum.
http://forum.arduino.cc/index.php/topic,148850.0.html .
Then look down to item #7 about how to post your code.
It will be formatted in a scrolling window that makes it easier to read.

Thanks.. Tom... :slight_smile:

NotWorkResult += NotWorkArray[i] << (i*8);

Will have the right side computed using “int” sized chunks. Cast NotWorkArray to an unsigned long before the shift:

NotWorkResult += ((unsigned long) NotWorkArray[i]) << (i*8);

boylesg:
Alternative you could use a union:

union 

{
   uint32_t nVal;
   uint8_t arrayBytes[4];

No you couldn't. Setting one member of a union and then reading another is undefined behavior. You can only read from the active member.

Pieter

What? That defeats the whole purpose of a union. I’ve never heard that and I’ve used a union many many times to break a value up as a byte array to send over serial or SPI.

"Undefined" is probably the wrong term. I don't have a reference for C++, but from K&R, 2nd Edition, Section 6.8: "..the results are implementation-dependent if something is stored as one type and extracted as another."

gfvalvo:
"Undefined" is probably the wrong term. I don't have a reference for C++, but from K&R, 2nd Edition, Section 6.8: "..the results are implementation-dependent if something is stored as one type and extracted as another."

That I can see because you might have endianess issues or something. No guarantee that the 4 bytes of a long go in any specific order. But I hope it's not truly undefined because that means I need to go fix a LOT of code.

Long and 100% successful experience using unions on a variety of platforms to convert data types suggests that PieterP's admonition can fairly safely be ignored.

cppreference.com explains why:

The union is only as big as necessary to hold its largest data member. The other data members are allocated in the same bytes as part of that largest member. The details of that allocation are implementation-defined, and it's undefined behavior to read from the member of the union that wasn't most recently written. Many compilers implement, as a non-standard language extension, the ability to read inactive members of a union.

jremington:
I don't know where PieterP got that idea

It's the C++ standard:
https://stackoverflow.com/questions/11373203/accessing-inactive-union-member-and-undefined-behavior

But I got it from the C++ Core Guidelines:
https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#Ru-pun

C++ Core Guidelines:
C.183: Don't use a union for type punning
Reason
It is undefined behavior to read a union member with a different type from the one with which it was written. Such punning is invisible, or at least harder to spot than using a named cast. Type punning using a union is a source of errors.
[...]
Note
Unfortunately, unions are commonly used for type punning. We don't consider "sometimes, it works as expected" a strong argument.

The fact that it worked in the cases where you used it is no guarantee that it'll work with different optimization levels, when the Arduino cores update their compilers, or when you use an Arduino core with a different compiler.
The C++ language does not allow it, so no C++ compiler is required to generate sensible code for it.
Of course, some compilers might support it, but that's non-standard.

If you use it for a personal project, for a single board, where you know that it works, it's probably fine.
But on a platform like Arduino, that supports many different processors and compilers, and where you have no control over what compiler is used by different cores, and what compiler options they use, I don't think it's safe to assume that it always works.

Thank you. Why not make a more useful contribution and suggest how to do the conversion in a platform independent manner?

Do you know of any cases where unions do NOT work for conversion?

Insatman:
The following example code converts a unsigned long into four bytes and back again. The result is sent to the serial monitor at 9600. if the number contained in the unsigned long is less than 2^16 the conversion works...if above it fails. This makes no sense to me since unsigned long is 32 bits...not 16?!?

What am I doing wrong?

Be definition, an 'unsigned long' number has this range:
(a) 0 to 4294967295 (decimal)
(b) 0000 0000 0000 0000 0000 0000 0000 0000 to 1111 1111 1111 1111 1111 1111 1111 1111 (binary) (spaces are for clarity)
(c) 00000000 to FFFFFFFF (in hex format)
(d) 0x20 to 1x232-1.

232-1 (4294967295) is less than 232 (4294967296).

So, the conversion will always limit upto 4294967295.

jremington:
Thank you. Why not make a more useful contribution and suggest how to do the conversion in a platform independent manner?

Casting it to a character type is fine:

[color=#00979c]  unsigned[/color] [color=#00979c]long[/color] [color=#000000]value[/color] [color=#434f54]=[/color] [color=#000000]0x30313233[/color][color=#000000];[/color]
  [color=#00979c]unsigned[/color] [color=#00979c]char[/color] [color=#000000]bytes[/color][color=#000000][[/color][color=#00979c]sizeof[/color][color=#000000]([/color][color=#000000]value[/color][color=#000000])[/color][color=#000000]][/color][color=#000000];[/color]
  [color=#d35400]memcpy[/color][color=#000000]([/color][color=#000000]bytes[/color][color=#434f54],[/color] [color=#00979c]reinterpret_cast[/color][color=#434f54]<[/color][color=#00979c]const unsigned[/color] [color=#00979c]char[/color] [color=#434f54]*[/color][color=#434f54]>[/color][color=#000000]([/color][color=#434f54]&[/color][color=#000000]value[/color][color=#000000])[/color][color=#434f54],[/color] [color=#00979c]sizeof[/color][color=#000000]([/color][color=#000000]value[/color][color=#000000])[/color][color=#000000])[/color][color=#000000];[/color]
  [color=#5e6d03]for[/color] [color=#000000]([/color][color=#00979c]unsigned[/color] [color=#00979c]char[/color] [color=#000000]u[/color] [color=#434f54]:[/color] [color=#000000]bytes[/color][color=#000000])[/color]
    [b][color=#d35400]Serial[/color][/b][color=#434f54].[/color][color=#d35400]print[/color][color=#000000]([/color][color=#000000]u[/color][color=#434f54],[/color] [color=#00979c]HEX[/color][color=#000000])[/color][color=#434f54],[/color] [b][color=#d35400]Serial[/color][/b][color=#434f54].[/color][color=#d35400]print[/color][color=#000000]([/color][color=#00979c]' '[/color][color=#000000])[/color][color=#000000];[/color]
  [color=#00979c]unsigned[/color] [color=#00979c]long[/color] [color=#000000]result[/color][color=#000000];[/color]
  [color=#d35400]memcpy[/color][color=#000000]([/color][color=#00979c]reinterpret_cast[/color][color=#434f54]<[/color][color=#00979c]unsigned[/color] [color=#00979c]char[/color] [color=#434f54]*[/color][color=#434f54]>[/color][color=#000000]([/color][color=#434f54]&[/color][color=#000000]result[/color][color=#000000])[/color][color=#434f54],[/color] [color=#000000]bytes[/color][color=#434f54],[/color] [color=#00979c]sizeof[/color][color=#000000]([/color][color=#000000]result[/color][color=#000000])[/color][color=#000000])[/color][color=#000000];[/color]

  [b][color=#d35400]Serial[/color][/b][color=#434f54].[/color][color=#d35400]println[/color][color=#000000]([/color][color=#000000]value[/color] [color=#434f54]==[/color] [color=#000000]result[/color] [color=#434f54]?[/color] [color=#005c5f]"Success!"[/color] [color=#434f54]:[/color] [color=#005c5f]"Fail"[/color][color=#000000])[/color][color=#000000];[/color]

If you're sending the value over a network, things get a little more complicated, because then you have to start worrying about Endianness.
Bit shifting works fine, as demonstrated in other replies.
You can also use the htonl and ntohl functions (host byte order to network byte order of longs, and network to host byte order), especially if you're communicating with a computer.

[color=#434f54]  // Sending[/color]
  [color=#00979c]uint32_t[/color] [color=#000000]numberToSendHost[/color] [color=#434f54]=[/color] [color=#000000]0x30313233[/color][color=#000000];[/color]
  [color=#00979c]uint32_t[/color] [color=#000000]numberToSendNetwork[/color] [color=#434f54]=[/color] [color=#000000]htonl[/color][color=#000000]([/color][color=#000000]numberToSendHost[/color][color=#000000])[/color][color=#000000];[/color]
  [color=#000000]static_assert[/color][color=#000000]([/color][color=#00979c]sizeof[/color][color=#000000]([/color][color=#00979c]uint32_t[/color][color=#000000])[/color] [color=#434f54]==[/color] [color=#000000]4[/color][color=#434f54],[/color] [color=#005c5f]"Expected uint32_t to be 4 bytes wide"[/color][color=#000000])[/color][color=#000000];[/color]
  [b][color=#d35400]Serial[/color][/b][color=#434f54].[/color][color=#d35400]write[/color][color=#000000]([/color][color=#00979c]reinterpret_cast[/color][color=#434f54]<[/color][color=#00979c]unsigned[/color] [color=#00979c]char[/color][color=#434f54]*[/color][color=#434f54]>[/color][color=#000000]([/color][color=#434f54]&[/color][color=#000000]numberToSendNetwork[/color][color=#000000])[/color][color=#434f54],[/color] [color=#00979c]sizeof[/color][color=#000000]([/color][color=#00979c]uint32_t[/color][color=#000000])[/color][color=#000000])[/color][color=#000000];[/color]

  [color=#434f54]// Receiving[/color]
  [color=#00979c]unsigned[/color] [color=#00979c]char[/color] [color=#000000]bytes[/color][color=#000000][[/color][color=#00979c]sizeof[/color][color=#000000]([/color][color=#00979c]uint32_t[/color][color=#000000])[/color][color=#000000]][/color][color=#000000];[/color]
  [color=#5e6d03]while[/color] [color=#000000]([/color][b][color=#d35400]Serial[/color][/b][color=#434f54].[/color][color=#d35400]available[/color][color=#000000]([/color][color=#000000])[/color] [color=#434f54]<[/color] [color=#00979c]sizeof[/color][color=#000000]([/color][color=#00979c]uint32_t[/color][color=#000000])[/color][color=#000000])[/color][color=#000000];[/color] [color=#434f54]// Add framing checks[/color]
  [color=#5e6d03]for[/color] [color=#000000]([/color][color=#00979c]unsigned char[/color] [color=#000000]&b : bytes[/color][color=#000000])[/color]
    [color=#000000]b =[/color] [b][color=#d35400]Serial[/color][/b][color=#434f54].[/color][color=#d35400]read[/color][color=#000000]([/color][color=#000000])[/color][color=#000000];[/color]
  [color=#00979c]uint32_t[/color] [color=#000000]numberReceivedNetwork[/color][color=#000000];[/color]
  [color=#000000]static_assert[/color][color=#000000]([/color][color=#00979c]sizeof[/color][color=#000000]([/color][color=#00979c]uint32_t[/color][color=#000000])[/color] [color=#434f54]==[/color] [color=#000000]4[/color][color=#434f54],[/color] [color=#005c5f]"Expected uint32_t to be 4 bytes wide"[/color][color=#000000])[/color][color=#000000];[/color]
  [color=#d35400]memcpy[/color][color=#000000]([/color][color=#434f54]&[/color][color=#000000]numberReceivedNetwork[/color][color=#434f54],[/color] [color=#000000]bytes[/color][color=#434f54],[/color] [color=#00979c]sizeof[/color][color=#000000]([/color][color=#00979c]uint32_t[/color][color=#000000])[/color][color=#000000])[/color][color=#000000];[/color]
  [color=#00979c]uint32_t[/color] [color=#000000]numberReceivedHost[/color] [color=#434f54]=[/color] [color=#000000]ntohl[/color][color=#000000]([/color][color=#000000]numberReceivedNetwork[/color][color=#000000])[/color][color=#000000];[/color]
  
  [b][color=#d35400]Serial[/color][/b][color=#434f54].[/color][color=#d35400]println[/color][color=#000000]([/color][color=#000000]numberReceivedHost[/color] [color=#434f54]==[/color] [color=#000000]numberToSendHost[/color] [color=#434f54]?[/color] [color=#005c5f]"Success!"[/color] [color=#434f54]:[/color] [color=#005c5f]"Fail"[/color][color=#000000])[/color][color=#000000];[/color]

Arduino doesn't define htonl etc., but you can steal the gLibC implementation:

uint32_t htonl (uint32_t x) {
#if BYTE_ORDER == BIG_ENDIAN
  return x;
#elif BYTE_ORDER == LITTLE_ENDIAN
  return __builtin_bswap32 (x);
#else
#error "What kind of system is this?"
#endif
}

uint32_t ntohl(uint32_t x) {
  return htonl(x);
}

jremington:
Do you know of any cases where unions do NOT work for conversion?

No, I don't.

Now I see why many, if not most compilers have implemented "as a non-standard language extension, the ability to read inactive members of a union" and will continue to take advantage of its convenience.

But I will keep the warning in mind.

Just as an additional aside, what is the usefulness of a union at all if I can't use it for type punning?

Mods: Do you think the union discussion should split off to a new thread? I feel bad hijacking the OP.

Delta_G:
Just as an additional aside, what is the usefulness of a union at all if I can't use it for type punning?

The Core Guidelines speak of saving memory and creating tagged unions, as an algebraic sum type.

Saving memory is rarely done, as far as I know, especially now that RAM is getting so cheap, and C++ has std::variant as a sum type, which is much easier and safer than tagged unions.

I guess it's a remnant of C, where using unions for type punning is allowed by the standard.

PieterP,

I got started on the ESP8266 thanks to your excellent tutorial, thank you. Reading that a union should not be used to split variables up into bytes came as a bit of a shock, but as you said it I took notice, having already had the benefit of your knowledge. I learned C on PICs and learning to use a union like that was one of my early lessons. I am a bit surprised it's not supposed to work in C++! :confused: . This does leave me with the question of why it might not work, which I would like you or someone to explain, if you know. I see it like this: All the members of a union share the same physical bytes in the memory they occupy. You write to one member, the union is still in scope, you read a different member. Nothing else has written to any member of the union since the immediately preceding write, the original value you just wrote will still be there, unchanged. Reading individual bytes should return what is expected, I don't see how it could not. There is the issue of endianness, and I have made that mistake myself, but that's my mistake, not a problem with the compiler. I just cannot imagine any mechanism for it not working. Do you know?

PerryBebbington:
This does leave me with the question of why it might not work

It might not work because the C++ standard doesn't say that it should work.

The reason is probably to allow for optimizations.

You're right that they occupy the same memory, but many CPUs don't operate on RAM, they have to load it into registers first, and when they're done, they have to write it back. If you have multiple variables pointing to the same memory, it's hard for the compiler to know when to write back or reload the registers to or from RAM, and going to RAM is of course much slower than just using registers or even pipelining all operations.
That's demonstrated in this stack overflow answer: c++ - What is the strict aliasing rule? - Stack Overflow

But why did C allow aliasing through unions, and C++ chose not to? I don't know.

PieterP:
I guess it's a remnant of C, where using unions for type punning is allowed by the standard.

extern C here we come.

PieterP:
But why did C allow aliasing through unions, and C++ chose not to? I don't know.

It never did.

K&R First Edition:

C11 standard
Annex J.1 Unspecified behavior

The following are unspecified:
...
— The values of bytes that correspond to union members other than the one last stored into (6.2.6.1).