turn array of HEX into one number

hey there
I have a simple question here:

assume we have an array of HEX, say A=[0x11,0x22,0x33]

how can I turn it to a single integer number as n=112233

any help?

Your question is weird: you have 0x11,0x22,0x33 and want 112233. is that in Hex or decimal?

Assuming you want 0x112233.

When storing data in memory, You need to know if you have a little endian or big endian representation and how many bytes are to be used. Here with 3 bytes it’s unclear what you have… it’s often either 2 or 4 or 8 bytes

In little endian representation, the LSB is stored first. so 0x112233 does not fit on 2 bytes, so would be a variable on 4 bytes (an uint32_t ) and thus with 00 in front → so 0x00112233 and stored in memory as 0x33, 0x22, 0x11, 0x00.

if it’s a little endian representation and you have the right number of bytes (say 4), then it’s pretty easy it’s just casting what you read from the pointer which is defined by the array name.

if it’s a big endian representation then you need to rearrange the bytes and set them where they belong. this is done by shifting the bits left into position. The first element of the array has to go in the MSB, and the fourth element of the array has to go in the LSB.

Try this code:

uint8_t anArray[] = {0x33, 0x22, 0x11, 0x00}; 

void setup() {
  Serial.begin(115200);

  // LITTLE ENDIAN REPRESENTATION --> 0x112233
  uint32_t v1 = *((uint32_t *) anArray);
  Serial.print(F("LITTLE ENDIAN: Your value is 0x"));
  Serial.println(v1, HEX);

  // BIG ENDIAN REPRESENTATION --> 0x33221100
  uint32_t v2 = ((uint32_t) anArray[0]) << 24; // put the MSB in place
  v2 |= ((uint32_t) anArray[1]) << 16; // add next byte
  v2 |= ((uint32_t) anArray[2]) << 8; // add next byte
  v2 |= ((uint32_t) anArray[3]); // add LSB
  Serial.print(F("AS BIG ENDIAN: Your value is 0x"));
  Serial.println(v2, HEX);
}

void loop() {}

Serial Monitor (@ 115200 bauds) will show

[color=purple]
LITTLE ENDIAN: Your value is 0x112233
AS BIG ENDIAN: Your value is 0x33221100
[/color]
#include <stdio.h>
#include <stdint.h>

int
main ()
{
    uint8_t   a [] = { 0x11, 0x22, 0x33, 0 };
    uint32_t *b    = (uint32_t*) a;

    printf (" %x  %d\n", *(uint32_t*) a, *(uint32_t*) a);
    printf (" %x  %d\n", *b, *b);
}

output

332211  3351057
 332211  3351057

Guys your help and explanation and codes are amazing and I really appreciate it! thanksssss as alwaysssssss!!!!!!!

J-M-L:

uint8_t anArray[] = {0x33, 0x22, 0x11, 0x00}; 

uint32_t v1 = *((uint32_t *) anArray);

gcjr:

uint8_t   a [] = { 0x11, 0x22, 0x33, 0 };

uint32_t b    = (uint32_t) a;

This invokes undefined behavior. The variables anArray and a do not point to variables of type uint32_t, so you cannot convert it to a pointer to uint32_t.

More formally, from https://en.cppreference.com/w/cpp/language/reinterpret_cast:

reinterpret_cast
Any object pointer type T1* can be converted to another object pointer type cv T2*. This is exactly equivalent to static_cast<cv T2*>(static_cast<cv void*>(expression)) (which implies that if T2's alignment requirement is not stricter than T1's, the value of the pointer does not change and conversion of the resulting pointer back to its original type yields the original value). In any case, the resulting pointer may only be dereferenced safely if allowed by the type aliasing rules (see below).
[…]
Type aliasing
Whenever an attempt is made to read or modify the stored value of an object of type DynamicType through a glvalue of type AliasedType, the behavior is undefined unless one of the following is true:

  • AliasedType and DynamicType are similar.
  • AliasedType is the (possibly cv-qualified) signed or unsigned variant of DynamicType.
  • AliasedType is std::byte, char, or unsigned char: this permits examination of the object representation of any object as an array of bytes.

uint8_t and uint32_t are not similar types, so dereferencing the result of the reinterpret_cast is not allowed.

It might work just fine for a simple isolated example like this, but in a larger code base where you might move some of these casts to separate functions, you will get very hard to debug errors that occur only on some platforms with some optimization settings, and depending on optimization opportunities surrounding the problematic casts, e.g. adding or removing print statements could change the result of calculated with reinterpret_casted data entirely.
This is not a theoretical problem, I’m speaking from experience here.


If you are absolutely certain that you want the bit representation of one variable to be copied exactly to a second variable, the only generally valid way to do that is by using memcpy:

uint8_t anArray[] = {0x33, 0x22, 0x11, 0x00}; 
uint32_t v1;
static_assert(sizeof(anArray) == sizeof(v1), "Error: different sizes");
memcpy(&v1, anArray, sizeof(v1));

In C++20 you can use std::bit_cast to the same thing more elegantly, but it’s just syntactic sugar around memcpy with some compile-time checks.

That being said, for integers the most portable way that handles the Endianness of the platform correctly (and in my opinion the best approach) has already been posted by J-M-L, using bit shifts:

uint32_t v2 = uint32_t(anArray[0]);       // put the LSB in place
v2         |= uint32_t(anArray[1]) << 8;  // add next byte
v2         |= uint32_t(anArray[2]) << 16; // add next byte
v2         |= uint32_t(anArray[3]) << 24; // add MSB

Pieter

PieterP:
This invokes undefined behavior. The variables anArray and a do not point to variables of type uint32_t, so you cannot convert it to a pointer to uint32_t.

yes you can. i demonstrated that the code compiles and runs as expected. that's the purpose of the cast.

yes C provides the rope to hang yourself. you need to understand what you're doing

@PieterP

I respectfully disagree, it’s definitely not undefined behavior.

reinterpret_cast converts any pointer type to any other pointer type, even of unrelated classes. The operation result is a simple binary copy of the value from one pointer to the other. All pointer conversions are allowed: neither the content pointed nor the pointer type itself is checked.

C and C++ defines an array as a contiguous collection of homogeneous elements that can be accessed using an index. Contiguous => the elements of the array are adjacent to one another in memory with no gaps between them

C/C++ also allow for the name of the array to be seen as a pointer to the first byte of such memory location.

Nothing prevents you in C or C++ to decide what the memory holds, the language is build for this and gives you all access to bare metal.

So if I want to say that my pointer now is actually pointing to uint32_t, it’s my right and if I dereference that pointer and read the data then I get what I told the compiler to do. Because the spec guarantees all my bytes were in contiguous memory location, that will always work.

I don’t know why I would “get very hard to debug errors that occur only on some platforms with some optimization settings” as long as you have a compiler follows the norm.

Might you be confusing accessing data within a union?

Could you share an example of a such bogus byte array conversion to a larger integral type ?

gcjr:
yes you can. i demonstrated that the code compiles and runs as expected. that’s the purpose of the cast.

It might work in this simple example with your specific compiler and settings. That’s the point of undefined behavior, it might well work in some cases, but it might just as well break in others.

You’re not programming in assembly, but in a higher-level language. C and C++ both have official standards that define the behavior of the “abstract machine”, and compilers use this standard to make certain assumptions about your code, assumptions that allow for optimizations. If you don’t follow the standard, these assumptions no longer hold, and the optimizer might spit out garbage. I don’t think many people fully realize and appreciate how aggressively the optimizers transform your code these days, this is only possible because of well-defined assumptions.

With conversions like the one you posted, you often get lucky when the compiler just reinterprets a value in memory or in a register, but once you start getting reordering because of optimizations, it no longer works.

It is hard to reproduce such behavior using a trivial example because it’s too simple to allow for the complex optimizations that might break it.

gcjr:
yes C provides the rope to hang yourself. you need to understand what you’re doing

Yes, and invoking undefined behavior by dereferencing pointers of the wrong type definitely counts as hanging yourself.

Arduino uses C++, not C, and the general idea of “modern” C++ is to minimize ways to shoot yourself in the foot.
It’s of course still possible to write dangerous code to show off how much you know about memory layout etc. but the truth is that it is in violation of the C++ standard and doesn’t have any benefits over standard-compliant code whatsoever.

In this case, the standard-compliant way to interpret an array of bytes as a 4-byte integer is to use either memcpy or std::bit_cast.
reinterpret_cast may work in 90% of the cases, but it’s wrong and offers no benefits over the alternatives that work in 100% of cases, so why would you use it or teach it to others?

J-M-L:
C and C++ defines an array as a contiguous collection of homogeneous elements that can be accessed using an index. Contiguous => the elements of the array are adjacent to one another in memory with no gaps between them

C/C++ also allow for the name of the array to be seen as a pointer to the first byte of such memory location.

I agree with that.

J-M-L:
Nothing prevents you in C or C++ to decide what the memory holds, the language is build for this and gives you all access to bare metal.

But that’s not true. The C++ standard is written in terms of values and objects with storage and lifetime, not bits in memory. You can in general only read a value if that value is within its lifetime.

In your example, there is no uint32_t object at the location that the pointer anArray points to, so you’re not allowed to read it.

Luckily, C++ does allow you to inspect the bit pattern in memory for a given object, but only through pointers of type char, unsigned char and std::byte. Similarly, memcpy can copy the bit pattern of one variable to another.

From the memcpy documentation:

Where strict aliasing prohibits examining the same memory as values of two different types, std::memcpy may be used to convert the values.

That’s why the following is allowed:

uint32_t v1; // lifetime of the object of type uint32_t starts here
memcpy(&v1, anArray, sizeof(v1)); // copy the bit pattern

You can now read the value of v1, because it’s an object that actually exists, its lifetime started at the first line of that snippet.

On the other hand *((uint32_t *)anArray) is not an existing object, the only object that exists at the location of anArray is an array
of four uint8_t objects. There is no uint32_t object within its lifetime.

This distinction between objects on the one hand and representation in memory is very important: it allows compilers to optimize code.
Reasoning about objects that are created and destroyed with well-defined lifetimes makes this much, much easier than having to deal with one huge, eternal block of memory.

In assembly, you’re programming an actual machine with actual memory, so you can interpret any memory location however you like as long as you satisfy alignment etc.
In C++, you’re programming an abstract machine that’s defined in terms of objects and lifetime, so if you interpret memory as an object that doesn’t exist to the compiler, the optimizer makes a mess.

So if I want to say that my pointer now is actually pointing to uint32_t, it’s my right and if I dereference that pointer and read the data then I get what I told the compiler to do. Because the spec guarantees all my bytes were in contiguous memory location, that will always work.

No, it is not allowed to dereference such a pointer, this is explained on the page I linked to in my previous reply.

If you want to access the memory, you are allowed to use memcpy or character/byte type pointers, but you cannot cast it to any other types and dereference the result afterwards. It is simply not allowed for the reasons I explained in the previous paragraph.

reinterpret_cast converts any pointer type to any other pointer type, even of unrelated classes. The operation result is a simple binary copy of the value from one pointer to the other. All pointer conversions are allowed: neither the content pointed nor the pointer type itself is checked.

I think you might be confusing performing the actual cast, and dereferencing the result.

Doing the cast itself is fine, dereferencing the result is not.

uint8_t anArray[] = {0x33, 0x22, 0x11, 0x00};
uint32_t *p1 = reinterpret_cast<uint32_t *>(anArray); // valid
Serial.println(*p1);                                  // illegal
uint8_t* p2 = reinterpret_cast<uint8_t *>(p1);        // valid
Serial.println(*p2);                                  // valid

Basically the only thing you’re allowed to do with p1 is cast it back to the original type of anArray, to a type “similar” to the type of anArray, or cast it to a character type for inspection of the bit pattern.

You cannot dereference p1.

@OP

1. We may view your proposition with the help of the following self-explanatory diagram (Fig-1) known as "Memory Mapping" of variables.


Figure-1:

2. We can use the following sketch to transform your 3-byte array of hex numbers into a single variable y which will hold 0x112233. The sketch has been designed based on data manipulation mechanism of Fig-1. (It is tested in Arduino UNO.)

byte A[] = {0x11, 0x22, 0x33};
long int y = 0x00000000;       //to hold 0x112233
void setup() 
{
  Serial.begin(9600);
  //------------------------------------------------------
  long int y2 = (long)A[0]<<16;  //we get: y2 = 0x00110000; shift A[0] around a 32-bit buffer and then put in y2.
  long int y1 = (long)A[1]<<8;  //we get: y1 = 0x00002200
  long int y0 = (long)A[2]<<0;    //we get: y0 = 0x00000033 
  y = y2 + y1 + y0;    //we get: y = 0x00112233
  //------------------------------------------------------
  Serial.print(y, HEX);    //Serial Monitor shows: 112233

}

void loop() 
{
  
}

Feel free to ask for any clarification.

PieterP:
It might work in this simple example with your specific compiler and settings. That's the point of undefined behavior, it might well work in some cases, but it might just as well break in others.

You're not programming in assembly, but in a higher-level language. C and C++ both have official standards that define the behavior of the “abstract machine”, and compilers use this standard to make certain assumptions about your code, assumptions that allow for optimizations.

Arduino uses C++, not C, and the general idea of “modern” C++ is to minimize ways to shoot yourself in the foot.

esoteric

C++ is compatible with C which means the way C manages data applies to C++ when using C constructs. I understand that a String is abstract. a char array is not. And C as about as close to assemble as you can get.

doesn't the ability to increment a pointer imply and rely on a a non-abstract way of managing storage. C is a system language, which makes it suitable for directly accessing machine registers with various implications, why it was suitable for writing an operating system with real-time requirements (predictable within limits)

of course what I did assumes an Indianess. it's not intended to be used in a library that could end up on some other computer architecture.

within the context of the Arduino, staying within C constructs, i don't see that it's a bad practice for this type of programming.

Can you use on an Arduino Uno ? (C++20 isn’t it ?)

That being said, I agree that casting to (uint32_t *) is violation of the Strict Aliasing Rule and that the standard blessed method for type punning in both C and C++ is memcpy.

This is heavy and a leap of faith that our compiler/optimizer will recognize the use of memcpy for type punning and optimize it away to end up doing what the cast is doing in our very simple example with native types in a basic array.

I would disagree with

the only object that exists at the location of anArray is an array of four uint8_t objects. There is no uint32_t object within its lifetime.

since anArray in my expression is the automatically decayed version (no longer an uint8_t[4] but it becomes a uint8_t* )

to OP: go for this

uint32_t v1; // lifetime of the object of type uint32_t starts here
memcpy(&v1, anArray, sizeof(v1)); // copy the bit pattern

and you are indeed on the safe spec side and Strict Aliasing.

Also, the compile optimization certainly takes into account byte boundaries for shifts. So it “knows” that a ‘<<24’ in this context does not require the shifting of an entire 32 bit variable, only a load, logical or, and store of one byte from and to discrete memory locations. Thus this form is not as inefficient as it appears, because it will be implemented as four consecutive byte operations. It might be that memcpy() is more straightforward, hence slightly faster since the logical “or” isn’t required.

uint32_t v2 = uint32_t(anArray[0]);       // put the LSB in place
v2         |= uint32_t(anArray[1]) << 8;  // add next byte
v2         |= uint32_t(anArray[2]) << 16; // add next byte
v2         |= uint32_t(anArray[3]) << 24; // add MSB

maybe it would be less of an issue if a uint32_t were allocated and cast as a byte array

gcjr:
C++ is compatible with C which means the way C manages data applies to C++ when using C constructs. I understand that a String is abstract. a char array is not. And C as about as close to assemble as you can get.

That’s simply not the case. C++ offers some level of backwards compatibility with C, but they are different languages with different rules. For example C allows type punning using unions, C++ does not. Arduino compiles your sketches using a C++ compiler, so the only relevant standard is the C++ in that case. But I won’t dwell on that.

gcjr:
doesn’t the ability to increment a pointer imply and rely on a a non-abstract way of managing storage. C is a system language, which makes it suitable for directly accessing machine registers with various implications, why it was suitable for writing an operating system with real-time requirements (predictable within limits)

Incrementing pointers works because the standard defines how that works. Of course, the standard wasn’t made in some abstract, perfect world, its principles make sense on real-world hardware. But that doesn’t mean that you can just write C like assembly and assume that everything works exactly how you’d expect, you have to abide by the rules of the standard if you want the compiler to produce any meaningful output.

The point here is not what’s intuitive to someone who understands the underlying architecture, it’s about what’s allowed by the language standard.
Dereferencing pointers of the wrong type is simply not allowed in C or C++, so all discussions should either end there, or should be addressed to the standard committees.

J-M-L:
Can you use on an Arduino Uno ? (C++20 isn’t it ?)

Unfortunately not, the Arduino AVR Core uses GCC 7.3 (IIRC), which supports most of C++17, so it’ll take some time before they move to a C++20-capable compiler, and even more time before they move from C++11 to C++20 in their platform.txt files.
But there’s nothing special about bit_cast, it’s just syntactic sugar around memcpy, as you can see here: std::bit_cast - cppreference.com
It even explicitly states:

reinterpret_cast (or equivalent explicit cast) between pointer or reference types shall not be used to reinterpret object representation in most cases because of the type aliasing rule.
Before std::bit_cast, std::memcpy can be used when it is needed to interpret the object representation as one of another type

J-M-L:
This is heavy and a leap of faith that our compiler/optimizer will recognize the use of memcpy for type punning and optimize it away to end up doing what the cast is doing in our very simple example with native types in a basic array.

Luckily, the compiler developers are aware of this intended use of memcpy, if it’s used for type punning, any reasonably modern compiler will never actually call the C library memcpy function.

In fact, you can verify that all three implementations (memcpy, shifts and reinterpret_cast) generate exactly the same code for OP’s use case: Compiler Explorer

J-M-L:
I would disagree with […] since anArray in my expression is the automatically decayed version (no longer an uint8_t[4] but it becomes a uint8_t*)

A variable of array type can be implicitly be converted to a pointer in an expression. The type of the variable itself is still uint8_t[4], it doesn’t depend on how it’s used in an expression, decay to a pointer doesn’t change the underlying type.
That’s what allows you to still use the decayed pointer to access other elements beyond the first.
But let’s not get too pedantic about that :slight_smile:

gcjr:
maybe it would be less of an issue if a uint32_t were allocated and cast as a byte array

Indeed. That’s because casting to an pointer to bytes (or array of bytes) is explicitly allowed by the standard, it’s an exception to the strict aliasing rule: char, unsigned char and std::byte are allowed to alias other types.

aarg:
Also, the compile optimization certainly takes into account byte boundaries for shifts. So it “knows” that a ‘<<24’ in this context does not require the shifting of an entire 32 bit variable, only a load, logical or, and store of one byte from and to discrete memory locations. Thus this form is not as inefficient as it appears, because it will be implemented as four consecutive byte operations. It might be that memcpy() is more straightforward, hence slightly faster since the logical “or” isn’t required.

At the godbolt link above, you can see that both the shift version and memcpy generate exactly the same code at -O2 -Os and -O3.
What surprised me is that the shift version doesn’t really get optimized at -O1, whereas memcpy is optimized just as well as on the higher levels.

decay to a pointer doesn't change the underlying type

the sheer fact that they allow for reading/writing byte by byte but not 2 or 4 bytes in one go shows that it's a very weak concept. The underlying data are just bytes in memory at the end of the day

Dereferencing pointers of the wrong type is simply not allowed in C or C++, so all discussions should either end there, or should be addressed to the standard committees.

Nothing says it's not allowed. It's possible, it's just UB - which does not mean wrong behavior. it will work until it doesn't.
I can see why it helps the compiler in more complicated situation with complex objects and but I'm pretty sure that 10 years from now the example above (simple bytes in an array) it will still work.

memcpy() would not work either if the data was not properly aligned to be read that way, byte by byte (or bus wide).

Anyway - yes there is the norm and then there is what then compiler does.

I'm OK to take chances when it all makes sense; I agree it's better to recommend to newbies the right way of doing things. So that's your point.

J-M-L:
the sheer fact that they allow for reading/writing byte by byte but not 2 or 4 bytes in one go shows that it's a very weak concept. The underlying data are just bytes in memory at the end of the day

Yes, the underlying data are just bytes in memory, but you can't really optimize reads/writes to memory. Optimizing meaningful operations on values/objects with a clear lifetime is much easier.
Being able to access the underlying bytes is of course very useful, which is why this is explicitly allowed in C and C++,but that doesn't mean that you can just read or write to any memory location using other types.

(As a side note, there are situations where using types like char * or uint8_t * ends up being slower because the compiler assumes that it aliases with other variables. Similarly, two pointers of the same type could also alias the same variables. The C language has the restrict keyword to ensure the compiler that a pointer doesn't alias anything else, which again allows for optimization.)

J-M-L:
Nothing says it's not allowed. It's possible, it's just UB - which does not mean wrong behavior. it will work until it doesn't.

https://en.cppreference.com/w/cpp/language/ub

undefined behavior

Renders the entire program meaningless if certain rules of the language are violated.

There are no restrictions on the behavior of the program. Examples of undefined behavior are memory accesses outside of array bounds, signed integer overflow, null pointer dereference, [...] access to an object through a pointer of a different type, etc. Compilers are not required to diagnose undefined behavior (although many simple situations are diagnosed), and the compiled program is not required to do anything meaningful.

Let's say “it's not allowed if you want your program to do anything useful”.
While there is a chance that the behavior won't be wrong, the code is most definitely wrong :slight_smile:

J-M-L:
I can see why it helps the compiler in more complicated situation with complex objects and but I'm pretty sure that 10 years from now the example above (simple bytes in an array) it will still work.

Yes, it will most likely still work for simple integers like this. However, the question is at which point does it stop working? Will reinterpreting floats as integers work? Probably. If arrays work, do structures work as well? Apparently not, at least not always.

I've personally had a lot of problems in a code base that used two different but identical structs for 2-vectors:

struct Point {
  float x, y;
};
struct Vec2f {
  float x, y;
};

Some parts of the code used Point, others used Vec2f, so pointers to Point were just reinterpret_cast'ed to pointers to Vec2f all over the place. That worked just fine until one day it didn't. The most likely cause were changes to the logging system, some values were logged in between the reinterpret_casts. After changes to the logging, some of the logger calls were inlined and optimized, and suddenly all Point and Vec2f objects in these functions contained complete garbage values. It was an absolute nightmare to debug.

I think you're right that it will continue working for simple things like integers, but that's most likely because the compiler developers know that it would break stuff if they changed it.

J-M-L:
I'm OK to take chances when it all makes sense; I agree it's better to recommend to newbies the right way of doing things. So that's your point.

Exactly. I know that it sounds pedantic, and that many people just keep doing it the way they've done it for decades, but I don't think it should be posted as an answer on a forum for beginners who are just learning to program.
Small things like this can cause a lot of grief in the long run, and it doesn't cost anything to learn doing it the right way from the start.

I’m ok with formal strong typing

Struct allows for grouping multiple types and padding / gaps and is the building block of classes - which is not the case of an array. So I would not take the same chance with a struct.

I would not be surprised that the committee had long arguments to come to this weak middle of the road definition with byte access because you just can’t do without it in that language.

you can't really optimize reads/writes to memory

yes you can. Memory bus are getting wider, 128 bits is getting more common. In consumer device (my iPad for example). Reading consecutive bytes in a byte by byte way in such an architecture would be stupid (luckily not what the compiler does ). Software based prefetching can be useful in loops as well for example.