Help with bytes, ints, longs

Hello all,

I consider myself a pretty solid programmer but I've hit a brick wall on a simple problem. Don't know if there's something major I'm missing, possibly to do with endianness or memory allocation.

My understanding of C representation of whole numbers is that it's binary with the least significant bit on the right, and the most significant bit on the left. For signed values, the most significant bit being set indicates a negative number, and the remaining bits of the number are set according to twos complement.

I would therefore expect positive bytes, ints and longs to behave pretty much the same, and negative numbers would differ only in the selection of where the most significant bit is.

In particular I would expect the number 324 stored in a 4-byte long value to have exactly the same lower bytes as the number 324 stored in a 2-byte int value - the bytes being 1 and 68. The long value would just have two extra zero bytes on the left.

I ran into an issue when writing code based on this assumption and I've summarised the problem into a minimal sketch below. Can anyone explain to me why when this sketch is run, the long value and the int value aren't being reported as 324. How do I have to marshall the bytes into a long or int in order to get the value 324 reported? I may be failing to understand sprintf also.

What I actually see reported is...

Long, Int, Bytes: 1140916224, 17409, 0 0 1 68

//I would expect having the rightmost bytes {1,68} would 
//map to an integer value of 324 (e.g. 256 + 68) for both 
//int and long. The long starts at byteArray[0] and the int starts at byteArray[2]
byte byteArray[] = {0,0,1,68};
static char stringBuffer[50];

void setup(){
  Serial.begin(9600);
}

void loop(){
  delay(1000);
  long *longPointer = (long*)byteArray;
  int *intPointer = (int*)(byteArray+2);
  sprintf (stringBuffer, "Long, Int, Bytes: %ld, %i, %i %i %i %i --- ;) \r\n", *longPointer, *intPointer, byteArray[0], byteArray[1], byteArray[2], byteArray[3]);
  Serial.print(stringBuffer);
  Serial.println();
}

Sounds like an issue of Big Endian Vs. little Endian.

You are assuming that the way the bits are represented conceptually as a value are identical to the way the bits reside in memory. That's not the case. Different processor architectures and compilers use different schemes to represent numeric values in memory.

In your case 1140916224 corresponds to 0x44010000, 17409 corresponds to 0x4401 and 0, 0, 1, 68 correspond to 0x00, 0x00, 0x01, 0x44. So you can see that the least significant byte is stored at the lower memory address. This is referred to as 'little-endian' storage. (The scheme which you assumed is known as 'big-endian'. Some processors use that scheme, but not your Arduino.)

(Edited to correct 'big-endian' to 'little-endian' and vice versa.)

Yup, AVRs use Little Endian, which means the following array:

{0,0,1,68}

when converted to a long becomes:

68256^3 + 1256^2 + 0256^1 + 0256^0 = 1140916224

Little endian means that the least significant byte is placed at the lowest memory address (leftmost in the array initialiser)

Thanks, everyone

I can confirm that I've been able to serve up my magic value of 324, finally, using the code inlined below, which simply reorders the bytes {68,1,0,0} and assumes all values use the leftmost bytes first. I'm sure there will be more interesting complexities as I'm trying to communicate python 'longs' using "7-bit bytes" over serial from Python - the reason for all this bitwise shenanigans in the first place, but I have a start.

I suspect if I'd fully reversed the transform from the desktop code (which uses left and right shifting on whole-numbered values to derive individual bytes), then I would be in the clear as this would have seamlessly hidden the architectural differences which I had misunderstood.

//I would expect having the leftmost bytes {68,1} would 
//map to an integer value of 324 (e.g. 68 + 256) for both 
//int and long. Both the long and the int should start at byteArray[0]
byte byteArray[] = {68,1,0,0};
static char stringBuffer[50];

void setup(){
  Serial.begin(9600);
}

void loop(){
  delay(1000);
  long *longPointer = (long*)byteArray;
  int *intPointer = (int*)byteArray;
  sprintf (stringBuffer, "Long, Int, Bytes: %ld, %i, %i %i %i %i --- ;) \r\n", *longPointer, *intPointer, byteArray[0], byteArray[1], byteArray[2], byteArray[3]);
  Serial.print(stringBuffer);
  Serial.println();
}

The code you wrote can equally be written in another way, using a union:

//Make a new variable type which contains an int, a long, and a 4 byte array all sharing the same memory space:
typedef union {
  byte array[4];
  long longInteger;
  int integer;
} ArrayToInt;

ArrayToInt byteArray = {68,1,0,0}; //the first object in the union declaration is a 4 byte array, so the type can be initialised as an array.
static char stringBuffer[50];

void setup(){
  Serial.begin(9600);
}

void loop(){
  delay(1000);
  sprintf (stringBuffer, "Long, Int, Bytes: %ld, %i, %i %i %i %i --- ;) \r\n", byteArray.longInteger, byteArray.integer, byteArray.array[0], byteArray.array[1], byteArray.array[2], byteArray.array[3]);
  Serial.print(stringBuffer);
  Serial.println();
}