EEPROM.put() endianness

You seem to believe there is only one "best" choice. There is not. Thousands of processors have been built using "little-endian", and thousands have been built using "big-endian". Both work just fine, and which is "best" is generally a matter of which one a person used first. I have used, and designed, processors using both, and there is NO functional difference. And, in this case, I honestly do not see why on earth you care. Use get and put, and your code will ALWAYS work, regardless of the processors byte ordering. Not all Arduino processors are the same. Some are big-endian, some are little-endian. The SMART approach is to write your code so that is need not know or CARE what the underlying processor does, and instead focus on more important "problems".

And quite possibly at the time the processors were developed, there was sound techinical reasons for the little endian choice.

Could the engineers who made that initial choice have appreciated that they should reverse their choice because 45 years later someone would consider big endian more appropriate and easier to use ?

For every modern little-endian processor you name, I can name an equally modern big-endian processor. IT DOES NOT MATTER! It is a completely arbitrary choice. Some processors will even do both! BOTH architectures have existed since the very beginning of computers, and BOTH still exist today in brand-new designs. NEITHER has any compelling advantage. The two biggest volume processor design companies in the world are Intel and ARM. Intel is little-ending, ARM is big-endian. So what?

I never said that big-endian was "best". From the point of view of a machine it is arbitrary indeed and it does not matter (it may have mattered in the past). But form the point of view of a human, big-endian feels more natural. Even that is just a convention, but a convention that we learned as little children and that we use every day, along with everybody around us. If I want to write down the square of 7 in decimal, I go for 49 (big-endian) instead of 94 (little-endian). I could interpret 94 as the square of 7, but that would require a constant effort and introduce a lot of mistakes as soon as I'm not paying attention and just reading casually.

This is precisely why, without even thinking about it, and before finding out that the arduino tends to be little-endian, I favoured a big-endian scheme for saving integers to EEPROM. And then there came the clash between my bespoke solution and the methods from the EEPROM library. I've said it many times already along this thread.

https://www.rfc-editor.org/ien/ien137.txt

2 Likes

Looks like I'll soon be moving to Blefuscu. Thanks for the link, it's an interesting article and a welcome reminder to read or re-read classical literature instead of computer stuff from time to time.

That is simply NOT true! It is true ONLY from the point of view of SOME humans. I, and many others, am perfectly comfortable with EITHER, and I could not care less which one any given processor uses. If you took a poll, you would find there are just as many people who feel big-endian is "more natural" as there are who feel little-endian is "more natural". Don't ascribe your personal bias to the entire species. It is a personal preference, and nothing more.

Are you sure about that? Consider a Teensy 3.2 (based on ARM Cortex-M4 ):

void setup() {
  Serial.begin(115200);
  delay(1000);
  
  uint32_t x = 0x12345678;
  uint8_t *ptr = (uint8_t *) &x;
  for (uint8_t i = 0; i < 4; i++) {
    Serial.print(*(ptr + i), HEX);
    Serial.print(" ");
  }
  Serial.println();
}

void loop() {
}

Output:

78 56 34 12 

Same code run on an Uno:

78 56 34 12 

Really? I don't think there has been an unashamedly big-endian processor since the 68k series. (Not counting the modern RISC processors, which have the memory load/store so divorced from the rest of the instructions that they can be configured either way.) (ARM and MIPS are both configurable at some level, but I think all of the ARM and MIPS microcontrollers I'm aware of have been configured as little-endian in silicon.)

Everyone numbers their bits in little-endian fashion these days, because math: bit 0 has a value of 2**0, etc. So LE for bytes continues in that fashion.

I think LE might have had an advantage for doing multi-precision math on early 8bit CPUs. Two add two n-byte numbers, you have to start with the LSB. In an LE architecture, the LSB is right there where your pointer is pointing. If your instruction set is limited in its ability to do indexed addressing, or math on pointer registers, this is much more convenient that needing your first add to be @(Y+4) + @(X+4)

TCP/IP networking is all big-endian. I spent too many years making big-endian code run on little-ending chips. (Nowadays, Intel has a bi-endian compiler that does most of the work for you. If it takes 40ns to fetch a byte from memory, and 0.5ns to swap the bytes around with the BSWAP instruction (on a modern x86 cpu)...)

ARM processors are actually "bi-endian". They will operate big-endian, little-endian, or both, depending on how the processor logic is configured when the chip is designed. The design package you receive from ARM supports either, or both. So, there is no "one" endian-ness with ARM. The "endian-ness" affects ONLY the bus-interface and address mapping logic of the processor, which are a tiny, tiny part of the overall processor logic. The vast majority of the ARM-based chips I've designed and used over the last 20 years were implemented as big-endian, including the Due, which is the ARM I have used most recently.

I don't think I've ever seen a big endian Cortex-M microcontroller.

The SAM3X8E on the Due is little-endian:

Yes, you're right, the Due is hard-configured for little-endian. I mis-remembered, as I have not used a Due in many years. But the rest still holds - the endian-ness of ARM processors is selectable, and has been for 20+ years, going all the way back to the "Thumb" processors of the '90s. That includes the current CORTEX-M series, which, unlike earlier versions, allow endian-ness can only be configured when the processor logic is synthesized during the chip design process. Earlier models allowed endian-ness to be configured at run-time, usually via a configuration pin.

I'm quite sceptical about the usefulness of a poll in such a case. Probably, many of those who vote for little-endian as a more natural system, will themselves be biased by their own experience with little-endian architectures, which are indeed more common (standard, even) nowadays. To them, what they use and see most of the time will feel more natural. Also, they may casually read the question miss the following point...

Me neither. I never argued about processors, by the way. It has always been a matter of picking a way to manually store a 2-byte value to EEPROM [the point], which is itself single-byte based and single-byte addressable, and therefore naturally endianness-agnostic.

Also consider that I went for big-endian before I found out about the little-endianness of the higher-level EEPROM methods. Now that I know, I feel like I lean a little more towards a little-endian approach, if only for the sake of consistency and ease of use. So, oddly enough, I may be more biased today than I was before finding out about the methods, and so will be the responders to your poll, unless they all are as genuinely ignorant about the methods as I was when I picked big-endian.

Lastly, what feels more natural and what is more appropriate in a particular case are two very different things, but I'm not sure the responders who cast a quick vote and move on will stop to think about the difference.

But it really should not matter when storing the value to EEPROM, as long as you read back in the same order you store. Storing two bytes (or any number of contiguous bytes) would have a starting address and length, the actual data contents, and what that data represents in the code, should be irrelevant. The only time I can think it would matter is if you wanted to dump the contents of the EEPROM to a human-readable format, but if you reversed the endianness between EEPROM and memory a similar dump of memory would not match and make things even more confusing.

I agree, and this is the very reason why I favour big-endian in this particular case. I diagnose the correct use of EEPROM by raw byte dumps to serial: with big-endian, I have one less thing to worry about. I also don't have to worry about any mismatch between RAM and EEPROM endianness when I only move single bytes around. But an int16_t has 2 bytes, so the question of endianness arises: To follow what the RAM does and go little-endian or to favour human-readability and go big-endian? For me, the latter makes more sense.

@330R
I've been following this rather interesting discussion but not commented until now. I get that you personally have a preference for big-endianness, what I am still unable to understand, despite reading all your posts, is why you think it matters in general or more specifically when writing to and reading from an EEPROM.

We (in the greek/latin) based world write text from left to right. The arab numbers are right to left (49 starts with a 9 and has, due to its size, an additional digit left to it). That's a weird convention, but you learned it rather early and there is no plan to change it.
I feel it very natural, to store texts with the first character in the first memory location and numbers with the LSB first. ( However, regarding numbers --where you know the size in memory in advance--, I admit that Big Endian is a possible way to do it (the wrong way :slight_smile: )

Little-endianness has got its elegance and is even more coherent: the lowest address takes the byte with the lowest weight: neat! Big-endianness, on the other hand, has it backwards: lower weight goes to higher address. So, as far as the inner workings of the machine go, I have no real reason to prefer big-endianness: I acknowledge the greater abstract elegance of little-endianness, but I don't actually care that much, as long as thing are working.

As soon as I peek inside the machine, however, I see things differently. I tried to explain this in posts #3 and #35. When I do hex dumps of the EEPROM, then it's little-endian that seems backward and big-endian feels more natural, because I dump from address 0x000 up to address 0x3ff and in this case I see the heaviest byte first (to the left of the monitor), which is coherent with how I read decimal numbers every day.

P.S. The function I use for dumping emulates the output of GNU's od - t x1, with some minor differences. If I save a string to EEPROM with

EEPROM.put(0, "foobar");

the first line of dump output reads like

0000 66 6F 6F 62 61 72 00 FF FF FF FF FF FF FF FF FF

with the lowest address taking the lowest index of the array: this way I can convert the bytes to ASCII in my head (with a cheat sheet in front of me, of course!) and read the string from left to right, like I do in normal English.

But when I do

EEPROM.put(0x40, 0x12345678);

my dump looks like

0040 78 56 34 12 FF FF FF FF FF FF FF FF FF FF FF FF

If a machine could understand, it wouldn't understand what the fuss is all about: "I did exactly what you liked the last time, Master: lowest weight to lowest address, exactly like you ordered me to." Yes, but I'm not happy with the result when I look at it. I'm not a machine, and to me, that thing is weird and a potential source of bugs. Problem is: to a machine, the 'f' in "foobar" and the 0x78 in 0x12345678 are the same thing, i.e. the lowest byte in a sequence of bytes. What the bytes represent to me is not the machine's business. But we humans (OK, humans with a Latin alphabet, and I'm one of them) read words and numbers differently: it feels natural to expect the heaviest digits on the left. On the other hand, we have no concept of "heavy digits" when reading words.

So, to me, when the following two conditions are met:

  1. We just want to save multi-byte integers to EEPROM, and
  2. We rely on memory dumps in the form I have shown for debugging

then, big-endianness is better suited for the task because it looks more natural and it is coherent with how we read and write numbers on paper.

Nonetheless, I must admit that this feeling of mine is not as universal as I thought: this discussion has shown it.

I don't get why you think you need to do hex dumps. For context I was recently trying to write to and read from a Microchip 24xx512 EEPROM. I did do a hex dump from it as I wasn't getting back the data I put in, that is I was getting back 0xff in every location. Turned out because I'd mistyped something I was not writing to the location I thought I was. For that check all I needed to know was that there was data or not. 0xff or something else. As to what the data actually was, don't care at they byte level. Put it back in RAM where it belongs and check the results are as expected in terms of C variables.

What are you doing that's not covered by that?

I use them as a tool (which I find very useful) to look inside the EEPROM at low level. If you are curious about my specific case, I'm implementing a wear-levelling algorithm that includes bit-shifting on the values and that moves the values 2 EEPROM cells forward with every save, with wrap-around at the end. I want to observe what is happening inside and if everything is going according to my plans. Having to wrap my head around little-endianness on top of everything else is something I'd rather avoid, if I can. I also have a function that scans the EEPROM and tells me where the values are and what cells are available for the next writing cycle. For this, too, hex dumps are very useful: if the function sees what my eyes see, then the the function's good.

Your higher-level approach will certainly work, but I work better if I can visualise every cell of memory on my monitor, like a big chessboard, so I can actually see things moving about from one dump to the next. I even went as far as printing a few dumps on paper and drawing circles and writing notes on them with a pencil.

1 Like