Difference between char and uint8_t

As @stitech mentioned, it's the compiler that treats char as either signed or unsigned by default.

But an ISA can dictate whether signed or unsigned char is "easier" or more "natural", and C compilers tend to choose the easier route for size and speed. The RISC-V ISA, for example, sign-extends almost everything, so it's more natural to treat char as signed. I'm not as familiar with Arm (the ISA used by the Due), but my guess is it zero-extends almost everything which could explain why GCC char treats unsigned by default on Arm.

I glanced at the datasheet for the processor of the Due. The reasoning doesn't change. Please tell me where the following is wrong.

A 32 bit memory address may contain a 32 bit number, signed or unsigned, can be part of a larger number (64 or 128 bit), can contain a 16 or a 8 bit number (disadvantage: inefficient use of memory space), or can contain 2 16 bit numbers or 4 8 bit numbers (extra execution time for extracting and merging).

The content of that memory address does not give any information on this. If it's a 8 bit number the rest will be filled with '0's (or '1's for negative), but the reverse is not true: a series of '0's or '1's does not necessarily indicate a smaller type.

Depending on the type, computation with the content of that memory address requires different machine instructions to execute. The task of the compiler is to translate the intentions of the programmer into the proper machine instructions. Therefore the compiler needs to know the types, the processor needs not.

Is not a memory address always an unsigned number?

1 Like

The Due has an Arm processor. Not too long ago Arm's instruction set made sign-extending a byte (e.g., when loading a byte from memory) more cumbersome and expensive, in terms of instruction count and/or cycles, than zero-extending it.

Some instructions were added to Arm later to make it easier to sign-extend bytes, but those instructions are still more limited in the addressing modes they can use than the equivalent zero-extending instructions, so it's still somewhat more expensive in some cases to sign-extend than to zero-extend a byte. So I can surmise this is why the GCC developers chose to treat char as unsigned on Arm.

Arm's propensity to zero-extend is also why its designers had to add a BIC (Bit Clear) instruction as a complement to the AND instruction. With an AND-immediate instruction you can't do an AND on just the lower 8 bits while preserving the upper 24 bits of a register since the immediate 8-bit operand is zero-extended. On the other hand, RISC-V sign-extends virtually everything, so you can do "ANDI x1,x2,0xfffffff0" (equivalent to "ANDI x1,x2,~0xf") to mask out the lower 4 bits and preserve the upper 28 bits (the immediate operand is 12 bits wide, 0xff0, and is sign-extended to 0xfffffff0).

I don't see how this contradicts the point I was making: the compiler needs to know the types of the variables to pick the right machine instructions, and at runtime this information has been processed and is not even available for the target processor. That the compiler for the ARM processors apparently differs from the one for the AVR, and therefore handles certain types differently, is not a big surprise.

The answer to the original question, valid for the AVR processors that I work with, valid for the Arduino IDE that I use, is that char and int8_t are interchangeable. And yes, the answer may be different if I choose another processor or another IDE, in which case the question may pop up again. I'll keep using int and uint, just in case.

For now I thank everyone that helped shedding light on this topic.

This is a bad solution.
Use char to store a char (like 'a').
It tells the compiler that you represent a char and not a number. It also tells other programmers (including your future self) that you intend to store a character.
Let the compiler take care on how it wants to represent that character as a binary number in memory.

1 Like

uint8_t can take values from 0 through 255 because it is unsigned. char can take values from -128 through 127. They are not equivalent.

This is not correct, please see the previous posts in this thread, the signedness of char has already been discussed.

I would say that the type char takes values as ASCII characters like: 'a' - 'z', 'A' - 'Z', '0' - '9', punctuation marks, and control characters. For example:

char ch1 = 'w';
char ch2 = '\n';

I was quoting my reference books: Teach Yourself C in 21 Days and The Arduino Cookbook. There is an unsigned char data type.
The arduino IDE Help on char states that char is for storing ASCII values for characters and to use byte for unsigned 8-bit data.
This seems to be another confusing topic.

To some extent!
Given:

char ch1 = 0x41;
byte ch2 = 0x41;

What will be the output on the Serial Monitor after the execution of the following codes when both variables contain identical numerical values?

Serial.println(ch1);
Serial.println(ch2);

This is a good lesson that some programming texts are better than others (avoid programming books by Herbert Schildt, for example).

Both of those books are apparently referring to some specific implementation(s) of C or C++. On the Arduino Due (as I pointed out above), char is unsigned so has a range of 0 to 255. The C and C++ standards allow char to be either signed or unsigned by default, and some C/C++ implementations choose one way and other implementations choose another way.

I have with interest read this about ½ year old thread. It seems to me that the type definitions is not treated that strict with the code for the Arduino, and the type uint8_t is used for almost everything.

I noticed, that pin identifications seems to be just numbers and again with the type uint8_t. I guess, that I would have chosen them to be an enumeration type, but there is probably reasons to chose just numbers. When I read the library I see these definitions with extensive use of uint8_t:

void pinMode(uint8_t pin, uint8_t mode)
void digitalWrite(uint8_t pin, uint8_t val)
int digitalRead(uint8_t pin)
void analogReference(uint8_t mode)
int analogRead(uint8_t pin)
void analogWrite(uint8_t pin, int val)

I can find declarations for pins, that are defined as integer variables. This is from the AnalogInput example:

int sensorPin = A0;   // select the input pin for the potentiometer
int ledPin = 13;      // select the pin for the LED
int sensorValue = 0;  // variable to store the value coming from the sensor

void setup() {
  // declare the ledPin as an OUTPUT:
  pinMode(ledPin, OUTPUT);
}

void loop() {
  // read the value from the sensor:
  sensorValue = analogRead(sensorPin);
  // turn the ledPin on
  digitalWrite(ledPin, HIGH);

I learned, that more varied use of type definitions prevent more code errors.

Why Is this extensive use of uint8_t and numbers necessary?

I did not read whole topic what it is about, but I think you are wrong. No all other types may vary between platforms in C/C++. Moreover, data type uin8_t is derived.
In C/C++ data type char is 1 byte with 8 bits, always 1 byte and it is basic data type while uint8_t is not basic data type.

Programmers using higher languages don’t care how big their numbers are (in general). Whether int is 32 or 64 bits, it’s almost always enough. Some languages don’t even offer the choice to use unsigned numbers.

Programmers working with microcontrollers often work directly with the registers. They realize that pin number can never be negative. Memory addresses can never be negative. And if you’ve got only 2kB of RAM (sometimes even less) you better be careful about using RAM space. And they may want to control the roll-over of their numbers from max to zero, distinct from max to minus max.

In this respect the definition of char as a signed number is somewhat surprising as ASCII coding doesn’t know negative numbers. But Arduino uses only the most basic characters of ASCII, never over 127, in which case signed or unsigned is irrelevant.

By the way, pin numbers never exceed 127 and still they’re defined as int.

The language C and it’s successors were developed with the idea that the maker of a compiler for a specific processor should have the freedom to tailor it to that processor, which resulted that the definition of (amongst others) int was left to that maker. For the smaller numbers char, unsigned char and byte were invented, but these were still under the ‘freedom’ policy, with the subsequent ambiguity.

Later the system with uint8_t etc was created for more precise control.

1 Like

No, I am not fully right. Yes about uint8_t. It is not fundamental type. However, as I read the standards, it turns out the type char (also unsigned) is "at least" 8 bits long. So it may be larger than 8 bits. Even worse, there is char8_t in the new specifications, which is also at least 8 bits long. OMG.

1 Like

In Arduino UNO, there is semantics confusion between data types char and byte. In the following example, both variables y1 and y2 contain the same operand; but, the output meanings are different. Would be glad to hear your opinion/explanation.

void setup()
{
  Serial.begin(9600);
  char y1 = 65;
  byte y2 = 65;
  Serial.println(y1); //shows:A
  Serial.println(y2); //shows: 65
}

void loop() 
{

}

Output:

A
65

This is the definition:

    size_t println(char);
    size_t println(unsigned char, int = DEC);

The compiler distinguish method/function based on data type, byte is unsigned char.
It is the first println for char and the second one for byte.

2 Likes

Datatypes like uint64_t are part of the C++ language ( Fixed width integer types (since C++11) - cppreference.com ).

Within the Arduino compiler this is referenced back to an older type, already present in the compiler (typedef unsigned long long int uint64_t;).

The user doesn’t know exactly how this older type is defined (unless he dives deep), but the programmers of Arduino do, giving the user that precise control when he uses the new type.

And this is now independent of which compiler.

The original C data types (char, short, int, long, float, double) were intentionally left "loosely defined" in the language and standards. "char" is "at least 8 bits", a short is at least as long as a char, an int is at least as long as a short, a long is at least as long as an int, etc...
This provided a lot of flexibility, back when there were machines that addressed data in sizes other than "per 8bit byte", but also was annoyingly inexact (I used to work on a machine where a character was normally 7bits, for example.)

When things were standardized, it was decided to add types with exact bitsizes as well. These are normally defined in terms of the old primitive types, as appropriate for the architecture (typedef unsigned char uint8_t;), for reasons that aren't obvious to me. But they don't need to be, I guess.
When you need a particular size of variable, you should use these new types.
(There's also a bunch of other less-popular types covering "fastest variable of at least 8 bits" and similar. See C data types - Wikipedia - they probably SHOULD be more popular. Along with size_t. Sigh.)

AVR code that uses 8bit variables as integers because they are smaller and faster and big enough for some purposes should always be using int8_t or uint8_t

I can be. Consider:

   for (int8_t i = 50; i >= 0; i--)
      myarray[i] = 0;  // zero the first 51 elements of the array  

Accidentally use uint8_t and you will get unexpected behavior (probably a warning/error from the compiler, these days.)