char datatype actually unsigned?

The documentation states that the char datatype is signed and can hold values from -128 to 127 http://www.arduino.cc/en/Reference/Char
I noticed that this is not the the case. Take this example:

char negative = -1;
char positive = 1;
if (positive < negative)
{   
    Serial.println("Sign Error");
}

The only explination I can think of for this is that the comparison operator < might automatically typecast to int, in which case the two’s compliment sign bit on the char would be ignored.

So what’s the point of using a char instead of an unsigned char if they behave the same?

Negative 1 should be 0x81, positive 1 is 0x01, so taken literally positive is less than negative.

The comparison < does not seem to have anyway to know you want a 2’s complement comparison.

Thanks for the reply.

CrossRoads:
Negative 1 should be 0x81, positive 1 is 0x01

Actually, negative 1 should be 0xFF.

CrossRoads:
The comparison < does not seem to have anyway to know you want a 2’s complement comparison.

It knows I want a 2’s compliment comparison for signed integers, so why does it ignore signed chars? Take this example:

char positive_char = 1;

Serial.print("Char Test: ");
Serial.print(negative_char, HEX);
Serial.print(" should be less than ");
Serial.print(positive_char, HEX);
if (negative_char < positive_char) {   
	Serial.println(" and it is");
} else {
	Serial.println(" but it is NOT!!!");
}

int negative_int = -1;
int positive_int = 1;

Serial.print("Int Test: ");
Serial.print(negative_int, HEX);
Serial.print(" should be less than ");
Serial.print(positive_int, HEX);
if (negative_int < positive_int) {   
	Serial.println(" and it is");
} else {
	Serial.println(" but it is NOT!!!");
}

Output:
Char Test: FF should be less than 1 but it is NOT!!!
Int Test: FFFFFFFF should be less than 1 and it is

I don't know Miles. I do all my stuff as unsigned #s to avoid dealing with that stuff. Sounds like it could be a bug.

What version of IDE/hardware are you using?

I noticed that this is not the the case

I noticed that it works fine…

void setup( void )
{
  Serial.begin( 9600 );
}

void loop( void )
{
  char negative = -1;
  char positive = 1;
  
  if (positive < negative)
  {   
      Serial.println( F( "Sign Error") );
  }
  else
  {
    Serial.println( F( "No problem" ) );
  }
  delay( 1000 );
}
No problem
No problem
No problem
No problem
No problem
No problem
No problem

The only explination I can think of for this is that the comparison operator < might automatically typecast to int

Theoretically that is what happens. However, it is very likely that the compiler recognizes that there is no difference between comparing the two values as “int” or “char” and, as an optimization, leaves the comparison as “char”.

After a moment of reflection, I suspect the compiler recognizes the fact the the condition is always false and removes the comparison and then-clause as dead code. 1510 versus 1550 bytes of code. Yup, that appears to be the case.

in which case the two’s compliment sign bit on the char would be ignored

Wrong. The phrase to search for is “sign extension”.

In this particular contrived case, the compiler is stepping in.

This snippet:

void setup(void) 
{
  char negative = -1;
  char positive = 1;
  if ( positive < negative )
    Serial.println("I am broken");
}

Compiles down to this wonderful thing:

setup:
	ret

Anyway, I forced it to generate code for these functions:

bool lt(char a,char b)
{
  return (a < b);
}

bool ltu(unsigned char a,unsigned char b)
{
  return (a < b);
}

And the compiler did the right thing. Exactly the same code, but used a BRGE (Branch if greater or equal (signed)) in the first case and BRSH (Branch if same or higher (unsigned)) in the second.

MilesF:
The documentation states that the char datatype is signed and can hold values from -128 to 127 http://www.arduino.cc/en/Reference/Char
I noticed that this is not the the case.

I can’t reproduce your problem - although you didn’t say what it printed …

char negative = -1;
char positive = 1;

void setup ()
{
  Serial.begin (115200);
  Serial.println (negative, DEC);
  Serial.println (positive, DEC);
  
  if (positive < negative)
    Serial.println("Sign Error");
  else
    Serial.println("OK");
}

void loop () {
}

That prints:

-1
1
OK

That’s what you expect isn’t it?

If you change it to:

unsigned char negative = -1;
unsigned char positive = 1;

void setup ()
{
  Serial.begin (115200);
  Serial.println (negative, DEC);
  Serial.println (positive, DEC);
  
  if (positive < negative)
    Serial.println("Sign Error");
  else
    Serial.println("OK");
}

void loop () {
}

It prints:

255
1
Sign Error

I expect that as well. The -1 was converted to 0xFF which is indeed not lower than 1.

maniacbug:
In this particular contrived case, the compiler is stepping in.

This snippet:

void setup(void) 

{
 char negative = -1;
 char positive = 1;
 if ( positive < negative )
   Serial.println(“I am broken”);
}

So, I’m thinking the same thing that you observe: The compiler cleverly saw that you had made assignments and then directly used the values in a relational expression whose result was false, so the compiler decided that, since nothing would be done, it just optimized away the resulting code. I mean, if you want it to do something, you gotta give it something to do, right?

If you had declared either (or both) of the variables as being volatile, it wouldn’t have optimized it away. Of course you still wouldn’t have seen anything in the result, even if it were broken, since you didn’t execute Serial.begin()

Anyhow you don’t have to contrive anything to see the results with Serial.print…

void setup(void) 
{
    Serial.begin(115200);
    char negative = -1;
    char positive = 1;
    if ( positive < negative ) {
        Serial.println("It is broken!");
    }
    else {
        Serial.println("It works as expected!");
    }
}
void loop(){
}

Output:


It works as expected!

If you want to see how the comparison is done, just declare the variables volatile and then look at the output from avr-objdump -d to see the different branch instructions that are used for signed comparison and unsigned comparison.

Bottom line: I don’t disagree with your methodology or your conclusion about the correctness of program comparisons with signed and unsigned integer data types.

The thing is that I have just been involved with some benchmarking attempts where people made incorrect conclusions based on their ignorance of the marvels of modern optimizing compilers. Sometimes what you put into your source code is not what you expect from the optimized executable.

Regards,

Dave

MilesF:

the comparison operator < might automatically typecast to int

That is not correct. Some compilers may “promote” char to int and unsigned char to unsigned int, but if both have the same signedness, nothing is cast to a different type.

MilesFSo:
what’s the point of using a char instead of an unsigned char if they behave the same?

They do not behave the same. You didn’t give us a complete sketch that shows us how you arrived at the conclusion that they do behave the same.

Try this, for example:

    void setup(void) 
    {
        Serial.begin(9600);
    
        char cnegative = -1; // Stored as 0xff
        char cpositive = 1;  // Stored as 0x01
        if (cpositive < cnegative) { // Signed comparison
            Serial.println("cpositive is less than cnegative");
        }
        else {
            Serial.println("cpositive is not less than cnegative");
        }
    
        byte bnegative = -1; // Stored as 0xff
        byte bpositive = 1;  // Stored as 0x01
        if (bpositive < bnegative) { // Unsigned comparison
            Serial.println("bpositive is less than bnegative");
        }
        else {
            Serial.println("bpositive is not less than bnegative");
        }
    
    }
    void loop(){}

Output:


cpositive is not less than cnegative
bpositive is less than bnegative

Regards,

Dave

Footnote:
If people are really interested, they might try comparisons where one operand is signed and the other is unsigned…

It turns out that comparing integer data types for equality is no problem even if they are of different signedness, but other relational expressions are of questionable value (in my opinion). (In other words: In general, don’t use ‘<’ or ‘<=’ or ‘>’ or ‘>=’ to compare signed with unsigned types.)

I appreciate all the replies. I figured out the problem is with my makefile. I ran some of the examples all of you gave provided using my makefile and continued to get sign errors. Then I ran the examples in the Arduino IDE and they worked. I’ve compiled a few other projects using the makefile and they worked flawlessly, so I was really confused why it was messing up chars. Then I dug a little deeper and found out that it was using the -funsigned-char option. I didn’t even think to look for this, and it slipped by because the makefile is so huge. I attached it in case anyone wants to take a look.
Anyway, sorry to cause a commotion over user-ish error, but I’m glad it’s resolved. At least if anyone else runs into this very obscure issue, they’ll have a better chance of finding a solution with a search. I spent hours searching for a solution before posting and couldn’t find anything.

Edit: I think the person who wrote the makefile I’m using might have gotten some inspiration from this page http://www.tty1.net/blog/2008-04-29-avr-gcc-optimisations_en.html
It suggests using -funsigned-char to reduce code size. I don’t see how that would make a difference. Does anyone have an idea why it would?

Makefile (8.4 KB)

The thinking may be that with unsigned char you can store a larger positive numerical value in it and you might not need to use a two-byte integer instead.

There is a discussion of why this option is useful in another context:

http://www.network-theory.co.uk/docs/gccintro/gccintro_71.html