Compiler just got very fussy, when did this change?

I was going through some MIDI code I had written in the past and found that it did not work on Arduino 1.0.5 where as previously it had under earlier 1.0.x versions. After a lot of messing about I found that the char data type no longer behaved as it used to.
This code illustrates the problem:-

char channel = 0;
char incoming = 0x90;
//byte incoming = 0x90;

void setup(){
  Serial.begin(9600);
  if(incoming == (0x90 | channel)) {
    Serial.print("A match as expected");
  }
  else {
   Serial.print("No match");
  } 
}
void loop(){
}

If you un comment the line to make the data type into byte it works no problem.

When did this change and why?

Strange. It also works if you cast incoming to byte in the IF. And it works if you set incoming to 0x7f and check for that value. Looks a lot like the compiler doesn't like negative values.

It also works if you do this:-

  if( (incoming & 0xff) == (0x90 | channel)) {

Which is how I found it. It is almost as if char is not being defined as a byte.

chars are signed. bytes are unsigned. The value that you have (144) is not in the range that fits in a char.(-128 to 127)

PaulS: chars are signed. bytes are unsigned. The value that you have (144) is not in the range that fits in a char.(-128 to 127)

But they re not acting like signed, single-byte containers. They are acting like 7-bit containers. Do any of the "big kid" compilers treat 0x90 the same way as the Arduino version?

Isn't the 'missing' bit the sign bit... The high order bit?

Under OS/X this variant:

#include <stdio.h>

char channel = 0;
char incoming = 0x90;

int main ()
  {
  if(incoming == (0x90 | channel)) {
    printf("A match as expected\n");
  }
  else {
   printf ("No match\n");
  }
}

Gives:

No match

Ditto for Ubuntu.

No match

$ gcc --version
gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
Copyright (C) 2011 Free Software Foundation, Inc.

If you do this:

  Serial.println ((int) incoming);

It prints:

-112

So your code is asking: Is -112 equal to 0x90 (144)? Well, no it isn't.

So your code is asking: Is -112 equal to 0x90 (144)? Well, no it isn't.

But that means the compiler is expanding the char into more than a byte before making the comparison. Because in strictly byte terms -122 is equal to 0x90, they are exactly the same thing.

Like I said it did not used to do this but now it does

Yessss. -122 is "exactly the same as" 144 only if you use certain definitions.

I think you need to look at this:

https://www.securecoding.cert.org/confluence/display/seccode/INT02-C.+Understand+integer+conversion+rules

First:

char incoming = 0x90;

Since incoming "can't hold" 0x90 (144) (it can only hold -128 to +127), that line is the same as if you had written:

char incoming = -122;

char, being signed, can hold that.

Your code has the same problem if you write:

char incoming = 0x90;

void setup(){
  Serial.begin(115200);
  if(incoming == 0x90) {
    Serial.print("A match as expected");
  }
  else {
   Serial.print("No match");
  } 
}
void loop(){ }

So let's assume you left off the "or" part of the expression.

Now a literal is considered an int, if the value you are giving it will fit into an int, which 0x90 does (144).

So your code is effectively:

char incoming = -122;

void setup(){
  Serial.begin(115200);
  if(incoming == 144) {
    Serial.print("A match as expected");
  }
  else {
   Serial.print("No match");
  } 
}
void loop(){ }

I understand what you are saying about the underlying bit patterns, but the fact is that the compiler is correctly interpreting what you have written as comparing a one-byte signed number (which happens to be -122) to a two-byte signed number (which happens to be 144) because the sign bit only applies to one of them.

The generated code actually throws in a check of the sign bit, which is probably why it fails what seems to be an obvious test.

according the C standard "char" is a separate type and treated differently from "unsigned char" or "signed char". Also, the default signed-ness of "char" is not specified and left to the implementation. gcc allows setting the default signed-ness of "char" through command line options. (Perhaps check the command-line options being used on your compiles compared to the older one that worked). I have seen cases where the gcc AVR optimizer misses certain loop & function call optimizations when using a "char" type vs a uint8_t

But the bottom line is that "char" should never be used for anything but chars. Use the stdint types. That is what they are for. http://www.nongnu.org/avr-libc/user-manual/group__avr__stdint.html If you want an exact 8 bit value use uint8_t for an unsigned value and int8_t for a signed 8 bit value. That will be portable across all C/C++ implementations.

For maximum speed/efficiency you should use the c99 "fast" data-types. Those specify the minimum size needed but give the compiler the freedom to go larger if it could generate faster code. i.e. in many cases on bigger processors a larger "int" will generate better/faster code than a int8_t or uint8_t

--- bill

but the fact is that the compiler is correctly interpreting what you have written as comparing a one-byte signed number (which happens to be -122) to a two-byte signed number (which happens to be 144) because the sign bit only applies to one of them.

Yes but the thing is It never used to do this. It might be strictly correct but it has only just got so. I am not saying that it is wrong I am saying it is being over fussy especially as it has only just started behaving like this. I didn't see hordes of people complaining it was wrong before.

Now a literal is considered an int,

Yes that is the root cause, the right hand side is being expanded to two bytes and the left not. That is why

if( (incoming & 0xff) == (0x90 | channel)) {

works.

The generated code actually throws in a check of the sign bit,

Sorry to be picky but there is no sign bit. Numbers are stored as two's complement not sign and magnitude. So the most significant bit is not a sign bit.

PaulS: chars are signed. bytes are unsigned. The value that you have (144) is not in the range that fits in a char.(-128 to 127)

Paul - a very software way of looking at things. From a hardware point of view there is no difference between a signed byte and an unsigned byte. The bit pattern is still the same the only difference is how you interpreter that bit pattern. That same bit pattern in a byte could be interpreted as a 1's complement number, 2's complement, sign and magnitude unsigned or even as an ASCII value.

according the C standard "char" is a separate type and treated differently from "unsigned char" or "signed char".

So the compiler has been wrong all these years, that is my point. I am not arguing about whether this is correct or not.

Since in this case I assume incoming is a byte from the MIDI port, and those bytes can exceed 127 in value, I suggest making "incoming" byte type. Then the code works.

Yes but the thing is It never used to do this. It might be strictly correct but it has only just got so.

Indeed. As someone who has supported MUD game servers over the years, it is an occupational hazard that as compilers are "improved" they tend to report as errors things they used to let through, and in your case it seems, behave differently. No doubt someone would have complained that the comparison of -122 to +144 as an equality was wrong, and they fixed it.

Grumpy_Mike:
When did this change and why?

I can think of two possibilities…

You’re using a Mac? I ask because (I believe) the Arduino software for the Mac does not include the compiler. Maybe the compiler you started with was buggy and did not handle type-promotion correctly (or incorrectly treated char as unsigned).

Typically, C(++) compilers have an option to make char signed or unsigned. It is possible an older version of the Arduino IDE forced char to be unsigned.

the Arduino software for the Mac does not include the compiler.

Yeah, it does:

Where else would avr-gcc come from?

The reason for this apparently strange behaviour is a horrible feature of C and C++ called "integral promotion", which is a hangover from the way K&R implemented the original C compiler. When you use an arithmetic, logical or comparison operator on a value that is smaller than type 'int' (such as 'char', 'signed char' or 'unsigned char' - for which 'byte' is an alias), the compiler expands that value to either 'int' or 'unsigned int' before applying the operator. In this case, both operands are converted to type 'int' (because the rules say that 'int' us used in preference to 'unsigned int' if it is large enough to accommodate all the possible values). That is why it ends up comparing -122 with +144.

In some situations, a signed value is unexpectedly treated as unsigned. For an example, see the beginning of http://eschertech.com/articles/items/art100607.html and the solution given at the end.

Mike, There are many long an ugly stories related to "char" type and it gets VERY messy when not using ASCII character sets that use larger than 8 bit characters. Best thing is not to use for anything but characters.

Something to keep in mind: A "unsigned char" is a number, A "signed char" is a number. A "char" is not a number. It is a char. And that is where the "problem" originates.

The compiler has rules for how to convert that non number "char" to a number. Unfortunately the C standard does not have a hard rule for whether to treat the numeric value of the char as signed value or an unsigned value when converting the char value to a number. And that is the problem. (Also keep in mind that a "char" is not always 8 bits, this adds additional complexity for other character sets)

Those that write code declaring a variable as a "char" which is a non number but then try to use it as a number can get silently burned by the conversion process.

What might have been better is if the compiler created an error when the non number variable was used as a number. At least that way users would have to go in an fix their code to use a true numeric type.

In your case, It is possible that the compiler has changed and has altered the default sign of "char". About 1.0.3 on Linux (I'm guessing this happened on MAC as well) the Arduino IDE started shipping the avr compiler with the IDE. Since the IDE uses VERY old tools, it is possible that the compiler you were using before was actually newer than the compiler that ships with the newer IDE. At a minimum it is likely to be different.

I actually go in and rename or remove the compiler directory that ships with the linux IDE since there are newer/better versions of the avr compiler tools in the debian/ubuntu/mint repos.

But like I said, it is best to never use "char" for numeric values since "char" is not really a number.

--- bill