Unexpected unsigned-to-signed promotion

I thought I understood type promotion until today. I ran into a case where an unsigned type was silently promoted to a signed type, causing unexpected results. I know that blaming unexpected results on compiler bugs is one of the first signs that one is losing his grip on reality, but this one has me seriously considering it.

Here's some code:

  byte b = 0x7f;
  unsigned long u = b << 8;
  Serial.print("0x7F <<  8: ");
  Serial.println(u, HEX);
  
  b = 0x80;
  u = b << 8;
  Serial.print("0x80 <<  8: ");
  Serial.println(u, HEX);

which gives me this output:

    0x7F <<  8: 7F00
    0x80 <<  8: FFFF8000

I expected b to be promoted to a 16 bit type prior to the left shift, but I didn't expect the promotion to be from an unsigned type (byte) to a signed type (int). It seems to me that it should have been promoted to an unsigned int.

Of course, if I explicitly cast b to an unsigned type prior to the shift (e.g.u = (unsigned long)b << 8;, I get the results I expected (i.e. 0x80 << 8: 8000).

Can anyone provide a logical explanation for this unsigned-to-signed promotion, or is that just the way it is?

--
Togg

P.S. I was using the Arduino IDE version 1.0.3 on Mac OS X 10.6.8, and running the code on an Arduino Duemilanove board.

constants are ints. So b << 8 is a uint8_t and a int16_t together. The bigger one is int16_t, so the compiler promotes the other to that.

When b is an unsigned long, that is bigger, so the constant is promoted to that.

I would say it's one of the more annoying features of C -- you could say it's a "standards" bug rather than a compiler bug :wink:

WizenedEE:
constants are ints. So b << 8 is a uint8_t and a int16_t together. The bigger one is int16_t, so the compiler promotes the other to that.

Interesting, and plausible, but note:

u = b << (byte)8;        
u = b << (unsigned int)8;

both result in u being 0xFFFF8000.

--
Togg

I know that blaming unexpected results on compiler bugs is one of the first signs that one is losing his grip on reality ...

How very true.

See this:

https://www.securecoding.cert.org/confluence/display/seccode/INT02-C.+Understand+integer+conversion+rules

Integer types smaller than int are promoted when an operation is performed on them. If all values of the original type can be represented as an int, the value of the smaller type is converted to an int; otherwise, it is converted to an unsigned int.

Both of your operands (b and 8 ) can be represented as an int so they are promoted to an int.

Change b to:

  unsigned int b = 0x7f;

And the code works as expected. This is because an unsigned int can't be promoted to an int.

Not that I knew that before I looked it up. :slight_smile:

Thanks to both of you for the good answers, and extra thanks to Nick for the link to the CERT page on integer conversions. I need to spend some quality time with that one.

Given those answers, I'm a bit surprised that this code:

#include <stdio.h>
#include <inttypes.h>

int main () {
    unsigned char b = 0x80;
    unsigned long u = b << 8;
    printf("0x80 <<  8: %lx\n", u);
    return 0;
}

compiled (with gcc) and run on my Mac desktop displays the expected 0x80 <<  8: 8000. The actual behavior then seems to be implementation dependent.

This issue occurred in the context of assembling a 22 bit value from a device by reading in a byte at a time, like this (simplified):

unsigned long getValue() {
    unsigned long val = 0;
    val = getByte() << 16;
    val |= getByte() << 8;
    val |= getByte();
    ...
    return val;
}

byte getByte() {
 ...
//read byte from device
...
}

Of course there were two problems with this code due to the (at the time unexpected) conversion of byte to int: we got a very large jump in the returned value when the middle byte went from 0x7F to 0x80, and the MSByte is left shifted out of existence.

Thanks again, gentlemen.

--
Togg

togg:
... and run on my Mac desktop displays the expected 0x80 <<  8: 8000.

Yes, but on the Mac (and indeed many architectures) an int is 4 bytes. So once promoted to an int, and shifted left 8 bits, the 1-bit isn't shifted into the sign bit, so the result does not get the sign bit set.

On the Mac, try:

#include <stdio.h>
#include <inttypes.h>

int main () {
    unsigned char b = 0x80;
    unsigned long u = b << 24;
    printf("0x80 <<  24: %lx\n", u);
    printf("sizeof (int) = %i\n", (int) sizeof (int));
    return 0;
}

Output:

0x80 <<  24: ffffffff80000000
sizeof (int) = 4

As a newcomer to C, I also found this behavior less than "transparent". I've taken to using compiler directives to make things very explicit. For example,

byte Category, Name, Target;
unsigned int ID;
ID = (uint16_t) ((Category << 6 | Name ) << 3 | Target) << 5;

forces the calculations on the right to be done in a 16-bit unsigned register thus matching the type of ID. While composing I also tend to put in more of these (uint16_t) and parentheses than really needed as I never remember order of precedence rules (or almost anything else for that matter - my biological RAM leaves lots to be desired). Because it makes things look messy, I then clean it up, but making sure that I'm getting the same result.
Ciao,
Lenny

compiled (with gcc) and run on my Mac desktop displays the expected 0x80 << 8: 8000. The actual behavior then seems to be implementation dependent.

Size (of "int") matters.

OK, I think I've got it, but I have one more surprising (to me) result to explain.

byte b = 0x80;
unsigned long ul = 8;
unsigned long r = b << ul;
Serial.println(r, HEX);

Results in FFFF8000 on my Arduino Duemilanove board.

Since one of the operands to the shift operator is an unsigned long, I would have expected b to be converted to an unsigned long. Then the shift should happen (0x00000080 << 8 = 0x00008000) and the result stored in r. That's obviously not happening here.

Of course, left shifts greater than 32 always result in 0 (in this case), so it's somewhat nonsensical for the type of the shift operand to be unsigned long. Is the compiler optimizing away the byte-to-unsigned long conversion and just doing the standard byte-to-signed int promotion?

--
Togg

togg:
I expected b to be promoted to a 16 bit type prior to the left shift, but I didn't expect the promotion to be from an unsigned type (byte) to a signed type (int). It seems to me that it should have been promoted to an unsigned int.

From the C++ 2003 standard, section 4.5.1 on standard conversions:

"An rvalue of type char, signed char, unsigned char, short int, or unsigned short int can be converted to an rvalue of type int if int can represent all the values of the source type; otherwise, the source rvalue can be converted to an rvalue of type unsigned int. .... These conversions are called integral promotions."

and from section 5.8.1 on shift operations:

"The operands shall be of integral or enumeration type and integral promotions are performed. The type of the result is that of the promoted left operand."

I agree, it's counter-intuitive.

This shows 0x8000:

  byte b = 0x80;
  unsigned long ul = 256;
  unsigned long r = b * ul;
  Serial.println(r, HEX);

So it appears that what dc42 said is correct (the second quote, about the shift operations).

Plus the next sentence in the standard (which I just downloaded) is very relevant:

The behavior is undefined if the right operand is negative, or greater than or equal to the length in bits of the promoted left operand.

So, undefined behaviour. :stuck_out_tongue:

Or is it? Would both operands be promoted to unsigned long? But they can't be or the results would be 0x8000.

You need to distinguish between two types of conversion. Integral promotion is performed on a single operand. In the case of a shift operation, both operands are subject to integral promotion, independently of each other. The type of the right operand does not affect the type of the promoted left operand.

For binary arithmetic (not shift) operators, the "usual arithmetic conversions" are performed. In this case, the type of one operand can affect the type of the converted left operand. In particular, if one operand is unsigned int or a longer unsigned type, and the other operand has a signed type, then the other operand will be converted to unsigned. However, an expression of the form byte * byte will still be treated as int * int with the "unsigned" attributes discarded. See section 5.0.9 of the 2003 standard.