Casting uint16_t to uint8_t but >>1 shifts high byte as well as low

#EDIT Arduino Nano ATmega328P /#
I'm saving ADC values in real time, and seeking to A) compress, and B) do it as quickly as possible.
I take the 10-bit ADC value (as uint16_t) then shift right varying amounts to compress down to 8-bit without too much precision loss.
I've created a small project (StrangeShiftRightCode) to demonstrate ...

void setup() {
}
volatile uint8_t res;
volatile uint16_t n=0;
void loop() {
  ++n;
  if(n > 1023) { n = 123; }
  res = (n) >> 1;
  res = ((uint8_t)n) >> 1;
  res = (static_cast<uint8_t>(n)) >> 1;
}

The (snipped) generated code:
(Noting that both the ((uint8_t)n) and static_cast<uint8_t>(n)) are being treated as 16 bit (and arguably signed) for the >> 1 operation.)
There seems to be extra machine instructions there.
Can someone explain why this is?
Maybe I should know, but my C++ & AVR Assembler is maybe rusty.
...

volatile uint8_t res;
volatile uint16_t n=0;
void loop() {
  ++n;
 1b2:	80 91 01 01 	lds	r24, 0x0101	; 0x800101 <n>
 1b6:	90 91 02 01 	lds	r25, 0x0102	; 0x800102 <n+0x1>
 1ba:	01 96       	adiw	r24, 0x01	; 1
 1bc:	90 93 02 01 	sts	0x0102, r25	; 0x800102 <n+0x1>
 1c0:	80 93 01 01 	sts	0x0101, r24	; 0x800101 <n>
.../StrangeShiftRightCode/StrangeShiftRightCode.ino:7
  if(n > 1023) { n = 123; }
 1c4:	80 91 01 01 	lds	r24, 0x0101	; 0x800101 <n>
 1c8:	90 91 02 01 	lds	r25, 0x0102	; 0x800102 <n+0x1>
 1cc:	81 15       	cp	r24, r1
 1ce:	94 40       	sbci	r25, 0x04	; 4
 1d0:	20 f0       	brcs	.+8      	; 0x1da <main+0xb6>
 1d2:	10 93 02 01 	sts	0x0102, r17	; 0x800102 <n+0x1>
 1d6:	00 93 01 01 	sts	0x0101, r16	; 0x800101 <n>
.../StrangeShiftRightCode/StrangeShiftRightCode.ino:8
  res = (n) >> 1;
 1da:	80 91 01 01 	lds	r24, 0x0101	; 0x800101 <n>
 1de:	90 91 02 01 	lds	r25, 0x0102	; 0x800102 <n+0x1>
 1e2:	96 95       	lsr	r25
 1e4:	87 95       	ror	r24
 1e6:	80 93 00 01 	sts	0x0100, r24	; 0x800100 <_edata>
.../StrangeShiftRightCode/StrangeShiftRightCode.ino:9
  res = ((uint8_t)n) >> 1;
 1ea:	80 91 01 01 	lds	r24, 0x0101	; 0x800101 <n>
 1ee:	90 91 02 01 	lds	r25, 0x0102	; 0x800102 <n+0x1>
 1f2:	95 95       	asr	r25
 1f4:	87 95       	ror	r24
 1f6:	8f 77       	andi	r24, 0x7F	; 127
 1f8:	99 27       	eor	r25, r25
 1fa:	80 93 00 01 	sts	0x0100, r24	; 0x800100 <_edata>
.../StrangeShiftRightCode/StrangeShiftRightCode.ino:10
  res = (static_cast<uint8_t>(n)) >> 1;
 1fe:	80 91 01 01 	lds	r24, 0x0101	; 0x800101 <n>
 202:	90 91 02 01 	lds	r25, 0x0102	; 0x800102 <n+0x1>
 206:	95 95       	asr	r25
 208:	87 95       	ror	r24
 20a:	8f 77       	andi	r24, 0x7F	; 127
 20c:	99 27       	eor	r25, r25
 20e:	80 93 00 01 	sts	0x0100, r24	; 0x800100 <_edata>
main():
.../.arduino15/packages/arduino/hardware/avr/1.8.6/cores/arduino/main.cpp:47
 212:	20 97       	sbiw	r28, 0x00	; 0
 214:	71 f2       	breq	.-100    	; 0x1b2 <main+0x8e>
 216:	0e 94 00 00 	call	0	; 0x0 <__vectors>
 21a:	cb cf       	rjmp	.-106    	; 0x1b2 <main+0x8e>

No answer to your question, but to go from 10 bit to 8 bit one should (at least) shift by 2?

  res = (n) >> 1;
  res = ((uint8_t)n) >> 1;
  res = (static_cast<uint8_t>(n)) >> 1;

IIRC casting before shifting throws away the two most significant bits while you want to remove the least significant bits.

  res = (n >> 2);
  res = (uint8_t) (n >> 2);
  res = (static_cast<uint8_t>(n >> 2));

I'm saving ADC values in real time,

Giving your goal, it might be worth to take a look at the library code how the 10 bit ADC is generated, it might be possible to rewrite the analogRead() or equivalent. Unless you need portable code.

Which board? , (assume UNO)?

Try this:

Serial.println( ((uint16_t) analogRead(pin)) >> 2);

better, Instead of truncating, round

Serial.println( ((uint16_t) (analogRead(pin)+2)) >> 2);

1 Like

It's a matter of byte order. If the highest byte resides leftmost in memory then casting a 2 byte value to 1 byte will return the high byte, not the low one.

Not true. At least on a ESP32 where I tested it:

void setup() {
  Serial.begin(115200);
  delay(1000);

  uint16_t x {0x12EF};
  Serial.println(x, HEX);
  Serial.println(static_cast<uint8_t>(x), HEX);
}

void loop() {
}

Output:

12EF
EF

Also, memory doesn't have a "leftmost". It has byte addresses which can be either greater or less than each other.

If value is <256, shift right 1; range 0-127
Else if <512 shift right 2 + whatever; range 128-191
Else shift right 3 then + whatever; range 192-255.
This drops least sig bit then stores rest of number to 7 sig bits, in 8 bits

Yeah eh, if the analogRead() returns 1023, then adding 2 results in 1025, shifting that 2 bits = 256, which is not within an 8 bit variable. which i thought was the goal.

That is just nonsense. when you cast to a smaller variable size, you will always be left with the least significant part unless you shift. Endianness is not relevant, that is just the convention that is used for casting.

This is the correct answer and should be marked as solution.

First, the C-style cast has the same result as the static_cast; and since we're doing C++, use the latter, and you don't need the extra parentheses. Consider

  Serial.println(static_cast<uint8_t>(1023) << 1, HEX);
  uint8_t u8 = 1023;
  u8 <<= 1;
  Serial.println(u8, HEX);

Prints 1FE and FE. The cast does -- well, the opposite of truncation -- decapitation(?) but does not constrain the shift operation. It is handled as just plain int. In contrast, assigning to a variable will lop off the same bits and constrain the shift.

But yeah, the resulting machine code seems like a signed 16-bit shift and then 16-bit mask to clear the nine high bits. The high byte in r25 could be ignored entirely; instead just a clc to clear the carry bit before the ror r24.

The plain n >> 1 is simpler: just 16-bit zero-shift. Then storing the low byte.

So to implement your compression scheme, just do the shifts and adds, then assign. Or write inline assembly.

What exactly did you test?
Do you understand the difference between type cast, conversion and coercion?
C++ allows for close conversion types.

I tested exactly what you claimed:

... and the test returned the value's low byte, not it's high byte.

As @Deva_Rishi pointed out, the result would have been exactly the same had the processor used the Big Endian convention. That's how casting works.

A shift-right-2 in all circumstances makes low values very granular; afterwards 4, 5, 6, & 7 all look the same.
I adopt a variable shift, to preserve about 7 bits of significance. This shifts to the right ...:

  • low(L) values (<256) by 1 (max error of 1)
  • medium(M) values (256 to 511) by 2 [then add 64] (max error of 2)
  • high(H) values (512 to 1023) by 3 [then add 128] (max error of 4)
    That gives 256 encoded values; the extra 64 & 128 are subtracted to decode, and adding 0(L), 2(M) or 4(H) to 'round to middle-ish of range'.
    My unit testing checks all 1024 inputs against all 256 encoded values, both encoding and decoding.

How do you keep track of the shift applied? That requires additional memory, which defeats the purpose.

The 1st shift line is to show how the AVR assembler code for a 16 bit shift.
The 2nd & 3rd lines show (or SHOULD show) the AVR assembler code for an 8 bit shift - BUT IT LOOKS LIKE THE GENERATED CODE IS FAULTY.
BTW that casting to 8 bits only happens when 16 bit value is < 256; save time by chopping off 8 top zero bits!

Some more experimentation, and I THINK i have the answer:

  • The generated AVR machine code ALWAYS DOES A 16BIT SHIFT - even for 8 bit unsigned operand! And similar unnecessary code assigning 16bit to 8bit!
    Note that n8 is 'volatile uint8_t', and n16 is renaming my original 'volatile uint16_t n':
.../UnexpectedBehaviour/StrangeShiftRightCode/StrangeShiftRightCode.ino:12
  n8 = n16;
 212:	80 91 02 01 	lds	r24, 0x0102	; 0x800102 <n16>
 216:	90 91 03 01 	lds	r25, 0x0103	; 0x800103 <n16+0x1>
 21a:	80 93 00 01 	sts	0x0100, r24	; 0x800100 <_edata>
.../UnexpectedBehaviour/StrangeShiftRightCode/StrangeShiftRightCode.ino:13
  res = n8 >> 1;
 21e:	80 91 00 01 	lds	r24, 0x0100	; 0x800100 <_edata>
 222:	90 e0       	ldi	r25, 0x00	; 0
 224:	95 95       	asr	r25
 226:	87 95       	ror	r24
 228:	80 93 01 01 	sts	0x0101, r24	; 0x800101 <res>

PS How do I raise this as a bug in the compiler?

It does not appear to be one. It would be a "bug" if running the resultant object produced behavior that violated the C++ language standard. Does it?

That doesn't explain why a shift right of an 8bit unsigned number (whether high or low part of 16bit is irrelevant) is generating code to do a 16bit shift.
Subsequent simpler(!) testing shows that even a shift right of a standard uint8_t (no casting) generates code for a 16bit shift cf my comment at post#14.

Hmm. Maybe an 'unnecessary inefficiency' then?

"Opportunity for Optimization". Which makes me wonder if there's maybe already a compiler flag ...

Something like this? (sort of logarithmic)

void setup() 
{
  Serial.begin(115200);
  Serial.println(__FILE__);

  for (int i = 0; i < 1023; i+=4)
  {
    int value;
    if (i < 256) value = i / 2;
    else if (i < 512) value = 128 + (i-256)/4;
    else value = 192 + (i-512)/8;
    Serial.print(i);
    Serial.print("\t");
    Serial.println(value);
  }
}

void loop()
{
}

Yes indeed. I reduce the arithmetic with [value = base + (i - offset)/N] by combining base & offset with a bit of algebra; maybe an optimiser will do that.
My actual code is

    if(number0to1023 > 0x1FF){ return (number0to1023 >> 3) + 0x80; }
    if(number0to1023 > 0xFF){ return (number0to1023 >> 2) + 0x40; }
    return ((uint8_t)number0to1023) >> 1;

It's the generated code for that last line that had me scratching my head; I tried to do an 8bit shift (number is < 256) but AVR code is 16bit.
(My earlier comment was on phone w/o access to code)