How many bits is the char data type?

Would someone please try the sketch below. I must be missing something, the variable x is acting as though it were a signed 16-bit integer. I thought char was a signed 8-bit integer.

//Arduino 1.0.5, Arduino Uno.

void setup(void)
{
    Serial.begin(115200);
    char x = 0;
//    int8_t x = 0;    //gives same results as char
//    uint8_t x = 0;   //works as expected
    
    for (long i=0; i<32800; i++) {
        Serial.print(x, DEC);
        Serial.println();
        ++x;
    }
}

void loop(void)
{
}

The output I see:

0
1
2
3
...
32765
32766
32767
-32768
-32767
-32766

Just did the test and got the same result. I then added Serial.print(x); and the Serial.print(x,DEC); becomes capped between -128 and 127.

My code that reveals the cause:

void setup(void)
{
    Serial.begin(115200);
    volatile char x=0;
    char y=0;
    for (long i=0; i<512; i++) {
        Serial.print(x, DEC);
        Serial.print('\t');
        Serial.print(y, DEC);
        Serial.println();
        ++x;
        ++y;
    }
}

void loop(void)
{
}

Some output:

0	0
1	1
2	2
3	3
4	4
5	5
6	6
7	7
8	8
9	9
10	10
11	11
12	12
13	13
14	14
15	15
16	16
17	17
18	18
19	19
20	20
21	21
22	22
23	23
24	24
25	25
26	26
27	27
28	28
29	29
30	30
31	31
32	32
33	33
34	34
35	35
36	36
37	37
38	38
39	39
40	40
41	41
42	42
43	43
44	44
45	45
46	46
47	47
48	48
49	49
50	50
51	51
52	52
53	53
54	54
55	55
56	56
57	57
58	58
59	59
60	60
61	61
62	62
63	63
64	64
65	65
66	66
67	67
68	68
69	69
70	70
71	71
72	72
73	73
74	74
75	75
76	76
77	77
78	78
79	79
80	80
81	81
82	82
83	83
84	84
85	85
86	86
87	87
88	88
89	89
90	90
91	91
92	92
93	93
94	94
95	95
96	96
97	97
98	98
99	99
100	100
101	101
102	102
103	103
104	104
105	105
106	106
107	107
108	108
109	109
110	110
111	111
112	112
113	113
114	114
115	115
116	116
117	117
118	118
119	119
120	120
121	121
122	122
123	123
124	124
125	125
126	126
127	127
-128	128
-127	129
-126	130
-125	131
-124	132
-123	133
-122	134
-121	135
-120	136
-119	137
-118	138
-117	139
-116	140
-115	141
-114	142
-113	143
-112	144
-111	145
-110	146
-109	147
-108	148
-107	149
-106	150
-105	151
-104	152
-103	153
-102	154
-101	155
-100	156
-99	157
-98	158
-97	159
-96	160
-95	161
-94	162
-93	163
-92	164
-91	165
-90	166
-89	167
-88	168
-87	169
-86	170
-85	171
-84	172
-83	173
-82	174
-81	175
-80	176
-79	177
-78	178
-77	179
-76	180
-75	181
-74	182
-73	183
-72	184
-71	185
-70	186
-69	187
-68	188
-67	189
-66	190
-65	191
-64	192
-63	193
-62	194
-61	195
-60	196
-59	197
-58	198
-57	199
-56	200
-55	201
-54	202
-53	203
-52	204
-51	205
-50	206
-49	207
-48	208
-47	209
-46	210
-45	211
-44	212
-43	213
-42	214
-41	215
-40	216
-39	217
-38	218
-37	219
-36	220
-35	221
-34	222
-33	223
-32	224
-31	225
-30	226
-29	227
-28	228
-27	229
-26	230
-25	231
-24	232
-23	233
-22	234
-21	235
-20	236
-19	237
-18	238
-17	239
-16	240
-15	241
-14	242
-13	243
-12	244
-11	245
-10	246
-9	247
-8	248
-7	249
-6	250
-5	251
-4	252
-3	253
-2	254
-1	255
0	256
1	257
2	258

Same result if I do (int)y before sending it to print.
I suspect compiler optimization bug. BTW, I read the definitions of print. There is no print(char c, int base). The closest is print (unsigned char, int).

@liudr, thanks for the test, at least it's not just me :wink: Sounds like a good theory, I wonder if all char or int8_t variables are taking up twice the amount of storage that people think.

I tried it on a few different releases. On Arduino 0022, it works as expected:

0
1
2
3
...
125
126
127
-128
-127
-126

In a variation of the code I ran, I printed the address of two char array elements and they differ by 1, as expected. But the result is the same as using two char variables. I don't have 0022 any more. You should report this as a bug.

I am reading a book about programming arduino, and there is a chart with all of the intiger types and the char is 8-bit.

Could it be that the compiler is using Unicode for the character set?

econjack:
Could it be that the compiler is using Unicode for the character set?

I don't think so. The char array element address I printed out told me it's 8-bit.

liudr:
Same result if I do (int)y before sending it to print.
I suspect compiler optimization bug. BTW, I read the definitions of print. There is no print(char c, int base). The closest is print (unsigned char, int).

Umm, technically, it is a program error and the program is undefined since the value overflows. The ISO standard defines the behavior for unsigned types (i.e. the expression is done in modulo arithmetic), but it is undefined if a signed type overflows.

The generated code is below:

000000c0 <setup>:
  c0:	cf 93       	push	r28
  c2:	df 93       	push	r29
  c4:	88 e9       	ldi	r24, 0x98	; 152
  c6:	91 e0       	ldi	r25, 0x01	; 1
  c8:	40 e0       	ldi	r20, 0x00	; 0
  ca:	52 ec       	ldi	r21, 0xC2	; 194
  cc:	61 e0       	ldi	r22, 0x01	; 1
  ce:	70 e0       	ldi	r23, 0x00	; 0
  d0:	0e 94 09 01 	call	0x212	; 0x212 <_ZN14HardwareSerial5beginEm>
  d4:	c0 e0       	ldi	r28, 0x00	; 0
  d6:	d0 e0       	ldi	r29, 0x00	; 0
  d8:	88 e9       	ldi	r24, 0x98	; 152
  da:	91 e0       	ldi	r25, 0x01	; 1
  dc:	be 01       	movw	r22, r28
  de:	4a e0       	ldi	r20, 0x0A	; 10
  e0:	50 e0       	ldi	r21, 0x00	; 0
  e2:	0e 94 a9 03 	call	0x752	; 0x752 <_ZN5Print5printEii>
  e6:	88 e9       	ldi	r24, 0x98	; 152
  e8:	91 e0       	ldi	r25, 0x01	; 1
  ea:	0e 94 c9 02 	call	0x592	; 0x592 <_ZN5Print7printlnEv>
  ee:	21 96       	adiw	r28, 0x01	; 1
  f0:	80 e8       	ldi	r24, 0x80	; 128
  f2:	c0 32       	cpi	r28, 0x20	; 32
  f4:	d8 07       	cpc	r29, r24
  f6:	81 f7       	brne	.-32     	; 0xd8 <setup+0x18>
  f8:	df 91       	pop	r29
  fa:	cf 91       	pop	r28
  fc:	08 95       	ret

As you can see, the value of x is kept in a register (R28/R29) and it does an adiw (add immediate word) at address 0xEE. This seems to me to be the bug. The fact that it is using a register means it is bypassing the normal truncation it would get if it actually put the data back into an 8-bit field.

Here you are, it's not a bug.

In the C programming language, signed integer overflow causes undefined behavior, while unsigned integer overflow causes the number to be reduced modulo a power of two, meaning that unsigned integers "wrap around" on overflow.

Do a search for "c++ signed overflow undefined".

Basically since it is undefined the compiler is entitled to generate whatever code it wants to.

(edit) Like MichaelMeissner said. :slight_smile:

The solution then is to simply replace:

++x;

With either this;

x = (byte)x + 1;

Or this:

x = (x+1) & 0xFF;

Both result in identical code due to compiler optimisation.

Thanks everyone! Wow, very interesting. I found that declaring the variable as volatile makes it behave as I expected. In case you're wondering, I was just trying to demonstrate for a friend what happens when signed and unsigned integers overflow. The behaviour I expected for signed integers was for it to overflow from the maximum value (127) to the minimum value (-128). Kind of funny as I was ad-libbing at the time and of course got totally confused. I continue to be amazed at the optimization this compiler will do.

While I certainly cannot argue that the observed behaviour does not fit the definition of "undefined" :wink: I never would have expected "We'll promote your variable from 8 bits to 16, and continue to increment it, but when the 16 bits overflows, then we'll just let it go from the maximum value to the minimum value." The joke is certainly on me, hahaha :smiley:

An earlier thread on the same subject

AWOL:
An earlier thread on the same subject

Thanks. I'd done some searching but didn't find that thread. I agree that from a purely theoretical viewpoint of the language, the behaviour is somewhat surprising. But given the specific implementation, and considering the compiler optimizations and hardware characteristics (instruction set), "undefined" in this case just turns out to be this really weird thing. I assumed that I knew what was going to happen, and I did not realize that I was treading into "undefined" territory. Couple lessons there for sure!

Nick,

Thanks for showing the assembly. It's clear they didn't do any truncating or else on registers. Does ATMEGA328 not have an "inc" or "dec" command for such simple and often-needed incrementing and decrementing by 1?

Could you also demonstrate how the volatile keyword makes different assembled code? That would be great!

Starting at address 40, ldd loads a byte from SRAM (forced there as a result of volatile), subi adds one (by subtracting -1, welcome to RISC!) and std puts the result back into SRAM.

00000000 <setup>:
   0:   0f 93           push    r16
   2:   1f 93           push    r17
   4:   df 93           push    r29
   6:   cf 93           push    r28
   8:   0f 92           push    r0
   a:   cd b7           in      r28, 0x3d       ; 61
   c:   de b7           in      r29, 0x3e       ; 62
   e:   80 e0           ldi     r24, 0x00       ; 0
  10:   90 e0           ldi     r25, 0x00       ; 0
  12:   40 e0           ldi     r20, 0x00       ; 0
  14:   52 ec           ldi     r21, 0xC2       ; 194
  16:   61 e0           ldi     r22, 0x01       ; 1
  18:   70 e0           ldi     r23, 0x00       ; 0
  1a:   0e 94 00 00     call    0       ; 0x0 <setup>
  1e:   19 82           std     Y+1, r1 ; 0x01
  20:   00 e0           ldi     r16, 0x00       ; 0
  22:   10 e0           ldi     r17, 0x00       ; 0
  24:   69 81           ldd     r22, Y+1        ; 0x01
  26:   77 27           eor     r23, r23
  28:   67 fd           sbrc    r22, 7
  2a:   70 95           com     r23
  2c:   80 e0           ldi     r24, 0x00       ; 0
  2e:   90 e0           ldi     r25, 0x00       ; 0
  30:   4a e0           ldi     r20, 0x0A       ; 10
  32:   50 e0           ldi     r21, 0x00       ; 0
  34:   0e 94 00 00     call    0       ; 0x0 <setup>
  38:   80 e0           ldi     r24, 0x00       ; 0
  3a:   90 e0           ldi     r25, 0x00       ; 0
  3c:   0e 94 00 00     call    0       ; 0x0 <setup>
  40:   89 81           ldd     r24, Y+1        ; 0x01
  42:   8f 5f           subi    r24, 0xFF       ; 255
  44:   89 83           std     Y+1, r24        ; 0x01
  46:   0f 5f           subi    r16, 0xFF       ; 255
  48:   1f 4f           sbci    r17, 0xFF       ; 255
  4a:   81 e0           ldi     r24, 0x01       ; 1
  4c:   0c 32           cpi     r16, 0x2C       ; 44
  4e:   18 07           cpc     r17, r24
  50:   01 f4           brne    .+0             ; 0x52 <setup+0x52>
  52:   0f 90           pop     r0
  54:   cf 91           pop     r28
  56:   df 91           pop     r29
  58:   1f 91           pop     r17
  5a:   0f 91           pop     r16
  5c:   08 95           ret
void setup(void)
{
    Serial.begin(115200);
    volatile char x = 0;
    
    for (long i=0; i<300; i++) {
        Serial.print(x, DEC);
        Serial.println();
        ++x;
    }
}

void loop(void)
{
}

That is illuminating! Thanks Jack. I never learned assembly for any RISC system, just x386 assembly. When I used assembly, I do tend to keep things in registers if I can. I read disassembled Turbo C code back in 90's. It was moving between memory and register so much that I couldn't stop laughing :wink:

liudr:
That is illuminating! Thanks Jack. I never learned assembly for any RISC system, just x386 assembly. When I used assembly, I do tend to keep things in registers if I can. I read disassembled Turbo C code back in 90's. It was moving between memory and register so much that I couldn't stop laughing :wink:

Yeah same here, I've done a fair amount of assembler in the past on various CISC machines, but never on a RISC machine. So I'm just feeling my way along the walls in the dark here :smiley:

My signature used to be rep movsd; //do it
For anyone that programmed assembly on 386 before, in real mode, this improves data transfer rate by 100% via 32-bit operations. Works great if you were coding a 320*200 256 color mode game and try to copy your buffer onto the video card. When I was playing with the 32-bit stuff, I had no access to 32-bit assemblers, just 16-bit with real mode debuggers. So once the CPU enters 32-bit "protected mode", I was running blind. Can't count how many times I had to restart my 486, like every minute.

Pfft, you were spoiled with 386 & VGA with all its fancy 32 bits & 256 colors :smiley: