Go Down

Topic: The char variable (Read 2012 times) previous topic - next topic

sparkylabs

I have just been reading about the character variable. Now reading between the lines this was a roundabout way of saying that it is a one byte variable that contains the ASCII codes of what ever character is put into it. So how can this variable have negative and positive numbers? This does not make sense the ASCII code goes from 0 to 255 why do you even need negative numbers in this bearable type?
My shop: www.sparkylabs.co.uk/shop

marco_c

The char is equivalent to an unsigned byte or uint8_t. They take up the same amount of space (8 bits). Whether something is negative or positive depends on the interpretation of the bits, so 0xff can be 255 or -1. Look up 'two's complement' to find out how this works.
Arduino libraries http://arduinocode.codeplex.com
Parola for Arduino http://parola.codeplex.com

Arrch


I have just been reading about the character variable. Now reading between the lines this was a roundabout way of saying that it is a one byte variable that contains the ASCII codes of what ever character is put into it. So how can this variable have negative and positive numbers? This does not make sense the ASCII code goes from 0 to 255 why do you even need negative numbers in this bearable type?

Just depends on the context. Unsigned and signed variables are just interpreted differently, but they are stored the same in memory.

Nick Gammon


The char is equivalent to an unsigned byte or uint8_t.


It's actually a signed byte. It is "byte" that is unsigned.

Quote
This does not make sense the ASCII code goes from 0 to 255 why do you even need negative numbers in this bearable type?


It's a bit of a historical throwback, I think. It doesn't make sense to have "negative" letters, so really the byte (unsigned char) would have been a better choice. But as the others said, when stuffing ASCII codes into a byte (or char) you don't care about the sign because you won't be doing arithmetic on it.
Please post technical questions on the forum, not by personal message. Thanks!

More info:
http://www.gammon.com.au/electronics

marco_c

#4
May 23, 2012, 03:05 am Last Edit: May 23, 2012, 03:07 am by marco_c Reason: 1
Oops!

Actually a related question from me is how Arduino would support double byte or Unicode characters now that the interface supports multiple languages, or is it just not an issue in this environment?
Arduino libraries http://arduinocode.codeplex.com
Parola for Arduino http://parola.codeplex.com

Nick Gammon

I suppose you could put Unicode into an int or long, not sure why you would want to. You would need a suitable display device for there to be much point. Probably I would use UTF-8 if I had to support Unicode - bearing in mind how short we are of RAM.
Please post technical questions on the forum, not by personal message. Thanks!

More info:
http://www.gammon.com.au/electronics

marco_c

My thinking was around serial characters to the console display in the IDE.

LCD displays would have special characters anyway for non-roman characters. Haven't used a screen 'display' as such, so don't know what drives those.
Arduino libraries http://arduinocode.codeplex.com
Parola for Arduino http://parola.codeplex.com

Jack Christensen

One reason a negative character is desirable is to indicate some condition, error, etc. For example, see the Serial.read() function. When called, it reads the next character available. But what if there are no characters to read? Then it returns -1.

Actually char variables can be declared either as signed or unsigned. In reality they are just a short int. Whether signed or unsigned is the default, and how long a "short" is, is installation dependent.

Code: [Select]
//declare some chars
unsigned char a;
signed char b;
MCP79411/12 RTC ... "One Million Ohms" ATtiny kit ... available at http://www.tindie.com/stores/JChristensen/

MarkT


I suppose you could put Unicode into an int or long, not sure why you would want to. You would need a suitable display device for there to be much point. Probably I would use UTF-8 if I had to support Unicode - bearing in mind how short we are of RAM.


If you were using Kanji that last statement wouldn't make sense - UTF-8 is less efficient than UTF-16 or other 16 bit encodings.
[ I won't respond to messages, use the forum please ]

Nick Gammon

Ah well, horses for courses. I am not in fact using Kanji.
Please post technical questions on the forum, not by personal message. Thanks!

More info:
http://www.gammon.com.au/electronics

AWOL

#10
May 23, 2012, 09:02 am Last Edit: May 23, 2012, 10:31 am by AWOL Reason: 1
Quote
Actually char variables can be declared either as signed or unsigned. In reality they are just a short int

First part true - in fact, you can tell the compiler whether you want an unqualified "char" to be signed or unsigned.
Second part, false - a "short int" is not the same as "char" on the Arduino.

Quote
One reason a negative character is desirable is to indicate some condition, error, etc. For example, see the Serial.read() function.
The return type of "Serial.read" is "int", which is how it deals with returning -1.
Any characters received with the sign bit set (0x80 to 0xFF) are not sign-extended, so are returned as "int"s in the range 0x0080 to 0x00FF.
"Pete, it's a fool looks for logic in the chambers of the human heart." Ulysses Everett McGill.
Do not send technical questions via personal messaging - they will be ignored.

michael_x

Quote
tell the compiler whether you want an unqualified "char" to be signed or unsigned

My only try ( defining a "signed byte variable" ) failed,
so I imagined there is no "signed" qualifier in Arduino ( or avr-gcc ), and char, int and long are signed by default, whether it makes sense for a character in a char type variable or not.

I understand Serial Monitor is a java application ( where chars are 16 bit unicode by default ), so one would have to check carefully (in both directions and eventually consider OS dependencies) how it behaves with non-ASCII characters.
There's no such thing as code pages on Arduino (defining the meaning of a character).
It is even rather common convenience to rely on the assumption that a char in the range 1...127 represents an ASCII character.

AWOL

Quote
My only try ( defining a "signed byte variable" ) failed,

Not surprising, really, because
Code: [Select]
typedef uint8_t byte;

However, nothing at all wrong with
Code: [Select]
signed char variable;
"Pete, it's a fool looks for logic in the chambers of the human heart." Ulysses Everett McGill.
Do not send technical questions via personal messaging - they will be ignored.

Nick Gammon


so I imagined there is no "signed" qualifier in Arduino ( or avr-gcc )


Try again:

Code: [Select]
void setup ()
  {
   signed int foo;
   signed long bar;
   signed char fubar;
  }
void loop () {}
 


Gives:

Code: [Select]
Binary sketch size: 466 bytes (of a 32256 byte maximum)
Please post technical questions on the forum, not by personal message. Thanks!

More info:
http://www.gammon.com.au/electronics

Jack Christensen


Second part, false - a "short int" is not the same as "char" on the Arduino.


Thanks for that, not sure why I thought that. I was actually only referring to number of bits, but was still wrong, short ints and ints are both 16 bits. This is of course consistent with the standard which I believe says that an int only needs to have at least as many bits as a short int.

Quote
The return type of "Serial.read" is "int", which is how it deals with returning -1.
Any characters received with the sign bit set (0x80 to 0xFF) are not sign-extended, so are returned as "int"s in the range 0x0080 to 0x00FF.


That is as I understood. I wasn't thinking specifically Arduino though, more back in the day when ASCII only had 128 characters and Pluto was still a planet ;-)
MCP79411/12 RTC ... "One Million Ohms" ATtiny kit ... available at http://www.tindie.com/stores/JChristensen/

Go Up