signed-unsigned char help

googs · September 11, 2013, 8:13am

Reading about C, i come to chars, the author uses

 char c = 'A';

Then when i go through the Arduino ASCII example, the code also does the same.
They both reference signed chars, is this just because generally we dont use the other characters ?
And if so, if i wanted to use the other characters from the set, i would need to use unsigned char ?
I see in the output that the binary numbers for the standard characters have longer binary values, when using unsigned, which i assume would create larger code if i was to use unsigned for standard text ?

Are my assumptions correct ?

system · September 11, 2013, 8:39am

The 128 ASCII characters cover the positive range of the signed char type, i.e 0..127.

googs · September 11, 2013, 9:18am

I can see that, but there are 255 ASCII characters. I notice if output from -127 to 128 i get the other half of the table as decimal 65408 through 65535 first, ie. the larger binary numbers.
That answer has no meaning to my question sorry.

(Sorry the end of my question shouldve said signed not unsigned.)
See confusion!, i meant if i wanted to use the non standard chars, then i should changed to unsigned, to reduce code size ?

And if i go unsigned from 0 to 255 i get the non standard after the std with smaller binary numbers.

system · September 11, 2013, 9:21am

I can see that, but there are 255 ASCII characters

That statement has no meaning in reality, sorry.

googs · September 11, 2013, 9:23am

Well if i print all 255 characters i see 255 different characters ? :s

Checked Wicki, now im really lost, and have no intention of becoming a scientist.

I dont understand, yes wicki references 7-bit, but we have 8-bits ?
And all i want to know is if i want to use those other 128 characters, that apparently arent part of ASCII, should i then use unsigned ?

system · September 11, 2013, 9:54am

The ASCII definition is for 7 bit characters, not 8 bit characters. That dates back to when everyone worked on a serial terminal running at 300 baud, 7 bits, with parity (7-E-1). As communications mediums became more reliable parity was dropped and the 8th bit became available for defining extra characters.

There is no one single standard which states what those "extra" characters should be. With the release of DOS 3.3 IBM introduced the concept of the "code page" where the user could select the character set for the upper portion of the 8-bit range. Different code pages had different combinations of accented characters to allow better localization in different countries.

Both signed and unsigned char values cover the same 8-bits - the only difference comes when you are performing maths or displaying numeric values. When you're dealing with characters it actually makes no difference which you use, as they are simply a human representation of the underlying 8-bit value.

googs · September 11, 2013, 10:03am

Thank you, i do remember mucking with the extended sets back in them days!
But in my extended set i see here, when using signed, the little TM symbol, (dec -103) has a binary number of:
11111111111111111111111110011001

Both signed and unsigned char values cover the same 8-bits

That looks like 32-bit to me, im missing something i dont know what !

system · September 11, 2013, 10:10am

What you're missing is something known as "sign extension".

When you print a number using Serial.print() that value is cast to a long value (32-bits) before being printed. If that value happens to be negative, then the bits 31-8 are filled with 1 to "extend" the sign bit all the way to the left, keeping the numeric value the same (also see "two's complement" for how the sign bit works).

googs · September 11, 2013, 10:16am

Well damn, i must remember not to play around with the examples and just run them as they are, thats whats confused me.
The author gets into casting and twos compliment further on, im getting ahead of myself. I thought my question was pretty straight forward, but i guess i will get the answer further into the book.
Thanks for hangin in there majenko

system · September 11, 2013, 10:47am

Maybe this will help. The Arduino Uno has no concept of characters. It only knows memory locations and the values located there. So when you compile a program the compiler takes the characters you type and generates machine code that that specific microprocessor can understand. What you type in as a program can be in English, Chinese, Japanese or whatever. The processor doesn't ever see that as it gets converted before it gets sent.

So lets say you store a character in your Arduino like "A" it's ACSII representation is dec: 65, hex: 41, oct: 101, bin: 1000001 That's what stored and as long as you send it and retrieve it with the same codepage that's what you'll get back. If you send it in ASCII and retrieve it with something else you may get "A" or ">" or something I can't type.

googs · September 11, 2013, 10:58am

Thanks Herbie for the thought, but no its the Extended set im interested in, the ASCII set is the same wether signed or unsigned, its the Extended set that changes relatively. And basically, i wondered if to store signed Extended chars, do they need more memory (32 bits), than unsigned, which are 8 bits. Therefore resulting in larger code.
Like i said i think im getting ahead of myself. Ive wasted half a day pondering this so im gonna get back into the book.
Thanks

system · September 11, 2013, 11:11am

but no its the Extended set im interested in

For what purpose?

googs · September 11, 2013, 11:34am

Displaying to LCD later when i get that far.

system · September 11, 2013, 11:37am

Displaying to LCD later when i get that far.

Have you verified that your LCD knows how to display the non-standard ASCII characters? I would NOT expect it to be able to.

googs · September 11, 2013, 11:45am

Oh ok, well thanks for that Paul, i will investigate it tomorrow, it has some fancy lookin shield on it, but if that be the case, thanks very much indeed for pointing that out. It will save me some valuable time i imagine.

econjack · September 11, 2013, 11:49am

googs: I think you'll find PaulS is right: your LCD will not recognize an extended character set. You keep talking about a 16 bit character set, which is the Unicode character set. While there is probably some LCD device out there that supports the Unicode character set, I don't know what it is. You might try googling "Unicode" and Arduino and see what pops up.

nickgammon · September 11, 2013, 11:58am

It might display 256 characters, that's certainly possible.

On this page I have a graphical LCD which displays 256 characters, and the representation for all of them.

This is a pretty big topic, I would Google "code page" or something like that. In essence you are correct that to reasonably represent the numbers 0 to 255 you need to use unsigned characters.

MichaelMeissner · September 11, 2013, 12:38pm

When C was designed back in AT&T only signed chars were supported, since the original hardware (PDP-8 and then PDP-11) only supported signed 8-bit types. In fact, unsigned came in much later. At the time, C was only used with the USA 7-bit ASCII, so for printable characters it didn't matter whether char was signed or unsigned. As C got ported to different platforms, some (such as IBM mainframes, that used EBCDIC in those days) as well as platforms that preferred unsigned 8 bit quantities (Data General, powerpc, and IBM mainframes even if ASCII were used), the language go mutated in that char could hold either -128..127 or 0..255 at the vendor discretion. During the original ANSI C standards proceedings this became an issue, and we added the 'signed' keyword to complement 'unsigned' and both became modifiers. During the proceedings we called char without modifier plain char, but sometimes don't char.

At one point, I recall Dennis Ritchie saying that if he could go back in time, he felt one change to C he would have made would have been to make 'char' unsigned from the start, since the world is more than just the US.

system · September 11, 2013, 12:56pm

ince the original hardware (PDP-8 and then PDP-11) only supported signed 8-bit types.

My recollection is that the PDP-8 supported signed 12 bit types (the width of the accumulator and the memory width), but it is a very long time ago.

MichaelMeissner · September 11, 2013, 1:11pm

AWOL:

ince the original hardware (PDP-8 and then PDP-11) only supported signed 8-bit types.

My recollection is that the PDP-8 supported signed 12 bit types (the width of the accumulator and the memory width), but it is a very long time ago.

Whoops, I was mis-remembering the original C machine. It was PDP-7, not PDP-8 which as you say is rather different. Sorry about that.

Speaking of the 8, when I was at Data General in the 1980's, some engineers started getting home computers, and felt the need to call them 'real' computers comparing them to the mini-computers we worked on (and of course us mini computer guys felt compelled to call our systems real systems as well compared to the mainframe). The guy doing the common code generator for the Data General compilers (including the C compiler front end I wrote) had an interesting definition for real computers. He said to somebody, ok put your real home computer on top of his (PDP-8), and it would work fine. Then he would put his computer on top of yours and the Altair, TRS-80, original IBM PC, etc. would not run afterwards, as it would be crushed to bits. :roll_eyes:

Topic		Replies	Views
Using Unsigned Chars in Arduino Programming Questions	9	4358	May 5, 2021
typedef unsigned char uint8; Programming Questions	9	7966	May 5, 2021
"Signed byte" data type? Programming Questions	13	26417	May 5, 2021
Arduino 1.0 beta print(uint8_t) vs print(int8_t) ?? Suggestions for the Arduino Project	4	5773	May 6, 2021
char datatype actually unsigned? Programming Questions	11	3000	May 5, 2021

signed-unsigned char help

Related Topics