Save a char as int

Hello!

I'm quite new to arduino and C. I'm reading the Jack Purdum Book on programming C for arduino and I came across a question. I've searched the fora and found some answers, but I want to know the why behind it. That's not clearly stated anywhere I think.

The question is:

  char c;
  int num;

  if(Serial.available() > 0){

    c = Serial.read();
    num = (int) c;
    Serial.println(num);
    Serial.println(c);

    }

When typing 5 for example in the serial monitor, num prints 53 and c prints 5. When

num = (int) (c - '0')

and you print num, it displays 5.

What is it that the - '0' does, that makes it a 5 instead of 53.

My little research found that 53 is the ASIC dec for 5. But I don't understand what the - '0' does then.

Thanks!

The character for '0' is ascii 48, so 53-48=?

wildbill:
The character for '0' is ascii 48, so 53-48=?

So what it is actually doing is printing an ascii 5 (ENQ), and not the character '5'?

Why does it do that? I always thought the arduino "translates" the ascii in a character. So it receives a character '5', translates in ascii '53', does what it needs to do, translates it back in character, and print that. Why does it prints ascii?

Sorry if that's a dumb question, but i'm really trying to understand what is happening.

I think the thing confusing the issue is Serial.print. It treats disparate types differently. If you pass it an int, it'll print a number, it doesn't know or care that that int took its value from a char, it's got 53 in it now so I'll print '53'.

When printing a char, it prints the ascii code represented by the data the char contains. In this case 53, but I know it's a char so I will print '5'.

Okay, that makes a whole lot more of sense now.

If I understand correctly, when printing a character, printing something like "53" is impossible, cause a character can only have 1 character, so it "translate" the 53 (to 5 in this case).

And when using an int, 53 printing is certainly possible, and it prints just that.

davidsmol:
Okay, that makes a whole lot more of sense now.

If I understand correctly, when printing a character, printing something like "53" is impossible, cause a character can only have 1 character, so it "translate" the 53 (to 5 in this case).

Yes, as long as you hand wave past the existence of Unicode, which is probably reasonable here.

And when using an int, 53 printing is certainly possible, and it prints just that.

Exactly.

in more details:

The print() function has one name but in reality you have multiple different print functions, each with its own ‘signature’ (in short the parameters it expects).

This means that when you can call print() with a given type, the compiler picks the associated function and each function can implement the action the way it wants.

This is defined in Print.cpp and here are all the versions of the print() function that are advertised

    size_t print(const __FlashStringHelper *);
    size_t print(const String &);
    size_t print(const char[]);
    size_t print(char);
    size_t print(unsigned char, int = DEC);
    size_t print(int, int = DEC);
    size_t print(unsigned int, int = DEC);
    size_t print(long, int = DEC);
    size_t print(unsigned long, int = DEC);
    size_t print(double, int = 2);
    size_t print(const Printable&);

if you pass a variable to print() and the compiler recognizes one of those types directly, then it calls that function.

that’s what happened with your char type and int type. The programmer of the Print class decided that if the type is char, it’s likely that the intent of the coder was to print characters, so ASCII symbol associated to the value whereas if the type is int the intent of the coder was likely to print a number and thus a conversion of the value to its ASCII decimal (or HEX, BIN) representation is performed by that function.

if you have sharp eyes, you’ll notice there is no float or bool type there but if you call print with such a typed variable, the compiler won’t bark at you.

That’s because the compiler is smart enough to change the type of your variable on the fly to a larger ‘compatible’ format. To do this the compiler will apply well documented recipes (implicit conversion sequence) (see conversions) to transform your parameter into something else that could work.

If the compiler cannot find an implicit conversion sequence for an argument that matches a signature, then the compiler stops and complains. That’s the case for example with the unsigned long long type that is not implemented on Arduino. The compiler would complain that

[color=orange]call of overloaded 'println(long long unsigned int&)' is ambiguous[/color]

here is a small code to see this in action

void println_ULL(unsigned long long ull)
{
  Serial.print("0x");
  for (int8_t i = 7; i >= 0; --i) {
    uint8_t b = (ull >> (8 * i));
    if (b < 0x10) Serial.write('0');
    Serial.print(b, HEX);
  }
  Serial.println();
}

void setup() {
  Serial.begin(115200);
  Serial.println();
  bool bT = true;
  Serial.println(bT); // prints 1

  float f = 1.2345;
  Serial.println(f); // prints 1.23
  Serial.println(f, 6); // prints 1.234500

  unsigned long long ull = 0xDEADBEEF0BADF003;
  ull = ull + 10; // maths works
  // Serial.println(ull); // error 'call of overloaded 'println(long long unsigned int&)' is ambiguous'
  println_ULL(ull); // but you can still do you own stuff (0xDEADBEEF0BADF00D)
}

void loop() {}

if you uncomment the line// Serial.println(ull);the compiler will bark at you :slight_smile:

You just answered my last question, I was wondering how it decides what to do with which type of input. I didn't think of looking at the Print.cpp.

Thanks!

glad if that helped !

1. What is a character?
It is an image on the screen (I am talking about text screen and not graphics screen) of a symbol (a - z, A - Z, 0 - 9, punctuation marks like ! and others, special characters like $ and others) of the alphabet set of the English Language.

2. How can we save the image of the symbol 5 in computer memory?
Because a memory location holds 8-bit data (possible combination of 0 and 1), it has been decided that this bit/binary pattern: 00110101 would be stored for the image of 5. Why is this particular pattern is chosen for 5--this is another story? When this bit pattern is fed to an special electronics module, the image 5 appears on the text screen; when this bit pattern 00110110 is fed into that electronics module, the image of 6 appears on the text screen. These patterns are called ASCII Codes (American Standard Code for Information Interchange) for 5 and 6 respectively. The following Table of Fig-1 contains the ASCII codes for all possible printable characters.


Figure-1:

3. What 'data type' is there to declare a variable (ch) that will hold the ASCII code of a character (say 5).
The data type is char. For example:

char ch = 0b00110101;

or

char ch = 0x35; //hex is compact form of binary and is used for convenience. Memory location always hold data in bit form.

or

char ch = 53;

or

char ch = '5';

4. What code/command should we use to direct the value of ch of Step-3 into the electronics module of Step-2 so that the image 5 appears on the text screen.

Serial.print(ch);

5. What will appear on the text screen if we choose data type byte for the example of Step-3 and execute the Serial.print() code?

byte ch = 0x35;  
Serial.print(ch);

Now we need to discuss a little bit on the functional mechanism of print() method.
(1) The print() method checks the data type; if it is char, the print() method directs 0x35 to the electronics module of Step-2; as a result,the image 5 appears on the text screen.

(2) The print() method checks the data type; if it is byte, the print() method is transformed to the following form (this is my conceptual understanding):

Serial.print(ch, DEC);  //DEC is for DECimal base = 10 base

The numerical value of ch (0x35) is converted into equivalent decimal value of 53 (3x161 + 5x160 = 48 + 5 = 53) and then the images of 5 and 3 are made to appear on the text screen by executing the following codes:

Serial.write(0x35);   //to see 5 on screen, ASCII code of 5 (0x35) is to put to electronics module
Serial.write(0x33);    //to see 5 on screen, ASCII code of 3 (0x33) is to put to electronics module

The above two codes could be written using print() methods in the following way:

Serial.print((char)0x35);   //appears 5 ; known as casting to character
Serial.print((char)0x33);            //appears 3

6. What are the differences among the following four declarations of Step-3?

char ch = 0b00000101;
char ch = 0x35;
char ch = 53;
char ch = '5';    //putting opening/closing single quote across a single character

There are no differences; they are the same. The declaration char ch = '5'; is friendly than these declarations: char ch = 0b00000101;, char ch = 0x35;, and char ch = 53;.

7. What is the meaning of the following declaration? (To what value the following expression would be evaluated?)

int num = (int) (ch - '0');    //assume char ch = 0x35
==> int num = (int) (0x35 -0x30);   //'0' is evaluated to 0x30 as per Step-6
==> int num = (int)(0x05);

ch-'0' is evaluated to 0x05 = 00000101, and it is 8-bit. The destination variable num has been declared as a 16-bit variable; so, we need to append 8 zeros to the left of 0x05 to make it 16-bit (00000000 00000101). This is dine by a process called casting, and it is done by putting the keyword int surrounded by pair of parentheses before 0x05.

@golam careful in how you phrase this.

The print() method checks the data type

No.... The print method does not do that. It just gets the data with the type as indicated in its signature.

The compiler does select the right one (the one with the right signature) possibly after an implicit conversion sequence.

Read above

Side note which might be useful:

Also int num = (int) (ch - '0');    //assume char ch = 0x35is not fully equivalent to int num = (int) (0x35 -0x30); in terms of how maths are done but the casting Is not necessary as it is implicit to fit the destination

The reason why the codes are not equivalent (if you loose sight of the types) is that if you dig further in the norm, you’ll read that The type of the integer literal is the first type in which the value can fit, from the list of types which depends on which numeric base and which integer-suffix was used. (and starting with int not char)

So if you were to write 0x35 instead of the char variable then it would not handled as a byte but already as an int (so 2 or 4 bytes depending on your arduino) and thus ‘0’ would get promoted at compile time into an int as well to perform the subtraction. The result would thus be an int already.

I wrote that as an example of a quick way to convert an ASCII character digit into a numeric value. JML is correct that the cast is not necessary, but I'm not a big fan of "silent casts". Usually pouring the contents of a 1-byte data bucket (i.e., a char) into a 2-byte bucket (an int) is not a problem. However, reverse the process and pour the data content of a 2-byte bucket into a 1-byte bucket has the potential to slop 1 byte of data on the floor. Most modern compilers do the cast for you. That's not always been the case.

Old C compilers could get fussy on silent casts, so I just always use the cast as a form of documentation. It also makes it clear to the reader that you did intend to promote the data item.

Sure a cast does not hurt - it clarifies intent and the compiler would just do the same thing anyway

But too much of a good intent can make code clumsy - so leveraging something that is documented and part of the spec does not hurt me as long as you know what yo do.

I see this being done very regularly here when many do char c = Serial.read(); for example. read() returns an int not a byte so this is the typical example you described of trying to fit a larger type in a smaller one.

Some strongly typed languages would not allow such “freedom”

@J-M-L: I do see that (Serial.read()) all the time and I'm guilty of it, too. Back in the 1980's, my company produced an MSDOS C compiler that had a "Picky Flag", which was like a built-in Lint. We never saw any published code that passed picky-level 9. It was so fussy, that we rarely ran it above Level 4. Level 9 would not even allow you to use printf() without using its return value. I would call that "clumsy", but revealing a silent cast is "pure documentation" as the compiler will do it anyway. However, when you're teaching a bunch of beginners, I'd rather err on the "documentation" side.

yeah - fair points. I think we agree. it's all about the right dose of a good thing and depends on your specifics.

J-M-L:
I see this being done very regularly here when many do

char c = Serial.read();

for example. read() returns an int not a byte so this is the typical example you described of trying to fit a larger type in a smaller one.

From the above fact, could it be inferred that the Serial FIFO Buffer is a word (16-bit) organized space? The execution of 'char c = Serial.read();' instruction brings out from Buffer a 'data item' of 16-bit size of which the lower byte is the character itself and is saved in variable c; the upper byte is the 'status word' which is discarded. A 64-byte long Serial Buffer can hold only 32 characters.

From the above fact, could it be inferred that the Serial FIFO Buffer is a word (16-bit) organized space?

No.

Why would you want to infer, when you can know?

GolamMostafa:
From the above fact, could it be inferred that the Serial FIFO Buffer is a word (16-bit) organized space?

no just that the programmer chose to return 16 bits so that -1 (0xFFFF on UNO) would mean 'there was nothing to read' and otherwise return the byte read in the LSB.