Go Down

Topic: Printing and understanding international chars (Read 3 times) previous topic - next topic

VilluV

#10
Nov 29, 2010, 09:47 pm Last Edit: Nov 29, 2010, 10:01 pm by villuv Reason: 1
Open the sketch source file with some HEX editor and check how the char is stored there. It must be the same code that you are using in your program. If it is not, the file is still not in the correct encoding. If there is two or more bytes used for that char, it is still in UTF-8.

If it is correct, then the next step I'd take is peek into the compiled .o file with hex editor (or dissassembler, like: http://www.arduino.cc/cgi-bin/yabb2/YaBB.pl?num=1290209328/16#16 ) and see how that char is stored there.

PS:
How to change default encoding in Arduino IDE, I don't (yet) know, maybe someone else can help with that. If it is not possible, create some .h file with your string constants with some other editor and include this file in your sketch, but don't open/edit it with IDE.

(oh the joys of i18n...  ;) )

EDIT: Maybe i'm reading your source code wrong, but I think that the character your're trying to display is not in the right place in the character map array. It must be at the place that it is in the iso character map (- 33), but right now it seems to be at the place of 'c'.
If you want just to test if the characters are stored with right codes, and don't want to create the full array yet, use special condition in the 'if' clause which tests just this character code and gives the correct array index for that. If that works, you can go to the full-blown character set and remove the condition.

(your character has code 0xC3, so that has to end up in 195 - 33 = 162 nd place in the char map array)

Abfahrt

Quote
Open the sketch source file with some HEX editor and check how the char is stored there. It must be the same code that you are using in your program. If it is not, the file is still not in the correct encoding. If there is two or more bytes used for that char, it is still in UTF-8.


You are right. It is stored as CE 93 (two bytes). It should be C1.

Quote
How to change default encoding in Arduino IDE, I don't (yet) know, maybe someone else can help with that. If it is not possible, create some .h file with your string constants with some other editor and include this file in your sketch, but don't open/edit it with IDE.


Can you give a simple example? What do you mean with string constants?

Thanks for your help!

VilluV

#12
Nov 29, 2010, 10:14 pm Last Edit: Nov 29, 2010, 10:20 pm by villuv Reason: 1
What I meant is basically this:
create a file called, for example, messages.h
Code: [Select]

#define SOME_MSG "blah blah"
#define OTHER_MSG "oh yea"

etc.
Edit this with some editor that can save files with iso encoding.

And now in your sketch, use:
Code: [Select]

#include "messages.h"

...

do_something_with_string(SOME_MSG);



Added bonus: If this works, you can have separate header files for different languages and include the one you need for translated versions of your software without touching other files!

But what I'm not sure about is how the c++ preprocessor handles files with different encoding, maybe it doesn't work as I expect....

EDIT: You may have to sneak in following parameter to the compiler
Code: [Select]

-finput-charset="iso-8859-7"

But I don't yet know how to do that with Arduino IDE...

Abfahrt

#13
Nov 30, 2010, 12:14 pm Last Edit: Nov 30, 2010, 12:19 pm by giannoug Reason: 1
I think I made some progress. I will try not to mess with file encodings and special compiler parameters.

When you cast a char to an integer, the number represents the character's position on the ASCII table. For example, the sketch below will print:

Quote

C
67


Code: [Select]

void setup() {
 Serial.begin(9600);
 
 char latin = 'C';
 
 Serial.println(latin);
 Serial.println((int)latin);
}

void loop() { }


If I change the char to '[ch915]', it prints:

Quote

"
-108


Why -108? If I make the char unsigned it prints 148. The sketch I want to print Greek chars should work with those numbers (instead of the ISO-8859-7). My questions now are, why does it print a negative number? Is it because its a 8bit unsinged and I declared it as a signed? In what encoding are the chars stored?

Thanks for your answers, you really helped :)

VilluV

Yes, it is negative, because you're using signed char data type where the most significant bit is used as sign bit (-127..127). Use unsigned char for range 0..255.

I think that the IDE saves files by default in UTF-8 encoding, but it also might use system's default encoding. In Linux you can check what's the default from LANG or LC_ALL env. variables, I don't know how to check that in Windows or OS X.

Go Up