Ä Ö Ü Problem

Hallo,

i've got a problem with the german "Umlaute": Ä, Ö, Ü and the strange "ß"

it is about a project to transform a string (called "SATZ") (whole sentence) into phonems (basic parts of spoken language), and then do something like a TextToSpeech (rather TextToSpelling) thing by playing files on an MP3-module. It works so far, only the special Letters ÄÖÜß don't work. probably it is a problem of string-conversion or how strings are coded or something. As i'm not a native C/C++ speaker (more an artist, still learning by doing) i ask for your help now.

here is the part of the code convertig/switching the string:

#include <SoftwareSerial.h>

#define rxPin 15 // actually, I am not using a rxPin
#define txPin 14
SoftwareSerial MP3Serial =  SoftwareSerial(rxPin, txPin); // set up a new serial port


//------------------ VARIABELN FÜR GRAPHEM-PHONEM ---------
String URSATZ = "Test a sentence with Ä, Ö, Ü, ß: Möhrensuppe mit Äpfeln und süßen Blümchen.";
String SATZ = URSATZ;

String PHON = " ";
int MINIMI = 0;
int MP3 = 0;
int LENGTH = 400;

void setup() {
  SATZ = SATZ.toLowerCase();
  
  pinMode(txPin, OUTPUT);
  MP3Serial.begin(9600);
  
  Serial.begin(9600); 
  delay(1000);
}

void loop() {
  GraphToPhon(); // UMWANDLUNG
  PlayMP3(); // TONAUSGABE
  delay(LENGTH);
}

void GraphToPhon() {
  
  MP3 = 0;
  MINIMI = 1;
  PHON = "";
  
  String S3 = SATZ.substring(0,3);
  String S2 = SATZ.substring(0,2);
  String S1 = SATZ.substring(0,1);
  
if(S3.equals("sch") == true) {PHON = "SCH"; MINIMI = 3; MP3 = 37; LENGTH = 400;}
if(S3.equals("rrh") == true) {PHON = "R"; MINIMI = 3;  MP3 = 18; LENGTH = 330;}
if(S3.equals("cch") == true) {PHON = "K"; MINIMI = 3;  MP3 = 11; LENGTH = 330;}
if(S3.equals("ail") == true) {PHON = "AI"; MINIMI = 3;  MP3 = 30; LENGTH = 400;}
if(S3.equals("ieh") == true) {PHON = "I"; MINIMI = 3;  MP3 = 9; LENGTH = 400;}
if(S3.equals("eau") == true) {PHON = "O"; MINIMI = 3;  MP3 = 15; LENGTH = 400;}
if(S3.equals("chs") == true || S3.equals("cks") == true ) {PHON = "X"; MINIMI = 3;  MP3 = 23; LENGTH = 40;}

if (MINIMI < 3) {
  if(S2.equals("tt") == true || S2.equals("th") == true || S2.equals("dt") == true) {PHON = "T"; MINIMI = 2; MP3 = 20; LENGTH = 330;}
  if(S2.equals("nn") == true) {PHON = "N"; MINIMI = 2; MP3 = 14; LENGTH = 335;}
  if(S2.equals("ss") == true || S2.equals("zz") == true) {PHON = "S"; MINIMI = 2; MP3 = 19; LENGTH = 400;}
  if(S2.equals("rr") == true || S2.equals("rh") == true) {PHON = "R"; MINIMI = 2; MP3 = 18; LENGTH = 330;}
  if(S2.equals("ll") == true) {PHON = "L"; MINIMI = 2; MP3 = 12; LENGTH = 330;}
  if(S2.equals("ff") == true || S2.equals("ph") == true) {PHON = "F"; MINIMI = 2; MP3 = 6; LENGTH = 380;}
  if(S2.equals("gg") == true || S2.equals("gh") == true) {PHON = "G"; MINIMI = 2; MP3 = 7; LENGTH = 330;}
  if(S2.equals("ck") == true || S2.equals("kk") == true) {PHON = "K"; MINIMI = 2; MP3 = 11; LENGTH = 330;}
  if(S2.equals("mm") == true) {PHON = "M"; MINIMI = 2; MP3 = 13; LENGTH = 330;}
  if(S2.equals("bb") == true) {PHON = "B"; MINIMI = 2; MP3 = 2; LENGTH = 330;}
  if(S2.equals("dd") == true) {PHON = "D"; MINIMI = 2; MP3 = 4; LENGTH = 330;}
  if(S2.equals("er") == true) {PHON = "ER"; MINIMI = 2; MP3 = 28; LENGTH = 390;}
  if(S2.equals("ts") == true || S2.equals("tz") == true || S2.equals("zz") == true) {PHON = "Z"; MINIMI = 2; MP3 = 24; LENGTH = 330;}
  if(S2.equals("aa") == true || S2.equals("ah") == true) {PHON = "A"; MINIMI = 2;  MP3 = 1; LENGTH = 350;} 
  if(S2.equals("pp") == true) {PHON = "P"; MINIMI = 2; MP3 = 16; LENGTH = 330;} 
  if(S2.equals("ei") == true) {PHON = "EI"; MINIMI = 2; MP3 = 29; LENGTH = 450;} 
  if(S2.equals("ai") == true) {PHON = "AI"; MINIMI = 2; MP3 = 30; LENGTH = 450;}
  if(S2.equals("ng") == true || S2.equals("nk") == true) {PHON = "NG"; MINIMI = 2; MP3 = 31; LENGTH = 350;}
  if(S2.equals("ie") == true || S2.equals("ih") == true) {PHON = "I"; MINIMI = 2; MP3 = 9; LENGTH = 350;}  
  if(S2.equals("ch") == true) {PHON = "CH"; MINIMI = 2; MP3 = 32; LENGTH = 400;} 
  if(S2.equals("eh") == true || S2.equals("ee") == true) {PHON = "E"; MINIMI = 2; MP3 = 5; LENGTH = 350;}
  if(S2.equals("äh") == true) {PHON = "AE"; MINIMI = 2; MP3 = 25; LENGTH = 450;}
  if(S2.equals("uh") == true || S2.equals("ou") == true) {PHON = "U"; MINIMI = 2; MP3 = 21; LENGTH = 450;}
  if(S2.equals("au") == true || S2.equals("ow") == true) {PHON = "AU"; MINIMI = 2; MP3 = 33; LENGTH = 390;}  
  if(S2.equals("oh") == true || S2.equals("oo") == true) {PHON = "O"; MINIMI = 2; MP3 = 16; LENGTH = 450;}
  if(S2.equals("üh") == true) {PHON = "UE"; MINIMI = 2; MP3 = 27; LENGTH = 450;}
  if(S2.equals("eu") == true || S2.equals("oi") == true || S2.equals("äu") == true || S2.equals("oy") == true) {PHON = "OI"; MINIMI = 2; MP3 = 34; LENGTH = 400;}
  if(S2.equals("ks") == true) {PHON = "X"; MINIMI = 2; MP3 = 23; LENGTH = 330;}
  if(S2.equals("öh") == true) {PHON = "OE"; MINIMI = 2; MP3 = 26; LENGTH = 400;}
  if(S2.equals("en") == true) {PHON = "EN"; MINIMI = 2; MP3 = 35; LENGTH = 380;}
  if(S2.equals("pf") == true) {PHON = "PF"; MINIMI = 2; MP3 = 36; LENGTH = 330;}
  if(S2.equals("qu") == true) {PHON = "Q"; MINIMI = 2; MP3 = 17; LENGTH = 410;}
}

if (MINIMI < 2) {

  if(S1.equals("a") == true) {PHON = "A"; MP3 = 1; LENGTH = 350;}
  if(S1.equals("b") == true) {PHON = "B"; MP3 = 2; LENGTH = 330;}
  if(S1.equals("c") == true) {PHON = "C"; MP3 = 3; LENGTH = 330;}
  if(S1.equals("d") == true) {PHON = "D"; MP3 = 4; LENGTH = 355;}
  if(S1.equals("e") == true) {PHON = "E"; MP3 = 5; LENGTH = 350;}
  if(S1.equals("f") == true || S1.equals("v") == true) {PHON = "F"; MP3 = 6; LENGTH = 370;}
  if(S1.equals("g") == true) {PHON = "G"; MP3 = 7; LENGTH = 340;}
  if(S1.equals("h") == true) {PHON = "H"; MP3 = 8; LENGTH = 335;}
  if(S1.equals("i") == true || S1.equals("y") == true) {PHON = "I"; MP3 = 9; LENGTH = 340;}
  if(S1.equals("j") == true) {PHON = "J"; MP3 = 10; LENGTH = 340;}
  if(S1.equals("k") == true) {PHON = "K"; MP3 = 11; LENGTH = 340;}
  if(S1.equals("l") == true) {PHON = "L"; MP3 = 12; LENGTH = 330;}
  if(S1.equals("m") == true) {PHON = "M"; MP3 = 13; LENGTH = 330;}
  if(S1.equals("n") == true) {PHON = "N"; MP3 = 14; LENGTH = 335;}
  if(S1.equals("o") == true) {PHON = "O"; MP3 = 15; LENGTH = 370;}
  if(S1.equals("p") == true) {PHON = "P"; MP3 = 16; LENGTH = 330;}
  if(S1.equals("q") == true) {PHON = "Q"; MP3 = 17; LENGTH = 380;}
  if(S1.equals("r") == true) {PHON = "R"; MP3 = 18; LENGTH = 350;}
  if(S1.equals("s") == true || S1.equals("ß") == true) {PHON = "S"; MP3 = 19; LENGTH = 385;}
  if(S1.equals("t") == true) {PHON = "T"; MP3 = 20; LENGTH = 340;}
  if(S1.equals("u") == true) {PHON = "U"; MP3 = 21; LENGTH = 370;}
  if(S1.equals("w") == true) {PHON = "W"; MP3 = 22; LENGTH = 335;}
  if(S1.equals("x") == true) {PHON = "X"; MP3 = 23; LENGTH = 340;}
  if(S1.equals("z") == true) {PHON = "Z"; MP3 = 24; LENGTH = 340;}
  if(S1.equals("ä") == true) {PHON = "AE"; MP3 = 25; LENGTH = 380;}
  if(S1.equals("ö") == true) {PHON = "OE"; MP3 = 26; LENGTH = 380;}
  if(S1.equals("ü") == true) {PHON = "UE"; MP3 = 27; LENGTH = 380;}
  
  if(S1.equals(" ") == true || S1.equals(",") == true || S1.equals(".") == true || S1.equals("!") == true) {PHON = "STOP"; MP3 = 0; LENGTH = 650;}
  MINIMI = 1;
}

  if (SATZ.length() <= 3) {
    SATZ = SATZ + "    ";
  }
  SATZ = SATZ.substring(MINIMI);
  
  if(S3.equals("   ") == true) {SATZ = URSATZ; delay(1000);}
}


void PlayMP3() {

int MP3ONE = 0;

Serial.println(PHON+"   "+MP3);

MP3Serial.print(0x7E,BYTE);
MP3Serial.print(0x07,BYTE);
MP3Serial.print(0xB0,BYTE);
MP3Serial.print(0x30,BYTE);
MP3Serial.print(0x31,BYTE);
MP3Serial.print(0x30,BYTE);
if (MP3 < 40 && MP3 >= 30) {MP3Serial.print(0x33,BYTE); MP3ONE = MP3-30;}
if (MP3 < 30 && MP3 >= 20) {MP3Serial.print(0x32,BYTE); MP3ONE = MP3-20;}
if (MP3 < 20 && MP3 >= 10) {MP3Serial.print(0x31,BYTE); MP3ONE = MP3-10;}
if (MP3 < 10 && MP3 >= 0) {MP3Serial.print(0x30,BYTE); MP3ONE = MP3;}
if (MP3ONE == 0) {MP3Serial.print(0x30,BYTE);}
if (MP3ONE == 1) {MP3Serial.print(0x31,BYTE);}
if (MP3ONE == 2) {MP3Serial.print(0x32,BYTE);}
if (MP3ONE == 3) {MP3Serial.print(0x33,BYTE);}
if (MP3ONE == 4) {MP3Serial.print(0x34,BYTE);}
if (MP3ONE == 5) {MP3Serial.print(0x35,BYTE);}
if (MP3ONE == 6) {MP3Serial.print(0x36,BYTE);}
if (MP3ONE == 7) {MP3Serial.print(0x37,BYTE);}
if (MP3ONE == 8) {MP3Serial.print(0x38,BYTE);}
if (MP3ONE == 9) {MP3Serial.print(0x39,BYTE);}
MP3Serial.print(0x7E,BYTE);

}

This looks like a problem with character encodings. I just copied your code sample into Arduino, saved the file and the looked at a hexdump of that file - the conclusion is that the Arduino editor is quite happily saving the file in UTF-8 (for me at least).

Basically, if it is the case that on your computer the Arduino editor is working in UTF-8, it simply means that every special character umlaute and eszett is being stored as two bytes instead of one. As a simple work around you could change every sequence including one of these characters to the comparison with the next highest string length. E.g. change

if(S2.equals("öh") == true) {PHON = "OE"; MINIMI = 2; MP3 = 26; LENGTH = 400;}

to

if(S3.equals("öh") == true) {PHON = "OE"; MINIMI = 3; MP3 = 26; LENGTH = 400;}

and move it to the first block of comparisons.

My preferred solution would be to change the encoding you're using to ISO-8859-1 (Western Europe), but how to do that depends greatly on which operating system you are using, which text-editors you have at your disposal/are comfortable with using etc.

Something I forgot to mention is that toLowerCase() won't work for these characters either and you may want to write your own. If you get things working in ISO-8859-1 then conversion is as simple as checking that the input is in a certain range (ranges containing only uppercase characters), and if it is, adding 32 to the character code to make it lowercase. This is a convenient property of the way the ASCII and ISO-8859-1 tables were designed.

See ISO/IEC 8859-1 - Wikipedia to see what I'm talking about.

Hi,

unfortunately i did not get my code ISO-8859-1-coded. i don't know how to do it, exactly. what i did was to try several text-editors (for example: notepad++ (Windows98 :fearful:)). I changed the encoder to ISO8859-1 (i'm actually not sure if it finaly got saved the right way), but when i uploaded it to my arduino, i got the following Error...

GRAPHEM_PHONEM_TALK_MOVE_06.pde contains unrecognized characters.
If this code was created with an older version of Processing,
you may need to use Tools -> Fix Encoding & Reload to update
the sketch to use UTF-8 encoding. If not, you may need to
delete the bad characters to get rid of this warning.

so i gave up and tried your "work-around", and it works so far. I converted my string to lowercases by myself, but i'm sure i'll find a better way for that in future, as there might be changing inputs.

THANKS !

P