String to int extended-ASCII code issue

Hello everyone !

I have an issue that is giving me headaches for days now.

I have an A6 Module connected to an ESP8266, I can get it to work to send SMS but only the ASCII table (0x00 to 0x7f).

I realized if I want to use the extended ASCII table (0x00 to 0xff) I need to send to the SoftwareSerial the characters int representation.


Source : WIKIPEDIA

Here’s the big issue

I pass a string to my function

sendSMS(phoneNumber, "étrange");

What I receive in my SMS :

“[blank_space]trange”

Here is my function that decomposes the string to int representation for each char and sent it into the Software Serial, note that “text” is actually the “étrange” String :

for (int i = 0; i < text.length(); i++) {
     mySoftwareSerial->write((int)text.charAt(i));
}

What is really strange is if I directly write to the SoftwareSerial like this :

mySoftwareSerial->write((int)233); // Writes "é" to the serial

I successfull receive “é” character by SMS.

Anyone have an idea of the problem ? I suspect the extented ASCII characters to be wrongly converted to int by my for loop…

Thank you !

Maybe you’re receiving unicode instead of extended ascii?

Below might help in debugging.

// print the length of the text to serial port
Serial.print("text length = ");
Serial.println(text.length());
for (int i = 0; i < text.length(); i++) {
     mySoftwareSerial->write((int)text.charAt(i));
     // print the hex values to serial port
     Serial.print((int)text.charAt(i), HEX);
    Serial.print(" ");
}
Serial.println("====");

PS
Why the cast to int? I don’t think that that is necessary.

Hello, thank you for your answer.

You are totally right, my "é" character is converted in two bytes (0xC3 0xA9)

Total output in serial monitor for String "étrange !" is C3 A9 74 72 61 6E 67 65 20 21

Is there any way to "force" it as ASCII ?

Thank you so much in this effort of debugging

What provides the text that you want to send?

Either you need to change it there. Or you can create some form of lookup table that translates the two bytes to extended ascii; if a byte exceeds 0x7F, look in the lookup table for the combination of that byte and the next one and translate it.

Below a simple example with the lookup table. I based the lookup on http://www.fileformat.info/info/charset/UTF-8/list.htm and your image.

// struct defining a lookup entry
struct LOOKUP
{
  byte utf8[2];
  byte extended;
};

// lookup table
LOOKUP lut[]
{
  {{0xC3, 0xA8}, 232},  // e-grave
  {{0xC3, 0xA9}, 233},  // e-acute
};

// received text
char text[] = {(char)0xC3, (char)0xA9, 't', 'r', 'a', 'n', 'g', 'e'};   // "étrange";

void setup()
{
  Serial.begin(57600);

  Serial.println(strlen(text));
  Serial.println(text);
  for (uint8_t cnt = 0; cnt < strlen(text); cnt++)
  {
    Serial.print((byte)text[cnt], HEX);
    Serial.print(" ");
  }
  Serial.println();

  for (uint8_t cnt = 0; cnt < strlen(text); cnt++)
  {
    if ((byte)text[cnt] >= 0x80)
    {
      byte lutIn[2];
      memcpy(lutIn, &text[cnt], 2);
      for (uint16_t lutCnt = 0; lutCnt < sizeof(lut) / sizeof(lut[0]); lutCnt++)
      {
        if (memcmp(lutIn, lut[lutCnt].utf8, 2) == 0)
        {
          Serial.print(lut[lutCnt].extended, HEX);
          cnt++;
          break;
        }
      }
    }
    else
    {
      Serial.print((byte)text[cnt], HEX);
    }
    Serial.print(" ");
  }
  Serial.println();
}

void loop()
{

}

You will have to complete the lookup table :wink:

unicode instead of extended ascii

If you don't do it often, and don't mind ugly, you could do

sendSMS(phoneNumber, "\xE9trange"); // 0xE9 == 233 == é
sendSMS(phoneNumber, "\351trange");  // and that's octal

I don't believe that there is a way to use escaped decimal numbers :frowning:

I think that the problem is that the text comes from somewhere else.