convert GSM 7-bit encoded text to utf-8

Hi,

I'm receiving sms messages using the Adafruit Fona GSM module. My problem is the character encoding of the SMS messages. The appear to come in a 7-bit encoding and I need them in UTF-8. This gives problems with high ascii character like the German "ö".

Someone has a method to convert the encoding?

Thanks!

No experience with this but (based on the tables in the link you provided and my understanding of the link on how it works) you might be able to solve this with a lookup table; it will consume some memory.

byte lookup[][2] = 
{
  {0x40, 0x00},
  {0xC2, 0xA3},
  {0x24, 0x00},
  ...
  ...
}

GSM character 0 is the index of the first element in the array, GSM character 1 is the index of the second element in the array and so on.
The value 0x00 indicates the character does not need to be printed.

Note that the table must be sequential and you can't have missing entries. E.g. an entry for the escape character must be there; it can be {0x00, 0x00}.

In the below, gsmChar is the received character

if(lookup[gsmChar][0] != 0x00)
{
  // send first byte
  Serial.print(lookup[gsmChar][0]);
}

if(lookup[gsmChar][1] != 0x00)
{
  // send second byte
  Serial.print(lookup[gsmChar][1]);
}

If you want to be able to handle extended gsm characters, you will need to check if 0x1B is received; easiest might be to handle this with a switch/case.

if(gsmChar!=0x1B)
{
  // indicate extended mode
  mode = 1;
}

if(mode==0)
{
  if(lookup[gsmChar][0] != 0x00)
  {
    // send first byte
    Serial.print(lookup[gsmChar][0]);
  }

  if(lookup[gsmChar][1] != 0x00)
  {
    // send second byte
    Serial.print(lookup[gsmChar][1]);
  }
}
else
{
  // handle extended gsm characters
  ...
  ...
  // after handling extended characters, reset mode no normal
  mode = 0;
}

The above demonstrates the idea, not compiled or tested. If you can show your code, you can get a more tailored advise.

@sterretje thanks for the quick reply. Things are even easier for me - I don't need to cover all high ascii values, but only 7 chars:

(öäüÖÄÜß). I ignore all others.

I'm still confused about the values I received. My test SMS contained "mit mehr öäü". That should be:

{6D,69,74,20,6D,65,68,72,20...} and now three high ascii value {7C,7B,7E} according to the GSM encoding table I linked before.

with this little method:

void printASCIICode(char * myString) {
  char currentChar;

  for(int i=0;i<strlen(myString);i++) {
    currentChar = myString[i];
    Serial.println(currentChar,HEX);
  }
}

I got this:

6D
69
74
20
6D
65
68
72
20
F6
E4
FC

Why do I get this high numbers (dec 246, 228 and 252) for the chars that should be 7C,7B,7E?

The relevant part of my code to retrieve the SMS:

    //any data available from the FONA?
    do  {
      //Read the notification into fonaNotificationBuffer
      *bufPtr = fona.read();
      delay(1);
    } while ((*bufPtr++ != '\n') && (fona.available()) && (++charCount < (sizeof(fonaNotificationBuffer)-1)));

    *bufPtr = 0; //Add a terminal NULL to the notification string

    if (1 == sscanf(fonaNotificationBuffer, "%*c%*c+CMTI: " FONA_PREF_SMS_STORAGE ",%d", &slot)) {

      // Retrieve SMS sender phone number.
      if (! fona.getSMSSender(slot, callerIDbuffer, 31)) {
        Serial.print(F("Didn't find SMS message in slot 1: "));
        MYDEBUG_PRINTLN(slot);
      }
      MYDEBUG_PRINT(F("FROM: ")); MYDEBUG_PRINTLN(callerIDbuffer);

      // Retrieve SMS value.
      if (fona.readSMS(slot, smsBuffer, 250, &smslen)) { // pass in buffer and max len!
        strcat(msgText, smsBuffer); // the msg
        strcat(msgText, "#");
        strcat(msgText, callerIDbuffer); // the callers number
        emptyCharBuffer();

        // delete the original msg after it is processed
        deleteSingleSMS(slot);
      }

shouldn't this method do the job (it does not - no change to see in the message):

void printASCIICode(char * myString) {
  for(unsigned i=0;i<strlen(myString);i++) {
    Serial.println(myString[i],HEX);
    if(myString[i] == 0xf6) {
      Serial.println("Replace");
      myString[i] = 246; // oe = 0xc3,0xb6
    }
  }
}

Oh - just saw that 0xf6 = 246... so I'm replacing the character with an identical value. Smart...
But why does the original character comes as 0xf6 and does not display as "ö" and if I check the hex value of an UTF8 "ö" I get 0xf6 too but this this time it will be displayed?!

anyone has an idea why the received char with the value 0xf6 will not be displayed as "ö" in the serial monitor or on the connected display while an "ö" I passed in the code will return the same hex value and will be displayed in the serial monitor and the display??? This should be a very basic thing I just don't get it... :frowning: