conversion UTF-8 <-> GSM


does someone have a convesion routines to convert a GSM encoded SMS string to UTF-8 string.
I need this to place a SMS into a string buffer for the serial minitor but with usage of some special charactes e.g. öäüÖÄÜ
Also I need the other way to convert UTF-8 string to a GSM encoded string.


Is there a specific format for SMS messages ?
Can you try to send a SMS yourself with special characters ? Perhaps you can let the Arduino print the hexadecimal values, so you can see what kind of encoding is used.

I'm using a SIM800l modem where the SMS in the AT commad uses the GSM character set.
See also:

Arduino uses UTF-8 in the serial monitor. For debugging I want to convert it.
Also when I use the String type which I guess uses also UTF-8 encoding I want to convert to GSM 7-bit format for sending SMS via AT command. I want to use some of the special characters used in GSM 7-bit format like äöüÖÄÜ etc. which are different in both character sets.
So I'm looking for a fast string conversion function.

The Arduino String data type does not use any encoding. UTF-8 is not part of it. The String.charAt(), String.length() do not take care of UTF-8.

So it is the normal English GSM 03.38 with no extra UTF-16 characters and no special language ?

I'm afraid you have no luck :frowning:
You have to write the code yourself. I think that a conversion table it the most common way. The escape character makes it harder, but that can be included in the table. You might even have to use PROGMEM for the table.
Others want it as well: Unicode problem · Issue #18 · cristiansteib/Sim800l · GitHub
The SIM800 is able to send and email in UTF-8, but that will probably not be possible to use with a cell phone.

Thanks for reply
But what encoding i done by String() when I do String("ä") when not UTF-8 ?
At least the serial monitor uses UTF-8 try it yourself by:


it will show you an "ä" which is UTF-8 character 0xC3A4

SIM800 cannot use UTF-8 according to datasheet it supports this character sets:

Still looking for conversion function. I'm wondering if I'm the only one who has this issue.

As far as I know, the String("ä") takes the UTF-8 bytes, but the String class does not even know that it is a single UTF-8 character. It sees a few bytes, that's all.

There has been a number of issues with UTF-8 and the Arduino IDE. Issues with a temporary *.ino file, issues in Windows and linux and with the Serial Monitor. They have been fixed during the last years, so the source code and the Serial Monitor should all be UTF-8.

You are not the only one that wants this conversion, but it seems that no one has made it for the Arduino yet.