Sending SMS sim800L UTF16

markushausen · June 4, 2023, 10:29pm

Hello
I am working an an Lilygo with sim800l for sending sms.
I uses TinnyGsm. I get it to work without specialchars when using
modem.sendSMS(phoneNumber, message) where number and message is variable.
i would like to send specialchar like öäü.
I therfor changed to
sendSMS_UTF16
i testet with
modem.sendSMS_UTF16(phoneNumber.c_str(), u"Hellö", 5);
where phonenumber is variable. Thats worked fine. The u in front of "Hellö" is not transmitting, just prefixing that it is unicode. Without the u letter its chinise letters when recieving sms.

The problem is, when i will change the "Hellö" with a variable like:
modem.sendSMS_UTF16(phoneNumber.c_str(), message.c_str(), message.length());

i cant find a way, how to say to the sendSMS function, that the string is unicode.
If i make u + message.c_str() i get, that u is not declared.
is here anybody who has an idea how to fix that isue?

Regards Markus

J-M-L · June 5, 2023, 3:27am

The u"xxx" notation is part of the C++ specification to denote a UTF16 string literal. So it’s not something magic in the function call that you put in front of a variable.

If you want to create a variable of a similar type, it’s not just a basic c-string. You neee to create a buffer of type char16_t[N], where N is the size of the string in UTF-16 code units including the null terminator. You obviously need to fill that with UTF16 values.

Are you building dynamic messages or you just have a bunch of canned messages you want to send?

markushausen · June 5, 2023, 4:52am

Hello and thanks for respons on my first post
Yes it should be dynamic messages. I defined a function for sending sendSMSWithSpec i call.

bool sendSMSWithSpec(String phoneNumber, String message, String remoteip) {
  
if (remoteip=="XX.XX.XX.XX")  //For securetyreasons bounded to an ip
{

bool res = modem.sendSMS_UTF16(phoneNumber.c_str(), message.c_str(), message.length());
return res;


 // Working without specialchar
 // bool res = modem.sendSMS(phoneNumber, message);
 // DBG("SMS:", res ? "OK" : "fail");
 // SerialMon.println(phoneNumber);
 // SerialMon.println(message);
 //  return res;

}
}

J-M-L · June 5, 2023, 8:20am

The String class contains an UTF8 representation of your characters so when you extract the cString you just have a char[N] array, that is individual bytes using the UTF8 coding.

Unicode currently defines 1114112 code points (17 planes of 2^16 code points). A code point is something rather abstract, it can be a glyph, a formatting code, or just reserved for future use. The point is you have tons of them !

UTF8 uses a 8 bit based variable length encoding scheme that encodes each Unicode code point using one to four bytes : indeed as UTF-8 works in terms of 8 bit code units and since 8 bits can only represent 256 values, code points are represented by a sequence of one to four UTF-8 code units (one to four bytes) to cover all code points and not all sequences of but make sense. There is a clear representation that lets you know how many bits to read to go get the code point.

UTF16 uses a 16 bit based variable length encoding scheme that encodes each Unicode code point using 2 or 4 bytes to represent all the code points (never one nor three).

If you want to send an UTF16 representation your function needs to decode the UTF8 representation and construct an UTF16 where 2 or 4 bytes are always used for each character even if one was only needed in the UTF8 representation

Decoding UTF8 (finding out how many bytes are used) is not difficult but you need the mapping to 16 bits or 32 bits for the characters. Because of the way it uses the first byte of multi-byte sequences, UTF-8 uses 3 bytes for some characters that require only 2 bytes in UTF-16.

That’s where the challenge is but it does work out for most character you ll want to use if you go to a bit representation and extract the Unicode code point and move it to the other representation

For ascii char it’s easy, they fit on one byte and the MSb is always 0 in UTF8. "A" in ASCII is hex 0x41; in UTF-8 it is also 0x41 and there is a straight mapping in UTF-16 with 2 bytes as 0x0041

When you get out of ASCII it’s a bit more complicated. For example "À" in UTF-8 uses up two byte 0xC3 0x80 and in UTF-16 it is 0x00C0.

You get from one to the other by removing the control bits of UTF8 and rebuild the code point value

code.

code point	UTF-8	possible 1st byte	#bits to code
U+0000 to U+007F	0xxxxxxx (ASCII)	00 to 7F	7
U+0080 to U+07FF	110xxxxx 10xxxxxx	C2 to DF	5+6=11
U+0D00 to U+FFFF	1110xxxx 10xxxxxx 10xxxxxx	E0 to EF	4+6+6=16
U+10000 to U+10FFFF	11110xxx 10xxxxxx 10xxxxxx 10xxxxxx	F0 to F4	2+6+6+6=20

Using the mapping will give you the right UTF16 code most of the time and should be enough for your needs probably

markushausen · June 5, 2023, 12:27pm

Thanks for your answer. I see that I still seem to have a lot to learn. Because I had never heard of code point value before. If I understand that correctly, it is not that simple to send the variable message containing special characters as an SMS. I've searched all kinds of things on the net, but I've only ever found examples with fixed texts but nowhere an example with variables that can contain special characters.
If I use the sendSMS function, the recipient only gets an @ or a ? on for special characters. I can use the sendSMS_UTF16 function with fixed text, but I'm too stupid to transfer it as a variable.

J-M-L · June 5, 2023, 1:02pm

here is a piece of code to try it out


void utf8ToUtf16(const char* utf8String, uint16_t * utf16String, const size_t maxSizeWithoutTrailingNull) {
  size_t utf8Index = 0;
  size_t utf16Index = 0;

  while ((utf16Index < maxSizeWithoutTrailingNull) && (utf8String[utf8Index] != '\0')) {
    uint32_t byte1 = utf8String[utf8Index++];

    if ((byte1 & 0x80) == 0) {
      // 1 byte uint8_tacter (ASCII)
      utf16String[utf16Index++] = static_cast<uint16_t>(byte1);
    } else if ((byte1 & 0xE0) == 0xC0) {
      // 2 byte uint8_tacter
      uint16_t byte2 = utf8String[utf8Index++];
      uint16_t utf16uint8_t = ((byte1 & 0x1F) << 6) | (byte2 & 0x3F);
      utf16String[utf16Index++] = utf16uint8_t;
    } else if ((byte1 & 0xF0) == 0xE0) {
      // 3 byte uint8_tacter
      uint8_t byte2 = utf8String[utf8Index++];
      uint8_t byte3 = utf8String[utf8Index++];
      uint16_t utf16uint8_t = ((byte1 & 0x0F) << 12) | ((byte2 & 0x3F) << 6) | (byte3 & 0x3F);
      utf16String[utf16Index++] = utf16uint8_t;
    } else if ((byte1 & 0xF8) == 0xF0) {
      // 4 byte uint8_tacter
      uint32_t byte2 = utf8String[utf8Index++];
      uint32_t byte3 = utf8String[utf8Index++];
      uint32_t byte4 = utf8String[utf8Index++];
      uint32_t utf32uint8_t = ((byte1 & 0x07) << 18) | ((byte2 & 0x3F) << 12) | ((byte3 & 0x3F) << 6) | (byte4 & 0x3F);
      utf32uint8_t -= 0x10000;
      uint16_t highWord = static_cast<uint16_t>((utf32uint8_t >> 10) + 0xD800);
      uint16_t lowWord = static_cast<uint16_t>((utf32uint8_t & 0x3FF) + 0xDC00);
      utf16String[utf16Index++] = highWord;
      utf16String[utf16Index++] = lowWord;
    }
  }
  if (utf16Index < maxSizeWithoutTrailingNull) utf16String[utf16Index++] = 0x0000; // trailing null
}

void dumpUTF16(const uint16_t * utf16String) {
  Serial.print("Big Endian UTF-16BE:\t");
  for (size_t utf16Index = 0; utf16String[utf16Index] != 0x0000; utf16Index++) {
    byte LSB = utf16String[utf16Index];
    byte MSB = utf16String[utf16Index] >> 8;
    if (MSB < 0x10) Serial.write('0'); Serial.print(MSB, HEX);
    if (LSB < 0x10) Serial.write('0'); Serial.print(LSB, HEX);
    Serial.write(' ');
  }
  Serial.println("\n---------------");
  Serial.println("(the byte's order in the UTF16 buffer is little endian though as this is Arduino's representation)");
}

String messageUTF8 = "àéèö";
const size_t maxSize = 30;
uint16_t messageUTF16[maxSize + 1];

void setup() {
  Serial.begin(115200); Serial.println();
  utf8ToUtf16(messageUTF8.c_str(), messageUTF16, maxSize);
  dumpUTF16(messageUTF16);
}

void loop() {}

messageUTF8 is a String holding whatever you want to send.
The utf8ToUtf16() function should convert the text correctly into UTF16 (with a little endian byte order in memory)

you should get Big Endian UTF-16BE: 00E0 00E9 00E8 00F6

you can verify that it's working fine by using an online tool like Convert UTF8 to UTF16 - Online UTF8 Tools

markushausen · June 6, 2023, 6:04pm

Thank you for your solution. I got it to work. Below is the entire code in case someone has something similar in mind. If someone would like to optimize something, they are of course welcome to do so and then let me know.

//Definiere wlan teil
#include <WiFi.h>
#include <WebServer.h>

const char* ssid = "XXXX";
const char* password = "XXXX";

//IPAddress ip(192, 168, 10, 99);  // Statische IP-Adresse
//IPAddress gateway(192, 168, 10, 1);  // IP-Adresse des Gateways
//IPAddress subnet(255, 255, 255, 0);  // Subnetzmaske

//Definiere server teil
WebServer server(80);

// Select the corresponding model
#define SIM800L_IP5306_VERSION_20190610

#include "utilities.h"

// Set serial for debug console (to the Serial Monitor, default speed 115200)
#define SerialMon Serial
// Set serial for AT commands (to the module)
#define SerialAT  Serial1

// Configure TinyGSM library
#define TINY_GSM_MODEM_SIM800          // Modem is SIM800
#define TINY_GSM_RX_BUFFER      1024   // Set RX buffer to 1Kb

#include <TinyGsmClient.h>

#ifdef DUMP_AT_COMMANDS
#include <StreamDebugger.h>
StreamDebugger debugger(SerialAT, SerialMon);
TinyGsm modem(debugger);
#else
TinyGsm modem(SerialAT);
#endif


void utf8ToUtf16(const char* utf8String, uint16_t * utf16String, const size_t maxSizeWithoutTrailingNull) {
  size_t utf8Index = 0;
  size_t utf16Index = 0;

  while ((utf16Index < maxSizeWithoutTrailingNull) && (utf8String[utf8Index] != '\0')) {
    uint32_t byte1 = utf8String[utf8Index++];

    if ((byte1 & 0x80) == 0) {
      // 1 byte uint8_tacter (ASCII)
      utf16String[utf16Index++] = static_cast<uint16_t>(byte1);
    } else if ((byte1 & 0xE0) == 0xC0) {
      // 2 byte uint8_tacter
      uint16_t byte2 = utf8String[utf8Index++];
      uint16_t utf16uint8_t = ((byte1 & 0x1F) << 6) | (byte2 & 0x3F);
      utf16String[utf16Index++] = utf16uint8_t;
    } else if ((byte1 & 0xF0) == 0xE0) {
      // 3 byte uint8_tacter
      uint8_t byte2 = utf8String[utf8Index++];
      uint8_t byte3 = utf8String[utf8Index++];
      uint16_t utf16uint8_t = ((byte1 & 0x0F) << 12) | ((byte2 & 0x3F) << 6) | (byte3 & 0x3F);
      utf16String[utf16Index++] = utf16uint8_t;
    } else if ((byte1 & 0xF8) == 0xF0) {
      // 4 byte uint8_tacter
      uint32_t byte2 = utf8String[utf8Index++];
      uint32_t byte3 = utf8String[utf8Index++];
      uint32_t byte4 = utf8String[utf8Index++];
      uint32_t utf32uint8_t = ((byte1 & 0x07) << 18) | ((byte2 & 0x3F) << 12) | ((byte3 & 0x3F) << 6) | (byte4 & 0x3F);
      utf32uint8_t -= 0x10000;
      uint16_t highWord = static_cast<uint16_t>((utf32uint8_t >> 10) + 0xD800);
      uint16_t lowWord = static_cast<uint16_t>((utf32uint8_t & 0x3FF) + 0xDC00);
      utf16String[utf16Index++] = highWord;
      utf16String[utf16Index++] = lowWord;
    }
  }
  if (utf16Index < maxSizeWithoutTrailingNull) utf16String[utf16Index++] = 0x0000; // trailing null
}


bool sendSMSWithIMEI(String phoneNumber, String message, String remoteip) {

  String messageUTF8 = message;
  const size_t maxSize = 30;
  uint16_t messageUTF16[maxSize + 1];
   utf8ToUtf16(message.c_str(), messageUTF16, maxSize); 
  
if (remoteip=="XXXXXXXXXX") //Authentification, later with token
{
  String imei = modem.getIMEI();
  DBG("IMEI:", imei);

    SerialMon.println(phoneNumber);
    SerialMon.println(message);

bool res = modem.sendSMS_UTF16(phoneNumber.c_str(), messageUTF16, message.length());
return res;


}


}

void handleRoot() {

  String remoteip = "";
  String message = "";
  String nummer = "";
  String nachricht = "";

  if (server.args() > 0) {
    for (int i = 0; i < server.args(); i++) {
      message += "Parameter " + server.argName(i) + ": " + server.arg(i) + "\n";
      if (server.argName(i) == "nachricht")
      {
        nachricht = server.arg(i);
      }
      if (server.argName(i) == "nummer")
      {
        nummer = server.arg(i);
      }
    }
  }
  remoteip = server.client().remoteIP().toString();
  SerialMon.println(message);
 
  bool result = sendSMSWithIMEI(nummer, nachricht, remoteip);
  server.send(200, "text/plain", "Received message: " + message + result);
  
}



void setup()
{
  // Set console baud rate
  SerialMon.begin(115200);

  // Connect to WiFi
  WiFi.begin(ssid, password);
  while (WiFi.status() != WL_CONNECTED) {
    delay(1000);
    SerialMon.println("Connecting to WiFi...");
  }

  SerialMon.println("WiFi connected");
  server.on("/", handleRoot);
  server.begin();
  SerialMon.println("Server started");
  SerialMon.println(WiFi.localIP());



  delay(10);

  // Start power management
  if (setupPMU() == false) {
    SerialMon.println("Setting power error");
  }

  // Some start operations
  setupModem();

  // Set GSM module baud rate and UART pins
  SerialAT.begin(115200, SERIAL_8N1, MODEM_RX, MODEM_TX);

  delay(6000);
 
  SerialMon.println("Initializing modem...");
  modem.init();
    delay(4000);
  
 
}

void loop()
{
  server.handleClient();
}

J-M-L · June 6, 2023, 6:09pm

glad it helped thanks for sharing back.

system · December 3, 2023, 6:09pm

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Variable in sendSms function Sensors	3	293	November 20, 2022
conversion UTF-8 <-> GSM Networking, Protocols, and Devices	6	4789	May 6, 2021
Send SMS with variables.. Programming	11	1710	May 5, 2021
SIM800 SMS Encoding problem Networking, Protocols, and Devices	1	1414	May 6, 2021
Unable to send variable as SMS using SIM900 Module Programming	3	1081	May 5, 2021

Sending SMS sim800L UTF16

Related topics