How to convert Hex UTF-16 to String in arduino

I'm working on a function to correctly display Arabic words on the OLED/LCD. (Arabic letters have four different modes.)
I have an array of Arabic letters (Map array) in different states. After recognizing the alphabet in Arabic, I need to re-align the letters. My question is how do I put the Unicode characters through the table (Map Table) to a String variable (pBuffer)?

For example: To write the word "باب" you need to select the letter from Map table and place it in a String variable to send to the OLED/LCD.

How do I select characters from the Map table and put them together in a String variable?
I think there is a need a function to convert hex code from Map table to characters.

...
const char16_t Map[][5] PROGMEM = {

     /* code, isolated, initial, medial, final */
    {0x0621, 0xFE80, 0x0000, 0x0000, 0x0000 },  //1 /* HAMZA ء*/
    {0x0622, 0xFE81, 0x0000, 0x0000, 0xFE82 },  //2/* ALEF_MADDA آ*/
    {0x0623, 0xFE83, 0x0000, 0x0000, 0xFE84 },  //3/* ALEF_HAMZA_ABOVE أ*/
    {0x0624, 0xFE85, 0x0000, 0x0000, 0xFE86 },  //4/* WAW_HAMZA ؤ*/
    {0x0625, 0xFE87, 0x0000, 0x0000, 0xFE88 },  //5/* ALEF_HAMZA_BELOW إ*/
    {0x0626, 0xFE89, 0xFE8B, 0xFE8C, 0xFE8A },  //6/* YEH_HAMZA ئ*/
    {0x0627, 0xFE8D, 0x0000, 0x0000, 0xFE8E },  //7/* ALEF ا*/
    {0x0628, 0xFE8F, 0xFE91, 0xFE92, 0xFE90 }   //8/* BEH ب*/
};

String pBuffer;
pBuffer += ((char)(Map[7][2]));
pBuffer += ((char)(Map[6][5]));
pBuffer += ((char)(Map[7][3]));

u8g2.setCursor(5, 20);
u8g2.print(pBuffer);
...

Thanks.

First of all:

The values of your Map are not stored in hexadecimal. Values only need a certain notation (decimal, hexadecimal etc) so we can write them down as text. So, because c++ sourcecode IS text, you need to write the numbers in a certain notation. Here, you chose hex by starting the values with 0x, but the values in memory don’t have that notation aspect anymore, they are just numbers without notation. (obviously stored in ram hardware, which uses a binary representation, but that’s irrelevant here)

So, you don’t need to “convert hex code”. The more probable reason why your code is not working, is because you downcast from 16 bit characters (char16_t) to 8 bit characters (char). Arabic characters need 16 bit, but you throw away half of the bits by casting to “char”.

Also, maybe the u8g2 object you’re using doesn’t even support 16 bit characters? In which case, it won’t be possible to do what you want. But I’m not sure about that, obviously. Anyway, if it supports 16bit chars, you will somehow have to be able to send the 16 bit characters to it with another method or something…

Another solution, is to draw the Arabic characters graphically yourself (with pixels), but that’s a lot more work of course…

found this interesting link:

http://utfcpp.sourceforge.net/

look for the "utf8::utf16to8" example there. Seems to be what you are trying to do

I found the the library here:

https://sourceforge.net/projects/utfcpp/files/utf8cpp_2x/Release%202.3.4/

with any luck it is also compatible with arduino.

good luck! :slight_smile:

thomasvt:
First of all:

The values of your Map are not stored in hexadecimal. Values only need a certain notation (decimal, hexadecimal etc) so we can write them down as text. So, because c++ sourcecode IS text, you need to write the numbers in a certain notation. Here, you chose hex by starting the values with 0x, but the values in memory don't have that notation aspect anymore, they are just numbers without notation. (obviously stored in ram hardware, which uses a binary representation, but that's irrelevant here)

So, you don't need to "convert hex code". The more probable reason why your code is not working, is because you downcast from 16 bit characters (char16_t) to 8 bit characters (char). Arabic characters need 16 bit, but you throw away half of the bits by casting to "char".

Also, maybe the u8g2 object you're using doesn't even support 16 bit characters? In which case, it won't be possible to do what you want. But I'm not sure about that, obviously. Anyway, if it supports 16bit chars, you will somehow have to be able to send the 16 bit characters to it with another method or something...

Thanks for the explanations.
I did almost all of the work and only had trouble displaying Arabic texts.
The same project has been done in other language. but in C++ language there are more restrictions.

https://github.com/Naheel-Azawy/c-arabic-reshaper

https://github.com/soimy/arabic-persian-reshaper

thomasvt:
Another solution, is to draw the Arabic characters graphically yourself (with pixels), but that's a lot more work of course...

I implemented this method. See the link below.
(This method requires a font design.)

https://github.com/idreamsi/arduino-persian-reshaper

But I need to design a function forever so that it can be used in Arduino and other library (for example: u8g2).
The u8g2 library also supports UTF, but it breaks Arabic words and needs Reshape.

sherzaad:
found this interesting link:

http://utfcpp.sourceforge.net/

look for the “utf8::utf16to8” example there. Seems to be what you are trying to do

I found the the library here:

UTF-8 CPP - Browse /utf8cpp_2x/Release 2.3.4 at SourceForge.net

with any luck it is also compatible with arduino.

good luck! :slight_smile:

Thankful
I will check.