Problem with UTF8 or Unicode support in any display

I am new to Arduino programming and also not good at English. So sorry for bad English.

Recently I decided to made an project with Arduino Uno/Mega and display like st7735, ssd1302, sh1106, nokia5110 or any other display, because I have a collection of all of them.

In my project I need to show some Bengali word like "আমার সোনার " , "বাংলা আমি" "মুহিত আব্দ" etc.

So I search for such a library which support to display Unicode, Ascci letter and sentence.

I test Adafruit_GFX which is not working for me. Finally I got a library from olikraus and that is U8g2_for_Adafruit_GFX . This is an amazing library for me.

Because it support many native language like Bengali with huge number of font and icon.

But when I want to show Bengali word "আমার" on my ST7735 display. But it show me the output আম ার and want to show Bengali word "মুহিত" on my ST7735 display but it show me the output ম ুহ িত .

olikraus provide me a solution of this problem and it works fine for most of all Bengali letter or word. He solved it by reducing distance between the given letter. I attached it below code:

#include <Adafruit_GFX.h>
#include <Adafruit_ST7735.h>
#include <SPI.h>
#include <U8g2_for_Adafruit_GFX.h>

#define TFT_CS   8
#define TFT_RST  7
#define TFT_DC   6

Adafruit_ST7735 tft = Adafruit_ST7735(TFT_CS, TFT_DC, TFT_RST);
U8G2_FOR_ADAFRUIT_GFX u8g2_for_adafruit_gfx;

const uint16_t distance_adjust_table[] = 
{
  /* first char, second char, gap reduction value */
  0x09AE, 0x9BE, 12, /* reduce distance between ম  and া  by 12 */
  /* add more pairs here... */
  0x09B8, 0x09CB, 14, //reduce distance between স and ো
  0x09A8, 0x09BE, 12, //reduce distance between ন and া
  0x09AE, 0x09BF, 16, //reduce distance between ম and ি
  0x09AC, 0x09BE, 12, //reduce distance between ব and া
  0x09BE, 0x0982, 12, //reduce distance between া and ং
  0x09B2, 0x09BE, 12, //reduce distance between ল and া
  0x09AC, 0x09CD, 12, //reduce distance between ব and ্
  0x09CD, 0x09A6, 12, //reduce distance between ্ and দ
  /* this line terminates the table */
  0xffff, 0xffff, 0xffff
};

/* get distance from the distance table */
uint16_t get_distance_adjust(uint16_t e1, uint16_t e2)
{
  uint16_t i;
  i = 0;
  for(;;)
  {
    if ( distance_adjust_table[i] == 0x0ffff )
      break;
    if ( distance_adjust_table[i] == e1 && distance_adjust_table[i+1] == e2 )
      return distance_adjust_table[i+2];
    i+=3;
  }
  return 0;
}

int16_t draw_string(U8G2_FOR_ADAFRUIT_GFX &u8g2, int16_t x, int16_t y, const char *str)
{
  uint16_t e_prev = 0x0ffff;
  uint16_t e;
  uint16_t delta, adjust, sum;
  
  delta = 0;
  adjust = 0;
  u8g2.utf8_state = 0;
  
  sum = 0;
  for(;;)
  {
    e = u8g2.utf8_next((uint8_t)*str);
    if ( e == 0x0ffff )
      break;
    str++;
    if ( e != 0x0fffe )
    {
      delta = u8g2_GetGlyphWidth(&(u8g2.u8g2), e);      
      adjust = get_distance_adjust(e_prev, e);
      e_prev = e;
      u8g2_DrawGlyph(&(u8g2.u8g2), x-adjust, y, e);
      x += delta-adjust;
      sum += delta-adjust;    
    }
  }
  return sum;
}

void setup(void) {
  tft.initR(INITR_BLACKTAB);      // Init ST7735S chip, black tab
  tft.setRotation(0);
  tft.fillScreen(ST77XX_BLACK);
  u8g2_for_adafruit_gfx.begin(tft);
}

void loop() {
  u8g2_for_adafruit_gfx.setFontMode(1);                 // use u8g2 transparent mode (this is default)
  u8g2_for_adafruit_gfx.setFontDirection(0);            // left to right (this is default)
  u8g2_for_adafruit_gfx.setForegroundColor(ST77XX_WHITE);      // apply Adafruit GFX color
  u8g2_for_adafruit_gfx.setCursor(0,20);                // start writing at this position
  u8g2_for_adafruit_gfx.setFont(u8g2_font_unifont_t_bengali);  // select Bengali font
  draw_string(u8g2_for_adafruit_gfx, 0,20, "আমার সোনার");
  draw_string(u8g2_for_adafruit_gfx, 0,40, "বাংলা আমি");          // draw Bengali string
  draw_string(u8g2_for_adafruit_gfx, 0,60, "আব্দ ল্লাহ"); 
}

In Bengali letter there are some compound letter like " ব্দ " which is formed from "ব+্+দ" , and letter "ম্ম" is formed "ম+্+ম" . If this type of compound letter work then I can complete my main project perfectly.

Now I need to display this " ব্দ " , "ম্ম" type of compound letter . I think this is difficult task than the previous one. But don't know how to do this.

If some know how to do this types of work please mention me.

@muhit114474, although there is basically nothing wrong with where you posted your topic, it has been moved to the Displays section of the forum where it might get needed the attention.

1 Like

@muhit114474, welcome!

You already use transparent mode for compound letters. So combining one on top of the other should be easy. Try with zero distance, or maybe negative distance (width of the previous symbol).
Meaning reduce distance by width of previous symbol.

I am also a fan of U8g2_for_Adafruit_GFX.

Jean-Marc

Aha, these are trigrams. So you may need an additional table for trigrams.
Sometimes I am tempted to give a quick answer, when the true experts are too slow to answer.

1 Like

@sterretje

although there is basically nothing wrong with where you posted your topic, it has been moved to the Displays section of the forum where it might get needed the attention.

Sorry for choosing wrong category. Thank you very much.

@ZinggJM Thank you for your replay.

You are right. But as a new to this section it may be quite hard to make another table for additional table for trigrams. But I will try to my best as your instruction.

If some one give me some hits or example it will be easy for me.

I don't dare to give an example or code snippet, as I make too many errors if I don't test.
But the idea I have is, use a (separate) distance adjust table for the symbols to combine, and explicitly test for the combine symbol, if this symbol is only used for this.

@ZinggJM
Is there another way to show the combine letters? Like hex value of the combine letter for example "\U00000444". olikraus give me this hints.

I don't understand your question. I will take a look at the issue you raised to try to understand. But I am no fonts expert in any way. I assumed the symbols in question are really trigrams, "ব+্+দ", and the '+্+' is one unicode character that changes the look of the combination of the other symbols on both sides, different to the look when the symbols would be directly adjacent. And I assumed the '+্+' is not used separate and alone normally.

How many unicode characters are these "ব+্+দ" ?

But I am busy for now, sorry.

@ZinggJM You exactly right .The '+্+' is one Unicode character that changes the look of the combine letters on both sides. And the combine letter is different to the look when the symbols would be directly adjacent.

But I am busy for now, sorry.

Oh.... Ok. If have any free time then please give me some hits/example for making the trigrams table. Then I will be benefited a lot.

If don't have enough time to test it then no problem, I will test it myself and let you know if it works or not.

Else you would indeed need a replacement unicode for it.

Yes, But I don't know how to do that. Waiting for some expert person like you.

I think you have 2 options.

Either you create a table with trigrams and distance adjustment, with '+্+' as the middle character, and also search through this table.

Or you create a table for trigrams, containing only the characters to combine with distance adjustment, and search through it whenever you encounter a '+্+'. In this case you draw the symbol for '+্+' if the adjacent characters are not found in the table.

This will terminate my effort to try to support you, as I have other pending work to do.

Jean-Marc

This forum section might be for you: Jobs and Paid Consultancy

But I don't know how can I make a table for trigrams and do distance adjustment.

Can you give an example of table for trigrams for one compound letter ? Then I will able make for other letter.

@muhit114474

In your first post you provide the code that so far works for you. I would expect that anyone would be able to extend that code with the table for trigrams, including you.

But maybe you need to make clear that you invite any reader to help you, not just me or Oliver.

But In Bengali letter there are some compound letter like " ব্দ " which formed from "ব+্+দ" , and letter "ম্ম" is formed "ম+্+ম" . If this type of compound letter work then I can complete my main project perfectly.

At first sight, the first case looked like it can be done by combining the two glyphs, using distance adjust. But in fact the wanted symbol needs a different glyph. For the second case this is more obvious. My coding attempt so far failed, and will never produce the desired result anyway.

1 Like

I got code working for my approach somewhat, as far a that approach is usable.

See Problem with "u8g2_font_unifont_t_bengali" · Issue #33 · olikraus/U8g2_for_Adafruit_GFX (github.com)

Jean-Marc

This "Problem" is far more complex than obvious from the original post.

See e.g. Developing OpenType Fonts for Bengali Script - Typography | Microsoft Docs

Jean-Marc

I was not allowed to post this addition as a separate reply. I don't like this restriction!

1 Like