SOLVED: UTF8 to extended ASCII conversion partially fails

Hello,

In a sketch for an ESP I use the following code based on Arduino Playground - HomePage

uint8_t utf8Ascii(uint8_t ascii) {                      
  static uint8_t cPrev;
  uint8_t c = '\0';
  
  if (ascii < 0x7f || ascii == degCascii || ascii == degFascii) {
    cPrev = '\0';
    c = ascii;
  } else {
    switch (cPrev) {
      case 0xC2: c = ascii;  break;
      case 0xC3: c = ascii | 0xC0;  break;
      case 0x82: if (ascii==0xAC) c = 0x80;   // Euro symbol special case
    }
    cPrev = ascii;                                     // save last char
  }
  return(c);
}


String utf8AsciiStr(char * s) {
                                                           
  uint8_t c, k = 0;
  
  while (*s != '\0') {
    c = utf8Ascii(*s++);
    if (c != '\0')
      tmpBuffer[k++] = c;
  }
  tmpBuffer[k] = '\0';
  return(tmpBuffer);
}

I set the all relevant char arrays including "tmpBuffer" to a size of 500, but the conversion only works for a maximum length of 243 characters. If I try to convert a larger array, the beginning is cut and only the last 243 characters of the array are converted.

Can someone explain why this happens - is there any limitation with this conversion function?
I also tried larger array and buffer sizes, but that didn't help.

Update:
Having done some further testing, I can report the limit is not an exact length of 243 characters; it's a value of about 250 characters where the strange behaviour starts. Sometimes the beginning of the string is cut, sometimes only the last character is shown - it seems to be dependent on the "kind" of characters that are converted. Using single characters with a space space between (a b c d) seems to exceed the limit whereas words (abcd) lead to the opposite.
This is really strange and - regarding the conversion function - I have no explanation what factor causes this limitation. Memory ? -> I increased buffer size signifantly without effort. Or could it be a timing problem?

Update 2 - I got it running.

I used a code snippet grabbed somewhere in the web without understanding its exact function - which hasn't been a good idea...
Differently to the example code from arduino playground above, "my" function used an additional variable "tmpBuffer". Without knowing the exact reason I found out this additional variable caused the malfunction.
So I took the "in place conversion" example from arduino playground which was also a good opportunity to get rid of the String class. I made a little change because I also want to keep the unconverted array:

uint8_t utf8Ascii(uint8_t ascii) {                                    //converts a single character
  static uint8_t cPrev;
  uint8_t c = '\0';
  
  if (ascii < 0x7f || ascii == degCascii || ascii == degFascii) {
    cPrev = '\0';
    c = ascii;
  } else {
    switch (cPrev) {
      case 0xC2: c = ascii;  break;
      case 0xC3: c = ascii | 0xC0;  break;
      case 0x82: if (ascii==0xAC) c = 0x80;   // Euro symbol special case
    }
    cPrev = ascii;                            // save last char
  }
  return(c);
}



void convert(char* source, char* destination)     // picks every character from char *source,
{                                                                 // converts it and writes it into char* destination
        int k=0;
        char c;
        for (int i=0; i<strlen(source); i++)
        {
                c = utf8Ascii(source[i]);
                if (c!=0)
                        destination[k++]=c;
        }
        destination[k]=0;
}

So if you want to convert an array, just call convert(--source-array--, --destination-array--);
In case you want to overwrite the source array you can use convert(--source-array--, --source-array--);
Perhaps this can help someone in the future.

for (int i=0; i<strlen(source); i++)

Can be improved to:

for (int i=0; source[i]; i++)

Otherwise you'll count the characters in the source for each character you convert (it's O(n2)).

Also, since the convert function isn't modifying source, you should mark it as const:

void convert(const char* source, char* destination)

Thanks a lot for your hints, christop!

I'm facing a new problem and I haven't been able to solve it yet:
Receiving Json formatted char strings from an API they sometimes contain the strange character "…" In Unicode it is known as "U+2026 horizontal ellipsis"; as far as I know there is no ASCII code for this character.
Running an LED Matrix (Max7219) I use the function described above to be able to show special characters like ä,ö,ü,ß,€ etc. I'd like to replace the "…" by one ore more characters that cause no problems (+ or +++ for example). This should be implemented into the described function - I tried it, but unfortunately without any success. How could I change this function to achieve my goal?

uint8_t utf8Ascii(uint8_t ascii) {                                    //converts a single character
  static uint8_t cPrev;
  uint8_t c = '\0';
  
  if (ascii < 0x7f || ascii == degCascii || ascii == degFascii) {
    cPrev = '\0';
    c = ascii;
  } else {
    switch (cPrev) {
      case 0xC2: c = ascii;  break;
      case 0xC3: c = ascii | 0xC0;  break;
      case 0x82: if (ascii==0xAC) c = 0x80;   // Euro symbol special case
    }
    cPrev = ascii;                            // save last char
  }
  return(c);
}

Thanks in advance for your help!

I got help and found a solution.

I use this conversion function in combination with the MD_Parola Library for an LED-Matrix. This library supports own font sets, which can be generated easily by editors like this.

Keeping this in mind and knowing that in my own font set the "horizontal ellipsis" has the number 133, you can regard the solution posted by the creator of the library.