Strcmp or strcasecmp unicode string comparison

Hello everyone,

These 2 functions mentioned in the title, don't seem to work with utf8 strings. Is there any other solution to compare utf8 strings?

I think strcmp does work.

try this

const char * str1 = "éàô";
const char * str2 = "hello";
const char * str3 = "éàô";
const char * str4 = "ÉÀÔ";


void setup() {
  Serial.begin(115200);
  Serial.print("comparing 1 and 2 => "); Serial.println(strcmp(str1, str2));
  Serial.print("comparing 1 and 3 => "); Serial.println(strcmp(str1, str3));
  Serial.print("comparing (no case) 1 and 4 => "); Serial.println(strcasecmp(str1, str4));
}

void loop() {}

you should get no match for the first comparison and a match (output 0) for the second one and no match for the last one. ➜ case insensitive comparison is probably not working because the C library toupper() does not handle UTF8.

Unicode standard specifies a set of rules to fold characters to their lower case form and it's not simple. If you want to sort UTF8 strings it's even more complicated, see Unicode Collation Algorithm

1 Like

Well, that's weird. Your example seems to work fine but when I tried something like that, it didn't work.

I decoded a UCS2 sms and while it printed the exact string I wanted on serial monitor, strcmp didn't work.
Maybe I must test it more.

Thank's!

Share more info, like the exact byte array of the message

You need to be careful with unicode characters, a single character may take several bytes of storage.

Allow me to explain.

I've made this snippet that shows exactly what I'm aiming for:

char smsMessage[140];  //sms message
memset(smsMessage, '\0', sizeof(smsMessage)); // this may not be needed.
snprintf(smsMessage, sizeof(smsMessage), ("%s"), mypdu.getText()); //mypdu.getText() is for decoding the message from UCS2 format.
Serial.println(smsMessage); // this contains "ΕΝΗΜΕΡΩΣΗ" or "UPDATE"

if (strcmp(smsMessage, "ΕΝΗΜΕΡΩΣΗ") == 0 || strcmp(smsMessage, "UPDATE") == 0) {
//valid command
}

This doesn't work.

I also tried this without any luck.

if (strcmp_P(smsMessage, PSTR("ΕΝΗΜΕΡΩΣΗ")) == 0 || strcmp_P(smsMessage, PSTR("UPDATE")) == 0) {
}

If the message is in english, it works, but if it is in greek (utf8) it doesn't compare it right

Print our the actual hex code for each byte of the SMS message, and do the same for the text you are comparing it to, then you can see where it is going wrong.

Can you share that ? (On top of the actual byte that made it)

This is a function of pdulib.
You can find it here: GitHub - mgaman/PDUlib: Encode/Decode PDU strings for use with most GSM modems. Both 7 bit and 16 bit alphabets are supported.

I'll try first what David recomends and I'll be back with my results.

Thank's again!

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.