Serial wont read UTF-8

Hello,

I've been trying to read a serial communication that contains some UTF-8 characters but I can't get it to work. If I type the serial string in the serial monitor it show's correct but if I receive my string over serial it show's something completely different?

This is via an extern program wich shows the message I receive and the correct convertion
image

This is what my arduino reads and prints

You typed byte (Decimal): 21, Hex: 0x15
Non-printable character.
You typed byte (Decimal): 64, Hex: 0x40
Character: @
You typed byte (Decimal): 50, Hex: 0x32
Character: 2
You typed byte (Decimal): 0, Hex: 0x0
Non-printable character.

This is the script I'm using

void setup() {
  // Initialize serial communication at 1200 baud rate
  Serial.begin(1200);
}

void loop() {
  // Check if data is available to read from Serial Monitor
  if (Serial.available() > 0) {
    // Read the incoming byte
    byte incomingByte = Serial.read();
    
    // Print the byte value in decimal and hexadecimal
    Serial.print("You typed byte (Decimal): ");
    Serial.print(incomingByte, DEC);
    Serial.print(", Hex: 0x");
    Serial.println(incomingByte, HEX);
    
    // Check if the byte is a printable character
    if (incomingByte >= 32 && incomingByte <= 126) {
      Serial.print("Character: ");
      Serial.println((char)incomingByte);
    } else {
      Serial.println("Non-printable character.");
    }
  }
}

UTF-8 is used instead.

Based on your example, it looks like what you're receiving isn't a string at all, it's just some kinda bytes sequence. I think you need to better describe what you do to replicate that input of bytes, and what you mean with "can't get it to work".

Just as an example, remember that the serial monitor (like any generic terminal emulator) isn't the best tool to debug byte communications because is basically "human oriented" (i.e. send and receive "human-readable" characters), so how are you sending 0x0 or 0x15 bytes over the serial monitor?

Anyway, if you're sure the incoming data is UTF-8 (or Unicode...) characters you need to treat them as they are, arduino can't directly manage Unicode characters, you need some kinda translation. But all this strictly depends on what we're talking about and what you need to do with the received UTF-8 string...

Please do not change the title of your topic.

If Serial Monitor can not handle UTF-8, you have to use some other output device. Most pixel display libraries support at least the UTF-8 BMP

A character outside the ASCII range, when encoded as UTF-8, becomes usually two, but maybe three or four bytes... outside the ASCII range. This is by design. So every byte you receive should be 0x80 or greater. So yes, what you're getting is "completely different"

Squinting at your screenshot, the first character is þ (thorn), code point 0xFE. If you send all-ASCII abc, does that work? Then if you change that to aþc, what do you get instead of the expected

You typed byte (Decimal): 97, Hex: 0x61
Character: a
You typed byte (Decimal): 195, Hex: 0xC3
Non-printable character.
You typed byte (Decimal): 190, Hex: 0xBE
Non-printable character.
You typed byte (Decimal): 99, Hex: 0x63
Character: c

CE BE is the UTF-8 encoding for FE.

Those "extended" characters are by mutual agreement between the sender and receiver. You said it works in the Serial Monitor. It's the one that decides to send the text using UTF-8. If instead the sender uses Latin1 (ISO-8859-1) or Windows-1252 (for example), it would just send the 0xFE as-is.

That you're getting random-ish bytes, including 0x00 (NUL), points toward some other more fundamental problem.

It seems you are interfacing with a serial device driver on Windows that logs IRP_MJ_READ requests, indicating it is handling data transmission from a connected peripheral, possibly using a binary protocol at the kernel level.

However, 1200 baud feels suspicious because it is extremely slow. Are you sure about that?

Agree, and even trying to convert hex sequence to Unicode it throws an error.
So it doesn't look like Unicode characters anyhow, and the OP should either better investigate that communication or describe what he's trying to do.

Which UTF code did you expect?

None. Because that hex sequence doesn't look like to be a valid Unicode encoding, so I think it's just kinda binary response.

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.