shows some characters okay, and some not. The result is attached.
The same sketch shows every time different wrong characters.
Type 'i' and to show it.
Removing the 'F()' macro doesn't help.
With Arduino 1.5.8, Arduino Mega 2560, Linux 14.04 64-bit.
Using 1200 baud makes every UTF-8 character show wrong, and using 115200 makes almost 90% show okay. That is very strange.
Adding delays or using Serial.write() does not help.
Does Arduino support UTF-8 or not ? Or only now and then ?
Works perfectly for me, using Arduino IDE 1.5.6 R2, tested on two Arduino Mega 2560 (clones from Sainsmart and "DCcEle" (which, unlike other clones or original board, uses a CH340 UART) ), Windows 8.1 64, I tried different bauds and all worked perfectly!
Thanks everyone for looking into this.
I tested it with other serial terminal applications. Some support UTF-8, some don't, but the result is always consistant.
Only the Arduino serial monitor of certain versions show this random correct and incorrect output.
I would call this : UTF-8 to the serial monitor is not supported and buggy.
I can't attach in file in the Playground, and I need to do that because of a bug in "Get code". So I attach the file here and use in the playground : Arduino Playground - UTF-8
otherwise I got a an additional space (or maybe it's a '00' char) after each char (within the terminal of the arduno IDE under Win7).
I could not figure out, why that is the case (?)
That example is showing the extra bytes that are part of the wide string, showing that the contents is different to the usage of the F() macro. But this is essentially what the wide characters are. The problem is, your serial monitor is treating the data as ascii. If you save the text into a UTF file or view in a UTF enabled editor the characters will display correctly.
The way you have it outlined is how you could use it in code (if Serial.write accepted wchar_t). Except as you may notice, the second and fifth character do not print properly.
Serial.write() only prints a single byte (pgm_read_byte was used too). And using wchar_t as the pointer type causes pointer arithmetic which steps two bytes per pointer increment. ((wchar_t*)arr + idx)
Writing out 2 bytes for each character is not UTF-8; that's UTF-16LE (or UCS-2). And characters like the copyright symbol just happen to print correctly because the serial monitor is interpreting the bytes as some 8-bit (not UTF-8) character encoding, perhaps ISO8859-1 or Windows-1252.
yes you are right,
thanks for pointing that out
here in the Examples it's made clear:
--> UTF-8 - Wikipedia
(there are 1 byte up to 4 byte length UTF-8 characters - the € sign for example is three bytes long [E2 82 AC])
in the meantime I found out,
that the arduino IDE is seamlessly working with UTF-8.
that means
the F Macro
the PROGMEM modifier
the PSTR Macro
strcpy strncpy
strcpy_P
String class (including .length property)
sizeof (with array of chars)
everything works with UTF-8 strings of variable byte length
// Test for normal strings with UTF-8
// Public Domain
char three[] = "3µV";
const char four[] PROGMEM = "4µ€ ₠ ₡ ₢ ₣ ₤ ₥ ₦ ₧ ₨ ₩ ₪ ₫ € ₭ ₮ ₯ ₰ ₱";
String five = "5µF";
char six[] = "60€";
String seven = "70€₡₢₣₤₥₦";
char buffer[80];
void setup()
{
Serial.begin( 9600);
#if defined (__AVR_ATmega32U4__)
while(!Serial); // For Leonardo, wait for serial port
#endif
Serial.println("\n+++++++++++++++++++++++++++++++++++++++++");
Serial.println("Use a serial terminal that supports UTF-8");
Serial.println(F("1µ€ ₠ ₡ ₢ ₣ ₤ ₥ ₦ ₧ ₨ ₩ ₪ ₫ € ₭ ₮ ₯ ₰ ₱")); // Good, text in flash
// copy a string from flash memory to a buffer.
sprintf_P( buffer, PSTR("2µH")); // Good, text in flash
Serial.println( buffer);
// copy a string in ram to a buffer
strcpy( buffer, three); // Good
Serial.println( buffer);
// add one to strlen for the zero terminator
strncpy( buffer, three, strlen(three) + 1); // Good, strlen works with UTF-8 string
Serial.println( buffer);
strcpy_P( buffer, four); // Good, text in flash with PROGMEM
Serial.println( buffer);
// copy a string in flash to buffer byte for byte
for( int i = 0 ; i < sizeof( four) ; i++) // Good, sizeof works with UTF-8 string
{
buffer[i] = pgm_read_byte( four + i);
}
Serial.println( buffer);
Serial.println( five); // Good, a String class with UTF-8 character
Serial.print( "array of char: \"");
Serial.print( six);
Serial.print( "\", strlen=");
Serial.println( strlen(six));
Serial.print( "String object: \"");
Serial.print( seven);
Serial.print( "\", String.length()=");
Serial.println( seven.length());
Serial.println( "+++++++++++++++++++++++++++++++++++++++++");
Serial.println( "Enter a UTF-8 character and press <enter>");
Serial.println( "The hexadecimal value will be displayed.");
}
void loop()
{
if( Serial.available())
{
Serial.print( "You have entered: ");
delay(100); // allow the rest of the line to be received.
while( Serial.available())
{
byte c = Serial.read();
if( c != '\r' && c != '\n') // ignore trailing CR and LF
{
if( c <= 0x0F)
Serial.print( "0");
Serial.print( c, HEX);
Serial.print( ", ");
}
}
Serial.println();
}
}
leads to the (correct) output of a UTF-8 capable terminal:
(see also the strlen and String.length outputs...)
+++++++++++++++++++++++++++++++++++++++++
Use a serial terminal that supports UTF-8
1µ€ ₠ ₡ ₢ ₣ ₤ ₥ ₦ ₧ ₨ ₩ ₪ ₫ € ₭ ₮ ₯ ₰ ₱
2µH
3µV
3µV
4µ€ ₠ ₡ ₢ ₣ ₤ ₥ ₦ ₧ ₨ ₩ ₪ ₫ € ₭ ₮ ₯ ₰ ₱
4µ€ ₠ ₡ ₢ ₣ ₤ ₥ ₦ ₧ ₨ ₩ ₪ ₫ € ₭ ₮ ₯ ₰ ₱
5µF
array of char: "60€", strlen=5
String object: "70€₡₢₣₤₥₦", String.length()=23
+++++++++++++++++++++++++++++++++++++++++
Enter a UTF-8 character and press <enter>
The hexadecimal value will be displayed.
You have entered: E2, 82, AC,