Non-standard compiler behaviour?

Hi,

I've been trying to code some strings, including non-printable characters, to send to a serial port for testing purposes. I see that:

char messageC = "\x2c1234567\x2c\x0d";

is interpreted by the Arduino 1.8.3 desktop environment as a 3 character string (plus null terminator) equivalent to
"\x67\x2C\x0D"

The C / C++ references I am familiar with all state that \x is used to convert a 2 digit hexadecimal number. I would expect the example shown to compile the same as
"'1234567'\x0d" because \x2c equates to a single quote mark, or
"\x2c\x31\x32\x33\x34\x35\x36\x37\x2c\x0d"

I discovered this because I wanted to replace an incorrect checksum character in a string - specifically the character that was after \x01, with A. I didn't want to look up A = \x41. I should have been able to put A in the string with a preceding character of \x01 (i.e. "\x01A") but the compiler would not create 2 characters, rendering all my prior C knowledge irrelevant. I can not find escape sequences specifically in any Arduino documentation so I'm not sure why it should differ from the behaviour of documented compilers.

Many thanks for everyone's efforts in getting Arduino to be such a great usable tool. Perhaps this can help it be perfect.

Hexadecimal escape sequences have no length limit and terminate at the first character that is not a valid hexadecimal digit. If the value represented by a single hexadecimal escape sequence does not fit the range of values represented by the character type used in this string literal (char, char16_t, char32_t, or wchar_t), the result is unspecified.

http://en.cppreference.com/w/cpp/language/escape

So it it is well documented and expected behaviour.

I can not find escape sequences specifically in any Arduino documentation

Arduino uses standard gcc. They decline to fully document standard gcc…

The C / C++ references I am familiar with all state that \x is used to convert a 2 digit hexadecimal number. I would expect the example shown to compile the same as
“‘1234567’\x0d” because \x2c equates to a single quote mark,

Really? Do you have an example compiler that does what you expect? All the gccs that I have, and also clang/llvm, issue a warning and presumably do the same thing as the AVR gcc… Microchip xc8 1.33 is silent but generates the same 0x672c0d00 string.

BillW-MacOSX-2<4989> /usr/local/CrossPack-AVR-20100115/bin/avr-gcc -c xxx.c
xxx.c:1:19: warning: hex escape sequence out of range
BillW-MacOSX-2<4990> /usr/local/avr8-Atmel-3.6.0.487/bin/avr-gcc -c xxx.c
xxx.c:1:19: warning: hex escape sequence out of range
 char messageC[] = "\x2c1234567\x2c\x0d";
                   ^
BillW-MacOSX-2<4991> gcc-fsf-4.7 -c xxx.c
xxx.c:1:19: warning: hex escape sequence out of range [enabled by default]
BillW-MacOSX-2<4992> clang --version
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 6.0 (clang-600.0.57) (based on LLVM 3.5svn)
Target: x86_64-apple-darwin13.4.0
Thread model: posix

BillW-MacOSX-2<4994> clang -c xxx.c
xxx.c:1:20: error: hex escape sequence out of range
char messageC[] = "\x2c1234567\x2c\x0d";
                   ^~~~~~~~~~~
1 error generated.

Yes, if encountered that same problem. The hex escape is greedy. The fix

char messageC = "\x2c""1234567\x2c\x0d";

Thanks for your replies.

I had already found the reference in Whandall's link and read as far as:
"\nnn arbitrary octal value byte nnn
\xnn arbitrary hexadecimal value byte nn
\unnnn ..."
without noticing the footnote he quotes. My reading simply confirmed the long-time incorrect assumption I have apparently held. Many thanks for quoting the footnote and clarifying the matter.

I started C by reading Kernigan & Ritchie and using a Borland C cross-compiler. Checking a Borland C compiler manual from 1987, I do note that their implementation did limit the number of characters in hex escape sequences in the same way that the length of octal escape sequences are limited. That would appear to contradict "C A reference Manual, 3rd edition 1991" that cautions about hex escape sequences being greedy (without actually using the word "greedy").

More recently I have used CCS C and Microchip XC compilers without being tripped up by this issue.

As far as the fix goes, personally I would avoid "\x2c""1234567\x2c\x0d".
I would consider "\x2c" "1234567\x2c\x0d" instead (with space added) because I'm too used to seeing the embedded "" sequence used in another language (instead of " in C).

Thanks for your feedback. I have learnt something.