String concatenation problems with +

In my last project I tried to send data between an Arduino UNO and my PC through and Adafruit HUZZAH ESP8266 module, reading sensors, controlling servos etc.
To allow for further error handling, both the cards processed the incoming string data, using its first char as mode selection for example, and at the end of each state, the data was reassembled and sent to the next device.

The problem was, that the reassembled string was different from the original one. For the reassembly I used the String() command, putting the pieces together from a bunch char and String constants and variables (by constant I mean "string" and 'char') using + operator inbetween them. The error was present both on the UNO and the ESP8266.

I created a test code especially to uncover what the problem was, and how it works. I think it is an inconsistency of how the + operator works within the String() command.

The code, containing the result of the Serial monitor too:

char ch = 'A'; // Variable stored char
String str = "asd"; // Variable stored string
String conc; // String for concatenation

bool test = true; // Run test bit

void setup() {
  Serial.begin(115200);
}

void loop() {
  if (test){ // Runs only once
    
////////////  COMPARISON OF PREMADE AND AUTO CONCATENATION //////////////////
    conc = String(ch+'_'+str+"_");
    Serial.print(conc); // BUG HERE
    Serial.print('\n');
    Serial.print(ch+'_'+str+"_"); // BUG HERE
    Serial.print('\n');
/////////// FIX BY STRING CONVERSION /////////////////////
    Serial.print(String(ch)+'_'+str+"_");
    Serial.print('\n');
    Serial.print('\n');
/////////// TESTING SINGLE CHAR CONVERSION //////////////////////
    Serial.print('A');
    Serial.print('\n');
    Serial.print(ch);
    Serial.print('\n');
    Serial.write('A');
    Serial.print('\n');
    Serial.write(ch);
    Serial.print('\n');
    Serial.print('\n');
/////////// TESTING WITHOUT CHAR VARIABLE /////////////////////
    Serial.print('A'+'_'+str+"_"); // BUG HERE
    Serial.print('\n');
    Serial.print('\n');
/////////// TESTING WITHOUT VARIABLES ///////////////////
    //Serial.print('A'+'_'+"asd"+"_"); SYNTAX ERROR
    //Serial.print('\n');
    //Serial.print('\n');
/////////// TESTING WITH ONLY STRING CONSTANTS //////////
    //Serial.print("A"+"_"+"asd"+"_"); SYNTAX ERROR
    //Serial.print('\n');
    //Serial.print('\n');
/////////// TESTING WITH ONLY CHAR CONSTANTS /////////////
    Serial.print('A'+'_'+'a'+'s'+'d'+'_'); // BUG HERE
    Serial.print('\n');
    Serial.print('\n');
/////////// TESTING WITH NO ADJACENT CHARS /////////////
    Serial.print('A'+"asd"+'B'+str); // BUG HERE
    Serial.print('\n');
    Serial.print('A'+"asd"+str+'B'); // BUG HERE
    Serial.print('\n');
    Serial.print('\n');
/////////// TESTING WITH STRING START /////////////
    Serial.print(str+'A'+'_'+'a'+'s'+'d'+'_');
    Serial.print('\n');
    Serial.print("asd"+'A'+'_'+'a'+'s'+'d'+'_'); // BUG HERE
    Serial.print('\n');
    Serial.print(str+'A');
    Serial.print('\n');
    Serial.print("asd"+'A'); // BUG HERE
    Serial.print('\n');
    Serial.print(str+ch);
    Serial.print('\n');
    Serial.print("asd"+ch); // BUG HERE
    Serial.print('\n');
    Serial.print('\n');
////////// TESTING STRING CONST+VAR ///////////////
    Serial.print("asd"+str);
    Serial.print('\n');
    Serial.print(str+"asd");
    Serial.print('\n');
    Serial.print('\n');

    test = false;
  }
}

// CONCLUSION_1: Starting String() with a char (variable or constant) will add further
// characters to it with byte value, instead of concatenation until the first String
// variable (but not constant).

// CONCLUSION_2: Concatenating string constant with char(s) (variable or constant) with
// the string and at least one char being befor any string variables will cause the data
// to disappear or collapse.

// CONCLUSION_3: Adding more than one string constant to concatenate before a string variable
// will result in a data type mismatch error.

/* SERIAL MONITOR RESULTS (115200 baud):

160asd_
160asd_
A_asd_

A
A
A
A

160asd_

567

A
A
A
A

160asd_

567

asd
asdB

asdA_asd_
}⸮N⸮⸮0⸮⸮⸮k⸮⸮⸮⸮⸮#⸮D⸮⸮⸮x
?⸮⸮⸮⸮⸮o⸮5⸮⸮⸮#⸮⸮⸮⸮ϭ|⸮Os+wE⸮?w?⸮⸮⸮⸮^⸮⸮⸮r⸮⸮⸮-|z⸮>⸮Fݻ⸮7⸮⸮tYx⸮⸮⸮d⸮⸮⸮=⸮2⸮{⸮
⸮⸮⸮⸮⸮%⸮ۛ⸮⸮⸮⸮⸮⸮r⸮⸮;>ڝ[_⸮⸮⸮⸮⸮⸮}⸮Wޡo͏⸮r⸮⸮⸮4⸮⸮⸮e⸮⸮}/Ͽ⸮⸮⸮w⸮⸮Y̞⸮⸮3j6⸮S`⸮⸮⸮⸮⸮⸮⸮⸮?⸮\t⸮ɴS⸮B⸮⸮r⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮Dm⸮⸮.x⸮!⸮⸮⸮⸮O⸮⸮⸮  ⸮_⸮⸮}⸮⸮⸮⸮+n⸮⸮⸮⸮⸮LM⸮L[⸮⸮⸮⸮⸮⸮⸮[?⸮⸮m⸮~}⸮J⸮⸮⸮C=⸮⸮⸮⸮o⸮⸮Cv~⸮⸮⸮;⸮h{{S⸮⸮⸮⸮⸮⸮⸮⸮k⸮⸮⸮w⸮⸮e~⸮x⸮3⸮⸮⸮j5⸮⸮⸮{⸮/ҭ⸮⸮⸮⸮⸮1⸮G⸮⸮⸮⸮\;⸮⸮Ce⸮⸮⸮܈⸮⸮⸮GA⸮ջ⸮+⸮-c⸮⸮7⸮V_⸮q⸮⸮Swj⸮⸮⸮⸮{{x⸮⸮⸮w⸮R;⸮⸮⸮⸮⸮wl⸮T⸮ݿ⸮<⸮⸮⸮Ǧ⸮⸮⸮⸮⸮d⸮ۼ⸮⸮w⸮⸮⸮⸮⸮⸮o⸮⸮⸮⸮;⸮W8⸮⸮{⸮⸮
⸮⸮⸮⸮e⸮⸮⸮⸮~t;⸮~⸮⸮n⸮⸮W⸮[⸮w⸮⸮N+R⸮u⸮|⸮9⸮r⸮⸮⸮⸮⸮⸮)⸮⸮⸮⸮⸮/i.⸮>w⸮.⸮_.⸮⸮⸮G⸮k~X⸮⸮o⸮⸮⸮⸮⸮⸮⸮~⸮⸮$⸮⸮⸮2⸮⸮⸮⸮}⸮⸮⸮⸮?⸮=⸮⸮⸮⸮z\_⸮_⸮⸮⸮⸮⸮ίʯ1⸮⸮⸮t⸮⸮s⸮d~⸮1⸮.⸮ӿ⸮}⸮?{⸮⸮E⸮݀nX⸮1⸮ͭ⸮⸮⸮⸮*⸮⸮'⸮{⸮⸮⸮o⸮?|⸮⸮⸮O⸮⸮⸮B,O⸮<⸮⸮j⸮⸮s?⸮Ol֗^⸮YW]⸮⸮?⸮o⸮⸮8⸮Y~wl⸮⸮em⸮⸮⸮⸮⸮⸮⸮⸮3⸮ݼ⸮⸮~⸮⸮w⸮攏⸮֯⸮⸮⸮⸮߯⸮/⸮k⸮⸮⸮G⸮⸮⸮7*߻?,⸮⸮⸮Զ~>⸮⸮Y⸮\⸮ޗ⸮⸮⸮⸮Պk⸮⸮=⸮⸮?⸮⸮⸮⸮⸮⸮⸮⸮⸮|⸮⸮⸮⸮⸮g⸮⸮⸮⸮⸮⸮⸮⸮v⸮⸮⸮_+⸮⸮⸮⸮⸮S⸮⸮;⸮R⸮⸮Ǎ⸮⸮⸮cѻ⸮/⸮⸮,⸮⸮⸮'⸮pgUsF⸮/⸮⸮;⸮⸮"ϝ⸮⸮⸮_F>ߒn⸮⸮u?m⸮⸮⸮⸮⸮?⸮YM뫠⸮⸮⸮⸮n}⸮⸮⸮⸮'⸮⸮⸮⸮⸮\⸮Z⸮⸮"w⸮⸮m⸮?⸮⸮⸮elk⸮9⸮⸮⸮⸮Ξ⸮s˻⸮⸮⸮⸮x3l⸮⸮⸮ߏ⸮⸮⸮⸮⸮Q⸮⸮g⸮L⸮E{⸮]⸮⸮⸮⸮⸮̵Ͽ'⸮'⸮k⸮⸮e⸮gm⸮٧⸮⸮Y⸮/⸮⸮{m⸮^⸮⸮⸮Blof⸮ݳ~⸮h⸮⸮⸮⸮o⸮f⸮k⸮pG⸮=⸮j⸮⸮7⸮?~⸮k⸮⸮-V⸮ԣ⸮⸮;⸮⸮ey⸮+x⸮]G⸮⸮=⸮⸮⸮i⸮n⸮b⸮4⸮k⸮⸮<⸮?⸮ߓ]⸮⸮⸮⸮;:>⸮⸮⸮;v⸮g⸮}⸮⸮⸮}⸮⸮⸮ݿ⸮⸮+⸮/⸮˾⸮f⸮⸮4w⸮⸮]⸮u⸮⸮⸮⸮⸮⸮⸮⸮?⸮⸮⸮⸮⸮ѹ⸮⸮q[⸮⸮⸮⸮⸮⸮⸮⸮v7⸮⸮⸮⸮⸮⸮⸮j⸮O⸮⸮⸮e⸮⸮⸮⸮⸮⸮q⸮⸮R⸮⸮ްLw⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮w⸮_⸮ys⸮k⸮32L.⸮⸮]⸮֞⸮{⸮⸮⸮⸮7⸮⸮⸮⸮⸮Χ⸮⸮⸮g⸮⸮IW{⸮~e⸮⸮⸮⸮u⸮⸮q⸮~⸮O=<~⸮⸮⸮⸮~⸮⸮;⸮⸮⸮⸮J⸮⸮⸮⸮⸮⸮u⸮⸮⸮⸮+oM⸮`⸮⸮⸮⸮N⸮ͯ7z⸮w⸮M⸮⸮⸮C⸮⸮⸮^⸮⸮Ȯ⸮⸮⸮⸮⸮~⸮⸮⸮5⸮⸮q⸮⸮⸮⸮,ai
asdA

asdA


asdasd
asdasd

 */

I think the main problem is, that before a string variable, or freshly converted string comes in the line, the + operator tries to add the components, instead of concatenating them. But after a string variable the + is always concatenating.

I made three observations in the test, which I repeat here:

  1. Befor a string variable (or fresh convert) comes in the equation, all the char variables and constants will be added together with their byte value, instead of concatenation. This value is then turned into string, as in a number to string conversion.

  2. A string constant can not stop the addition of chars, instead the whole data up until that point will vanish or break.

  3. Only one string constant can come befor a string variable in the equation, othervise the translation will fail with a data type mismatch error.

I'm not really sure that all this is a bug, or simply a consequence of automatic format conversion conflicting with the concatenation.
Both being able to calculate equations, turning the result into string withing a single line, and concatenating lots of various data into a string in a single line make sense. However the two is obviously conflicting because the "+" operator they both use.

I since solved my problem by applying conversion to the first chars, but took quite some time to even find out what was going wrong (though that was partly because how inexperienced I am).

Order of operations issue...

More details:
In Serial.print(ch+''+str+""); // BUG HERE, all of the "plus" operators have the same precedence, so evaluation proceeds from left to right. ch + '_' is char plus char (or maybe int), and is well defined, but it does NOT result in a string (as you discovered.) By C rules, it should result in an int (or maybe just a char.) I'm not really sure whether int+string concatenates the character onto the string, or whether it will produce a nice number for you
'A' + String("BCD") might be "65BCD" ?

So it all makes sense, even if it's annoying and not what you expected.

westfw:
So it all makes sense, even if it's annoying and not what you expected.

I know, that it makes some sense, that's what I wrote at the end. Still, it's pretty damn annoying.
Since the + operator is redefined in the operation after it concatenates at least once insted of addition, and even addig multiple chars after that, the resulting string would be OK.

That's why I deciced to interprete this as a bug, because it only happens to the starting chars. I don't know whether this can be fixes at all, due to the evaluation order working as you stated.

westfw:
I'm not really sure whether int+string concatenates the character onto the string, or whether it will produce a nice number for you
'A' + String("BCD") might be "65BCD" ?

It will become a string, like you said, wtih "65BCD" as result.

westfw:
So it all makes sense, even if it's annoying and not what you expected.

One of the most important reasons people chose Arduino is that it's not annoying to program, unlike usual microcontrollers.
Again, I was reluctant to write about this issue at all, but this was my main reason after all.

It is not a good idea to use the String (capital S) class on an Arduino as it can cause memory corruption in the small memory on an Arduino. Just use cstrings - char arrays terminated with 0.

...R

There is no real problem; the C or C++ language and compiler do not embed a String class concept by default and thus there is no automatic promotion to the String class when you work with things that could be String. The compiler and language default to standard rules.

This is actually already documented see the part

Caution: You should be careful about concatenating multiple variable types on the same line, as you may get unexpected results. For example:....

If you want to ensure the + operation works as String concatenation you need to ensure the elements on which the + will operate on are of the String class. Then there is no ambiguity to the compiler for what is your intent.

As Robin states it's not a good idea anyway to use the String class (capital S) on small micro-controllers. the memory toll and risk of fragmentation is pretty high depending on how the class is being used, you are usually better off with using c-strings (null terminated char arrays) and associated functions (and others such as from cstdlib)

You can read this blog post for context on the String class

Why was the topic moved, was it really that off? And why wasn't there any notification about it? It took me an hour to find it.

Robin2:
It is not a good idea to use the String (capital S) class on an Arduino as it can cause memory corruption in the small memory on an Arduino.

Why is it an option then? It's in the Arduino language documentation, without any mention about possible corruption issues mentioned.

J-M-L:
This is actually already documented see the part...

Good thing that when I Googled the string issue, it did not even get into the first 3 pages... If this is such an important issue, maybe should be noted as such, not on the side page of a side page, as an example.
Even the page of the "+" operator does not mention any possible issues, just the example page for it.

It also uses a terrible example. In my case the two appearing problems were:

  1. The + was implemented as numeric addition on chars befor a String came in line.
  2. Number to string conversion happened with the ASCII value, instead being used as such for the chars befor a String came in line.

Showing any of this would be much more clear, than only saying that improper initialization may sometimes cause problems.

It is not a good idea to use the String [class...]

Why is it an option then?

Because C-style strings are "moderately difficult" to "awful", compared to the rest of the Arduino core features. :frowning:
(Among other things, you'll need to understand pointers. And the dynamic memory allocation problems associated with Strings don't actually go away, you just have to either do them yourself, or set things up to avoid them, both of which have their own problems...)

char pointers and non null terminated c-strings can also cause memory challlenges if you don't use them properly... used Correctly or for small transient programs or on larger arduino such as ESP or ARM based boards (where you have more giggle room in memory) Strings are an option.

As everything in programming - better understand the side effects (if any) of what you use when you get a layer of abstraction from the hardware. Devil often lies in the hidden part.

And when it comes to concatenation it's really about C++ and What is part of the standard and what is implementation dépendant - you need to do your homework there or use strongly typed elements (some modern language will force you to do so) to remove any ambiguity. The compiler is perfectly right to perform maths with chars and the + operator.

(And not sure how you searched but google gave me the link I shared in the first few results - not sure what terms I used to google for it)

pizagb:
Why was the topic moved, was it really that off?

Where was it before this - it certainly seems to be in the right place now.

And why wasn't there any notification about it? It took me an hour to find it.

I always bookmark all Threads that I start.

Until recently when a Thread was moved it left a residual link in it's original location. I have said elsewhere that it would be a good idea to reinstate that.

...R

Since the + operator is redefined in the operation after it concatenates at least once insted of addition, and even addig multiple chars after that, the resulting string would be OK.

The first part of that statement is WRONG. The + operator is NOT redefined. It is overloaded by the String class FOR String instances ONLY.

You REALLY need to look at how the operator is defined. In particular, is it designed to have a String on only one side of the operator? Or is it properly overloaded to allow a String on either side?

Why was the topic moved, was it really that off? And why wasn't there any notification about it? It took me an hour to find it.

things move around :slight_smile:

if you want to find one of your post, just click on your name and the show post link and you'll see all yours. You can then just click on a title to get to your post, wherever it is