Is String function family inherently dangerous?

I don't want to start a religious war or anything, so please forgive what might be a loaded question. I know people can get very passionate about coding style.

I seem to have a buffer overrun issue in a sketch, and some of my googling revealed that... some people feel pretty strongly that String family functions are bad news, should not be used, don't manage memory properly, and possibly rot your teeth and make your hair fall out. And can lead to buffer overruns.

Is this one of those "old Turk" things? There are always traditionalists in coding style as in everything else. Or is it really something to beware of? Is the Arduino implementation of String sloppy?

I use String a lot, because I come to Arduinistan from the world of weakly-typed high level scripting languages, with really strong string processing functions. Trying to manipulate strings as char pointers etc feels so clunky to me, almost like going back to assembler. So I do use String everywhere because it's quick and powerful and easy and feels less alien than wrestling with all that type transformation and pointer management.

But if String is bad news, like "a disaster waiting to happen," then I guess I should grit my teeth and start eliminating it from my handwriting. So I appeal for opinions:

  1. Is String dangerous even when used correctly?
  2. Is it very easy to use it incorrectly?
  3. What are the most common mis-uses of it, if so, that I should be looking for in my code as I hunt for the horrid memory management goof that has stopped my project dead in its tracks?

PS I do have all compiler warnings turned on, and I make sure I have either eliminated or at least well understood every warning before I throw code at the Arduino.

Tazling:

  1. Is String dangerous even when used correctly?
  2. Is it very easy to use it incorrectly?
  3. What are the most common mis-uses of it, if so, that I should be looking for in my code as I hunt for the horrid memory management goof that has stopped my project dead in its tracks?

Yes. micro controllers don't have megs of RAM to allocate/deallocate/garbage collect so using String will most certainly fragment your memory.

I got some popcorn ready. I'm only replying so I can get this on my updated topics list.

The Evils of Strings tells why Strings are discouraged in the Arduino world and how to use string functions in their place and will answer your questions.

Related question: are strings still bad, even on on Due?

westfw:
Related question: are strings still bad, even on on Due?

Or indeed on the ESP32 with its abundance of RAM.

I have applications running on Due that dynamically allocate thousands of message buffers per second, and have never had a problem, even running for many, many hours.

Regards,
Ray L.

blh64:
Yes. micro controllers don't have megs of RAM to allocate/deallocate/garbage collect so using String will most certainly fragment your memory.

I think that is the keypoint. Rather than saying "String is Evil on Arduino", I think if a general rule like
"if dynamic memory used is more than x% of total memory available then String is not advisable"
it would be more useful.

but how to define 'x' is beyond me...

As described in The Evils of Strings referenced above, memory fragmentation is the biggest issue you will confront. If you only have less than 2k before you run out, this can be difficult to manage, especially if you are not aware what is going on under the hood. That is the main reason why the character array strings are preferred by many - you can control what is happening.

However, you could write code that will work indefinitely using Strings. It all depends on the code and how the memory comes and goes from the heap.

So to answer your questions:

  1. Is String dangerous even when used correctly?

No, but it can be difficult to use it correctly, especially for beginners, in a low RAM environment.

  1. Is it very easy to use it incorrectly?

Yes.

  1. What are the most common mis-uses of it, if so, that I should be looking for in my code as I hunt for the horrid memory management goof that has stopped my project dead in its tracks?

You should try using the freemem() function (search for it) to print the amount of memory available in the heap. If you see that dropping over time then you have a memory leak.
I would then convert the Strings to C char arrays and see when the problem goes away. This is not a hard job, just requires some additional knowledge about the library functions to use. All are very well documented as part of the standard C libraries.

What an excellent answer marco_c.

As it so happens I have found the source of my memory stomp. I had some bizarre behaviour in Serial.print that was not quite, but almost reproducible. Hmmm says I, that looks like buffer overrun or what would be a SEGV with a crashdump in a kinder, better world (I do miss gdb). So after a bit of reading I suspected my Strings. The usual suspects as you might say.

However, an hour or two of head scratching and code pruning this evening has revealed the (embarrassing) error and it has nothing to do with String, but with a struct array that thought it was big enough and was not :slight_smile:

Found some useful code in the process which I will append here as it seems like others could benefit. There’s a library called MemoryFree but it doesn’t work with the Due, but this code seems to:

#include <stdlib.h>
#include <stdio.h>
#include <malloc.h>

extern char _end;
extern "C" char *sbrk(int i);
char *ramstart=(char *)0x20070000;
char *ramend=(char *)0x20088000;
char *heapend=sbrk(0);
register char * stack_ptr asm ("sp");
struct mallinfo mi=mallinfo();

void showMem() {
 Serial.print("\nDynamic ram used: "); Serial.println(mi.uordblks);
 Serial.print("Program static ram used: "); Serial.println(&_end - ramstart); 
 Serial.print("Stack ram used: "); Serial.println(ramend - stack_ptr); 
 Serial.print("My guess at free mem: ");  Serial.println(stack_ptr - heapend + mi.fordblks); 
}

I confess this is mostly gibberish to me, but I pasted it into my sketch and found it interesting to watch the numbers as I built up my struct array. My whole sketch uses less than 10 percent of Due memory so I don’t think I’m in much danger of “the stack and the heap overlapping” :slight_smile:

sherzaad:
I think that is the keypoint. Rather than saying "String is Evil on Arduino", I think if a general rule like
"if dynamic memory used is more than x% of total memory available then String is not advisable"
it would be more useful.

but how to define 'x' is beyond me...

i am pretty sure 'x' should be 50 or less then you are way safer, actually i have couple of projects on site running on Arduino Mega 2560, with bunch of Strings in it, dynamic mem. reads 30% upon compiling and never had issues for months (6+) now while being on site

played with couple of ESP8266 , never used a single char pointer/array in it and the dynamic mem. reads 31% upon compiling, bunch of bunch of Strings are in there and never had issues with it

I once tried to go for 'string' ( lower case 's' ) in nano+sim800L related project , tested it with 'receiving sms' and displayed it on serial monitor' which worked pretty well for couple of minutes , then , sms was displayed truncated , in short the use of 'string' was not that reliable at all, i gave up,went back to old time friend 'String'

  1. What are the most common mis-uses of it,
String mybuffer = String("");
   :
if (Serial.available() {
    mybuffer += Serial.read()
       :
}

Absolutely terrible!

"if dynamic memory used is more than x% of total memory available then String is not advisable"

This is a key statement. For all I know, it's not so much String that is broken, but the avr-libc implementation of malloc() (the general "dynamically allocate some memory" function.) It could be that malloc() is just too simple for String, in terms of coalescing free memory, in the name of saving code space (remember that the avr-libc dates to when many AVRs didn't have much flash space, either.) malloc() implementations can get ... arbitrarily complex (and then you get to add "fast_malloc()" because malloc() has gotten too slow... :frowning: ) (and of course, the avr-libc malloc() doesn't get improvements because "it sorta works and we don't want to make it worse" and "no one really uses dynamic memory allocation on small memory systems anyway.")

I suppose it would be interesting to instrument malloc() and String() and see what's really happening inside...

KASSIMSAMJI:
I once tried to go for 'string' ( lower case 's' ) in nano+sim800L related project , tested it with 'receiving sms' and displayed it on serial monitor' which worked pretty well for couple of minutes , then , sms was displayed truncated , in short the use of 'string' was not that reliable at all, i gave up,went back to old time friend 'String'

that’s probably because you did not know what you were doing and nothing to do with c-string which are extremely reliables and been around for decades...

srnet:
Or indeed on the ESP32 with its abundance of RAM.

On an ESP they are not so bad, i have no hard feelings about them, mind you all webserver functions take char* as arguments, so you'll be converting some of the time, but the String library for the ESP compiler as way more flexible. It is something to keep in mind that big Strings are fine on an ESP, but have memory overrun on a normal Arduino, specifically global Strings that are increased locally.

Delta_G:
I got some popcorn ready. I'm only replying so I can get this on my updated topics list.

Any refills needed?

marco_c:
You should try using the freemem() function (search for it) to print the amount of memory available in the heap. If you see that dropping over time then you have a memory leak.

But can you use it to get an indication of the extent of memory fragmentation?

It seems to me that memory fragmentation is bad even on a microcontroller with lots of memory. That only means it will take longer before the fragmentation causes a problem.

You also might want to do a comparison between the overhead from the use of string vs. String in your application. It can be quite a significant difference. Whether that matters would depend on how much memory you have to waste.

Deva_Rishi:
the String library for the ESP compiler as way more flexible.

How is it more flexible? I haven't heard anything about that before.

I don't know the ins and outs of the how and why (i haven't seen the code) but using large Strings (more than 256 bytes ) also as Global variables and increasing them in size at any place is not a problem at all. I suspect the Library is either declaring the Strings in a separate part of memory or managing the memory a lot more precise. There really is just a lot more memory. Even if you extend a local String in a function on an Arduino, if i have had issues above 256 bytes (which on a nano is 1/8th of all the available memory mind you..) on my ESP's i've never encountered any problems at all. Anyway since for a webpage you'd normally put all the html data in a String first and then do server.send(); (hey that one does take a String) so one would expect it to deal with it more elegantly.

I believe that ESP8266 has a version of the standard “full” C++ STL “string” library.
(and a better malloc.)

(alas, I find “standard STL libraries” to be nearly unreadable, so it’s hard to tell how it works.
Here’s the code for appending a char to a string from the esp8266 install. (I think.)

  template<typename _CharT, typename _Traits, typename _Alloc>
    basic_string<_CharT, _Traits, _Alloc>&
    basic_string<_CharT, _Traits, _Alloc>::
    append(size_type __n, _CharT __c)
    {
      if (__n)
	{
	  _M_check_length(size_type(0), __n, "basic_string::append");	  
	  const size_type __len = __n + this->size();
	  if (__len > this->capacity() || _M_rep()->_M_is_shared())
	    this->reserve(__len);
	  _M_assign(_M_data() + this->size(), __n, __c);
	  _M_rep()->_M_set_length_and_sharable(__len);
	}
      return *this;
    }

pert:
It seems to me that memory fragmentation is bad even on a microcontroller with lots of memory. That only means it will take longer before the fragmentation causes a problem.

This is key, and the fact that reserving space for a String can improve matters both mean that your testing is impacted.

Yesterday, a poster presented a sketch using String that crashed in two to three weeks. Assuming that that sketch was improved with reserve, how long now should the test period be to be convinced that everything's fine?

Worse, what if only an error condition or occasional combination of events causes fragmentation? Using Strings on something you want to run for a long time means that there's a risk that they'll eventually break your system. If that matters, Strings were a bad choice.

So on something unimportant, that doesn't need to run for days, sure, use String if you want to. Even on a complex long running piece of code, you may get away with it. But why take the risk if you don't have to?

westfw:
I believe that ESP8266 has a version of the standard “full” C++ STL “string” library.
(and a better malloc.)

(alas, I find “standard STL libraries” to be nearly unreadable, so it’s hard to tell how it works.

If I don’t get this wrong, the ESP8266 is using (in the Arduino environment) its own wstring class and is using concat (not append)

The code you posted comes from the templated base class named basic_string and no ones works with this class directly :slight_smile: It is for more complete systems with reference counting etc. In that case string functionality in the standard library lives in the <string> header file which includes 3 different string classes. The basic_string you posted and developers actually use one of the two other class

typedef basic_string<char> string;
typedef basic_string<wchar_t> wstring;

string is used for standard ascii (utf-8) strings)
wstring is used for wide-character/unicode (utf-16) strings. (which Arduino uses).

All the functionality of the Strings are implemented in the basic_string class and string and wstring are able to access that capability directly by magic of templates

(the code you posted though is one of the underlying append method. usually you just call append with a String to concatenate, not the size you want add)

the only “tough” part in what you posted is in || _M_rep()->_M_is_shared() and I believe this is there to handle shared deep data and allow for copy on write (ie share memory between instances referring to the same String until you need to differentiate). Arduino environment implements full copy of the deep data so does not need this