The HATRED for String objects - "To String, or not to String"

I've seen it so often now if as a whole the community is anti string to the extent it annoys members here...

Why not remove it (as in disable it as an option or something along the lines, default off you have to explicitly enable it somehow) ? obviously nobody likes it
instead replace it with a few handy char routines instead?

Why not remove it

Mostly because the problems are not all with the String class. Its design is fine. It is the implementation that leaves a bit to be desired. The real problem, though, is that the free() function has a fundamental flaw that needs to be addressed.

Once that is, the String class will be marginal. It will be fine for small strings - a dozen characters read from the serial port, for instance.

instead replace it with a few handy char routines instead?

Like strcat(), strcpy(), strtok(), strcmp(), you mean? These already exist, and are well documented, and can be used instead, today. No change needed.

You can get away with using the String Class with some care taken.. usually.

Problem is that many people don't know there's a reason to take care until it bites them, and then they post about a problem with crashes. You don't see many posts from people who managed to stay within limits, like the 2K RAM limit.

We will see a lot of this forever. People who learn C++ on PC's generally don't learn C strings and have no clue what goes on in the magic box let alone why a smaller box should be different.

I've seen it so often now if as a whole the community is anti string

No, not anti-string at all.

Very, very anti-String.

Problem is that many people don't know there's a reason to take care until it bites them, and then they post about a problem with crashes. You don't see many posts from people who managed to stay within limits, like the 2K RAM limit.

This is what I have found. I've been trying to 'teach myself' how to code, I have no background in Electronics or coding. I see something like a string that seems to make sense if used in the way I want to use it. Two weeks later... no, start again. It breaks and I had no way of knowing why, or what to use instead.

The big caveat with C strings is making sure you don't write past the end of the array. You -must- code for that yourself either explicitly or implicitly (like never writing any string too long, not good for general use but okay for special cases).

Set yourself up with some bookmarks, it's good to have references.

These are the library modules used in the AVR C++ Arduino uses:
http://www.nongnu.org/avr-libc/user-manual/modules.html

This is the C string library page:
http://www.nongnu.org/avr-libc/user-manual/group__avr__string.html

You #include <string.h> to use those functions. All the names are shorthand, you get to know them.

Some easy basics you can do most simple things with:
strlen() is string length
strcpy() is string copy, it puts the terminating zero at the end of the copy. It is string = string.
strstr() is string string, it searches for a substring within a string
strcmp() string compare tells you if one string is >, ==, or less than another, good for sorting
strcat() string concatenation, adds one string to the end of another

You will some with an n in the middle. The n tells you that character count is used.

strncpy() is strcpy for up to n characters and does NOT put a zero at the end.
strncpy is perfect for writing over part of a string with another, in BASIC it is mid$()

Not simple but very useful is strtok(), string token, that you can use to parse strings with.

Also don't forget the mem (memory) functions, the 2 most basic:
memset(), to set some number of bytes equal to a given value
memmove(), copies bytes and is safe to use when the destination overlaps the source

There's more of all of them. And if you don't see what you want then remember that C strings are just 1 dimension byte arrays you can easily process in loops without needing any library whatsoever. Those functions are only for convenience once you understand how C strings work.

One function C strings don't have that C++ String Class does is a function to tell you where the data actually is. That's because with C strings you don't one, ^^ YOU tell the function where the data is and where it goes ^^ and IT doesn't go anywhere else, unlike mind-of-their-own String objects.

Hope this helps with your jitters. The territory is really quite simple and rock solid stable.

Me, I love strings. Use them all the time and never have a problem with them. I hate chars.

cjdelphi:
Why not remove it (as in disable it as an option or something along the lines, default off you have to explicitly enable it somehow) ?

Instead of removing strings, they just have to make about a 1-line change to the library code in free() that has a bug in it. Then some, at least, of the problems will go away.

You potentially still have problems with memory fragmentation, but as some posters have observed, these do not always bite you.

If stability is important then I don't think it is a good idea to use dynamic memory allocation on systems without a robust memory management system, unless you understand the patterns of allocation and deallocation that will occur. I doubt that it would ever be practical to implement a generally robust memory management system within the constraints of an Arduino.

At the same time, dynamic helper classes such as String are IMO going to be particularly useful to novices because they take away the pain of dealing with buffers and points and so on. So the people most likely to use this are also the ones least likely to understand when and how to use dynamic allocation safely.

Given that one of the main goals of Arduino seems to be to make development accessible to novices, this strikes me as a step in the wrong direction.

stuarthooper:

Problem is that many people don't know there's a reason to take care until it bites them, and then they post about a problem with crashes. You don't see many posts from people who managed to stay within limits, like the 2K RAM limit.

This is what I have found. I've been trying to 'teach myself' how to code, I have no background in Electronics or coding. I see something like a string that seems to make sense if used in the way I want to use it. Two weeks later... no, start again. It breaks and I had no way of knowing why, or what to use instead.

The problem with the String class on the Arduino is that it assumes sophisticated memory management (like a reference-counting or other garbage-collector). You don't have that on a 2K microcontroller, so eventually intermediate results (usually of string-concatenation) bung up memory and the processor crashes. Compilers cannot analyse every possible execution of your code so they cannot determine every case of where a String object becomes unreferenced and thus can be freed.

Using char * rather than String forces the programmer to handle string lifecycles and memory allocation, and the programmer usually knows how the program is mean to run and when a char array can be reused.

Compare with the String class in the Java language - this does have a garbage-collector to recover dead objects so you can have nice intuitive String operations and memory runs out only if you genuinely are hanging on to too much stuff - you never have to call free() and it all just works. But that runs on systems with MB of RAM and ROM...

What Nick said, a small fix to free() would do wonders for use of the String class.

Just plain 'strings' aren't a problem at all. It's the Strings that trip so many people up.

Using char * rather than String forces the programmer to handle string lifecycles and memory allocation, and the programmer usually knows how the program is mean to run and when a char array can be reused.

It forces an MCU programmer to know about the hardware which is blasphemy to purist comp-sci dweebs. How many programmers does it take to change a light bulb? Can't be done, it's a hardware problem!

I'm going to agree with GoForSmoke here. Tests in other threads have shown that, with free() fixed, code that previously crashed regularly worked indefinitely.

Whilst is it possible to fragment and run out of memory with only 2 Kb to spare, code that allocates (and then frees) the same sizes strings all the time will not suffer from it.

In particular, if you avoid doing string concatentation which is probably the worst offender in causing fragmentation. (That is, building up a string by adding one character to it all the time). In any case, this sort of string-building is necessarily slow.

The main problem with Strings is, with a limited RAM machine the potential to have a problem because you didn't properly account for how (worst case) memory might be used, and how you will deal with the condition that results. On small, simple programs you won't see much problem, but as the programs grow the need for total memory management BY THE PROGRAMMER becomes more and more the major requirement. One call to a function that uses Strings without having memory properly cleaned up and things crash. Is that the fault of Strings, or bad programming practice by a programmer that didn't keep track of his free memory.

It is interesting to note that in the early days all programmers had memory usage tables and when they wrote a program they documented exactly how memory would be used because it was tight. Once again we are using a system with tight memory constraints and we expect the software to do the programmers job. If you want unlimited memory, don't use an arduino.

Oh - it wasn't all that many years ago that there existed machine with 4Kx8bit memory, for program AND data and we made it work... 16K x 8 bit cost $80.00 US.

Oh - it wasn't all that many years ago that there existed machine with 4Kx8bit memory, for program AND data and we made it work... 16K x 8 bit cost $80.00 US.

And before that in the minicomputer era a 16K x 16 bit core memory board would set you back some thousands of dollars.

Lefty

For example
void loop()
{
char c[10];
int n;

}

//what happens exactly here

with each iteration of the loop does a local variable get created (c/n) and recreated over and over again?... at the time i did not know so I simply made a global var c/n (but whatever i called them in my project)
so i knew i was never going to fill the memory full of undestroyed pointers and unused memory allocations (just to be on the safe side)

so with each loop would all the local variables be destoyed along with any memory allocation?

In your example the stack is incremented* by 12 (10 plus 2 for the int) to make room for those variables. They are thus uninitialized because they have whatever was on the stack. When loop exits that stack space is reclaimed.

If you make them global the stack doesn't get altered but you have 12 bytes less available for the stack because they have to go somewhere.

The stack and heap share the same piece of memory (RAM) starting at wherever your global variables end. The heap grows upwards and the stack grows downwards from the top of memory. If they happen to collide: trouble!

* decremented really, because the stack grows downwards.

I don't know if there's any time savings but a global gets allocated once while or making non-static locals or passing a load of parameters to a function happens over and over.

True, a stack-based variable will take a couple of machine cycles to reserve the extra stack. If you only ever need the one copy, a global (or static) variable would suit your needs.

Ok, guys. For your reference, let me clear up how memory works :wink: (for those who know already, this is for those who don't.)

There are basically two types of "dynamic" memory: the "stack", and the "heap".

The stack can be thought of like a pile of coins. Coins are put on to the pile in order, and then removed afterwards in reverse order.

Any "local" variables, along with any variables passed to functions, are placed on the stack, used within the function they are defined in, and then removed from the stack again.

The stack starts at the top of the memory space, and grows downwards.

Then there is the "heap". This is more like a mound of coins. Coins can be put into the mound wherever they will fit. If there is no room inside the mound for them, then they are dropped on the top of the mound. Coins can be pulled out from anywhere in the mound. The same goes for variables. These are all "dynamically allocated" variables. Anything that uses "malloc()" or "free()", and any classes or functions which use these functions within themselves.

The heap is located at the bottom of the memory area.

The two main issues with the heap are memory leaks and memory fragmentation.

Memory leaks occur when some function places coins into the mound, and then forgets that it put them there, so they never get taken out again. Variables created in the heap must be removed after use, or the heap will just grow and grow and grow.

Memory fragmentation occurs when small variables are removed from the heap to be replaced by larger variables. The space left by the small variable is too small to accept the larger variable, so the larger variable is instead placed on the top of the heap, causing the heap to grow. This most often occurs when working with strings and you want to add something to the end of a string (concatenate) or join two strings together.

As the heap gets more and more fragmented and grows bigger and bigger over time, and as more and more local variables get pushed into the stack, there is a big risk that the heap and the stack will both grow so big that they meet in the middle. When this happens you have problems. Big problems. This is when crashes occur.

You can get "memory de-fragmentation" programs for the PC which effectively re-arrange the heap to remove the holes in it. This makes programs run faster, as they can allocate bigger chunks of memory, and helps to reduce the overall size of the heap. Quite useful. There is nothing (as far as I have seen) like it for micro-controllers - mainly due to the lack of space in the first place :wink:

So, the issue with the String class isn't with the String class itself - it's a very useful class (in the right situations). The issue is with dynamic allocated memory causing fragmentation in the heap, which overflows and crashes into the stack.

So yes, a fix to "free()" will improve things, but on a small micro-controller with very limited RAM space dynamic memory allocation in itself is something to be shunned whenever possible.

1 Like

So, the issue with the String class isn't with the String class itself - it's a very useful class (in the right situations). The issue is with dynamic allocated memory causing fragmentation in the heap, which overflows and crashes into the stack.

Part of the problem IS with the String class itself. When you want to append one character to an existing String, the length of the new String is the length of the old String plus the length of the String to append. That amount of space is allocated, and the old String is copied there, then the String to be appended is tacked on, then the old String's space is freed.

Append another character, and the whole process is repeated, because the String has no room to grow.

On other platforms, extra space is allocated, so that there is some room to grow. Perhaps 10 extra bytes, so you can add 10 characters, before a malloc/copy/free operation is required again.