Optimizing String usage - print()

In a nutshell there is one key notion in this debate which is your appetite for understanding memory management for collections of objects and the associated trade offs

You can go with "static, fixed size" or "dynamic variable size" memory allocation.

Strings are not the only buffer you’ll need (think about binary protocols or just collections of values in an array, vectors, matrix, graphs, and any other type of data set), so for those of us with enough programming experience and willingness to understand what’s under the hood the answer is clear : you need to know how to handle the static fixed size case as it’s a pattern you’ll see a lot in C or C++.

Once you master that pattern you’ll see there are many functions to handle the fixed size constraint and you can just code with those or decide you are tired of working at that level and then you write your own set of helper functions or classes to encapsulate the frequent operations. You do this by building on the knowledge you acquired and carefully address the potential edge cases and overflow issues. This is what also led to more classes in C++ like the containers.

This encapsulation comes at a cost (a few more bytes, a few more computing cycles, possible non universal API so non portable or not optimized knowledge).

When you work on small microcontrollers sometimes every byte and every cycle matters and if you are a diligent and careful programmer then you can save a bit here and there to fit into the hardware constraints.

In some cases understanding what’s happening in memory is a choice between life and death. It’s no joke. So for those of us who have been exposed to such responsibility or just care about not crashing, you can understand there is a strong position on staying in control with proven functions.

But the crux is there: The native functions or any library built on top of static usage won’t handle edge cases for you. The good ones just give you a way to catch the issue. How you handle that is for the programmer to solve and lazy / bad / tired coders just don’t code for it, thinking "it will never happen in a normal use case and too bad for you if you did not follow what’s expected". That’s not counting on hackers looking for this... :imp:

Amongst the edge cases, buffer overflow is one of the most common issue developers face and need to code for. With price of memory going down and more capable hardware around, some thought that the best way to solve for this edge case was by not having the programmer worry about: if the buffer is too small, just make it bigger.

And that works until you hit memory limits…

On modern 64 bits computers with an OS, Gigs of RAM and virtual memory this limit is quite far and you only get a speed impact if you have to swap a lot.

On small microcontrollers with no OS and where you play with actual RAM and no virtualization it’s another story.

One question then is whether the library you use catches the exception that there is no memory available for buffer expansion and how this is presented to the developer to handle. (try/catch)

One challenge you have with the Arduino String class is that this is left uncaught, the operation fails silently and can either crash or just not do anything leading the programmer to believe all went fine.

(The other challenge is that your code could actually have been fine with a larger static buffer but rules for dynamic allocations lead to a situation where there was no block of memory large enough to accommodate the expansion - that’s the challenge of poking holes into the heap and having no garbage collector)

➜ if you use the String class you can’t catch the memory limit issue and thus your code is at risk, you are no longer in control.

It all boils down to Is it a risk you can live with?

2 Likes

So as I understand:

Using c-strings requires a deeper understanding of the language, but when used effectively, they represent the most efficient approach. This is specially crucial when we talk about low memory microprocessors and it needs to be as optimized as possible.

String were introduced by Arduino 20 years ago as a response to avoid using c-strings. They are not that "optimized" but simpler to use. They can be used in more advanced projects but following some use directives in order to do not break the program.

In this forum most of the people are against the usage of String, what is justified by multiple reasons.

Wow thank you so much for such a well explained response. I think I have really understood the use case scenario about all this and where the real problem resides.

In my particular project, I don't currently face memory, hacking, or 'life or death' issues, so perhaps the change isn't truly worth it at the moment. But I'm sure I'll consider it for future projects.

Above all, I've realized that c-strings are a milestone in my dev career.

Horses for courses. The ESP8266 you mentioned in your first post is not a low memory device and uses Strings extensively in its Wifi support libraries.
Now if you were programming an attiny micro, that would be a different discussion.
If you are looking at c-string methods, check out the 'safe' versions of strcat and strcpy i.e. strlcpy and strlcat
ESP8266 and ESP32 support these 'safe' versions

I think there is some fallacy in calling those 'safe'.

It could be argued (and it has been) that strlcpy and strlcat make truncation errors easier for a programmer to ignore and thus can introduce more bugs than they remove.

They are more secure because they won't lead to buffer overflow and will return a length that can be used to check if there would have been an overflow if the operation had been completed but when they fail to do what was expected the developer still needs to catch that and decide how to handle the buffer overflow that did not happen.

So the point is — if you need to test for a length anyway, why not do it before calling the function rather than after when your destination buffer has been modified...

To do that for strlcat you need to keep track of the current length of the string.
In which case you might as well use SafeString which keeps track of this internally for you.

@diegomm27
I'm still curious what your real world example looks like and what you real mean when you ask for "Optimizing String usage - print()".

What code do you want to optimize?

Yes - it’s a good knowledge to have as it applies to any buffer management.

Whether you use higher level functions and libraries once you understand the constraints and consequences is up to each programmer and context and price to pay (memory / cpu / dependencies / control on code) to get or not get such features.

I don’t see the point personally and the false sense of safety implied by the name of the library is misleading as newbies who don’t get what’s going on will likely not test for length or overflow and catch errors as needed. So the code will not crash because of memory issues but because they didn’t understand that edge cases needed to be handled anyway.

My opinion is that any wannabe C/C++ programmer should get that skill under his bet before going to non standard functions and newbies with throw away / simple short code are served somewhat appropriately with the bundled String class.

But this was already discussed so I’ll stop there.

Well .ino currently have around 1000 lines of code and multiple header files. The thing here is that I have been using Strings in all my code, leading to String reallocation in multiple occasions, what is a real pain when you are debugging using serial port.

Actually this does not affect the device's performance, it is just a matter of improving both the project and my programming skills.

that's why I'm asking for your usecases.
For example if you just have a debug print out of several values you don't need to concatenate a String, you can just use Streaming.h and stream directly to the Serial interface.

Something like

Serial << "Debug: " << "i=" << i << F(" floatVariable=") << floatVariable;

you see you can "combine" different type of data ... just one single stream of data sent to Serial.

even if you would need a buffer for other interfaces, the usage of Streaming.h is much more efficient than the String class (or any derivates) I have seen so far.

Or if you handle an Ethernet Client on an Arduino ... see the StreamLib library from @Juraj which is perfect for the Arduino webclient.

Or the PString.h library ...

It all depends on what you want to solve :wink:

I will check it out :wink: that's looking good

Here is Stroustrop's advice on strcpy() -- Don't use it.

You must be new to "code developing".

What does he suggest as an alternative ?

Did he mention strlcpy() or strlcat()? (Which is what I referred to)

That comment "don't use strcpy() unsafe (potential range error)" seemed to be an aside on one slide and not followed up in particular.

However the talk is about safety and he does say you need to have range checking for strings (i.e. c-strings small s) and arrays and that libraries/classes should be used to shield the user from the "unsafe" low level stuff.

One slide says to "Use containers" and "Avoid subscripting raw pointers"
Another slide is titled "Being careful" doesn't scale

So I assume he is thinking of a container that holds the array length in addition to the array and provides access to the array without raw subscripting and with range checking, null checking etc.

see https://www.youtube.com/watch?v=I8UvQKvOSSw for his 1.5hr talk and slides
Also see C++ creator rebuts White House warning | InfoWorld for an overview and other links.

can I ask then why you used those functions in the SafeString library

if they are so unsafe?

(don't bother answering it's a rhetorical question, although I might have used still strlcpy for the sake of it, I agree with your code and it just makes sense to do it this way.)