String()

Forever I read about the evil of the String class around here. It seems, from what I'm reading, the worst it does is waste RAM by allocating big buffers no matter what it needs.

Wouldn't it be possible for someone to write a replacement String class that has a much smaller dynamic footprint? 'Cause, I see all the stuff it does and, really, it looks like a pretty handy set of tools.

-jim lee

An empty String requires a buffer pointer, a length and an allocation size; a total of six bytes. Hard to see any savings there.

The problem is not so much the waste, but the mess it leaves behind.

The bad thing about the String class is not the memory usage but the memory fragmentation you get. It tries to allocate blocks of memory with different sizes and over time your memory will be fragmented, meaning it is scattered with small blocks of free memory.

After some time new memory allocation request will fail, because the free memory is scattered accross your whole memory map and there isn't a single block (or adjacend blocks, EDIT: upon reading the avr-libc user-manual adjacent blocks will be aggregated by free()) to satisfy the request. So you will run into a situation, where you have for example 1K of free memory but it is split into blocks of 100 bytes. Now you want to allocate 200 bytes from your 1K free memory, but you can't because the 100 byte blocks are scattered and can not be connected to one 200 byte block.

Does the Arduino have the ability to combine two adjacent free memory blocks?

-jim lee

If that happens, yeah. But it's not going to move non-empty stuff. And the fragmentation is the main issue.

I'm still in favor four fixed size String implementation. No memory fragmentation but the easiness of a String object.

Yes, at least free does it and I suspect delete will invoke free under the hood.

Is there any way to read the memory map of an Arduino? Something that could give a listing of allocated and free blocks? Some thing you could use to draw a memory map with?

-jim lee

Yes you can do that:

  1. Setup a pointer 'currentPointer' to the first address of the memory region that you want to inspect
  2. Setup a pointer 'endPointer' to the last address of that region
  3. While the currentPointer does not equal the endPointer do:
    dereference the currentPointer and send the value over serial
    increment currentPointer by one.

You will get the exact content of the memory region. Now you have to figure out where your memory blocks are, by looking at __malloc_heap_start and __malloc_heap_end (you could set your current and endPointer to these addresses. You will also need to collect the addresses of every memory block you have allocated. Subtract two bytes from that address and you have memory block header. The header consists of a two-byte number indicating the size of the memory block. The free blocks can be inspected by walking the freelist - every free memory block contains the address of the next free memory block. Somewhere you have a pointer to the free memory block with the smallest size, but I am not sure where you can find that.

I suggest you read up on the matter in the avr-libc-user-manual-1.6.7 in chapter 3.4 (Implementation details).

There has to be some way free() knows how to recycle something. Or is it using the allocated header to tell it what to free? (because the user is holding the addresses)

So basically I can "see" everything that's free. So everything else would be allocated in some way.

Right?

If so, that's a great start!

-jim lee

Look into using the String reserve() function which allows you to allocate a buffer in memory for manipulating Strings. I start off with a larger string buffer then needed, say 100 and, as I develop the program, I print the length of the string held in the buffer to get an idea of how big the end buffer size should be and par my reserve number down.

To add to the buffer use concat(). To clear the buffer I use sStringBuffer = "";

OK, it is still running, I just checked on one of my projects, using String that has been running for 8+ months continous.

jimLee:
There has to be some way free() knows how to recycle something. Or is it using the allocated header to tell it what to free? (because the user is holding the addresses)

Exactly. When you call free you pass in the memory address of the allocated block. Free will then take a look at the size of the block to check adjacent memory blocks and free it by adding it to the free list. It will not override the block with say 0xFF, because that is not unnecessary work. Marking it as free by inserting it to the freelist is enough.

jimLee:
So basically I can "see" everything that's free. So everything else would be allocated in some way.

Yes.

What is this .data & .bss memory? I assume .data is your globals what is .bss though?

@Idahowalker, so.. If you're reasonably careful, String seems to work just fine then?

-jim lee

It will not override the block with say 0xFF, because that is not unnecessary work.

Too many negatives, I feel.

As long as Strings are allocated and de-allocated in a LIFO (last in, first out) fashion, there are no problems. However, in most assignments or concatenations, this is not the case.

Imagine the simple case of concatenating the Strings "Hello,·", "World" and '!':

String res = String("Hello, ") + "World" + '!';

There's much more going on than you might think at first sight:

  1. String("Hello, ") is constructed and allocates two blocks of heap memory*. This is String 0
    Memory:
┌── String 0
[color=red]XX[/color]
  1. A new String is constructed to store the result of the concatenation String("Hello, ") + "World". The heap memory doesn't change yet. This is String 1.

  2. String 0 is assigned to String 1. This means that two blocks of memory are allocated by String 1.
    Memory:

┌───── String 0
│  ┌── String 1
[color=red]XX XX[/color]
  1. The string literal "World" is concatenated to String 1 using a StringSumHelper. Another block of memory is allocated by String 1.
    Memory:
┌────── String 0
│  ┌─── String 1
[color=red]XX XXX[/color]
  1. The character '!' is concatenated to String 1. In this case, no further allocation is needed, because the String class had allocated some more than it really needed.

  2. String res is constructed. It doesn't allocate any memory yet. This is String 2.

  3. String 1 (the temporary result of the concatenation) is assigned to String 2. This means that String 2 has to allocates three blocks of memory.
    Memory:

┌────────── String 0
│  ┌─────── String 1
│  │   ┌─── String 2
[color=red]XX XXX XXX[/color]
  1. String 1 is destructed and its three blocks of heap memory are de-allocated.
┌────────── String 0
│  ┌─────── [s]String 1[/s]
│  │   ┌─── String 2
[color=red]XX[/color] [color=green]···[/color] [color=red]XXX[/color]
  1. String 0 is destructed and its two blocks of heap memory are de-allocated.
┌────────── [s]String 0[/s]
│  ┌─────── [s]String 1[/s]
│  │   ┌─── String 2
[color=green]·· ···[/color] [color=red]XXX[/color]

As you can see, now there are 5 blocks of memory that are squeezed between the start of the heap and the String res. If you want to allocate more than 5 blocks of memory, you have to increase the heap size, you cannot use the free space at the start of the heap because it's not large enough.
This is called heap fragmentation, and it's a huge problem on microcontrollers with little RAM.

A simple solution would be to scope the concatenation in such a way that the assignment to the result variable happens after the temporary Strings have been destructed:

   String res;
    {
      String temp = String("Hello, ") + "World" + '!';
      res = temp;
    }

Obviously, this is rather cumbersome.
Also, the peak heap usage will be higher, because there are more temporaries.

Another approach that overcomes the problem is pre-allocating space. In this case, you essentially get all String functionality, but with the memory benefits of fixed-length char arrays (as long as you don't have to increase the buffer size during the lifetime of the reserved String).

    String reserved;
    reserved.reserve(32);
    reserved = String("Hello, ") + "World" + '!';

For more information, check my previous post on the topic as well as this repository to check String memory layout yourself: ArduinoStringExperiments.

Pieter

(*) I tested this on a 64-bit computer, so the pointer size (minimum free list entry size) is 8 bytes. On an 8-bit Arduino, this will be only 2 bytes.

PieterP:
Another approach that overcomes the problem is pre-allocating space. In this case, you essentially get all String functionality, but with the memory benefits of fixed-length char arrays (as long as you don't have to increase the buffer size during the lifetime of the reserved String).

String reserved;

reserved.reserve(32);
    reserved = String("Hello, ") + "World" + '!';

Oh that is really great, i have a purpose for that straight away (on an ESP with plenty of memory though) this is going to save me quite a lot of time re-writing some things. Just to be clear if i've reserved an amount for a String, can i do all sorts of things as long as i don't exceed the maximum declared size without causing fragmentation ?

AWOL:
Too many negatives, I feel.

Yes, this should be: It will not override the block with say 0xFF, because that is unnecessary work. :smiley:

Idahowalker:
OK, it is still running, I just checked on one of my projects, using String that has been running for 8+ months continous.

Of course you can use the String class. You just have to be very carefull when you do so. Most of the people requesting help are beginners and they might not know about the implications of String, so in general "don't use String" seems to be a good advise. But you are right, with the necessary caution String can be used.

jimLee:
What is this .data & .bss memory? I assume .data is your globals what is .bss though?

.data contains initialized static data, .bss contains uninitialized static data. The bss-section is guaranteed to be initialized to 0 before your program enters the main function.

This will go into .data

// global scope
int i = 5;
const char* text = "Hello World";

And this will go into .bss

// global scope
int i;
const char* text;

There's no such thing as "uninitialized static data" .
There is explicitly initialised static data, or there is zeroed static data.

AWOL:
There's no such thing as "uninitialized static data" .
There is explicitly initialised static data, or there is zeroed static data.

You are right, so uninitialized data would be local variables I guess?

LightuC:
You are right, so uninitialized data would be local variables I guess?

No, local data lives on the stack.

...unless it is local and static.