Hi there, I encountered a problem that really puzzles me. When formatting and printing a string longer than 13 characters the output is totally disturbed. Please find below the code example and print output.
I'm really curious for the cause and the solution.
Take care of what @lastchancename has already posted: C strings/char array created like shown above e.g. textline13 does not accupy 13 but 14 characters as the null character \0 is added to mark the end of the array. textline13 uses 14 characters, textline14 uses 15.
You find a lot of tutorials regarding the use of C++ String objects and C strings; here is some examples
as @lastchancename stated the conversion specification %s is for char arrays not a String
if warnings are enable the compiler will display
sketch_aug7a.ino:9:17: warning: format '%s' expects argument of type 'char*', but argument 3 has type 'const String' [-Wformat=]
9 | sprintf(text, "%s\n", textline13);
you can use the String::c_str() method which returns a pointer to a null terminated C type char array representation of a String
e.g.
And the reason it works for up to 13 characters is due to Small String Optimization. The exact numbers can vary with implementation, but
a String object is 16 bytes
if the text -- always null-terminated -- is short enough
all the bytes are at the beginning
c_str() returns the same pointer as the address of the String itself
once the text no longer fits, the String object allocates space on the heap (and a little extra) to fit the text, and c_str() will return a pointer to that
that pointer can be stored at the beginning of the object, which is what you see printed
it will repeatedly allocate a new bigger block, copy the bytes over, and free up the older block as the text grows, potentially causing heap fragmentation
this can be mitigated by calling String.reserve up front
SSO (an unfortunately overloaded initialism) avoids using the heap, which pays off since many strings are short/small.
Yes; for example on ESP32, it's quite a bit more complicated
// Contains the string info when we're not in SSO mode
struct _ptr {
char * buff;
uint32_t cap;
uint32_t len;
};
// This allows strings up up to 11 (10 + \0 termination) without any extra space.
enum { SSOSIZE = sizeof(struct _ptr) + 4 - 1 }; // Characters to allocate space for SSO, must be 12 or more
struct _sso {
char buff[SSOSIZE];
unsigned char len : 7; // Ensure only one byte is allocated by GCC for the bitfields
unsigned char isSSO : 1;
} __attribute__((packed)); // Ensure that GCC doesn't expand the flag byte to a 32-bit word for alignment issues
#ifdef BOARD_HAS_PSRAM
enum { CAPACITY_MAX = 3145728 };
#else
enum { CAPACITY_MAX = 65535 };
#endif
union {
struct _ptr ptr;
struct _sso sso;
};
It's a union of two structs, SSO-mode or not. BTW, that comment for SSOSIZE being "up up to 11 (10 + \0" is wrong, since sizeof (struct _ptr) is likely either 10 or 12. At some point, the + 4 must have been added without updating the comment (and fixing the "up up" typo). Hmm, apparently this was copied over from the ESP8266 implementation, with the same typo
// This allows strings up up to 12 (11 + \0 termination) without any extra space.
enum { SSOSIZE = sizeof(struct _ptr) + 4 }; // Characters to allocate space for SSO, must be 12 or more
Anyway, AVR does not have SSO, which may be because it will "waste" bytes for small strings and make all String objects larger; so instead it always uses the heap. That makes the heap fragmentation even worse, making the argument to avoid String on AVR stronger.