Simple String sprintf fails on length>13

Hi there, I encountered a problem that really puzzles me. When formatting and printing a string longer than 13 characters the output is totally disturbed. Please find below the code example and print output.

I'm really curious for the cause and the solution.

void setup() 
{
  const String textline13 = "1234512345123";
  const String textline14 = "12345123451234";
  char text[100]; 

  Serial.begin(115200);

  sprintf(text, "%s\n", textline13); 
  Serial.print(text);   
  sprintf(text, "%s\n", textline14); 
  Serial.print(text);
}

void loop() 
{
}

The generated output:
1234512345123
8��?

Remember to leave space for the trailing null character.
sprintf() wasn’t designed for Strings - use a char array.

Hi @peterstruik,

the main reason for failure is (as posted by @lastchancename) the use of a String variable in sprintf().

If you want to stick with String objects you can handle this by using the .c_str() method of a String object:

void setup() 
{
  const String textline13 = "1234512345123";
  const String textline14 = "12345123451234";
  char text[100]; 

  Serial.begin(115200);

  sprintf(text, "%s\n", textline13.c_str()); 
  Serial.print(text);   
  sprintf(text, "%s\n", textline14.c_str()); 
  Serial.print(text);
}

void loop() 
{
}

This works.

The cause for the strange letters is that the String variable points to the String object and not directly to the character array.

P.S.: The use of a C string (char array) would generally be the preferred solution for microcontrollers due to their limited memory:

  const char textline13[] = "1234512345123";
  const char textline14[] = "12345123451234";

Take care of what @lastchancename has already posted: C strings/char array created like shown above e.g. textline13 does not accupy 13 but 14 characters as the null character \0 is added to mark the end of the array. textline13 uses 14 characters, textline14 uses 15.

You find a lot of tutorials regarding the use of C++ String objects and C strings; here is some examples

https://www.studytonight.com/c/string-and-character-array.php

https://www.programiz.com/cpp-programming/strings

as @lastchancename stated the conversion specification %s is for char arrays not a String
if warnings are enable the compiler will display

sketch_aug7a.ino:9:17: warning: format '%s' expects argument of type 'char*', but argument 3 has type 'const String' [-Wformat=]
    9 |   sprintf(text, "%s\n", textline13);

you can use the String::c_str() method which returns a pointer to a null terminated C type char array representation of a String
e.g.

void setup() 
{
  const String textline13 = "1234512345123";
  const String textline14 = "12345123451234";
  char text[100]; 

  Serial.begin(115200);

  sprintf(text, "%s\n", textline13.c_str()); 
  Serial.print(text);   
  sprintf(text, "%s\n", textline14.c_str()); 
  Serial.print(text);
}

void loop() 
{
}

when run the serial monitor displays

1234512345123
12345123451234
2 Likes

Thanks you all for your prompt answers. It explains the phenomena very clearly.

And the reason it works for up to 13 characters is due to Small String Optimization. The exact numbers can vary with implementation, but

  • a String object is 16 bytes
  • if the text -- always null-terminated -- is short enough
    • all the bytes are at the beginning
    • c_str() returns the same pointer as the address of the String itself
  • once the text no longer fits, the String object allocates space on the heap (and a little extra) to fit the text, and c_str() will return a pointer to that
    • that pointer can be stored at the beginning of the object, which is what you see printed
  • it will repeatedly allocate a new bigger block, copy the bytes over, and free up the older block as the text grows, potentially causing heap fragmentation
    • this can be mitigated by calling String.reserve up front

SSO (an unfortunately overloaded initialism) avoids using the heap, which pays off since many strings are short/small.

I'm surprised. As far as I know it's 6 bytes but it will more than likely depend on the architecture.

From the AVR core (Wstring.h)

Something that I'm missing?

What an marvelous example of in-depth knowledge. Respect!

Yes; for example on ESP32, it's quite a bit more complicated

// Contains the string info when we're not in SSO mode
struct _ptr { 
    char *   buff;
    uint32_t cap;
    uint32_t len;
};
// This allows strings up up to 11 (10 + \0 termination) without any extra space.
enum { SSOSIZE = sizeof(struct _ptr) + 4 - 1 }; // Characters to allocate space for SSO, must be 12 or more
struct _sso {
    char     buff[SSOSIZE];
    unsigned char len   : 7; // Ensure only one byte is allocated by GCC for the bitfields
    unsigned char isSSO : 1;
} __attribute__((packed)); // Ensure that GCC doesn't expand the flag byte to a 32-bit word for alignment issues
#ifdef BOARD_HAS_PSRAM
enum { CAPACITY_MAX = 3145728 }; 
#else
enum { CAPACITY_MAX = 65535 }; 
#endif
union {
    struct _ptr ptr;
    struct _sso sso;
};

It's a union of two structs, SSO-mode or not. BTW, that comment for SSOSIZE being "up up to 11 (10 + \0" is wrong, since sizeof (struct _ptr) is likely either 10 or 12. At some point, the + 4 must have been added without updating the comment (and fixing the "up up" typo). Hmm, apparently this was copied over from the ESP8266 implementation, with the same typo

// This allows strings up up to 12 (11 + \0 termination) without any extra space.
enum { SSOSIZE = sizeof(struct _ptr) + 4 }; // Characters to allocate space for SSO, must be 12 or more

Anyway, AVR does not have SSO, which may be because it will "waste" bytes for small strings and make all String objects larger; so instead it always uses the heap. That makes the heap fragmentation even worse, making the argument to avoid String on AVR stronger.

1 Like