An Arduino String (capital S) is an object that owns the storage for a heap-allocated character array.
When you write String s = "Hello, world";
, the String allocates memory, and copies all characters of the string literal into that memory. When s
is destroyed, its memory is deallocated. When you create a copy of a String variable, the character array it owns is copied as well. This requires an extra allocation and as a result, there are now two copies of the same String in memory.
Most of the time, this copy is redundant and just a waste of memory. That's why it's better to pass it by reference rather than by value.
C-strings are different, they are just pointers to a character array that was allocated somewhere else.
When you write const char *s = "Hello, world";
, the compiler stores the string literal somewhere in a character array in the static memory section of the program. The variable s
is then just a pointer to that array. It doesn't own the array, it just points to it (to its first character). Copying such a pointer is very cheap, so you can easily pass the pointer as a function argument.
Most functions that manipulate C-strings assume that the pointer points to an array of characters, not to a single character. The end of the array is indicated by a special NULL character '\0'.
void print(const char *string) {
while (*string != '\0') { // as long as the current character is not NULL
Serial.print(*string); // print the character
++string; // advance the pointer to point to the next character in the array
}
}
void setup() {
Serial.begin(115200);
print("Hello");
}
When entering the print function, string
points to the beginning of the character array:
H e l l o \0
⇡
string
This means that dereferencing string
using *string
or string[0]
(both are equivalent) will yield the character 'H'
.
'H'
is not null, so we enter the loop, print 'H'
, and advance the pointer. The pointer string
now points to the second element of the character array:
H e l l o \0
⇡
string
*string
now yields the character 'e'
, and so on, until string
points to the null character '\0'
and the condition of the while loop is no longer satisfied.
Note that you have to use only one method of dereferencing at a time, either *string
or string[0]
, not both.
The brackets [] (subscript operator) are just syntactic sugar for a dereference operator: Member access operators - cppreference.com
The built-in subscript expression E1[E2] is exactly identical to the expression *(E1 + E2)
This means that string[0]
is equivalent to *(string + 0)
or just *string
.
Why are pointers used for C-strings? I don't know exactly, but C++ evolved from C, and C didn't have references. It also couldn't pass arrays to functions, C-arrays automatically decay to pointers when passed as an argument. So the natural way to pass an array of characters to a function was to use a pointer (char *).
It's not allowed to write to the memory of a string literal, so the pointer must be read-only, that's why const char *
is often used instead of just char *
. It's good practice to always use const if your function is not going to write to the string argument.
For example, the standard function strcpy
has the signature char *strcpy( char *dest, const char *src )
. This indicates that the function will only read from src
, but it will write to dest
. The compiler will enforce this (e.g. you cannot pass a read-only string literal as the destination of a copy).
(The print function above was an example, you can just pass string
to the Serial.print() function directly, and it will know what to do with it.)