Confused about 'sizeof' function

Unfortunately, i couldn't find an answer about my sizeof function question. So yet another topic about the sizeof function.

''The sizeof operator returns the number of bytes in a variable type, or the number of bytes occupied by an array.''

With the following code i expect the answer: 4, 3 and 7, which i believe are the number of bytes of the corresponding strings

 Serial.print("sizeof data is "); Serial.println(sizeof("data"));
 Serial.print("sizeof lol is  "); Serial.println(sizeof("lol"));
 Serial.print("sizeof test123 is "); Serial.println(sizeof("test123"));

However the results are:

sizeof data is 5
sizeof lol is  4
sizeof test123 is 8

Could someone explains why the results are acutally 1 higher then what i expected?

Thanks in advance!

See: https://en.wikipedia.org/wiki/Null-terminated_string

When the text "lol" is in memory, how would the software know how long it is ?
Some computer languages add a variable in front of it, so you get 3 lol in memory.
The C language puts a zero-terminator at the end, so you get lol 0x00.
It is not the ASCII character for zero "0", but it is really zero, all bits are zero. That can be written as "\0".

1 Like

Because in C/C++ char arrays includes additional terminator byte.

Update - to be correct I had to say about c-strings, not char arrays

2 Likes

not char arrays, c-strings :slight_smile: (string literal)

char anArray[10]; // will have 10 bytes not 11

indeed
Inaccurate wording is my problem

and we are a tough crowd :slight_smile:

1 Like

Thanks @Koepel and @b707.

Seems like i was confused because i thought "" and '' was the same. But now i know "" creates a c-string and '' creates a single character.

So sizeof('data') is actually 4 what is expected.

It will be one, which I would have expected.

Serial.print("sizeof data is "); Serial.println(sizeof('data'));

result:

sizeof data is 4

the correct term for "data" in the code presented would be a string literal, as @J-M-L explained.

If you know that it creates a SINGLE char, why do you decide to put in '' four characters????

Then I stand corrected, but the size of such strange literals is rather pointless.

'' is a character constant, so if you are not using Unicode it is nearly always stored in a byte.

1 Like

I would avoid the use of multicharacter literals because, "The value of an integer character constant containing more than one character (e.g., 'ab'), or containing a character or escape sequence that does not map to a single-byte execution character, is implementation-defined."

1 Like

Good question :stuck_out_tongue:

So '' are only ment to be used with single chars? because, it is working with multiple chars as shown

@hubanl
'data' is an ambiguous construct, do not use it

2 Likes

No, it is a character literal.

it's more complicated than that in C++ until we get to C++23 :slight_smile:

You would get a warning about a multi-character character constant

you get 4 not because data has 4 letters but because you actually got an int and you were running likely on a 32 bits architecture. It would have been 2 on a UNO.

read Character constant - cppreference.com

Notes

Multicharacter constants were inherited by C from the B programming language. Although not specified by the C standard, most compilers (MSVC is a notable exception) implement multicharacter constants as specified in B: the values of each char in the constant initialize successive bytes of the resulting integer, in big-endian zero-padded right-adjusted order, e.g. the value of '\1' is 0x00000001 and the value of '\1\2\3\4' is 0x01020304.

In C++, encodable ordinary character literals have type char, rather than int.

2 Likes

The theoretical problem with using it as shown is that it may not port to another compiler with the same behavior.

3 Likes

Thanks everyone for the answers! Its all clear for me now :smiley:

I've had that thought many times ... always mistaken ... lol

I've been doing this since the 70's ... B was my first language...

Have fun...

:smiley_cat:

1 Like