Question regarding struct, String and Malloc

I've been teaching myself about pointers, struct, and Malloc. I have an example sketch that highlights a mystery. In my example, I have a simple struct with a String, bool, and int. I set values in a struct that is simply created as a global. Using the 'name.variable' syntax all works as expected, simple. I also created a pointer to a struct and the 'pointer->variable' syntax works as expected for String, bool, and int. Again, all good. Lastly, I used Malloc to get a chunk of memory using the same struct of String, bool, and int. I can successfully assign and retrieve bool, and int. My problem is I can assign a String but cannot retrieve it as a legible String. I know that Malloc is not a good practice with microprocessors. I came across this just as an experiment. I've been testing this with an ESP32 on the 2.0 RC3 IDE.
Here is my example sketch.



struct test1 {
  bool   found{true};
  int    value1{10};
  String name = "Howdy Doody"; 
} foo,  bar, *barPtr, *cat;



void setup() {
  Serial.begin(115200);
  cat = (struct test1*)malloc(sizeof(struct test1) );
  Serial.print("cat address-->");
  Serial.println((int)cat/4);         // Malloc seems to give a working address
  cat->name = "dog";                  // cat->name is assigned 'dog' (or so I thought)
  Serial.print("cat->name address -->");
  Serial.println((int)&cat->name/4);  //Address of cat->name
  //
  if (cat->name == "dog") Serial.println("cat->name is equal to dog");   //This test should be true, just assigned above
  cat->found = false;                 //This assignment works as expected
  cat->value1 = 5;                    //This assignment works as expected
  foo.name = "Maxwell";               //This assignment works as expected
  //
  barPtr = &bar;                      //Test of struct pointer without Malloc involved
  barPtr->name = "Winnie";
  //
  Serial.print("cat->name -->");
  // This is the offending statement
  Serial.println(cat->name);          //Why doesn't this print 'dog' ?
  //
  Serial.print("cat->name address -->");
  Serial.println((int)&cat->name/4);  //making sure still pointed at where 'dog' was stored
  Serial.print("cat.value -->");    
  Serial.println( cat->value1);       // int values prints correctly
  Serial.print("cat->found -->");
  Serial.println( cat->found);        // boolean value prints correctly
  // values from a global struct 'test1' work correctly
  Serial.print("foo.name -->");
  Serial.println( foo.name);         // This correctly prints 'Maxwell'
  Serial.print("foo.found -->");
  Serial.println( foo.found);
  Serial.print("foo.value1 -->");
  Serial.println( foo.value1);
  // test of pointer to struct bar - no malloc involved
  Serial.print("bar->name -->");    
  Serial.println( barPtr->name);      // Other pointer to struct name works as expected
  Serial.print("barPtr->value1 -->");
  Serial.println( barPtr->value1);
  free(cat);
}

void loop() {
  // put your main code here, to run repeatedly:

}

I hope this is readable. I've tried to eliminate any other issue other than Malloc with a String in a struct. I've been at this for three hours and it remains a mystery why the struct using Malloc will seemingly not work. Any thoughts from minds greater than mine?

Maybe you're confusing microprocessors with languages.

Does the fact that a String does not have a fixed size have anything to do with it ?

The problem with creating an object instance with malloc() is that the object constructor is never called so the object's memory contains whatever garbage was sitting in memory. The proper way to allocate memory for an object is "new". The proper way to get rid of the object is "delete".

Structs count as objects.

I think the syntax you want is:
cat = new struct test1;
and
delete cat;

Unless you're writing low-level memory allocators or containers and know exactly what you're doing, you should never use malloc and free in C++.

The correct way, as mentioned by johnwasser, is to use new/delete. But I'll go on to say that you should avoid using naked new/delete.

Instead, your first choice for dynamic allocation should be to use standard library containers like std::vector (or more specific containers depending on your application). If you just need a pointer to a single heap-allocated object, use smart pointers like std::unique_ptr.

struct test1 {
  bool   found{true};
  int    value1{10};
  String name = "Howdy Doody"; 
};

#include <memory>

void setup() {
  Serial.begin(115200);
  auto cat = std::unique_ptr<test1>(new test1);
  // Use make_unique if you have C++14:
  // auto cat = std::make_unique<test1>();
  cat->name = "Maxwell";
  Serial.println(cat->found);   // "true"
  Serial.println(cat->value1);  // "10"
  Serial.println(cat->name);    // "Maxwell"
} // memory for the cat is automatically released here

Don't repeat struct, the compiler already knows that test1 is the name of a struct, just use new test1.

1 Like

PieterP,
Thank you for your thoughtful reply. I tried your suggested code and of course, it works perfectly. I'm new to C++ having spent my professional career with 370 assembly, Smalltalk, Java, and Python. I was simply writing a sample to test my understanding of pointer, struct, malloc, and cast. (and yes I understand malloc is a bad idea on a microprocessor with limited memory). All the C++ types and need for casting makes me long for the typeless bliss of Python.
The mystery for me was that I could use malloc to allocate memory for a struct, and successfully store and retireve a bool, and int, but not a string. It is an academic question as a I never intended to use malloc in my project. I suppose I'll consider it one of life's little unsolved mysteries. Thanks again.

Good Question. I don't know, but a pointer to a struct not using malloc works just fine. A mystery...

Does malloc also invoke new() for the String in the new instance of your struct?
Maybe not...
...could be tested with a homemade class with a print statement in the constructor...

Malloc gives you an area of memory, nothing more. You point cat to it casting it to a struct test1. Int and bool are simple types that just occupy RAM so you can use them with impunity. String is more complicated and specifically, it contains a pointer to memory that was allocated (with malloc) when it was created.

That creation process never happened but you assured the compiler (with the cast) that there is a String there and it believed you. So you have a String object with a pointer to memory at some random location. Using it is likely to cause undefined behavior.

Heap Memory Allocation - ESP32 - — ESP-IDF Programming Guide latest documentation (espressif.com)

Not a mystery at all: malloc() does exactly what it says on the tin: it gives you a chunk of raw memory. It just allocates memory, that's it, there's nothing in that memory yet.
You don't just want raw memory, you want a value of type test1. There is no such value in the memory returned by malloc(). You have to create that value first before you can use it.


In your code, you get a pointer to raw memory, and then forcibly tell the compiler to ignore all type checking, and force it to believe that an object of type test1 exists at that memory location, even though that's not the case. That happens because of this cast:

  cat = (struct test1*)malloc(sizeof(struct test1) );
  //    ^^^^^^^^^^^^^^^--- this cast is a very bad idea

This is a reason why you shouldn't use C-style casts, and be very careful when explicitly casting pointers in general, because by doing so, you could be silencing all kinds of warnings and errors, and your code becomes simply invalid (it invokes undefined behavior).


Let's now look at the correct implementation using malloc (which you shouldn't use in real code, it's just for illustration purposes):

First, you allocate memory:

void *raw_mem = malloc(sizeof(test1));

raw_mem now just points to a chunk of memory that is large enough to potentially store a value of type test1. There is no such value yet, so you cannot access it.

Next, you have to create a value of type test1 so that you can actually use it.

test1 *cat = new(raw_mem) test1;

This is done using the new keyword. You tell new what memory to use to store the value.
new will create the value by invoking the constructor of the test1 class, and initialize all its members (e.g. also calling the String constructor).

Now that the value is created, you can use it:

cat->name = "Maxwell";

When you're done with it, you have to destroy the value by calling the destructor:

cat->~test1();

The value is now gone, you can no longer use it, but the memory is still there. So the last step is to give the memory back to the system:

free(raw_mem);

The main takeaway of this example is that memory allocation and the creation/destruction of a value in that memory are completely separate things.

As you can see, manually calling malloc() is a huge pain, which is why you shouldn't use it in C++. Instead, you use new and delete:

  • new will allocate memory and create a value
  • delete will destroy the value, and deallocate the memory

The example above is then reduced to:

// Allocate memory and create value:
test1 *cat = new test1;
// Use value
cat->name = "Maxwell";
// Destroy value and deallocate memory
delete cat;

While this still requires manual memory management, it's much cleaner than the malloc() approach above.

Problem might be that you should do:
String name = "dog";
This will reserve memory to hold the String.
But then you somehow need to make this available to your struct.
I foresee that the String (reference) will be on the stack and might go out of scope...
All together enough reasons why you should not use malloc here...

Thanks. Your answer makes perfect sense.

While it is true that so-called implicit-lifetime types (e.g. simple structs and fundamental types) can have their lifetime begin implicitly when using malloc, I don't think this applies when they are part of a non-implicit-lifetime type.

Accessing the int and bool members may seem to work in this case, but it's probably undefined behavior.

Either way, let's agree that you shouldn't use malloc like this. Take the guesswork out of it by just using new, which always does the correct thing, regardless of whether something is an implicit-lifetime type or not.

Again, Thanks. I appreciate the explanation. Light is beginning to dawn...

As a somewhat related observation, in the "but I thought I was only writing C code" department:
This syntax, where the initialization values are specified in the type definition:

struct test1 {
  bool   found{true};
  int    value1{10};
  String name = "Howdy Doody"; 
} foo,  bar, *barPtr, *cat;

would not have been permitted in normal C, not even in a "older" version like:

struct test1 {
  int   found = 1;
  int    value1 = 10;
  char name[20] = "Howdy Doody"; 
} foo;

This ability, to lump structure initialization with the type, instead of just with the variable declarations, is a really important change that happened in C++ (IMHO.) The old C paradigm was just really painful, whenever you had a complex data structures with a lot of default values that you were content to keep.

Thank you for the reply. As I noted in a previous response, my programming background is 10 years with 370 Assembly (old guy), followed by Smalltalk, Java, Python. Personally, I find C++ is like the dog's breakfast, a little bit of everything, and confounding. I may yet try switching to MicroPython. I'm learning C++ to complete my overly ambitious Arduino project.
Again, thanks for your POV. My understanding of types, cast, initialization have improved.

In Arduino UNO programming and interfacing with peripheral devices, do you see any good use of the versatile features of C++ ? Almost all the Arduino related interfacing jobs could be done by standard C and the sensor-specific library functions.

When you can drive a LED by direct use of digitalWrite() function, then why should you feel it necessary to declare a class to drive the LED? Someone said that the C++ should be brought in when the code lines would exceed about 50,000 lines.

That does not mean that I am discouraging you to learn C++. You keep learning C++ and at the same time find a suitable Arduino project which really needs the object-oriented features of C++.

Absolutely ANYTHING can be written in standard C, including OOP concepts and constructs. But OOP makes the code more modular, more re-usable, and more maintainable, even for relatively small applications. A horse will get you where you need to go, but that does not mean there's no reason to own a car. If you don't see the utility and value of OOP and c++, then you really don't understand their basic concepts, and why OOP dominates the software world. The software we all use and depend on every day simply could not exist without it.

2 Likes

50000 lines is a lot of code!
Use c++ when appropriate. Not because your code has 50000+ lines of code.
First learn to use functions.... then OOP.

1 Like