[Solved] Best practices for debugging memory corruption and free() woes?

Hello all! I've been a long time silent admirer of the sheer amount of libraries and knowledge out there, and I must humbly break that silence with a request for debugging tips!

I've read the musings of 'dark side' XD microcontroller programming with malloc() and free(), and all their pitfalls such as fragmentation, out of memory conditions and rumours of print() and println() shenanigans. The issue I've encountered isn't really corruption, but free() doing some odd things like magically freeing 100-400 bytes of data from a malloc'ed pointer of 2 bytes.

I've been working on a program that speaks with a serial device via Serial1, and the console on Serial, and created an event queue system which executes a task when due (checking for heartbeat, send a timed command to the device, request the state of a control, etc)

An events lifecycle follows this process:

1: call eventQueue->addEvent(time, function pointer, a small byte* array)
We've now saved the time, a function pointer and byte pointer.

2: call eventQueue->checkExpire();
1: When an event is due to be run, we call the function pointer, and hand it the packet of data
2: The function may re-add it's self with a new time and new packet of data
3: We free() the old packet of data and destroy the event.

There are two events in our queue, each with a 2 byte data packet, firing rapidly at 10-100 times a second, and all seems to go well. The little byte arrays have pointers that cycle through 0xf70, 0xf74, 0xf78 and then back to 0xf70, so fragmentation doesn't seem to be a worry. (if I remember right, each pointer created with malloc() has a 2 byte header array to help free() know how to handle it)

Our free memory hovers at 4694 bytes, give or take 4 bytes depending on when we measured it. Here is how the queue normally chugs along:

eq->checkExpire:  Calling function 0x210C with data pkt 0xF70
 eq->addEvent: func 0x210C pkt 0xF74
 pop: free() ptr: 0xF70 mem before: 4694 mem after: 4698

eq->checkExpire:  Calling function 0x210C with data pkt 0xF74
 eq->addEvent: func 0x210C pkt 0xF78
 pop: free() ptr: 0xF74 mem before: 4694 mem after: 4698

eq->checkExpire:  Calling function 0x210C with data pkt 0xF78
 eq->addEvent: func 0x210C pkt 0xF70
 pop: free() ptr: 0xF78 mem before: 4694 mem after: 4698

.. and so on.

However, here's the part that has taken a day of debugging to narrow down! When a "seemingly-unrelated" piece of logic alters a variable in an independent, fixed data structure, things take a sour turn very quickly. As you can see, our event system measures the memory in use immediatley before, and immediatley after freeing the data packet. We expect to regain 4 bytes, but we suddenly start seeing huge chunks of 200-400b 'freed' memory, from a 2 byte pointer.

readSensors: Processed input from Sensor 5

eq->checkExpire:  Calling function 0x210C with data pkt 0xF70
 eq->addEvent: func 0x210C pkt 0xF74
 pop: free() ptr: 0xF70 mem before: 4694b mem after: 5160

checkMemory: ***HALT*** MEMORY CORRUPTION: More free memory than after setup() finished:
                       5160b avail, setup(): 4680b

Before I coded the checkMemory function to watch out for this, these events would run on and on, and every free request would blow up another 100-400 bytes of memory, till we show ridiculous free memory sizes like 16kb (on a mega, with a program consuming 42kb of it's flash memory!). After that we'd crash or lock up when malloc() starts returning bogus pointers and generally throwing hissy fits.

So my humble question is, what are some debugging tips you could recommend for tracking down the source of free() misbehaving so badly, as well as general memory corruption? (I didn't include any code here, because this issue has defeated my somewhat-passable debug-fu, and need to improve! :))

P.S.: The check memory function permanently allocates 2 512 byte blocks of memory, at the start of setup() and at the end. They are initialized to 0x1 and the sum of bytes are totaled up once every few main loops in the program, so their sum must always equal the number of bytes, and annoyingly there has been NO observed corruption of these bytes! Just free() being very rebellious.

Specs:
Arduino Mega 2560, 42kb used flash memory.
Serial0 via USB, Serial1 via pins 18/19
Copious amounts of print(), println() and __FlashStringHelper
Arduino IDE 1.0.3

There's a bug in free. Read about it here: Google Code Archive - Long-term storage for Google Code Project Hosting..

I do wonder though whether the seemingly unrelated piece of code isn't quite as unrelated as it appears.

Best advice is never to call free() in embedded systems running on microcontrollers, and either never call malloc() either, or call it only in the initialization phase. Aside from any bugs there may be in malloc/free, using dynamic memory can result in memory fragmentation, which eventually results in running out of RAM.

Thanks for the responses! :slight_smile:

wildbill:
There's a bug in free. Read about it here: Google Code Archive - Long-term storage for Google Code Project Hosting..

I do wonder though whether the seemingly unrelated piece of code isn't quite as unrelated as it appears.

I was hoping that particular issue wasn't applicable to me... but it looks like it was the case!

You would be quite right to wonder about the 'unrelated code' :stuck_out_tongue: I checked over the code another 10 times, headscratched a similar number of times, then implemented the patch to the ardu core for malloc/free here: malloc(), realloc().. the dark side powerful it is - #60 by system - Programming Questions - Arduino Forum

So far, I am no longer able to reproduce the free() misbehavior with the same branch of logic and combination of print statements as before. Previously I tried ide 1.0.1 and 1.0.3 with the same problems.

Though I'm now curious why an IDE at a production level of release, has an almost year-old core bug, which has likely contributed to the general advice to avoid sometimes really-useful general memory management abilities for an entire prototyping platform :fearful:

Though I'm now curious why an IDE at a production level of release, has an almost year-old core bug

As you can see from the link I posted, you're certainly not the only one who feels that way!

Probably due to time and effort priorities and the need to fix what you should not be using.

I'd rather other things were addressed including getting the Tutorials that use the String Class out of the Learning section (they teach BAD HABITS) and replace with C string array examples.

dc42:
Best advice is never to call free() in embedded systems running on microcontrollers, and either never call malloc() either, or call it only in the initialization phase. Aside from any bugs there may be in malloc/free, using dynamic memory can result in memory fragmentation, which eventually results in running out of RAM.

+1 with extreme agreement!

Trying to make like MCU code should be the same as PC code is really hardware ignorant.
It's possible for small things but it is very bad practice that hurts the user in the not-so long run.

An interesting take on uC memory management :stuck_out_tongue:

I do think it has it's pros and cons, but I feel the benefits can outweigh some of the risks, provided malloc/free works as advertised, and you take care to avoid thrashing the free list ;p

I'm probably looking at this from a wholly different angle, but thrifty malloc/free memory management on a microcontroller makes even more sense to me on these types of platforms, especially if your codebase on the platform has many logic branches, states and data structures. It feels cleaner to allocate only the memory those particular branches of logic need at any particular state in the program.

For example, in a state-based methodology, if your code has a setup state where it's verifying all the sensors, calibrating base levels, monitoring for interference and performing serial init routines on all connected devices; there's the potential to need some types of data and structures that will no longer be relevant for the rest of that programs lifecycle, it would make sense to free all that up, since repurposing/recasting the same batch of memory from a structure, to a string array for example, could makes things less flexible for you. Concurrency also becomes a dangerous and complicated minefield, if all your functions are all vying for the same fixed blocks of memory. You'd need to be careful for functions calling functions, and keeping very close track of what fixed blocks each of those functions are going to be using.

Chances are there will be many situations where you're storing a cardboard box in a 90% empty part of the room, and since you aren't allowed to use room dividers to store other things in that same room, things just got more complicated for you, with the need for some homegrown code needed to manually manage all your space. Or (if the core functions were implemented right, and were efficient in re-integrating smaller unused blocks back into superblocks), you can use decades-old, time-tested methods and functions to manage that for you ;p

2-8kb of memory feels like a precious resource to be used on a strictly as-needed basis :slight_smile:

For a summary of the reasons why dynamic memory management is almost never used in critical embedded systems, see Escher Technologies Articles on Formal Verification>.

Your example of data structures used only during initialization would in most cases be easily solved by declaring the temporary data structures locally in the setup function (i.e. allocating them on the stack).

Again DC42 hits the point!

You can even allocate a block/buffer and use it differently by different parts of the code that never ever step on each other. Use pointers and structs/class objects.

It's constant instantiation and destruction that leads to problems, even in many PC programs.

Hmmm, this is good food for thought, those are some interesting tips and pitfalls you both present. I suppose my approach would begin to encounter unexpected problems like these as the program codebase grows, objects get larger and allocations get more frequent and varied.

Looks like I'll need to brush up my c++ foo to figure out some rudimentary garbage collecting/compacting, smart pointers and cooking up some new/delete methods :astonished:

Thanks for the tips ;p

Since your current system seems to need four two byte chunks of memory, I'd be inclined to declare a static array of structs including a flag to indicate in use/free and manage their use myself. In this case you could likely afford to make it many times bigger than four entries to deal with edge cases. Full on garbage collection seems overkill for a microcontroller.