Hello all! I've been a long time silent admirer of the sheer amount of libraries and knowledge out there, and I must humbly break that silence with a request for debugging tips!
I've read the musings of 'dark side' XD microcontroller programming with malloc() and free(), and all their pitfalls such as fragmentation, out of memory conditions and rumours of print() and println() shenanigans. The issue I've encountered isn't really corruption, but free() doing some odd things like magically freeing 100-400 bytes of data from a malloc'ed pointer of 2 bytes.
I've been working on a program that speaks with a serial device via Serial1, and the console on Serial, and created an event queue system which executes a task when due (checking for heartbeat, send a timed command to the device, request the state of a control, etc)
An events lifecycle follows this process:
1: call eventQueue->addEvent(time, function pointer, a small byte* array)
We've now saved the time, a function pointer and byte pointer.
2: call eventQueue->checkExpire();
1: When an event is due to be run, we call the function pointer, and hand it the packet of data
2: The function may re-add it's self with a new time and new packet of data
3: We free() the old packet of data and destroy the event.
There are two events in our queue, each with a 2 byte data packet, firing rapidly at 10-100 times a second, and all seems to go well. The little byte arrays have pointers that cycle through 0xf70, 0xf74, 0xf78 and then back to 0xf70, so fragmentation doesn't seem to be a worry. (if I remember right, each pointer created with malloc() has a 2 byte header array to help free() know how to handle it)
Our free memory hovers at 4694 bytes, give or take 4 bytes depending on when we measured it. Here is how the queue normally chugs along:
eq->checkExpire: Calling function 0x210C with data pkt 0xF70
eq->addEvent: func 0x210C pkt 0xF74
pop: free() ptr: 0xF70 mem before: 4694 mem after: 4698
eq->checkExpire: Calling function 0x210C with data pkt 0xF74
eq->addEvent: func 0x210C pkt 0xF78
pop: free() ptr: 0xF74 mem before: 4694 mem after: 4698
eq->checkExpire: Calling function 0x210C with data pkt 0xF78
eq->addEvent: func 0x210C pkt 0xF70
pop: free() ptr: 0xF78 mem before: 4694 mem after: 4698
.. and so on.
However, here's the part that has taken a day of debugging to narrow down! When a "seemingly-unrelated" piece of logic alters a variable in an independent, fixed data structure, things take a sour turn very quickly. As you can see, our event system measures the memory in use immediatley before, and immediatley after freeing the data packet. We expect to regain 4 bytes, but we suddenly start seeing huge chunks of 200-400b 'freed' memory, from a 2 byte pointer.
readSensors: Processed input from Sensor 5
eq->checkExpire: Calling function 0x210C with data pkt 0xF70
eq->addEvent: func 0x210C pkt 0xF74
pop: free() ptr: 0xF70 mem before: 4694b mem after: 5160
checkMemory: ***HALT*** MEMORY CORRUPTION: More free memory than after setup() finished:
5160b avail, setup(): 4680b
Before I coded the checkMemory function to watch out for this, these events would run on and on, and every free request would blow up another 100-400 bytes of memory, till we show ridiculous free memory sizes like 16kb (on a mega, with a program consuming 42kb of it's flash memory!). After that we'd crash or lock up when malloc() starts returning bogus pointers and generally throwing hissy fits.
So my humble question is, what are some debugging tips you could recommend for tracking down the source of free() misbehaving so badly, as well as general memory corruption? (I didn't include any code here, because this issue has defeated my somewhat-passable debug-fu, and need to improve! :))
P.S.: The check memory function permanently allocates 2 512 byte blocks of memory, at the start of setup() and at the end. They are initialized to 0x1 and the sum of bytes are totaled up once every few main loops in the program, so their sum must always equal the number of bytes, and annoyingly there has been NO observed corruption of these bytes! Just free() being very rebellious.
Specs:
Arduino Mega 2560, 42kb used flash memory.
Serial0 via USB, Serial1 via pins 18/19
Copious amounts of print(), println() and __FlashStringHelper
Arduino IDE 1.0.3