Out-of-memory check

I have seen a few problems recently that are, or look to be, situations where the application has simply run out of memory and clobbered itself.

I would like to propose an out-of-memory check that could be incorporated into the wiring logic. What I have in mind is for the timer0 interrupt handler to have a couple of instructions in it to test whether the SP and __brkval have crossed. In this event, I would have control branch to a function that blinks the pin 13 LED in a characteristic way.

The user experience is that an OOM (that persists for more than 1,000 microseconds) would cause the program to halt and a diagnostic blink pattern akin to a PC POST code to be generated. Maybe this could be generalized into a lightweight monitor that could handle other catastrophic bugs in a more graceful way that beginners could deal with -- though I can't think what others would qualify right now.

At some point I will have a crack at implementing this -- I don't think it would be very difficult. My Q is what do folks think about the idea? Has it been tried before and shown to be a waste of time? My worry is that the timer0 interrupt doesn't run all that often and there's a million and one ways a program could bork itself long before the timer had a chance to see and respond to the situation.

I've got a simple monitor that runs in the background on the watch dog timer and I had thought of adding this memory test, but it only runs every 16mS and as you say a lot of shit can hit the fan in that time.


Rob

An effective strategy for detecting this kind of failure is to fill the void with a known value (e.g. Microsoft uses 0xCC; I prefer to use a histogram of the target application). If the first byte below __brkval is not the expected value the stack and heap have crossed. The problem with using the next address below __brkval is that the stack sometimes grows in chunks; essentially skipping sections of memory. To compensate, you might require several bytes below __brkval to be the expected value. This, unfortunately, takes memory away from the application.

The problem with periodic probes is that they are postmortem. By the time the failure has been detected, the offending code is very likely not running. I suspect a much more effective strategy would be to modify malloc (and its ilk) to check for a failure.

Finally, providing default behaviour (e.g. blinking an LED) is a good idea but to be widely accepted there needs to be a way to override the default behaviour. For example, someone controlling a large motor may want to turn the motor off if a failure occurs.

This is kind of reminiscent of some Corewars programs I wrote a long time ago...

dsacmul:
This is kind of reminiscent of some Corewars programs I wrote a long time ago...

LOL - now you're really showing your our age...

XD