I've been working on and off on a complex clock project for ages (2+ years) and it mainly works but when testing for long periods of time... it just hangs.
Sometimes it happens after 6+ hours and other times after around 2. It varies a lot because the clock does a lot less sometimes depending on the time of day. It does not tell the time vocally on certain hours. Each hour of the week has a bit field that determines what the clock can and can't do. When testing I just let it do everything regardless.
There is a lot of hardware connected to it so there are plenty of places for the normal conflicts and buggy code and libraries to show their faces.
The hardware includes :- MCP4921 for sound PAM8403 Audio amp SPI Micro SD Card and highly modified/optimised library RTC1307 Serial GPS module (I get the current date and time from GPS at the start) and use location when available later SSD1322 256*64*4 bit OLED with custom library and DMA access PIR Infared detector BME280 over I2C for humidity, temperature and pressure
While running now if TEST is defined I show a lot of debug information and it seems at first glance the malloc/free calls are fine as I use dynamic wipes from SD card to wipe the old minute.
May have to visit all the code libraries and make sure all while loops that exit on hardware events also have a reasonable timeout.
I've also experienced the I2C bus not initializing a lot to the point where I will replace the I2C RTC and temperature sensor with SPI ones in the future which will also enable faster DMA access.
The other problem I have is I use a LOT of resource files on SD card however if any of these fail to load or are corrupt it should fail gracefully (ie. not play a specific sound or bitmap resource)
After starting up I have 67K of free heap and it will dip to 24K when 3 animations are loaded.
I'm currently looking into setting a watchdog function that I assume will get called even in an infinite loop type scenario. If possible I need to find out if it's stalling in the same function or at a minimum reset the clock in the event of a freeze.
Other thing is, running off USB power supply. For 1 lot of testing I commented out the DAC output that happens in an interrupt and the freeze still happens.
In normal operation if the clock detects no one for 20 minutes it shuts down and goes idle as no point showing and telling the time etc if no one there.
Any ideas where I should start or look next?
Thinking watchdog is the way to go... more than anything I need to know where it's crashing and hope it's not just a random place cause of leaks, hardware faults or because of interrupts during PCM sound output.