Go Down

Topic: My Due project just... stops after a few hours (Read 823 times) previous topic - next topic

Hoek

I've been working on and off on a complex clock project for ages (2+ years) and it mainly works but when testing for long periods of time... it just hangs.

Sometimes it happens after 6+ hours and other times after around 2. It varies a lot because the clock does a lot less sometimes depending on the time of day. It does not tell the time vocally on certain hours. Each hour of the week has a bit field that determines what the clock can and can't do. When testing I just let it do everything regardless.

There is a lot of hardware connected to it so there are plenty of places for the normal conflicts and buggy code and libraries to show their faces.

The hardware includes :-
MCP4921 for sound
PAM8403 Audio amp
SPI Micro SD Card and highly modified/optimised library
RTC1307
Serial GPS module (I get the current date and time from GPS at the start) and use location when available later
SSD1322 256*64*4 bit OLED with custom library and DMA access
PIR Infared detector
BME280 over I2C for humidity, temperature and pressure


While running now if __TEST__ is defined I show a lot of debug information and it seems at first glance the malloc/free calls are fine as I use dynamic wipes from SD card to wipe the old minute.

May have to visit all the code libraries and make sure all while loops that exit on hardware events also have a reasonable timeout.

I've also experienced the I2C bus not initializing a lot to the point where I will replace the I2C RTC and temperature sensor with SPI ones in the future which will also enable faster DMA access.

The other problem I have is I use a LOT of resource files on SD card however if any of these fail to load or are corrupt it should fail gracefully (ie. not play a specific sound or bitmap resource)

After starting up I have 67K of free heap and it will dip to 24K when 3 animations are loaded.

I'm currently looking into setting a watchdog function that I assume will get called even in an infinite loop type scenario. If possible I need to find out if it's stalling in the same function or at a minimum reset the clock in the event of a freeze.

Other thing is, running off USB power supply. For 1 lot of testing I commented out the DAC output that happens in an interrupt and the freeze still happens.

In normal operation if the clock detects no one for 20 minutes it shuts down and goes idle as no point showing and telling the time etc if no one there.

Any ideas where I should start or look next?

Thinking watchdog is the way to go... more than anything I need to know where it's crashing and hope it's not just a random place cause of leaks, hardware faults or because of interrupts during PCM sound output.



ard_newbie

It seems to be a very complex project resulting in multiple potential source of issues. I2C may be one of them because I2C on a Sam3x is sensitive to EMI. I recently did a sum up of some possible workarounds to recover from an I2C bus lock up in this forum.

Hoek

It seems to be a very complex project resulting in multiple potential source of issues. I2C may be one of them because I2C on a Sam3x is sensitive to EMI. I recently did a sum up of some possible workarounds to recover from an I2C bus lock up in this forum.
Yeah, I2C has been unreliable. (ie at time just won't initialize... then suddenly works like a charm again). Will eventually the the realtime clock and BME280 SPI versions connected up and it will have the benefit of using DMA.


I might have found the bug.... the code before the change was not overflow proof and if unlucky around the overflow it had a chance to freeze.

In the main loop most "stuff" happens very fast and as I update the screen once each loop I try and make it happen roughly every 25ms/25000uS.

Code: [Select]

loop()
{
    uint32_t t0 = micros();

    // code guts here

    render_screen();

    while (micros() - t0 < 25000)
    {
       // finished all our work in this loop early so get to spin.... weeeeeeeee
    }
}

pjrc

#3
Aug 25, 2018, 01:50 pm Last Edit: Aug 25, 2018, 01:52 pm by pjrc
I2C on a Sam3x is sensitive to EMI.
We had similar problems on Teensy 3.6 (similar 32 bit hardware, about twice the speed of Due).  It seems I2C sometimes get tricked into believing another I2C master is trying to claim the bus.  It can't tell the difference between another master pulling SDA or SCL low versus noise doing the same.

The worst problem involves slave chips getting "stuck" when they are transmitting a low data bit on SDA and expecting more pulses on SCL, but Teensy has released both lines because it believes another master took control.  Teensy's I2C responds so much faster than the slow speed of many I2C chips, so the other chip never hears the short SCL pulse and gets stuck waiting forever.  Even if Teensy reboots, the I2C chip remains stuck, so only a power cycle restores communication.

Ultimately this patch to the Wire library was needed to work around the problems.  A stuck bus is detected and 9 dummy clock pulses are sent, to get any I2C slave chips unstuck.

Dunno if this is the same as on Due, but maybe this explanation can help?

ard_newbie

#4
Aug 25, 2018, 03:32 pm Last Edit: Aug 25, 2018, 03:36 pm by ard_newbie
Dunno if this is the same as on Due, but maybe this explanation can help?

Yes this is the same issue and the same fix (9 dummy clock pulses to reset the I2C bus).

https://forum.arduino.cc/index.php?topic=560415.0

Go Up