own ZERO project unstable, need help

Hi members,

i built my own board for a gps-tracking device, using the SAMD G18A from the Zero board.

basically everything is working fine, except stability.
i have multiple boards running, BUT from time to time a board hangs up.
It just hangs up and does not do anything - then i restart the controller, and it is working fine!
it is hard to reproduce the problem. Even when i stress the board, it is working stable, and then out of nothing a few days later it stopps working. Not on all boards, and sometimes stable for weeks.

i implemented a little testing backdoor, i toggle a hardware-pin on a button interupt, so i can check, if the code got stuck in an infinite loop.
When the board hangs up, the pin does not toggle anymore! So, that is a little indicator, but for what?

I know, it will be hard for you to give me hints, but i do not know where/how to find that issue.

I tried to connect everyting similar to the zero schematic to avoid hardware issues like reset-component values, crystal, and so on.
SAMD fuses are stock (eg. BOD off)

i already monitored the free memory, that was an issue at the beginning, and so i implemented an auto-softreset on low memory. That is working fine, but does not solve the problem mentioned above. There must be something else.

It is a pretty huge project, i use an ESP8266 for wifi (Serial2), GSM + GPS with a Sim868 with AT-commands (Serial1), an OLED-Display via I2C, a button, a vibra, Led, and so on ...
The controller is sleeping most of the time and woke up via interrupts (like RTC).

Maybe some power-issue? I use a lipo-cell and various capacitors.
i would expect a reset from the controller, if there is a power retracement?? Or might this cause a HANG?

I would be thankful for every hint or idea! thanks!

Sounds like it could be crystal trouble to me.

You said you did everything as close as possible to the schematic, but what does your actual layout look like?

I’ve had problems very much like this because I placed my crystal a little too far away and the additional capacitance from my longer traces buggered everything up.

By the way, you definitely CAN get your micro to hang if you have a brown out in power delivery and don't have BOD turned on. Without BOD, it's possible for the supply voltage of your micro to fall enough that the digital logic starts entering undefined states internally, in which case there's no way to predict what will happen, but it will usually hang. Without BOD on, the micro will never reset itself.

If your project can pick up where it left off if it experiences a reset, you might want to turn BOD on after all.