Strange resets happening on Ardino MEGA (2560)

I am working on a project with Arduino MEGA (2560) and HMI touch display. Everything was fine until my last firmware programming to Arduino. Many of my subroutines finish with Arduino reset instead of return to where they were called from. In fact I can't even make it jump from void setup() to void loop(). I already ruled out watchdog as I have some subroutines that wait for user input and Arduino does not reset while waiting. There is no electrical issue either. If I explicitly add loop(); at the end of void setup() for example it goes to void loop() which makes it very strange. I thought of the call stack and that I might be going too deep with subroutines, but that is not the case with setup() and loop(). I checked the failing subroutines and they are executed to the very end, but they don't return, they reset Arduino. Tested this with serial print to the debug port (USB) at the end of the subroutines. I also thought something might have happened to my Arduino board so I programmed another one, but I got the same result. I get no error or warning during compilation either. I have ran out of ideas and I have to solve this soon. I would appreciate any idea that might point me in the right direction.

This is from my main code:

#include "Definitions.h"
#include "CommonFunc.h"
#include "RW_BBS_COM.h"
#include "RW_HMI_COM.h"
#ifdef DEBUG
    #include "DEBUG.h"
#endif // DEBUG

void setup() {
    unsigned int btn = 0;       // Variable for pressed button ID
    
    // Connect serial ports
    #ifdef DEBUG
        ConnectToPC();
    #endif // DEBUG
    BBS_Serial.begin(BBSBaud);  // default at 8bits data, no parity check, 1 stop bit
    BBS_Serial.setTimeout(25);  // Set timeout for reading bytes from BBSxx
    HMI_Serial.begin(HMIBaud);  // default at 8bits data, no parity check, 1 stop bit
    HMI_Serial.setTimeout(7);   // Set timeout for reading bytes from HMI
    
    InitHMI();                  // Initialize HMI

    if (ConnectBBS()){          // Try connecting to BBSxx
        GenData = ReadGEN();    // Read General Information form BBS
        ReadBBSProfile();       // Read full profile and store in ActiveProfile

        #ifdef DEBUG
            PrintGENData();
            Debug_Serial.println("Wait HMI input? y/n:");
        #endif // DEBUG
        
        // Wait for user to press OK on the HMI and go to main loop    
        while (btn != BtnOK_p1){                     // Wait until OK button pressed on page 1
            // Wait for input from HMI
            while (!HMI_Serial.available()){
                delay(100);                         // Wait for user response
                #ifdef DEBUG                        // In Debug mode respond to user command to stop waiting on HMI response
                    char UserInput;
            
                    if (Debug_Serial.available()){
                        UserInput = Debug_Serial.read();
                        if (UserInput == 'n')
                           break;                  // Break the HMI waiting loop and go to main loop
                    }   
                #endif // DEBUG
            }
            HMI_Serial.readBytes(HMIinput, sizeof(HMIinput));   // Get HMI report
            btn = ReadBtn();                                    // Read button ID
            #ifdef DEBUG
                btn = BtnOK_p1;                                 // Don't wait more in DEBGUG
            #endif // DEBUG
        }
        ShowImage(ImgMainMenu);                                 // After user reads the startup info and presses OK, show the main menu
        AdvProfileNum = 0;                                      // No advanced profile loaded yet so set it to 0
    }
    else{
        // Display message on HMI that connection failed and ask for reset
        // Add function to write with white font with no background and no refresh as on this screen we have no Status bar
        #ifdef DEBUG
            Debug_Serial.println("BBS connect error. Restart!");
        #endif // DEBUG
        while (true) // Loop here until user decides to restart the system
        {}
    }
    loop(); // >>>>>>>>> IF THIS IS NOT HERE INSTEAD OF GOING TO loop() ARDUINO RESETS AND START OVER
}

void loop() {
    unsigned int btn = 0;                               // Variable for pressed button

    #ifdef DEBUG
        /*DebugCycle:
        Debug_Serial.println("Reading profile...");
        ReadBBSProfile();
        PrintProfile();
        Debug_Serial.println("Write back? y/n:");
        while (!Debug_Serial.available())
            delay(100);                                 // Wait for response
        if (Debug_Serial.read() == 'y'){
            Debug_Serial.println("Writing profile...");
            WriteBBSProfile();
            PrintBBSerr();
            Debug_Serial.println();
            Debug_Serial.println("<<< Write end >>>");
            Debug_Serial.println();
        }
        Debug_Serial.println("Repeat? y/n:");
        while (!Debug_Serial.available())
            delay(100);                                 // Wait for response
        if (Debug_Serial.read() == 'y')
            goto DebugCycle;                            // Repeat if requested */
    #endif // DEBUG
    
    #ifdef DEBUG
        CheckHMI:                                       // HMI interaction loop start (used in Debug mode only to skip profile read/write repetition)
    #endif // DEBUG
    while (!HMI_Serial.available()){delay(100);}        // Wait for user interaction
    btn = 0;                                            // Clear button variable
    HMI_Serial.readBytes(HMIinput, sizeof(HMIinput));   // Get HMI report
    btn = ReadBtn();                                    // Read button ID
    #ifdef DEBUG
        Debug_Serial.println(btn, DEC);                 // Print the data from HMI
    #endif // DEBUG
    switch (btn){
        // Main Menu page
        // >>>FOLLOWING FUNCTION RETURNS CORRECTLY TO loop()
        case BtnGEN_p2:           {ShowGENinfo();               break;}   // Display General Info page and update indicators
        // >>>FOLLOWING FUNCTION ENDS WITH RESETS
        case BtnBeginner_p2:      {ShowBeginner();              break;}   // Display Beginner Mode page and update indicators
        // >>>FOLLOWING FUNCTION ENDS WITH RESETS
        case BtnAdvanced_p2:      {ShowAdvanced(advBAS);        break;}   // Display Advanced Mode page (Basic Settings) and update indicators
        // >>>FOLLOWING FUNCTION RETURNS CORRECTLY TO loop()
        case BtnAdvProf_p2:       {ShowAdvProf();               break;}   // Display Advanced Profiles page and update indicators
        default: break;                                 // Communication error or button unsupported by this firmware
    }

    #ifdef DEBUG
        goto CheckHMI;                                  // If in Debug mode don't go back to profile read/write test after HMI interaction
    #endif // DEBUG
}

You probably are crashing the stack - that can only show up on attempting to return when global variable
changes have clobbered the return addresses. Google "arduino freememory"

"Global variables use 1984 bytes (24%) of dynamic memory, leaving 6208 bytes for local variables. Maximum is 8192 bytes." This is what I get after build so I assumed I have more than enough memory. I will try to see what the FreeMemory has to say. The only logical explanation is stack being overwritten or overflown. Really frustrating, but that's what happens when I write more than 1000 lines of code without testing with the Arduino and just relying on compiler and logic. I can't recreate the project in my head including the stack in real time. :slight_smile:
Thanks for the advice!

If the 'report' is ascii text, the below is at least a suspect.

HMI_Serial.readBytes(HMIinput, sizeof(HMIinput));   // Get HMI report

readBytes does not append a terminating nul and if you receive sizeof(HMIinput) bytes, there is no place for a terminating nul (unless it's in the message).

After that, your code is unpredictable. Trying to print the text, using things like strlen, strcpy, strcat etc and you have problems.

HMIinput is 64 bytes global variable, so size is always the same. What the HMI returns in length vary, but is always less than 64 bytes. It usually returns ID numbers of pressed buttons, selected menu items and things like that. I never print directly what I received from the HMI. I also set the timeout for this serial port so readBytes won't wait long to fill up 64 bytes since it's never getting that many at once.
Actually I get the reset happening even with subroutines that only send data to the HMI, but don't read anything from it.

Sometimes a fresh set of eyes can spot a problem that you can no longer see. Post ALL of your code, and someone will probably spot the array out of bounds issue in 10 seconds.

I would, but it's not open source project and also it's more than 15 files with more than 2000 lines of code. Anyone who can spot a problem in all this code in 10sec will be a lucky man. :slight_smile:

You are overwriting the stack in setup, so 'just' check all routines called from there.

    loop(); // >>>>>>>>> IF THIS IS NOT HERE INSTEAD OF GOING TO loop() ARDUINO RESETS AND START OVER

Penoff:
but it's not open source project

Bad luck.

This line I added only to be able to go to void loop(), because if it is not there Arduino resets. I did it for test only. I know it's not supposed to be in the code at all. I was looking for information to find how deep the stack is. I was used to 8 level stack on some 8-Bit PICs, but this MCU now uses RAM. I try not to go deeper than 4-5 levels, but I might have made a mistake somewhere.

Penoff:
This line I added only to be able to go to void loop(), because if it is not there Arduino resets.

That's exactly my point, the stack (the return address for setup) is currupted when setup ends.

I meant that it was not there before. I added it after the stack stopped working properly. I wanted to investigate where it breaks and if it is only at one place in my code. I just wanted to go further in the code and that was the only way. Arduino IDE has no simulator so I didn't have much of a choice and had to improvise. Since it doesn't want to go to loop() on it's own it will be a little easier to track the issues as I will have much more limited code to investigate.

I just found my suspect which is part of many subroutines. Using copy paste and changing the purpose of a function doesn't always end up well. I will have to try this with the board tonight.
This is the commonly used subroutine:

void CopyByteArray(byte ByteArrayIn[], byte ByteArrayOut[], byte StartIn, byte StartOut, byte Length){
    byte cnt; // Counter variable
    for (cnt = 0; cnt <= Length; cnt++) {ByteArrayOut[cnt + StartOut] = ByteArrayIn[cnt + StartIn];};
}

I call it often like this:

CopyByteArray(FrameEnd, CMD, 0, 4, sizeof(FrameEnd));

FrameEnd is fixed at 4 bytes and doesn't ever change. Let's say CMD is 8 bytes and first 4 are filled up already (bytes 0-3). Then the copy function will cycle from 4 to 8 (cnt cycles from 0 to 4 included because the for loop uses <= instead of only <). This means it will write one byte outside the CMD variable. I might have been lucky until now and it didn't cause any unexpected results, but my luck has finally ended. Late night coding plays tricks sometimes. :slight_smile:

Penoff:
Since it doesn't want to go to loop() on it's own it will be a little easier to track the issues as I will have much more limited code to investigate.

I disagree. There is a critical error in the setup code, I would try to fix that first.

You say you ruled out watchdog , but the only real way of ruling that out it to turn it off completely. Maybe the watchdog is being fed in an ISR when you wait for user input (I have seen that sort of thing before).

Watchdog isn't used (I might add one on a later stage, but don't really need it for now). The device isn't doing anything else while waiting for user input. This is basically a terminal tool that configures a device from user input. It uses 2 serial ports for HMI and the device being configured (plus one for debug).
I edited my comment above but didn't see anyone posted reply during that time. I will check if fixing the copy function will solve my problem. I hope it does.
Thanks for the advice!

Just fixed that bad for loop. That fixed most of my problems. Now I just have to find what else is acting weird and why. Thanks for all your advice!