[SOLVED]How to troubleshoot and detect a possible memory leak?

Hi all,

My sketch (pretty big - 100kb complied on Mega) uses number of global, local and static variables.
Below is a fraction of the globals :

unsigned long timePB;                            // holds the time the address push button is pressed
char inputChar[10];                            // Manual Input from Serial port 
byte count=1;                                  // used for sending parts of received packets to the Log function
byte timerExecuted[21];                        // Stores the state of each of the timers
int input;                                     // Temporary variables to hold user input from serial/telnet
int input2;
char filename[16];                             // stores filename for log
byte sendPending=0;                             // Indicates if SendFeedback function has postponed bus transmission because of bus congestion
unsigned long packetsDropped=0;                // Number of packets dropped due to bus congestion
byte SkipTelnetInput=0;                        // When raised (1) Telnet input is skipped becasue a command is in process
byte SDCARDValid=0;                            // SD card status
byte warned=0;                                 // Telnet warning flag
byte messages=0;                               // register to hold active messages

byte statusRun =0;                             // System running status register
byte statusGen =0;
byte statusMech=0;                             // Status register for Mechanical systems


byte switchDevices [10];                       // Holds switching devices 
byte devInterNo=0;                             // Device index to interrogate from array above

unsigned long overflowpackets=0;                // register to hold number of packets with additional AA start characters
unsigned long totalSent=0;                     // Number of command packets  sent

unsigned long totalRetries[4] ={0,0,0,0};      // Number of packets that had to be resent 

unsigned long ledflash=1;

Now, totalRetries[0] gets assigned the value of 6315870 with no obvious reason.
Please note that the value above is always the same and doesn't seem to change over the sketch execution. I have removed all calculations that assign values to that variable and its contents still the same.
I therefore suspect a memory leak. Possibly some array subscript going out of range.

Is there a way to pinpoint the course of this problem ?
How does the compiler assign memory locations to the arrays?
For example which variable would be held at the memory location just before totalRetries[0] on the memory map?

Update:
Found something but cant exactly explain it . The following seems to be causing the problem:

Consider this stripped down snippet

static byte sendingTries[]={1,1,1,1,1};


 if  (sendingTries[index] <4 )
     
    {
      totalRetries[sendingTries[index])++;
       sendingTries[index]++ ;
     }

Any comments?

      totalRetries[sendingTries[index])++;

Using the count as an index seems strange to me.

Whandall:

      totalRetries[sendingTries[index])++;

Using the count as an index seems strange to me.

The purpose of the above is to count the number of transmission retries (due to packets lost ) on an RS485 bus.

so sendingTries[index] takes the value of 1 to 3 depending on the 'round' of tramsission.

ie if this is the first re-transmission the value of totalRetries[1] would increment and so on.

The system keeps track of the last four packets hence the value of index which can range from 0 to 3.

Could the value of index ever be greater than 4?

totalRetries[sendingTries[index])++;This doesn't compile anyway due to the 2x '[' and 1 ']' & 1 ')' so to figure out if it actually does what it should do we'd need to see the exact copy. Point is though if 'index' is ever bigger then '3' what will happen ?

No, Here’s the whole thing :

static byte sendingTries[5]={1,1,1,1,1};
int resendTimeout=400;

for (byte index =0; index<5; index++){ 
    if ( millis()- replyTimeout[index] > resendTimeout * sendingTries[index]  &&  replyTimeout[index] > 0 && sendingTries[index] <4) {         
      
       resendCommand(index);
       totalRetries[sendingTries[index]]++;
       sendingTries[index]++;
       
       return;                                                                                                  
     }
   }

Basicaly what it does is, check all 4 available slots for timeouts. If any of them have timed out, it resends that packet, and increments counters.

Now the interesting part (I just found that out) is that this ‘memory leak’ happens even if the above code doesn’t get executed at all!!
If I comment out " totalRetries[byte(sendingTries[index])]++; ", then everything is fine.
If the line is left there , even if not executed (because the If clause is not true), then totalRetries[0] somehow gets the value 6315870 straight away.

You should look up the meaning of 'memory leak'. :wink:

Maybe the pattern that is written over the variable gives you a hint.

6315870 = 0x605F5E = 0x5E '^', 0x5F '_', 0x60 '`', 0x00 ' '

Looks like a prefill of a buffer or a list of delimiters.

If I comment out " totalRetries[byte(sendingTries[index])]++; ", then everything is fine.
If the line is left there , even if not executed (because the If clause is not true), then totalRetries[0] somehow gets the value 6315870 straight away.

it must be the only reference to totalRetries within that function, it is very possible that somewhere else (outside of the snippet that you posted) within that function lies the key to the issue. Btw millis()- replyTimeout[index] > resendTimeout * sendingTries[index]  &&  replyTimeout[index] > 0 && sendingTries[index] <4)for efficiencies sake i would do the test in the order that makes them go false the quickest, and for overviews sake i would ‘brace’ every condition between the ‘&&’

have you tried using this line instead ?(totalRetries[sendingTries[index]])++;

Deva_Rishi:
have you tried using this line instead ?(totalRetries[sendingTries[index]])++;

Why should one?

Whandall:
Why should one?

yeah , no sorry... no need..

Watcher:
If the line is left there , even if not executed (because the If clause is not true), then totalRetries[0] somehow gets the value 6315870 straight away.

What are the four things declared before totalRetries? static data is included. (You may have to generate a MAP file or use AVR-NM to answer the question.)

No, Here's the whole thing :

For some (incorrect) definition of "whole", maybe.

What are the four things declared before totalRetries? static data is included

totalRetries is global. The following are just before it in globals

byte switchDevices [10];                   
byte devInterNo=0;                             
unsigned long overflowpackets=0;               
unsigned long totalSent=0;

Within the same function the following are also defined:

 static byte sendingTries[5]={1,1,1,1,1};
 byte  dayNow = currentDay;
 bool replyPending=false;
 static unsigned long timeForUpdate=0;  
 static unsigned long timeForDeviceInterrogation=0; 
 int resendTimeout;

(You may have to generate a MAP file or use AVR-NM to answer the question.)

This I have to lookup. Never used it before...

it must be the only reference to totalRetries within that function, it is very possible that somewhere else (outside of the snippet that you posted) within that function lies the key to the issue. Btw

Thought of that too. But no, I did a search :slight_smile:

for efficiencies sake i would do the test in the order that makes them go false the quickest

,

Forced them all false and tried it...

and for overviews sake i would 'brace' every condition between the '&&'

Used to to that but I now find the additional 'brace' just extra clutter and dont do it any more...(personal taste i guess)

For some (incorrect) definition of "whole", maybe.

Point taken PaulS.
However I tried to put the minimum code necessary for people to go through within reasonable amount of time without having to go though some 100kb of compiled code spanning over 19 file (IDE) tabs. :confused:

Watcher:
If the line is left there , even if not executed (because the If clause is not true), then totalRetries[0] somehow gets the value 6315870 straight away.

Ah. Then you need to divide-and-conquer to identify the offending section.

Riddle your code with calls to (something like) this...

void AssertNot6315870( void )
{
  if ( totalRetries[0] == 6315870 )
  {
    Serial.println( F("FAULT!") );
    // abort();
    // exit();
    // while ( true );
  }
}

...until you have bracketed the problem.

Make sense?

Riddle your code with calls to (something like) this...

You mean put a "fault detection " IF clause in every function till I find when/where exactly totalRetries[0] == 6315870 ??.. including the setup()..

Essentially, yes.