Arduino resets unexpectedly - NOT running out of SRAM

Hi guys,

Like several before me, I’ve come up against the problem of unexpected resets when using Arduino. I have read extensively on the forum and elsewhere to try and get to the bottom of this without success, and there is a history of threads that tail off without any obvious satisfactory conclusion. But there is a pattern emerging, that points towards a library problem, and IMHO it needs to be investigated thoroughly. I’m an experienced IT person, and I’d be happy to play a part, but my C skills are not up to doing it alone. Perhaps a God member can suggest the right course of action.

For completeness I give you the following background details, but as you will see later, I think these are largely irrelevant, since the same symptoms appear in a wide variety of situations (and no, it doesn’t seems to be a lack of memory problem).

I have a recent bog standard Uno with 2k of SRAM. I am writing software to control multiple heating elements on (eventually) more than one kiln, using a shield by Ocean Controls to multiplex readings from several thermocouples. The Arduino will read the temperatures and control the power to the elements by cycling a set of solid state relays – a simple project, but critical since failure could involve the fire department.

On the software side, I am writing a task scheduler to handle the various kiln control tasks, other tasks for communicating with the user (me) linked via the USB to my PC, and potentially other PIN related tasks. I’m using Excel with VBA on the PC side, but I am deliberately keeping Excel at arms length from the comms, and have found Gobetwino to be a suitable intermediary.

My code is currently 12.5kB and includes:
#include <SPI.h>
#include <avr/pgmspace.h>
#include <EEPROM.h>

Recently I started getting odd print outs on the serial monitor. (I always assume it is me. In 40 years I have often sworn at the makers of this or that language or utility only to find out later it was my own fault, and of course that may still be the case. But this time I have considered it from every angle and maybe it’s a genuine problem with the infrastructure – hardware or libraries. )

The first thing of course was to pepper it with diagnostic statements. But the weird thing was that the diagnostic statements themselves were getting corrupted. Also the injection of a diagnostic statement (or any other statement) in one place would sometimes fix the problem where it occurred, only for it to manifest itself in some other form elsewhere. In short the problem did not seem to be related either to the bit of code where the Serial.prints started going bizarre, or where the system eventually froze.

The forum threads were full of references to running out of SRAM. I transferred my biggest array to EEPROM. I then incorporated code from the forum (many thanks) to test for free memory, and all is well – at the point where the bizarre stuff started, there were 986 Bytes free, (Data segment was 374, Bss 425, Heap 96, and Stack 166). But of course, true to form, when I put in this memory usage code, the problem shifted!

I don’t use maloc. I’m not doing anything sneaky at all.

Next was a thorough look at the use of arrays and pointers within the programme. Any statements that looked even remotely risky were surrounded by diagnostics to reveal out of range subscripts, but nothing turned up. (As a C newbie I remain a little nervous here!).

Next was a more detailed examination of the evidence on the screen. I noticed that fixed strings were being misrepresented on the screen – a Serial.print of “KILN” that was correctly printed early in the session would eventually become “**LN” where ** could be anything, but the ** would usually be the same on the next iteration through the loop, or become ***. A look at the C technical documentation, (and confirmed using the free memory functions), shows that fixed strings are stored in the Data area of memory, well away from the Stack, so if there is genuine overwriting of fixed strings going on, it is pretty catastrophic. Perhaps a copy is put on the stack prior to printing, and it is the copy that is corrupted? Whatever.

Shortly after the first of the screen corruptions appears, the system will either hang or perform a reset.

I suspected Gobetwino, but it fails using the Serial Monitor alone (sorry Gobetwino)

I suspected the shield, but it persists when the shield is removed.(sorry Ocean)

I then read every thread I could find talking about restarts, and that’s when I became convinced of a deeper issue. The configurations reported involved several varieties of Arduino, different types of shield, and different libraries included. There were however several things in common to two or more tales:

  1. Timing of the restart - several tales involved a delay of about 20 seconds or so before the system resets
  2. the use in general of the serial port
  3. size of the code – largish in some cases like mine
  4. the movability of the problem when lines of code are added/subtracted
    a. “I can resolve this by altering a line that is in itself correct and has worked before enabling the Ethernet functionality. After rewriting a bit the unit restarts at a later point in time.”
    b. “Altering the code makes the error disappear or at least change”
  5. crashes even when there is little load on the comms

and on the human front, a whole heap of frustration and wasted time going down blind alleys. Several users have found work arounds and given up trying to solve their problems, others I’m sure have simply quit the arena, which is a shame for Arduino.

The clincher for me that it might be library related was the report by “cshotton” who described a set of symptoms very similar indeed to mine (on a very different configuration),

Whenever I call (lcd.printAt(0,1,"text") the display shows some random digits in stead of 'text'.)

and who eventually discovered that the problems went away when he stopped using Serial commands completely. Unfortunately I cant do this since I rely on the Serial monitor to see what’s going on.

It is difficult to point the finger at Serial, since most people will be using it, and the obvious question will be; if this is where the problem lies, why aren’t more people finding it and reporting it??? It’s a bit like finally having to suspect your mother of stealing cookies from the cookie jar – it doesn’t seem right somehow - it is too awful to contemplate a problem so close to the heart of the Arduino project.

Perhaps it is a combination thing: using Serial in a largish program. Certainly it came about for me and one other person as we “expanded the code”. Perhaps the linker gets it wrong. Perhaps none of these, but it needs running to ground, or there will be a succession of people giving up in frustration, most of whom we’ll never hear about.

Can I interest some God member to take this on and sort it once and for all. If it turns out to be my silly indexing bug, I’ll buy you dinner.

Happy to send all the code to anyone who wants to pursue this.

PS absolutely LOVE the Arduino!
Kenny

Long post, short on code.

Did you put your fixed strings n PROGMEM?

Hi - Haven't used PROGMEM at all

Regarding lack of code I didnt want to post hundreds of lines (as per forum advice). Would normally post smaller code showing the essence of the problem, but the problem only seems to occur with large amounts of code!! Do you want to see the lot?

Post the code as an attachment. Don't know what we are supposed to look into without it.

kenny_devon:
I have read extensively on the forum and elsewhere to try and get to the bottom of this without success, and there is a history of threads that tail off without any obvious satisfactory conclusion.

Sometimes we make a suggestion, which is probably right, and the original poster neither confirms nor denies that it worked. In other words, no response.

Sometimes the OP spits the dummy after an incredibly short time (eg. 2 hours) and says "well if no-one is going to help I'm going to MicroChip" or something bizarre like that.

  1. size of the code – largish in some cases like mine
  2. the movability of the problem when lines of code are added/subtracted
    a. “I can resolve this by altering a line that is in itself correct and has worked before enabling the Ethernet functionality. After rewriting a bit the unit restarts at a later point in time.”
    b. “Altering the code makes the error disappear or at least change”
  3. crashes even when there is little load on the comms

Honestly, all of that sounds suspiciously like memory issues to me. And you don't have to touch malloc to run into memory issues. Something most people don't really think about is:

Serial.print("This is a test.");

takes up 16 bytes of SRAM. Anywhere you use a static string in your code like that takes up SRAM. Also, the String class is just plain bad. Don't use it. Don't use any Libraries that use it.

I've written a number of larger projects, two that required a Mega for it's additional memory (the largest was 67k binary). Used a variety of libraries from a variety of sources. All of the intermittent reset issues I've come across fell into two categories:

  1. Ran out of SRAM.
  2. Power issues (dirty power, or lack of current capacity).

All that said, without seeing code, it's all just shooting in the dark.

Have you tested with the solid state relays disconnected and ruled out switching transients?

Do you have lots of nested calls somewhere that could use up that stack space and overwrite the heap? (also calls to functions with a lot of local variables?)

Thanks for your various replies

You asked for the code - you got it, warts and all. There's still a lot of junky diagnostic stuff in there, and it's very much work in progress. It wont let me post it all in one post, so parts 2 and 3 will follow immediately.

MarkT: The SSRs aren't connected yet. The only things on the Uno are the Ocean shield, and a thermocouple.

There is no recursion, and there is no sign of memory problems - the memory figures i gave in the original posting were from as deep as it gets in my code, and there were still 900 bytes free.

jraskell: I know it looks walks and smells like memory loss, but I've run the figures and they look OK. I even went back to a bare sketch and gradually added back all my code, running the showMem routine each time just to convince myself that the figures were "sensible". I then ran the full package and reproduced the problem with showMem in use at various points - there was no problem. When you look at the code you'll see it's frugal on actual text strings.

You're point blank condemnation of the String class has me really worried. I use them everywhere. I know they're expensive in program space, and in heap, but that expense is included in my free memory results, and I find them much easier to work with. As far as stack is concerned I dont think I'm getting anywhere near a problem. If Strings are that flakey why are they on the Arduino reference pages withou any health warning? What in particular goes wrong with them? Is there somewhere to read up on the issues?

Dirty power ?? Interesting. I generally take it form the USB. I'm not running anything off it except Arduino and the shield. Could the PC be giving flakey power? How could I check? But my resets occur after evidence of data corruption. I would think a power outage would cause an instant reset, not a situation that builds over many seconds, then resets.

I will emphasize one thing - my symptoms are virtually identical to another user with a completely different configuration, purpose, etc. I dont think its the specifics of my configuration that are the issue, but I could be wrong.

Part 1

// INCLUDES
#include <SPI.h>
#include <avr/pgmspace.h>
#include <EEPROM.h>

extern unsigned int __data_start;
extern unsigned int __data_end;
extern unsigned int __bss_start;
extern unsigned int __bss_end;
extern unsigned int __heap_start;
extern void *__brkval;

int PCcomm=1;  
int gp_all=0;
boolean echo=true;
boolean logging=true;
boolean realTimeTempUpdates=false;

//DEFINES
// pins
#define MAX_PINS  13 // on Arduino Uno

// instructions
#define TASK_FORCE_FULL_DATA_OFFLOAD 150
#define TASK_FORCE_DATA_OFFLOAD 151
#define STD_INSTRUCTION_INTERVAL 4000  // 4 secs between looking for new instructions
// scheduled tasks
#define MAX_SCHEDULED_TASKS  10 // on Arduino Uno
#define TASK_SCHEDULED_TASK1  1
#define TASK_GET_INSTRUCTIONS 2
#define TASK_SCHEDULE_DATA_OFFLOAD 3
#define TASK_READ_TEMPERATURES 5
#define TASK_SCHEDULED_TASK2  10

// unscheduled tasks
#define MAX_UNSCHEDULED_TASKS  10 // on Arduino Uno
#define TASK_TEST_UNSCHEDULED 1
#define TASK_DATA_OFFLOAD 4

// structure/other
#define maxFieldsPerRecord  10 
#define MAX_DATA_BYTES 500
#define mcp1 5

// temperature/MUX
int MAX_CHANNELS=4;
#define CS_TEMP  9 // MAX6674/6675 /CS Line
#define MUX_EN   7 // ADG608 MUX Enable
#define MUX_A0   4 // ADG608 Addr0
#define MUX_A1   5 // ADG608 Arrd1
#define MUX_A2   6 // ADG608 Addr2

// programming
#define YES true
#define NO false
#define ON true
#define OFF false
/////////// DECLARATIONS AND INITIAL VALUES

// GLOBALS

// globals for tracking processes and reading serial using PCcomm
int GBT_serInLen = 15;
char GBT_serInString[15];
int GBT_pId =0;

// new globals

  long loop_count=0;
  long timeAvailable;

  // declare, initialise temperature configuration variables
  //  int mcp1=MAX_CHANNELS+1; //add 1 so that index will match channel number
  byte channelsInUse=B00001000; // channel 4 for testing

// PINS
  int pinsToRead=0; // no pin action unless one of these flags is set 
  int pinAction[MAX_PINS]; 
  int pinActionType[MAX_PINS]; 
  int pins; //hold value of all pins in use (up to 16)  
  boolean pinExecution;  // note that ANY execution has happened
  boolean execute;  // note to execute function
  int pinReading, pinPrev;
  
// APPOINTMENTS
  boolean scheduledExecution;
  unsigned long nextAppointment=0; //next scheduled task in millis
  long appointments[MAX_SCHEDULED_TASKS+1]; 
  long appReturn;
  
// UNSCHEDULED
  boolean unscheduledExecution;
  int unscheduled=0;  // 16 flags available
  int managed=0; // 16 flags available - initially unset
  boolean managedTask;
  int minUseful[MAX_UNSCHEDULED_TASKS+1];  // set value in millis here for managed tasks
  int maxTaken[MAX_UNSCHEDULED_TASKS+1];  // set value here in millis for unmanaged tasks
  // initial values
      // none for now 

void initial_values(){
  // PINS
  //  setBit(pinsToRead, 13);// pin 13 for testing
  pinAction[13]=3;  //gives index of routine to handle state
  pinActionType[13]=1;  //0 action if LOW, 1 Acion if HIGH, 2 action if HIGH TO LOW, 3 action if LOW to HIGH

  // SCHEDULED
//  appointments[TASK_SCHEDULED_TASK1]=millis()+6000; 
//  appointments[TASK_SCHEDULED_TASK2]=millis()+7000; 
  appointments[TASK_SCHEDULE_DATA_OFFLOAD]=millis()+2000;  
  appointments[TASK_GET_INSTRUCTIONS]=millis()+2000;  

  appointments[TASK_READ_TEMPERATURES]=millis()+2000; 
  nextAppointment=appointments[TASK_READ_TEMPERATURES]; 

  // UNSCHEDULED
  setBit(managed,TASK_DATA_OFFLOAD-200);
  minUseful[TASK_DATA_OFFLOAD]=3000;
}
      
void initialise_arrays(){
  // most moved elsewhere
  iint(pinAction,MAX_PINS,0); //gives index of routine to handle state
  iint(pinActionType,MAX_PINS,0); //0 action if LOW, 1 Acion if HIGH, 2 action if HIGH TO LOW, 3 action if LOW to HIGH
  lint(appointments,MAX_SCHEDULED_TASKS,0); 
}


/////////// TEST FUNCTIONS

void pinHandler3(int pinReading)  {psl("pin action 3");  pnl(pinReading);}
long scheduledTask1(){psl("in scheduled task 1"); return 1000;}
long scheduledTask2(){psl("in scheduled task 2");return 5000; }



// UNSCHEDULED TASKS  - TEST
int unscheduledTask1(int timeAvailable) {
  ps("in unscheduled task 1");
}

////// MAIN LOOP  -- PIN MANAGEMENT   ----NOT USED YET

void pinStuff(){
    pinExecution=false;
    if (pinsToRead) {  // any bit set?
      for (int i=1;i<=MAX_PINS;i++){
        if (readBit(pinsToRead, i)){
          pinReading=digitalRead(i);
          pinPrev=readBit(pins,i);
          switch (pinActionType[i]) {
            case 0: execute=(pinReading==LOW); break;
            case 1: execute=(pinReading==HIGH); break; 
            case 2: execute=(pinReading==LOW)& (pinPrev==HIGH); break;// HIGH to LOW
            case 3: execute=(pinReading==HIGH)& (pinPrev==LOW); break; // LOW to HIGH
            case 4: execute=true; break;// always execute
            default: stop("PI1");
          }
          if (execute) {
            pinExecution=true;
            switch (pinAction[i]) {
              case 3: pinHandler3(pinReading); break;
              default: stop("PI2");
            }
          }
          assignBit(pins, i, digitalRead(i)); // use fresh digital read in case action routine has changed it
          psn("after assign", pins);
        } 
      }
    } // if pins to read
}

/////// INSTRUCTION MANAGEMENT

String listenForInstruction(){
  String inst;
  char sbuffer[10];
  static int instSerial=0;
  sendGobetwinoCommand("KILNINST",itoa(instSerial+1,sbuffer,10));
  inst=String(GBT_serInString) ;
  if ((inst=="-2") || (inst=="-3")) return "";  // gives -3 at end of file, -2 beyond end
  else if (inst=="-1") {alert("No Inst"); return "";}    // no instruction file
  instSerial+=1;
  if (echo)  sendKDmsg("EC", inst);
  return inst;
}

Code Part 2

long getInstructions(){
  boolean badCode=NO;
  boolean standardResponse=false;  // gets trumped by badCode
  int indexBody=0;
  int dely=0;
  char sw;
  char instType[3];  // last one for the null
  String response, responseCode,instBody;
  String st=listenForInstruction();
  if (st=="") response="";
  else if  ((st.length()<4)||(st.substring(0,2) != ">>")) response="NV"+st;   // Not Valid : must be ">>" = going out from PC
  else {     // found a string signalled as emanating from PC with enough length to be a valid instruction
    responseCode=st.substring(2,4);  // by default set response code to incoming instruction code
    responseCode.toCharArray(instType,3); // split for use in switches  3 gives space for null!!!
    instBody=st.substring(4);

    switch (instType[0]) {
      case 'D':  // temperature data
        switch (instType[1]) {
          case 'S': tempStuff(TASK_FORCE_FULL_DATA_OFFLOAD); break;   // code OS  Offload from Start  
          case 'C': tempStuff(TASK_FORCE_DATA_OFFLOAD); break;        // code OC  Offload from Current
          case 'R': realTimeTempUpdates=ON; standardResponse=YES; break;
          case 'r': realTimeTempUpdates=OFF; standardResponse=YES;  break;
          default: badCode=true; break;
        } 
        break;
      case 'l': logging=OFF;  standardResponse=YES; break; // stop logging
      case 'L': logging=ON;  standardResponse=YES; break; // start logging - remove logging override
      case 'R': break; // new ramp setting
      case 'P':
        if  (instType[1]=='B') {standardResponse=YES; break;}// simply play back message
        else if (instType[1]=='C') {
          // stuff for performance curves
        }
        else {
          badCode=YES;
        }
        break;
      case 'k': break;// temporary switch off kiln
      case 'K': break;// resume kiln
      case 'S': break; // start a program
      case 's': break; // stop a program
      case 'C': break; // new configuration settings
      case 'E': echo=ON;  response="Echo On"; break;
      case 'e': echo=OFF; response="Echo Off";  break;
      case 'W':                             // wait before looking for next instruction
        standardResponse=YES;
        dely=findPosInt(instBody, indexBody);
        sw=instBody.charAt(indexBody);
        if (dely==0) {badCode=YES; break;}
        switch (sw){
          case 'S':  
          case 's':  break;   // already in seconds 
          case 'M':  
          case 'm': dely *= 60; break; 
          default: badCode=YES; 
        }
        break;
      default: badCode=YES;
    }
  }
  if (badCode) sendKDmsg("BadCode",st);
  else if (standardResponse) sendKDmsg("OK",responseCode+":"+instBody);
  else if (response>"") sendKDmsg(responseCode,response);

  if (dely>0) return long(dely)*1000; else return STD_INSTRUCTION_INTERVAL;
}

////// SCHEDULED TASK MANAGEMENT

void scheduledStuff(){
// now the scheduled tasks    
    scheduledExecution=false;
    if (nextAppointment) { // will remain 0 unless set
      if (millis()>=nextAppointment) {  // time for action
        // identify culprit
        for (int i=1;i<=MAX_SCHEDULED_TASKS;i++){
          if (appointments[i]==nextAppointment){ //is this the culprit
            // do it and set next appointment for culprit
            switch (i) {
              case TASK_GET_INSTRUCTIONS: appReturn=getInstructions(); break; 
              case TASK_SCHEDULED_TASK2: appReturn=scheduledTask2(); break; 
              case TASK_SCHEDULED_TASK1: appReturn=scheduledTask1(); break; 
              case TASK_SCHEDULE_DATA_OFFLOAD: appReturn=tempStuff(100+TASK_SCHEDULE_DATA_OFFLOAD); break; 
              case TASK_READ_TEMPERATURES: appReturn=tempStuff(100+TASK_READ_TEMPERATURES); break;
              default: stop("TS2");
            }
            appointments[i]=millis()+appReturn;// update next appointment overall
            nextAppointment=appointments[i];
            for (int j=1;j<=MAX_SCHEDULED_TASKS;j++) {
              // 0 in appointments means no appointment
              if (appointments[j]>0) nextAppointment=min(appointments[j],nextAppointment);
            }
            scheduledExecution=true;
            break; // out of for????
          } // if the culprit
        }// loop through possible culprits        
      } // if time left
    } // if appointment exists
}

///////// UNSCHEDULED TASKS
void unscheduledStuff(){
  // now do unscheduled, but managed tasks
    unscheduledExecution=false;  // dont seem to need this if BREAK works as spected
    if (unscheduled) { // will remain 0 unless set
      // identify culprit
      for (int i=1;i<=MAX_UNSCHEDULED_TASKS;i++){
        if (readBit(unscheduled,i)){ // work to do
          // is there time ??
          timeAvailable=nextAppointment-millis();
          managedTask=readBit(managed,i);
          if ((managedTask && (timeAvailable>minUseful[i])) || 
             (!managedTask && ((millis()+maxTaken[i])<nextAppointment))) {
            // do it 
            unscheduledExecution=true;
            switch (i) {
              case TASK_TEST_UNSCHEDULED: 
                assignBit(unscheduled,i,unscheduledTask1(timeAvailable)); break; // do it, and note if there's more to do
              case TASK_DATA_OFFLOAD: 
                assignBit(unscheduled,i,tempStuff(200+TASK_DATA_OFFLOAD)); break;
              default: stop("UN1");
            }
            break; // out of for????
          } // if time available
        } // if the culprit
      }// loop through possible tasks

    } // if any unscheduled tasks
}


////////SERIAL COMMUNICATIONS  -- READING

String readSerialString(char *strArray,long timeOut) {
  //read a string from the serial and store it in an array  - with timeout
   long startTime=millis();
   int i=0;
   while (!Serial.available()) {
      if (millis()-startTime >= timeOut) return ""; 
   }
   while (Serial.available() && (i < (GBT_serInLen-1)) ) {  // made it -1 to allow for null
      if (i<0) stop("SR1");   
      if (i>13) stop("SR2");
      strArray[i] = Serial.read();
      delay(50);
      i++;
   }
//   Serial.println("in readstring i is");
//   Serial.println(i);
   
   strArray[i]= '\0';  // null on end to terminate string
   return String(strArray);
}

////////SERIAL COMMUNICATIONS  -- WRITING

  // Gobetwino layer
  
int sendGobetwinoCommand(String commandType,String content) {
     Serial.print("#S|");
     Serial.print(commandType);
     Serial.print("|[");
     Serial.print(content);
     Serial.println("]#");
     // wait up to 1000 ms for answer from Gobetwino, answer will be in GBT_serInString , answer is 0 if all is OK
     readSerialString(GBT_serInString , 1000);
     //Deal with answer here - omitted in this example
     return 9999;  //unused for now
}

  // KD layer

void sendKDflag(String msgType) { sendGobetwinoCommand("KILNLOG","^^"+msgType);}  // ^^ means just a flagcomming back in to PC
void sendKDmsg(String msgType, String msg) { 
  if (logging) sendGobetwinoCommand("KILNLOG","<<"+msgType+msg);// << means comming back in to PC
}  

void sendKDdata(String data){sendKDmsg("DA", data);}

String prepareNum(int r) {
  // convert to String and append comma
  char buffer[5];
  return String(itoa(r, buffer, 10)) + ",";   // comma separator, even at the end
}
//// END OF SERIAL COMMS

////                MULTIPLEXOR

void setupOceanMUX() {
  pinMode(CS_TEMP,OUTPUT); // MAX6675/6674 /CS Line
  pinMode(MUX_EN,OUTPUT); // Enable pin on ADG608
  pinMode(MUX_A0,OUTPUT); // A0 on ADG608
  pinMode(MUX_A1,OUTPUT); // A1 on ADG608
  pinMode(MUX_A2,OUTPUT); // A2 on ADG608

  digitalWrite(CS_TEMP,HIGH); // Set MAX7765 /CS High
  digitalWrite(MUX_EN,HIGH); // Enable on

  SPI.begin(); // Init SPI
}

void Set_Mux_Channel(byte chan)
{ // channel is 1 to 8
  // addresses of mux are 0 to 7
  byte chan2=chan-1;

  // place address bits on pins: 1 is HIGH, 0 is LOW  
  digitalWrite(MUX_A0,(chan2 & B00000001)>>0); 
  digitalWrite(MUX_A1,(chan2 & B00000010)>>1);
  digitalWrite(MUX_A2,(chan2 & B00000100)>>2);
}
//  END OF MUX


////// GENERAL PURPOSE
void stop(char* s) {  Serial.println("");  Serial.println("STOPPED IN ");  Serial.println(s);  while (1) {}}

void psn(String sss, long vvval) { ps(sss);  ps("[");   pn(vvval);   psl("]");}
void ps(String s) { if (!(PCcomm) || gp_all) Serial.print(s);}
void pn(long s) { if (!(PCcomm) || gp_all) Serial.print(s);}
void pnl(long s) {if (!(PCcomm) || gp_all) Serial.println(s);}
void psl(String s) {if (!(PCcomm) || gp_all) Serial.println(s);}

int findPosInt(String s, int &pointer) {
// retuns an integer, or -1 if none found
// pointer is advanced beyond integer
    int val=0;
    char c=s.charAt(pointer);
    while (c<='9' && c>= '0'){  
      val=val*10+c-48;  // 48 is ASCII for zero
      pointer+=1;
      c=s.charAt(pointer);
    }
  return val;
}

Code part 3

char readChar(String s, int &pointer) {
// retuns single character at pointer
// pointer is advanced
  char c=s.charAt(pointer);
  pointer+=1;
  return c;
} 

void iint(int a[], int sze, int val) {
  for(int x=0;x<sze;x++) a[x]=val;  // NB largest index is sze - 1 
}  
void lint(long a[], int sze, int val) {
  for(int x=0;x<sze;x++) a[x]=val;  // NB largest index is sze - 1 
}  
void ipbyte(byte a[], int sze, int val) {
  for(int x=0;x<sze;x++) a[x]=val;  // NB largest index is sze - 1 
}  

int setBit(int &B,int b) {  B= B | 1<<(b-1);  return B;}
  // sets bit b in B
  // returns modified B for immediate use

boolean readBit(int B,int b) {return (B & 1<<(b-1))>0;}

int unsetBit(int &B,int b) {  B = B & ~(1<<(b-1));  return B;}
  // unsets bit b in B
  // returns modified B for immediate use

int flipBit(int &B,int b) {  B = B ^ (1<<(b-1));  return B;}
  // flips bit b in B
  // returns modified B for immediate use

int assignBit(int &B,int b, int val) {
  // assigns val to bit b in B
  // returns modified B for immediate use
  if (val==1) return setBit(B,b);
  if (val==0) return unsetBit(B,b);
  stop("AB"); 
}

void alert(String s) {sendKDmsg("!!", s);}

void showMemory(String where){
  // Free, Data, Bss, Heap, Stack
  int heapend=(int)__brkval; 
  if (heapend==0) heapend=(int)&__bss_end;
  Serial.print(where);
  Serial.print("\nF:"); Serial.print( (int) SP           - heapend, DEC );
  Serial.print(" D:");  Serial.print( (int) &__data_end  - (int) &__data_start, DEC );
  Serial.print(" B:");  Serial.print( (int) &__bss_end   - (int) &__bss_start, DEC ); 
  Serial.print(" H:");  Serial.print( heapend            - (int) &__heap_start, DEC );
  Serial.print(" S:");  Serial.println( (int) RAMEND     - (int) SP, DEC );
 }

void setup(){ 
  Serial.begin(9600);
  delay(5000);

  showMemory("Start");
  pinMode(13,OUTPUT);

  initialise_arrays();
  initial_values();
  setupOceanMUX();  
  sendKDflag("*****************");

}
  
void loop() {
  while (true) {
    // loop time is 25 ms with 1 pin check, no appointments
    loop_count+=1;
   //psn("loop",loop_count);
   //Serial.println(loop_count);
    pinStuff();
    if (pinExecution) continue;  // ie back to start of loop

    scheduledStuff();
    if (scheduledExecution) continue; // back to start of loop

    unscheduledStuff();
    //    sendTemperatureRecord(record,record_count);

  }
}

///////////////  TEMPERATURE PROCESSING

long tempStuff(int functionSelector){
  // declare, initialise temperature recording variables
//showMemory("IN TEMPSTUFF");
  static int temp[mcp1]; 
  static int st_temp[mcp1]; 
  static int st_slope[mcp1]; 
  static byte n_ramp[mcp1]; 
  static int recordCount=0;  // goes from 1
  static int prevRecordCount=0;  // goes from 1
  static int rp=-1; // data byte record pointer
  static unsigned long reading_count=0;

  // init arrays  
//  iint(temp, mcp1, 0);   // current temp 
//  iint(st_temp, mcp1, 0); // starting temp for this group of readings 
//  iint(st_slope, mcp1, 0); // starting slope for this group of readings 
//  iint(n_ramp, mcp1, 0); //number of readings at current slope

  if (functionSelector==100+TASK_READ_TEMPERATURES) {  

    // declare, initialise temperature working variablesF
      int temperature;
      int slope;  // change from previous temp
      int slope_difference_limit=2;
      int slope_difference=0;   // differencefrom initial slope
      reading_count += 1; 
      pn(reading_count);
      psl("----------------------");
      for (int i=1; i<=MAX_CHANNELS; i++) {
        if ((channelsInUse >> (i-1)) != 1)  continue; 
        psn("Channel ",i);
        Set_Mux_Channel(i);
        temperature = Read_Temperature(); 
        if(temperature == -1) {
          psl("N/C");
        }
        else {
          pn(temperature); 
          psl(" DegC");
        }
        slope=temperature-temp[i]; // during last interval
        slope_difference=slope-st_slope[i];  // difference from starting slope, not from last slope reading
        if ((abs(slope_difference)>slope_difference_limit) ||
        (n_ramp[i]==255))  //run out of space in intervals byte 
        {
          recordCount+=1;
          // store the latest group
          EEPROM.write(rp+1,n_ramp[i]);  // byte 1 = number of intervals
          if (temperature < 0) {  // usually a broken channel
            EEPROM.write(rp+2,0); 
            EEPROM.write(rp+3,0); 
          }
          else {
            // channel goes in left most 3 bits, high bits of temp in balance
            EEPROM.write(rp+2,(i<<5)+(st_temp[i]>>8));
            EEPROM.write(rp+3,st_temp[i] & 255);  // least significant byte
          }
          psl("D");for (int j=1;j<=3;j++){pn(rp+j);ps(":");pnl(EEPROM.read(rp+j));}
          rp += 3;  // update record buffer pointer

          // record the new starting points
          st_temp[i]=temperature;
          st_slope[i]=slope;
          n_ramp[i]=0;  // will be incremented to 1
        } // if a new record is required
        // 
        temp[i]=temperature;
        n_ramp[i] += 1;
        if (realTimeTempUpdates) sendKDmsg("TP",String(i)+","+String(temperature));
        psl("");
        
      } //for each channel
      return 10000;
    }  // tempStuff - TASK_READ_TEMPERATURES
    
    
    else if (functionSelector==100+ TASK_SCHEDULE_DATA_OFFLOAD){
      // assuming for now that records will stay in EEPROM
      psl("in DataOffload");
      psn("nrec",recordCount);
      psn("prev",prevRecordCount);
      if (recordCount-prevRecordCount>0) {
        setBit(unscheduled,TASK_DATA_OFFLOAD);
      }
      return 10000; // delay before next check 
    }

    else if (functionSelector==TASK_FORCE_FULL_DATA_OFFLOAD) {
      prevRecordCount==0;  // back to start
      setBit(unscheduled,TASK_DATA_OFFLOAD);
      return 0;
    }

    else if (functionSelector==TASK_FORCE_DATA_OFFLOAD) {
      setBit(unscheduled,TASK_DATA_OFFLOAD);
      return 0;
    }

    else if (functionSelector==200+TASK_DATA_OFFLOAD) {
      // records start with 1
      int recLen=3;
      int val[3];
      int stByte;
      String s;
      char buffer[8];
      int temp;
      // construct one row at a time
      for (int i=prevRecordCount+1;i<=recordCount;i++)  
      { // records start with 1
        stByte=(i-1)*recLen;
        s="";
        val[1]=EEPROM.read(stByte);  // number of intervals 
        val[2]=EEPROM.read(stByte+1);  // mix of channel and temp
        val[3]=EEPROM.read(stByte+2);  // low bits of temp
        // unpack bytes 2 and 3
        val[3]+=(int)((val[2] & B11111)<<8); 
        val[2]=val[2]>>5;  // channel is 3 highest bits
        for (int j=1;j<=3;j++) {s+=prepareNum(val[j]); }
        sendKDdata(s);
      } // for each record
      prevRecordCount=recordCount;
      return 0;
    }
    
    stop("TS");  // should have returned by now

} // end of TEMPSTUFF

int Read_Temperature(void){
  // this function does not require access to shared temperature variables
  unsigned int temp_reading;
  digitalWrite(CS_TEMP,LOW); // Set MAX7765 /CS Low
  delay(5);
  digitalWrite(CS_TEMP,HIGH); // Set MAX7765 /CS High
  delay(200); // wait for conversion to finish..
  // read result
  digitalWrite(CS_TEMP,LOW); // Set MAX7765 /CS Low
  delay(1);
  temp_reading = SPI.transfer(0xff) << 8;  
  temp_reading += SPI.transfer(0xff);  
  digitalWrite(CS_TEMP,HIGH); // Set MAX7765 /CS High
  delay(1);
  // check result
  if(bitRead(temp_reading,2) == 1) return(-1); // Failed / NC Error
  else return((int)(temp_reading >> 5)); //Convert to Degc
}
/// END OF TEMPERATURE PROCESSING

Did you eliminate all hardware possibilities? If you designed your own board, is there a pullup resistor on the reset pin? If you are using the FTDI is there a decoupling capacitor o the DTR line?.

From software, are there any recursive calls that can eat up the stack? In this case the stack and heap will collide and cause all kinds of interesting things.

just some random thoughts.

Good luck

forgot the most obvious one, is your power clean? Are you sure there are no conditions where you might exceed 250 ma before the PTC shuts off the arduino for a couple of micro seconds?

From my understanding, the String class uses dynamic memory allocation and while it may not eat up all the memory, it fragments the memory to the point that the effect is just the same even though there is technically still a bunch of free memory.

I agree with jraskell and Delta_G. You have a classic case of stack overflow and/or SRAM corruption. The very fact that adding debugging statements moves the problem elsewhere screams of stack/SRAM problems and using the String library makes it smell like one too.
The problem is finding and fixing it.
I'd suggest that, where you can, you remove the use of Strings or at least try to abbreviate them, even if only temporarily.

Pete

kenny_devon:
You're point blank condemnation of the String class has me really worried. I use them everywhere. I know they're expensive in program space, and in heap, but that expense is included in my free memory results, and I find them much easier to work with.

When I said to upload the sketch as an attachment, I meant to click on "Additional Options" and upload the whole thing. That saves multiple posts.

Anyway, as soon as I see the String class, I can see a very large potential problem. Especially here:

String listenForInstruction(){
  String inst;

Every time this function is called it makes a String, and then it returns one. So this is thrashing strings. There was a bug report a while back about free not properly freeing memory, which may possibly not be fixed yet.

Even if it is, the reports of free memory can be misleading. You might conceivably have 900 bytes free but they may be 450 x 2 byte slots. And it crashes when it needs a 3 byte slot.

I don't care how easy they are to work with, that will be the problem, almost certainly. Stuff like this could be easily rewritten to not use the String class:

inst=String(GBT_serInString) ;
  if ((inst=="-2") || (inst=="-3")) return "";  // gives -3 at end of file, -2 beyond end

eg.

if (strcmp (GBT_serInString, "-2") == 0 || strcmp (GBT_serInString, "-3") == 0)
  return "";

kenny_devon:
If Strings are that flakey why are they on the Arduino reference pages withou any health warning? What in particular goes wrong with them? Is there somewhere to read up on the issues?

I presume you read this (sticky)? Read this before posting a programming question

On that page it mentions:

Hint: The String class tends to gobble up memory. Try to avoid using that, especially in larger programs.

Guys - many thanks for the input.

Nick I read your sticky, but managed to gloss over the hint - sorry. I'm now terrified of Strings! Damn. I'm going to do a series of tests in the morning.

I'll use a 9volt battery at all times, hopefully to eliminate voltage loss as possible cause. I'm going to take the MUX and Gobetwino out of the picture for now. Then I'm going to let it fail (hopefully!) using only the serial Monitor. This my starting point. Then the tests...

Comment out Serial.read(s) - 1 line only - easy test
Comment out all but one Serial.print - again easy to do - will eliminate potential problem but will still show the resets
remove all Strings -- aagghh

maybe some ZZZs will help

Hi guys

Did a load of tests. Strings are indeed the culprit.

It is nothing to do with the size of the program - even tiny programs can fail.
It is nothing to do with recursion, deep nesting, or the general expense of the String overhead.
It is nothing to do with fragmentation of the heap.

It is simply that the implementation of Strings has a bug which corrupts the Free List (and some fixed string data in the process). You can see it by printing out the heap, and by walking the Free List.

When the corruption occurs, it can cause system failure (system hangs, or system resets), or it can have no apparent effect. If you are printing to the Serial Monitor (or another device on the serial port), you may get a warning of impending collapse – fixed character strings will have one or more initial characters replaced by nulls. Eg A loop containing Serial.println(‘”Hello World”); will start producing “©©llo World”. (the ©© will be little squares).

I’ll give you initial evidence and analysis if I get time tomorrow. Then I’ll try put some limits on when this corruption does and doesn’t occur. It may only occur where String parameters are passed.

Nick - you mentioned an old bug report. Who does the debugging? Where are the bug reports reported- they may save me some time.

Many thanks for steering me in the right direction.

I just came across this thread and reading through to the solution I wanted to add - I ran into a similar problem with String objects, and it seems that whatever caused it, it was introduced with version 1.0 - I have an Arduino Mega that had been running a sketch fine for over 6 months, compiled on 022. I had used Strings quite a bit, but there were never any problems.

Then after 1.0 came out, I updated all my libraries and made some changes to the project running on the Mega. After that, I started getting random hangs and restarts, and there was almost always a 'clue' just before it died. As Kenny said, there would be a string with the first character lost. At first it would happen in one spot, so I'd 'fix' or change that bit of code, then it would happen somewhere else, etc. till finally I rewrote the sketch to remove every last String object.

After removing all the Strings, it's been rock-solid ever since (over 3 months now without any problems.)

Cheers!