Problem code hangs after a few hours - Xbee / Uno / WIZ820io

Hi,

I am verrrrrry new to C, in fact this is my 1st arduino based project, the hardware works ok, but I am having a problem trying to work out why my controller code hangs after running for anything between a few hours and 24+ hours.

The project consists of:

  1. Remote Device - Atmega 328 / Xbee S2 / 10 x DS18B20 / 2 x NC reed switches
  2. Controller Device - Atmega 328 / Cbee s2 / WIZ820io module -> posted to remote website

The code as you will see is mostly documented example code that I have hacked together, so each component, xbee, NTP, ethernet should work on its own, however either the way I have hacked them together or, the efficiency of the code it causing it to hang, I suspect that it is possibly running out of memory, however I have no idea on how to work out why, when or at what point it hangs.

Any help, pointers, ideas as to how to solves this, where the inefficiencies are, how to solve them would be greatly appreciated.

I have eliminated power problem by connecting a 1amp supply, no differance.

Thanks in advance

#include <SPI.h>
#include <Ethernet.h>
#include <EthernetUdp.h>
#include <XBee.h>
#include <SoftwareSerial.h>
#include <Time.h>

XBee xbee = XBee();

ZBRxResponse rx = ZBRxResponse();

// Enter a MAC address for your Wiz820io below.
byte mac[] = {
  0xDE, 0xAD, 0xBE, 0xEF, 0xFE, 0xED};

// server to connect to
char serverName[] = "www.domain.com";

// NTP server address (za.pool.ntp.org)
IPAddress timeServer(196, 25, 1, 9);

// Initialize the Ethernet client library
EthernetClient client;

// UDP instance to send and receive packets over UDP
EthernetUDP Udp;

// Set to offset of your local time (+2hrs in sec's)
const long timeZoneOffset = +7200L; //60sec*60mins*2hrs

// Syncs to NTP server every 1hr 3600 secs
unsigned int ntpSyncTime = 3600;

// Keeps track of how long ago we updated the NTP server
unsigned long ntpLastUpdate = 0;

// local port to listen for UDP packets
unsigned int localPort = 8888;

// NTP time stamp is in the first 48 bytes of the message
const int NTP_PACKET_SIZE= 48; 

//buffer to hold incoming and outgoing packets 
byte packetBuffer[NTP_PACKET_SIZE]; 

// Define SoftSerial RX/TX pins
// Ardunio DIO8 (pin14) to RX of usb-serial device
// Arduino DIO9 (pin15) to TX of usb-serial device
SoftwareSerial portOne(9, 8);

// received data,
// 1 byte tx status, 1 byte each door, 4 bytes each sensor
uint8_t data[43] = { //10 sensors * 4 + 1 tx status + 2 doors
  0,       // tx event
  0,       // door1          
  0,       // door2
  0,0,0,0, // sensor01
  0,0,0,0, // sensor02
  0,0,0,0, // sensor03
  0,0,0,0, // sensor04
  0,0,0,0, // sensor05
  0,0,0,0, // sensor06
  0,0,0,0, // sensor07
  0,0,0,0, // sensor08
  0,0,0,0, // sensor09
  0,0,0,0 // sensor10
};

// union to convery float to byte string
union u_tag {
  uint8_t b[4];
  float f;
} 
u;

// number sensors defined
const int numSensorDefined = 10;

// query string
String query;

// this xbee coordinators 64 bit address
String add64 = "xxxxxxxx";

void setup() 
{
  // start xbee on serial port
  xbee.begin(9600);

  // Start software serial
  portOne.begin(9600);
  
  delay(1000);
  
  portOne.println();
  portOne.println("Start Up Controller ... ");

  // start the Ethernet & UDP connection:
  int i = 0;
  int DHCP = 0;
  DHCP = Ethernet.begin(mac);

  // Try to get dhcp settings 30 times before giving up
  while(DHCP == 0 && i < 30){
    delay(1000);
    DHCP = Ethernet.begin(mac);
    i++;
  }

  if(!DHCP){
    portOne.println("DHCP Failed");
    for(;;); //Infinite loop because DHCP Failed
  }
  portOne.println("DHCP Success");

  // print your local IP address:
  portOne.print("My IP address: ");
  print_ip();

  // Update time via NTP server
  updateTime();

  // send startup notification
  startUp();

}

void loop() 
{  
  // Update time via NTP server every X amount of secs 
  // as determined by ntpSyncTime
  if(now() - ntpLastUpdate > ntpSyncTime) {
    updateTime();
  }

  // check on DHCP
  checkDHCP();

  // read incoming xbee packets
  readData();
}

Sorry, could not post all the code in one post, limited to 9600 chars

// try update time
void updateTime()
{
  int trys = 0;

  while(!getTimeAndDate() && trys < 10){
    trys++;
  }

  if(trys < 10){
    portOne.println("NTP Server Update Success");
  } 
  else {
    portOne.println("NTP Server Update Failed");
  }
}

// get timestamp from NTP server
int getTimeAndDate() 
{
  int flag = 0;
  Udp.begin(localPort);

  // send an NTP packet to a time server
  sendNTPpacket(timeServer);

  delay(1000);
  if (Udp.parsePacket()){
    // We've received a packet, read the data from it

      // read the packet into the buffer
    Udp.read(packetBuffer,NTP_PACKET_SIZE); 

    //the timestamp starts at byte 40 of the received packet and is 
    // four bytes,or two words, long. First, extract the two words:
    unsigned long highWord, lowWord, epoch;
    highWord = word(packetBuffer[40], packetBuffer[41]);
    lowWord = word(packetBuffer[42], packetBuffer[43]);  

    // combine the four bytes (two words) into a long integer
    // this is NTP time (seconds since Jan 1 1900):
    epoch = highWord << 16 | lowWord;

    // now convert NTP time into Unix time:
    // Unix time starts on Jan 1 1970. In seconds, that's 2208988800:
    epoch = epoch - 2208988800 + timeZoneOffset;
    flag=1;
    setTime(epoch);
    ntpLastUpdate = now();
  }
  return flag;
}

// send an NTP request to the time server at the given address 
unsigned long sendNTPpacket(IPAddress& address)
{
  // set all bytes in the buffer to 0
  memset(packetBuffer, 0, NTP_PACKET_SIZE); 
  // Initialize values needed to form NTP request
  // (see URL above for details on the packets)
  packetBuffer[0] = 0b11100011;   // LI, Version, Mode
  packetBuffer[1] = 0;     // Stratum, or type of clock
  packetBuffer[2] = 6;     // Polling Interval
  packetBuffer[3] = 0xEC;  // Peer Clock Precision
  // 8 bytes of zero for Root Delay & Root Dispersion
  packetBuffer[12]  = 49; 
  packetBuffer[13]  = 0x4E;
  packetBuffer[14]  = 49;
  packetBuffer[15]  = 52;

  // all NTP fields have been given values, now
  // you can send a packet requesting a timestamp: 		   
  Udp.beginPacket(address, 123); //NTP requests are to port 123
  Udp.write(packetBuffer,NTP_PACKET_SIZE);
  Udp.endPacket(); 
}

void readData()
{
  // read any available packet data
  xbee.readPacket();

  // check if packet was received
  if (xbee.getResponse().isAvailable()) {
    if (xbee.getResponse().getApiId() == ZB_RX_RESPONSE) {
      xbee.getResponse().getZBRxResponse(rx);

      // now fill our zb rx class
      xbee.getResponse().getZBRxResponse(rx);

      // read payload into data array
      for (int i = 0; i < rx.getDataLength(); i++) {
        data[i] = rx.getData(i);        
      }

      // build query from packet data
      buildQuery();
    } 
  }
}

void buildQuery()
{
  // build query string
  query="";

  // add timestamp to query
  query += "?timestamp=";
  String ts = timeStamp();  
  query += ts;

  // where did this packet originate from?
  // get remote sending xbee 64 bit address
  String id = get64Add();

  // add address to query
  query += "&serial=";
  query += id;

  // add tx status flag to query
  query += "&txflag=";
  query += data[0];

  // add doors to query
  query += "&door1=";
  query += data[1];
  query += "&door2=";
  query += data[2];

  // add sensors to query
  query += "&sensor=";

  int j = 3; // starts after tx flag and doors

  for (int i = 0; i < numSensorDefined; i++) {

    // convert temps from byte array back to floats
    for (int i = 0; i < 4; i++){
      u.b[i] = data[i+j];
      if (i == 3) {
        j = j + 4;
      }
    }

    // convert float to string, adding
    // a comma between sensor values
    float f = u.f;
    char t[7];
    dtostrf(f, 1,1,t);
    query += t;

    // only add comma if more values to come
    if (i != numSensorDefined - 1) {
      query += "%2C";
    }

  }

  // post data to website
  postData();  
}

void postData()
{
  if (client.connect(serverName,80)) {
    portOne.print("Successfully connected to ");
    portOne.println(serverName);
    portOne.println();

    // send POST request to website
    client.print("POST /logger/remoteAdd");
    client.print(query);
    client.println(" HTTP/1.1");
    client.println("Host: www.domain.com");
    client.println();

    // prints POST request out for debugging
    portOne.print("POST /logger/remoteAdd");
    portOne.print(query);
    portOne.println(" HTTP/1.1");
    portOne.println("Host: www.domain.com");
    portOne.println();
  }

  if (client.connected()) {
    portOne.print("Disconnecting from ");
    portOne.println(serverName);
    client.stop();
  }
}

// create string timestamp
String timeStamp()
{

  String ts, y, n, d, h, m, s;

  y = String(year());
  n = padDigits(month());
  d = padDigits(day());
  h = padDigits(hour());
  m = padDigits(minute());
  s = padDigits(second());

  ts = "";
  ts += y;
  ts += "-";
  ts += n;
  ts += "-";
  ts += d;
  ts += "+";
  ts += h;
  ts += "%3A";
  ts += m;
  ts += "%3A";
  ts += s;

  return ts;
}

// pad with "0" if needed
String padDigits(int digits)
{
  String d = "";
  if(digits < 10) {
    d += "0";
  }
  d += String(digits);
  return d;  
}

// get unique 64bit remote xbee address
String get64Add() {

  String remoteAddress;
  remoteAddress = String(rx.getRemoteAddress64().getMsb(), HEX);
  remoteAddress += String(rx.getRemoteAddress64().getLsb(), HEX); 

  // we only need address low
  //remoteAddress = String(rx.getRemoteAddress64().getLsb(), HEX); 

  return remoteAddress;

}

void print_ip(){
  for (byte thisByte = 0; thisByte < 4; thisByte++) {
    // print the value of each byte of the IP address:
    portOne.print(Ethernet.localIP()[thisByte], DEC);
    portOne.print(".");
  }
  portOne.println();
}

void checkDHCP() {

  /*
   returns:
   0: nothing happened
   1: renew failed
   2: renew success
   3: rebind fail
   4: rebind success
   */
  int status = Ethernet.maintain();

  if(status){
    portOne.println("A renew or rebind event happened while maintaining the dhcp lease");
    portOne.print(status, DEC);
    portOne.println(" was the returned status");
    portOne.print("IP address after renew/rebind:");
    print_ip();
  }
}

void startUp() {

  // build query string
  query="";

  // add timestamp to query
  query += "?timestamp=";
  String ts = timeStamp();  
  query += ts;

  // add address to query
  query += "&serial=";
  query += add64;

  // add tx status flag to query
  query += "&txflag=";
  query += data[0];

  // add doors to query
  query += "&door1=";
  query += data[1];
  query += "&door2=";
  query += data[2];

  // add sensors to query data[3 - 43]
  query += "&sensor=";

  // post data to website
  postData();
}

SoftwareSerial jumps out at me right away. It can't send and receive at the same time and has interrupts disabled during tx/rx. At 9600 baud or less, Timer0 interrupts will be missed and cause issues with delay(), millis() and perhaps micros(). On top of that, any characters received while sending will be lost. Could any of this explain your mysterious hangs?

If you only need one additional serial port, I recommend you use AltSoftSerial. It saved the day for me, but will take over a Timer (2 I think) disabling PWM on the associated pins. It has hard coded pins, but they happen to be 8 and 9, which are the ones you are using.

http://www.pjrc.com/teensy/td_libs_AltSoftSerial.html

It might make sense actually, now that I think of it, when I had CuteCom running to look at the serial messages the hanging seemed to happen more often and quicker.

At the end of the day I only need SoftwareSerial to print info and errors, in a live environment it is not needed as the the controller runs in the background and has no user interface, it is just a conduit to get the data to the website, all logging and data manipulation etc is down on the website.

I will try disabling SoftwareSerial and commenting out all print statements and see what happens. If it helps then I will try AltSofSerial

Cheers :slight_smile:

Hmmm, now that i think some more, I dont think it is, SoftwareSerial is on 8/9 it is tx only nothing is being sent to the micro via this port.
The xbee is setup on the default hardware 0/1 port. So I think that I am correct in saying there should be no conflict there?

It's good that you are only sending, since you won't experience data loss on that port. Still, since you are sending at 9600 baud, it is keeping interrupts disabled for the duration of each byte being sent. This would delay interrupt routines for more than 1mS (millisecond: an eternally long time for an ISR to wait). You would be better off if you increased the baud rate on the software serial port, as the faster the baud rate the less time it takes to send a byte and get on with things.

Sorry last post based on ignorance and lack of Ardunio experience, reading the notes on the AltSoftSerial website, it would appear that I should use it rather that rather than the default SoftwareSerial library, correct?

JeffS:
Sorry last post based on ignorance and lack of Ardunio experience, reading the notes on the AltSoftSerial website, it would appear that I should use it rather that rather than the default SoftwareSerial library, correct?

I honestly believe so. I've only been messing with the arduino for about a month, but I've been tinkering with micros for a very long time. If you can use it (only one port), then AltSoftSerial is much better from a technical standpoint.

Sorry being a bit thick here, when you say if you can get away with using only one port do you mean:
Dont use the hardware serial port 0/1 and use AltSoftSerial 8/9 for communication with both the xbee and the serial port?

No, I'm sorry. Always use the hardware ports first. SoftwareSerial allows you to create multiple instances for multiple ports. Alt soft serial is pretty well tied to the hardware (since it uses hardware features to operate, such as interrupts and the output compare facility to set pins at the right time during transmit) so it only creates one port(for tx and rx of course).

Well it been running now for almost 24 hours without a hiccup, so looks like I might have solved the problem.
Just to confirm what I have done for anyone with similar problem of code hanging after a number of hours that
is using SoftwareSerial.

I increased baud rate on SoftwareSerial port from 9600 to 115200, it was also suggested that I use the AltSoftSerial
library rather, I have not done so as yet but will if it manifests again.

Thanks to @afremont for posting the solution

It ran for about 3 days then hung again....

I have removed all print calls and removed the SoftwareSerial lib to see if that prevents the hangs.
I only need the print calls when setting up and debugging, it is not needed when running live.

I tried the AltSoftSerial lib but could not get it to work properly, the calls to client.print() that posts
data to website contained half garbled messages, the query part should be something like:

"?timestamp=2012-10-23+15%3A10%3A13&serial=13c20032465765&sensor=-127.0%2C-127.0%2C-127.0%2C-127.0%2C-127.0%2C-127.0%2C-127.0%2C-127.0%2C-127.0%2C-127.0"

Instead the first part of query is ok and half the sensors readings are gibberish, gobbldy gook.

"?timestamp=2012-10-23+15%3A10%3A13&serial=13c20032465765&sensor=-127.0%2C-127.0%2C-127.0%2C-127.0%2C-127.0%2C\B4\00\00\00\\B4\00\00\00\B4\00\00\00"

I think I have a flaw in my code that might be causing this, if removing the serial library and print calls
does not fix the hangs then i have to assume that is the case.

Any ideas on what I might be doing wrong with the AltSoftSerial would be appreciated.

I don't think it's AltSoftSerial if you downloaded it from pjrc.com. That looks like a 32 bit int to me. Is the number 180 meaningful? I'd say it's quite possible that your stack and heap have met each other and you are out of RAM.

Sorry that just me writing the type of rubbish that was there not actual, would the actual gibberish help you say what might be happening?
If so I will run the AtlSoft lib again and copy the exact output.

It hung again after only a few hours, and that was with no Software Serial library included, all calls to serial.print removed.

So I have something else wrong, I just have no idea on how to fault find it.....

Yes library from pjrc.com

You might want to dig around for some of the methods to determine how much free RAM you have left, but removing the serial library should have freed some up; so maybe look harder at the stack possibly running into/over the HEAP. I'm thinking you could it's always possible to have a pointer running amuck somewhere.

If you find that you should have plenty of RAM, you could define an array and initialize it to some specific value other than 0 at runtime. Then integrity check it at the start of every loop() to look for damage. If you have spare pins, use LEDs to try and determine what it was doing or where it was before it hung.

You could buy a Dragon debugger. :wink:

As to pointers in my code, I dont use them as I have not as yet mastered them.
Yes have been looking at code to see what is happening with available ram.

I came across a F function used to help out with memory issues, that might help so I can still use the print functions,
but still need to fix the underlying code problem.

Serial.println("This string is in SRAM and FLASH");
Serial.println(F("This string is not in SRAM, just FLASH."));

Thanks for the pointers :slight_smile: will play some and see what I can come up with

Just in case you were interested, the actual gibberish that was being printed was this:

POST /logger/remoteAdd?timestamp=2013-03-18+10%3A21%3A44&seri\0x08 1\0x00\0x00\0x00\0x08\0x02\0x00A 1\0x00 1\0x006\0x00\0x04\0x05\0xe4\0x08\0x9e w\0x00\0x04\0x00\0x00\0x01\0xb6\0x08\0x1e\0xf7 \0x92\0x00\0x1ex\0x08\0xb6\0x00 HTTP/1.1

Should be:

POST /logger/remoteAdd"?timestamp=2012-10-23+15%3A10%3A13&serial=13c20032465765&sensor=-127.0%2C-127.0%2C-127.0%2C-127.0%2C-127.0%2C-127.0%2C-127.0%2C-127.0%2C-127.0%2C-127.0 HTTP/1.1

I would put my money on this:

// query string
String query;

The String data type has been unreliable, and causes weird crashes. I use a character array instead.

Maybe I'm wrong, but I think that the XBee library works in API mode, which is not the default mode in the XBee modules, if you want to use them in API mode (in your case you are using Series 2 modules) you have to install the API firmware on the XBee modules.......you could use the simplest way using the XBee ins transparent mode using it like a normal serial transmitter/receiver.......

@Conguito
Thanks, the xbee are in API mode, that part part of the code work great, its off the shelf code so I dont think there is much I can do with it to to make it work any better.

@SurferTim
Thanks, you have pretty much hit the nail on the head, in my very limited knowledge way, I have narrowed it down to running out of memory.
I think I proved this by:

a. removing softwareserial lib and all serial.print calls, the code runs without hanging.
b. running available ram check calls at different points, it went as low as 124 at one point.
Put all serial stuff back:
c. putting all serial.print fixed strings into flash using F(), lowest was 564

It is now running with all my original serial stuff plus additional ram checks and serial prints and it appears to be running ok, 12 hours and counting.

You are right I think about the query string, I suspect that a lot of the code I wrote to put that string comprising the xbee data, timestamp etc is very inefficient.
I just do not quite know how to re-factor it. This is my 1st C project so have a lot to learn about the basics and structure of the language.

Any pointer on converting my query string to a character array?

Afterthought, I thought text strings in C were character arrays?