Arduino randomly stops working after a few days

The Problem:

Arduino Mega 2560 randomly stops sending information to my server after 1-3 days of functioning

The Situation:

  • Board is Arduino Mega 2560 w/ Ethernet Shield
  • This Arduino has 6x DHT-22 sensors which gathers temperature and humidity data which are compiled into a JSON which is sent over MQTT (using ethernet) to my nodejs server (stores it to a database, displays it to Grafana, etc)
  • The Arduino is currently being powered by a AC to DC 9V 1A Plug
  • Operating in temperatures around 24-30 degrees Celsius
  • Pressing the "reset" button on the board will make it reconnect and return to normal without having to unpower the board. So I can just toggle reset every day but that isn't a viable long-term solution.

What I've tried:

Hardware

  • I've tried 3 different Arduino Mega 2560 boards
  • I've tried a different power supply (current one is specified above in "The Situation", and the first one overheated the 1st Arduino board after about a month of use... lol)
  • Thought it could be the network cutting out, but I've tried directly unplugging the Ethernet from the board, waiting an hour, and reconnecting it. And it will reconnect to the server fine.

Software

  • I thought it was a memory issue/leak, but testing shows that's not the case. The stackAvailable() function on line 269 outputs how many bytes are free in memory, and I've been including that in the payload JSON to log it to the DB/Grafana. The amount of memory available stays consistent all the way until it stops sending data.
  • Replaced all Strings with char[] (did this over a month ago)
  • Tried both strcpy vs strncpy (scraping the bottom of the barrel for solutions)
  • Bit unrelated, but the ArduinoMqttClient library has a limit to the payload size (it cuts off anything exceeding it), to fix that you must directly change the source code of the library (I did this a while ago, so I've since forgotten where/how exactly)

Schematic

Code

// Program Libraries
#include <SPI.h>                // Serial Peripheral Interface
#include <Ethernet.h>           // Ethernet
#include <ArduinoJson.h>        // JSON
#include <ArduinoMqttClient.h>  // MQTT

// Sensor Libraries
#include <DHT.h>  // DHT Sensors

bool debug = true;    // Print debug logs
bool readings = true; // Print data logs

//
// PROGRAM VARIABLES
//

int loopDelay = 3000;            // MUST BE GREATER THAN 2000 (2 seconds) - Due to limitation of sensors
char payload[500] = {};          // JSON Payload variable
char recievedMessage[500] = "";  // Inbound MQTT Payload variable

//
// NETWORK VARIABLES
//

byte mac[] = { 0xAA, 0xBB, 0xCC, 0xDD, 0xEE, 0xFF };  // This Device's Mac Address

IPAddress server(192, 168, 1, X);  // Server's IP Address
int port = 1883;                   // Server's MQTT Port
IPAddress ip(192, 168, 1, X);      // This Device's IP Address
IPAddress gateway(127, 0, 0, 1);   // Default Gateway

EthernetClient eclient;            // Ethernet Client
MqttClient client(eclient);        // MQTT Client using the EthernetClient, 256byte payload size limit
char mqttAuthUser[] = "USERNAME";  // MQTT USER
char mqttAuthPass[] = "PASSWORD";  // MQTT PASSWORD
char listenTopic[] = "topic/subtopic";
char postTopic[] = "topic/subtopic";
char deviceID[] = "DEVICE_ID";

//
// SENSOR VARIABLES
//

// TEMPERATURE+HUMIDITY SENSORS
#define PIN_DHT_F1 8
#define PIN_DHT_F2 3
#define PIN_DHT_F3 2
#define PIN_DHT_B1 7
#define PIN_DHT_B2 6
#define PIN_DHT_B3 5
#define DHT_TYPE DHT22
DHT dhtF1(PIN_DHT_F1, DHT_TYPE);  // Front-1 Temperature Sensor (top)
DHT dhtF2(PIN_DHT_F2, DHT_TYPE);  // Front-2 Temperature Sensor (middle)
DHT dhtF3(PIN_DHT_F3, DHT_TYPE);  // Front-3 Temperature Sensor (bottom)
DHT dhtB1(PIN_DHT_B1, DHT_TYPE);  // Back-1  Temperature Sensor (top)
DHT dhtB2(PIN_DHT_B2, DHT_TYPE);  // Back-2  Temperature Sensor (middle)
DHT dhtB3(PIN_DHT_B3, DHT_TYPE);  // Back-3  Temperature Sensor (bottom)

// Sensors: dhtF1, dhtF2, dhtF3
float lastFrontHumidity[3] = { 999.999, 999.999, 999.999 };
float lastFrontTemp[3] = { 999.999, 999.999, 999.999 };
// NOTE: 999.999 is used as a "filter" on the server's backend which are ignored  
// Sensors: dhtB1, dhtB2, dhtB3
float lastBackHumidity[3] = { 999.999, 999.999, 999.999 };
float lastBackTemp[3] = { 999.999, 999.999, 999.999 };


void SetupNetwork() {
  Ethernet.begin(mac, ip, gateway);  // Begin Ethernet connection
  Ethernet.setDnsServerIP(ip);       // Set Ethernet IP
  client.setClient(eclient);         // Set MQTT Client as the EthernetClient
  // client.setUsernamePassword(mqttAuthUser, mqttAuthPass);  // Set Username & Password for MQTT Connection
  client.onMessage(NewMqttMessage);  // MQTT Messages are passed through the RecievedMessage Function
  client.connect(server, port);      // Connect to MQTT Broker
  client.subscribe(listenTopic);     // Set MQTT subscribed topic

  delay(1000);

  if (debug) {
    // Display this Device's IP Address
    Serial.print("\nArduino's IP --> ");
    Serial.println(Ethernet.localIP());
    Serial.println();

    // Display this Device is attempting to connect to the MQTT Server & Port
    Serial.print("connecting to ");
    Serial.print(server);
    Serial.print(":");
    Serial.print(port);
    Serial.println("...");
  }

  // If MQTT Server is connected, print the Server IP and TOPIC it is set to publish to
  if (client.connect(server, port)) {
    Serial.print("connected to ");
    Serial.print(eclient.remoteIP());
    Serial.print(":");
    Serial.println(port);

    Serial.print("Publishing to: ");
    Serial.println(postTopic);

    Serial.print("Subscribing to: ");
    Serial.println(listenTopic);
    Serial.println('\n');
  }

  else {
    Serial.print("MQTT connection failed! Error code = ");
    Serial.println(client.connectError());
  }
}

void MaintainNetwork() {
  Ethernet.maintain();
  if (!client.connected())  // Checks if this Device is not connected to Server
  {
    if (debug)
      Serial.println("---Disconnected! Attempting to Reconnect---");
    SetupNetwork();
  }
}


// Incoming MQTT Message Handler
void NewMqttMessage(int messageSize) {
  Serial.println("NEW MESSAGE!");
  /*strcpy*/ strncpy(recievedMessage, "", 500);

  while (client.available()) {
    //recievedMessage = recievedMessage + ((char)client.read());
    /*strcpy*/ strncpy(recievedMessage, (recievedMessage + ((char)client.read())), 500);
  }

  if (debug) {
    Serial.println("");
    Serial.print("Received a message with topic '");
    Serial.print(client.messageTopic());
    Serial.print("', length ");
    Serial.print(messageSize);
    Serial.println(" bytes:");
    Serial.println(recievedMessage);
    Serial.println();
  }

  //MessageHandler(recievedMessage); // Function does not exist yet - code just here for when I implement it in the future
}


void SetupTemperatureSensors() {
  dhtF1.begin();
  dhtF2.begin();
  dhtF3.begin();
  dhtB1.begin();
  dhtB2.begin();
  dhtB3.begin();
}

void CheckDHT(DHT sen, int i) {
  float h = sen.readHumidity();     // Get humidity %
  float t = sen.readTemperature();  // Get temperature (celcius)
  delay(100);
  // IF h AND t BOTH HAVE VALUE:
  if (!isnan(h) && !isnan(t)) {
    if (i >= 3) {  // if index >= 3 then it is a back sensor
      i -= 3;
      lastBackHumidity[i] = h;
      lastBackTemp[i] = t;
    } else {  // if not, use normal index as its a front sensor
      lastFrontHumidity[i] = h;
      lastFrontTemp[i] = t;
    }
  } else { // Yes im aware i can remove the else statement and make t and h just = 999.999 if invalid, but... "dont fix what isnt broken"
    if (i >= 3) {  // if index >= 3 then it is a back sensor
      i -= 3;
      lastBackHumidity[i] = 999.999;
      lastBackTemp[i] = 999.999;
    } else {  // if not, use normal index as its a front sensor
      lastFrontHumidity[i] = 999.999;
      lastFrontTemp[i] = 999.999;
    }
  }
}

void UpdateTemperatureHumidityData() {
  for (int dhtSenIndex = 0; dhtSenIndex < 6; dhtSenIndex++) { // Loop and update all DHT sensors 
    switch (dhtSenIndex) {
      case 0:
        CheckDHT(dhtF1, dhtSenIndex);
        break;
      case 1:
        CheckDHT(dhtF2, dhtSenIndex);
        break;
      case 2:
        CheckDHT(dhtF3, dhtSenIndex);
        break;
      case 3:
        CheckDHT(dhtB1, dhtSenIndex);
        break;
      case 4:
        CheckDHT(dhtB2, dhtSenIndex);
        break;
      case 5:
        CheckDHT(dhtB3, dhtSenIndex);
        break;
      default:
        Serial.println("Function UpdateTemperatureHumidityData() switch statement hit default somehow?!");
        break;
    }
  }
}

void SetupSensors() {
  SetupTemperatureSensors();
  // More functions would be here for other sensors (not implemented yet)
}

void UpdateSensorData() {
  UpdateTemperatureHumidityData();
  // More functions would be here for other sensors (not implemented yet)
}

void CompilePayload() {
  /*strcpy*/ strncpy(payload, "", 500);  // Not sure what is better to use (strcpy vs strncpy)

  JsonDocument doc;
  doc["type"] = "new_data";
  doc["device"] = deviceID;
  JsonObject data = doc.createNestedObject("data");
  JsonObject temperature_sensors = data.createNestedObject("temperature_sensors");  //dht
  JsonObject humidity_sensors = data.createNestedObject("humiditiy_sensors");       //dht
  if (debug) { doc["free_ram"] = stackAvailable(); }                                // Used for debugging (shows remaining memory)

  // TEMPERATURE
  temperature_sensors["F1"] = lastFrontTemp[0];
  temperature_sensors["F2"] = lastFrontTemp[1];
  temperature_sensors["F3"] = lastFrontTemp[2];
  temperature_sensors["B1"] = lastBackTemp[0];
  temperature_sensors["B2"] = lastBackTemp[1];
  temperature_sensors["B3"] = lastBackTemp[2];

  // HUMIDITY
  humidity_sensors["F1"] = lastFrontHumidity[0];
  humidity_sensors["F2"] = lastFrontHumidity[1];
  humidity_sensors["F3"] = lastFrontHumidity[2];
  humidity_sensors["B1"] = lastBackHumidity[0];
  humidity_sensors["B2"] = lastBackHumidity[1];
  humidity_sensors["B3"] = lastBackHumidity[2];

  serializeJson(doc, payload);  // Set JSON data to payload variable

  if (readings) {
    Serial.println(payload);
  }
}

void SendData() {
  client.subscribe(postTopic);  // Subscribe to the PostTopic

  // Send the payload to the Server
  client.beginMessage(postTopic);
  client.print(payload);
  client.print("");
  client.endMessage();
  client.subscribe(listenTopic);
}

unsigned int stackAvailable()  // FOR DEBUGGING - TESTING RETURNS 7948 (in empty sketch) THIS PROGRAM RETURNS ~5219 bytes
{
  extern int __heap_start, *__brkval;
  unsigned int v;
  return (unsigned int)&v - (__brkval == 0 ? (unsigned int)&__heap_start : (unsigned int)__brkval);
}

void setup() {
  Serial.begin(9600);
  if (debug) readings = true;
  SetupNetwork();
  SetupSensors();
}


void loop() {
  // The loop will run every 2 seconds (due to hardware limitations of sensor read-time)
  // During each loop all data will be collected, compiled into a JSON, and sent over MQTT
  MaintainNetwork();
  UpdateSensorData();
  CompilePayload();
  SendData();
  delay(loopDelay);
}

I've been trying to solve this problem for around around 2 months now, I can't figure out what is causing it to stop sending data, and if it is something with the network or MQTT, or if its the arduino board, or code.
This is my first post on the forums, but I've gone through here enough by now that I know what you guys like to see (ALL the code, schematics, etc) so hopefully I've captured all the information you need.

It's a bit of a big post so a massive thanks to those willing to help :slight_smile: :folded_hands:

1 Like

Powering the Mega with 9 volt and then using its 5 volt converter as a powersupply is basicly wrong.
The ethernet shield plus the 6 sensors are suspected to overload the onboard 5 volt converter.
The breadboard in the picture tells nothing.
Your third hardware test indicates a failure outside the controller. Else it would not restart.

Try installing test serial.print at strategic places to find out where the execution gets stuck.

HI, @tc-rtd
Welcome to the forum.

Do you have a DMM? Digital MultiMeter?

Thanks.. Tom.... :smiley: :+1: :coffee: :australia:

Powering the ega with 9 volt and the using its 5 volt converter as a powersupply is basicly wrong.

I assumed this could be a problem, would you recommend using a different powersupply or instead using an alternate powersource to power the sensors, or both?

The breadboard in the picture tells nothing

What information are you looking to find from it? It's a simple circuit (which adds to the frustration of it not working)

Your third hardware test indicates a failure outside the controller. Else it would not restart.

Are you suggesting that it could be either the network or MQTT broker that is causing the arduino to freeze?

Try installing test serial.print at strategic places to find out where the execution gets stuck.

The arduino works as expected (until freezing), the code I sent is a cleaned up version (without the 300 print statements i had :smile:). Using my laptop I can monitor the serial output but how would I do this without my laptop connected? Is there a better way other than just sending my "print statements" over mqtt and using the server to log that? Reconnecting my laptop after the arduino stops working will "reboot" the arduino as it will start working again.

Thanks for the welcome fellow aussie :slight_smile:
Yes I do have a DMM... somewhere, I'll have to look around for it :laughing:

If simply plugging the laptop into the arduino fixes the problem that suggests that it's power related.

What I dont understand with it being a power problem, is that it still works as intended for days before randomly stopping. What in regards to power could be causing it to suddenly stop? There's no reason for any spikes of power consumption in the system so I can't imagine why suddenly the power exceeds/undermines the board's capacity. I'm not saying it isn't power related (I think it could be), but I don't know if its over or under power capacity (as its inputting 9 volts, and the Mega2560 requires 7-12 volts). Do I need to power the DHT22 sensors separately, or change power supply? From my understanding it is within specifications.

There is more to battery specifications than volts.

How many amps/milliamps does your circuit use? How does that compare to your battery's discharge rate? And what happens if you use too much?

I'm not using a battery, rather just a 9V power supply plug, you are right that amps are an important factor as well.

  • The power supply provides 1 A
  • The mega's chip from my understanding is rated for 200mA
  • The arduino supports 40mA max on a single pin
  • The DHT22 sensor requires at max 2.5mA (while requesting data)

The program checks and stores the temp/humidity data sequentially, so only 1 sensor should be checked at a time.

Adding all this up, we're still well under the halfway point of the available current from the power supply.
Noted; I am not the best when it comes to hardware and circuitry (im much more of a programmer than electrical engineer) and I probably lack some fundamental knowledge of power. So if I am mistaken, please do correct me :slight_smile:

Hi,
Can you measure the current to the Mega and the input voltage?

This way we can determine the power loss in the onboard 5V linear regulator.

Thanks.. Tom.... :smiley: :+1: :coffee: :australia:

Sorry for the late reply, my knowledge with DMMs is very limited, and I don't know how to safely (without destroying the DMM or the circuit/components) test the voltage and current.
With some research and testing on a spare board, my understanding is to have one of the pins touching a GND whilst the other to a power? I tested this in the image below


The DMM read 8.8V and 0.3A from the plug socket. (I connected the pins to where the two arrows are)

Is this the correct way to measure current/voltage into the board? If so I'll give it a test on the proper circuit.

Note: this is using the same power supply as the main circuit, just a different board (with another ethernet shield, however its not being used so idk if its drawing any extra power + no sensors), so if the results here still answer your question, awesome :slight_smile:

That's the correct way to measure the voltage. You could also have used either of the other two solder pads under the barrel socket as ground.

But you cannot measure current this way. I'm not sure what that 0.3A reading you got means.

To measure current, you have to get the current to be measured to pass through the multimeter. So you need to break the circuit at some point and use the multimeter probes to complete the circuit again. Not sure how to do this with your PSU.

If you set your multimeter on current mode and connect the probes to the places you indicated, you would cause a short-circuit and I would expect a much higher reading than 0.3A

You seem to be concerned with the power (current) consumption of the Mega and the sensors. But you seem to be ignoring the Ethernet shield. That probably consumes more current than all the others combined, possibly around 150mA.

Does the ethernet shield have a barrel socket of its own? If not, does have it's own regulator which feeds of the Mega's barrel socket?

The chip on the Ethernet shield runs at 3.3V and will probably have it's own 3.3V regulator on-board. If it takes 9V from it's own or the Mega's barrel socket, that regulator will need to dissipate (9-3.3)*0.150 = 0.85 Watts as heat. That could cause overheating problems, especially if there is restricted airflow.

You could consider replacing your PSU for a 7.5V model.

Can’t help personally but in view of your comment I wonder if there are any clues within the 32 Issues outstanding in that gihub library?

How much variation is there in the time before it shuts down? Does it ever happen within an hour or two of starting up or being reset, or is it always at least a few days? Is the time before failure always exactly the same, or nearly so? Does it ever run for a very long time before failing, like a week or so?

It seem to me that if you hit reset after it boggs down, and it then continues to run for days, it's probably not a power problem. Does the regulator ever get hot?

I just wonder if you are running up against some kind of automatic time out on MQTT or something, and your code is just waiting for something that's never going to happen. Maybe it would make sense to set up the watchdog timer. Then it would at least reset quickly when it dies.

The arduino stays connected for usually around 24-48 hours, sometimes more sometimes less. There's no pattern to it from what I can tell it is genuinely randomly. On the old power supply the board would get very hot (it literally burnt and stopped working lol) - but with the current 9V one it operates fine and isn't getting overheated. I remember in the past my system would automatically disconnect after exactly 8 hours due to a software timeout (of MQTT broker or SQL but i forgot which one it was), but i've long since changed the timeout in the linux machine's configs for it and fixed that.

I have a "theory" that the MQTT broker on the server gets hung for some reason, and in return the arduino gets stuck as well. Might be worth implementing a check to see if the MQTT client is connected before sending data? I'm really running out of ideas :frowning:

Haven't looked much into watchdog timer, will need to take a look at it next week. I have been considering using the RESET pin to just reset the board after 24 hours, but that's not a proper fix to whatever this problem is.

I wont be able to look at it until mid-next week (am away), but yeah the ethernet shield pulls its power from the mega. However neither the mega or the ethernet shield are overheating (from what i can tell), even after hours of usage.

So you got a damaged board (Mega?), but it's running fine on 9 V you say? What voltage did the old PSU output, according to the specs and according to your measurements?

No that board is fried (no longer works), I am using a different one now (sorry I should've made that clear).
From memory it was a 12V power supply.

I'm starting to think @ShermanP might be on the right path with it being a potential MQTT issue, as the random disconnects don't act like how I would expect a power issue to act. I might add some extra checks in my code to make sure MQTT is connected before sending data, but my code should be doing that every loop anyway...

I think you might be right: ArdunioMqttClient - Issue 14
Will need to wait until im back to make changes, but hopefully its one of the solutions in here