MQQT locks up about once every 2 months

I have a sketch that reads the frequency of my electrical off grid supply and then sends mqtt messages to turn appliances on and off. It also receives mqtt messages. For some reason every so often the mqtt seems to lock up and it will neither receive or transmit messages. I don't have anything connected to the serial port so can't see why this is happening. This has happened about 3 times in the last 6 months and I have to power down the Arduino and restart it.

I'm wanting to add a bit of code that will monitor the mqtt and if it sees a lock up will reboot the Arduino. I'm not sure how to best to do it and I've had a search around but have not found anything suitable.

Any ideas?

Posting your current sketch would be a good start

I haven't posted it as it is that large at over 2000 lines I thought it would just complicate things.
I'm not looking for the error as i don't think it's within the sketch just a pointer to a way to get the Arduino to reboot.

The problem could be as simple as writing outside the bounds of an array but as we don't know whether the sketch even uses arrays or even which Arduino board it is running on it is difficult to provide help

I've cut the sketch down to give an idea of how it's set up. It's on a mega. The sketch continues to run, as it's still turning things on and off that are hardwired to it and the LCD screen still changes, it's just the MQQT.

#include <Arduino.h>
#include <FreqMeasure.h>
#include <elapsedMillis.h>
#include <Wire.h>
#include <LiquidCrystal_I2C.h>
#include <SPI.h>
#include <Ethernet.h>
#include <PubSubClient.h>

// Update these with values suitable for your hardware/network.
byte mac[]    = {  0xDE, 0xED, 0xAB, 0xEF, 0xFE, 0xDE };
IPAddress ip(192, 168, 3, 183);
IPAddress server(192, 168, 3, 196);

EthernetClient ethClient;
PubSubClient client(ethClient);

long lastReconnectAttempt = 0;

// Set the pins on the I2C chip used for LCD connections:
//                    addr, en,rw,rs,d4,d5,d6,d7,bl,blpol
LiquidCrystal_I2C lcd(0x3F, 2, 1, 0, 4, 5, 6, 7, 3, POSITIVE);                        // Set the LCD I2C address
//LiquidCrystal_I2C lcd(0x27, 2, 1, 0, 4, 5, 6, 7, 3, POSITIVE);


const byte noOfTriacs = 15;                                                           // <-- total number of ssr's inc pwm
const byte noOfPWMTriacs = 6;                                                         // <-- total number of pwm triacs

float frequencyHigh = 50.35;                                                          // frequency at which next SSR will turn on. Note PWM starts 0.15 Hz below this.
float frequencyLow = 50.02;                                                           // frequency at which next SSR will turn off
float interTriacDelayOn = 1.5;                                                        // additional seconds before next higher SSR can come on (0.1 sec already in sketch)
float interTriacDelayOff = 2;                                                         // additional seconds before next lower SSR can go off (0.1 sec already in sketch)

//float freqCal = 0.045;                                                                // calibration value Hz Desk spare
float freqCal = 0.005;                                                                // calibration value Hz Utility

const byte pinForTrigger[noOfTriacs] = {40, 26, 27, 28, 29, 40, 40, 40, 30, 31, 40, 32, 33, 34, 35};    // none PWM SSR's (pin 40 is just to allocate a pin and not used)
const byte pwmPin[noOfPWMTriacs] = {12, 8, 44, 11, 7, 40};                                               // PWM SSR's (last one not used)

const byte outputModePoolSelectorPin = 22;                                            // digital
const byte outputModeGridSelectorPin = 23;                                            // digital

int pwmglobalPin = 24;                                                                //global on/off for pwm SSR's

String temp_str;
bool temp_strb;
char temp[50];


// Timers for timed loads

const long delayOn[noOfTriacs] = {0, 0, 60, 90, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};                      // delay (seconds) for turning on loads 0,1,2,3 etc remember inc. all ssr's
const long delayOff[noOfTriacs] = {0, 0, 1800, 1800, 30, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};                // delay (seconds) for turning off loads 0,1,2,3 etc remember inc. all ssr's


enum outputGridModes {CONNECTED, DISCONNECTED};
//
boolean runTimeSelectionOfGridstatusIsEnabled = true;
enum outputGridModes outputGridMode = DISCONNECTED;
//
double sum = 0;
int count = 0;

enum outputPoolModes {OFF, ON};
//
boolean runTimeSelectionOfPoolstatusIsEnabled = true;
enum outputPoolModes outputPoolMode = ON;
//
boolean GridMQQT = false;
//
boolean DHW = true;                                       // For selection by MQQT of Domestic hot water
boolean DHWBoost = false;                                 // For selection by MQQT of Domestic hot water boost
boolean DHWPriority = false;
boolean airCon = true;                                    // For selection by MQQT of Air Con
boolean poolOn1 = true;                                   // For selection by MQQT of Pool
boolean poolOn2 = true;                                   // For selection by MQQT of Pool
boolean poolOn3 = true;                                   // For selection by MQQT of Pool
boolean delayPool1 = false;                               // for pool1 timer run out increase load
boolean delayPool2 = false;                               // for pool2 timer run out increase load
boolean utility = true;                                   // For selection by MQQT of Utility heater
boolean study = true;                                     // For selection by MQQT of study heater
boolean DHW2 = true;                                      // For selection by MQQT of domestic hot water bottom
boolean landing = true;                                   // For selection by MQQT of landing heater
boolean kitchen = true;                                   // For selection by MQQT of kitchen heater
boolean living = true;                                    // For selection by MQQT of living room
boolean office = true;                                    // For selection by MQQT of office heater
boolean extra = true;                                     // For selection by MQQT of extra heater
boolean workshop = true;                                  // For selection by MQQT of workshop heater
boolean extra2 = true;                                    // For selection by MQQT of extra2 heater
boolean resetArduino = false;

//
byte received_payload[128];
//
//
//
elapsedMillis timer0;
elapsedMillis timer1;
elapsedMillis timer2;                                                                 // start delay reset
#define interval0 150                                                                 // the interval 0 in mS for Frequency measurement time period
#define interval1 1000                                                                // the interval 1 in mS for SSR delays
#define interval2 30000                                                               // start delay for reset 30 seconds
//
int delayAirConMqqt = 0;                                                              // for resending aircon MQQT every 10 minutes (600 seconds)
//
enum triacStates {TRIAC_OFF, TRIAC_ON};                                               // the external trigger device is active high.
enum pwmStates {PWM_OFF, PWM_ON};                                                     // the external trigger device is active high.
//
int x = 0;                                                                            // to ignore first results
int y = 0;                                                                            // for PWM (used for next PWM ssr to come on)
int yold = -1;
int z = 0;                                                                            // to reduce display fequency of PWM and frequency
int v = 0;                                                                            // to reduce mqqt postings when grid on
int PWMValue = 0;
int PWMValueOld = 0;
int w = 0;                                                                            // for reset
int r = 0;                                                                            // check for increase load of 3 cycles
int s = 0;                                                                            // for MQQT sending frequency
//
enum triacStates triacState[noOfTriacs];
enum pwmStates pwmState[noOfTriacs];
//
long cycleCountTimerOn[noOfTriacs];                                                   // for timed on and off of loads
long cycleCountTimerOff[noOfTriacs];
long cycleCount = 0;
long cycleCountAtLastTransition = 0;
//
static byte countpool = 0;
static byte countgrid = 0;
float frequency;

boolean reconnect()
{
  if (!client.connected())
  {
    Serial.print("MQQT connect failed, error code = ");
    Serial.println (client.state());
    Serial.println("Attempting MQTT connection...");
  }

  if (client.connect("ArduinoFrequencyWired"))
  {
    // Once connected, publish an announcement...
    client.publish("frequency", "ArduinoFrequencyWired");
    Serial.println("MQTT Connected");
    // ... and resubscribe
    client.subscribe("frequency/+");
  }

  return client.connected();
}

void(* resetFunc) (void) = 0;                                //declare reset function at address 0


void callback(char* topic, byte* payload, unsigned int length)
{
  memcpy(received_payload, payload, length);
  Serial.println();
  Serial.print("Message arrived [");                         // handle message arrived
  Serial.print(topic);
  Serial.println("] ");

  String string;
  for (int i = 0; i < length; i++)
  {
    string += ((char)payload[i]);
  }

  int detail = string.toInt();                               // convert payload as an Integer


  if ( strcmp(topic, "frequency/reset") == 0 )
  {
    Serial.print("Resetting ");
    Serial.println(string);

    if (string == "true")
    {
      resetArduino = true;
      client.publish("frequencywired/reset", "Resetting", true);
      client.publish("Pylontech/reset", "true", true);
      delay(2000);
      resetFunc();                                                 //call reset
    }
  }

}

void setup()
{
  lcd.begin(20, 4);
  lcd.clear();

  Serial.begin(115600);
  FreqMeasure.begin();

  client.setServer(server, 1883);
  client.setCallback(callback);

  Ethernet.begin(mac, ip);
  //delay(1500);
  lastReconnectAttempt = 0;


void loop()
{
  if (!client.connected())
  {
    long now = millis();
    if (now - lastReconnectAttempt > 5000)
    {
      lastReconnectAttempt = now;
      // Attempt to reconnect
      if (reconnect())
      {
        lastReconnectAttempt = 0;
      }
    }
  }
  else
  {
    // Client connected
    client.loop();
  }

  
}

// end of loop()




void checkMQQT()
{
  if (DHWBoost == true && triacState[0] == TRIAC_OFF)
  {
    triacState[0] = TRIAC_ON;
    analogWrite(pwmPin[0], 255);
    cycleCountTimerOff[0] = 0;
    cycleCountTimerOn[0] = 0;
    Serial.print ("SSR on ");
    Serial.println (0);
    char buffer[30];
    sprintf(buffer, "frequencywired/%d", 0);
    client.publish(buffer, "ON", true);
  }

}

A full sketch could work wonders....

Does the code you posted exhibit the problem? If not, add small pieces of the original code back into it one at a time. When it breaks, you'll have a pretty good idea of what caused it.

It sounds like it could be a "millis rollover handling" problem which would happen every ~50 days. However, your code sample is incomplete.
See here anyway for how to and how not to do it. Arduino Tutorial: Avoiding the Overflow Issue When Using millis() and micros() – Norwegian Creations

Good idea, but that could take years as it only happens every couple of months!

Thank you, you could well be right if it also happens with elapsedMillis. My timer0 and timer1 both have resets in the code but timer2 doesn't ( It does now).

I'd still like to have some sort of auto reboot though.

            long now = millis();

Using a signed variable for a timing variable is not a good idea

Thanks- I'm sure that came from an example sketch.

My MQTT server is a Raspberry Pi. I use a Python program on the Pi as a message manager.

When I reset the Pi and the Broker comes back up, the 18 ESP32's issued MQTT tokens are no longer valid. I do not wish to find and reset 18 ESP32's so I use a 'MQTT watchdog' on the ESP's. One a second the Python program sends out a ping of time info on a topic called mqttOK.

void fmqttWatchDog( void * paramater )
{
  int UpdateImeTrigger = 86400; //seconds in a day
  int UpdateTimeInterval = 86300; // 1st time update in 100 counts
  int maxNonMQTTresponse = 120;
  for (;;)
  {
    vTaskDelay( 1000 );
    if ( mqttOK >= maxNonMQTTresponse )
    {
      ESP.restart();
    }
    xSemaphoreTake( sema_mqttOK, portMAX_DELAY );
    mqttOK++;
    xSemaphoreGive( sema_mqttOK );
    UpdateTimeInterval++; // trigger new time get
    if ( UpdateTimeInterval >= UpdateImeTrigger )
    {
      TimeSet = false; // sets doneTime to false to get an updated time after a days count of seconds
      UpdateTimeInterval = 0;
    }
  }
  vTaskDelete( NULL );
}

Above is my MQTT watchdog routine which runs once a second and increments a variable. When the variable reaches a count then the ESP32 is reset.

in this code snippet from a MQTT parser

void fparseMQTT( void *pvParameters )
{
  struct stu_message px_message;
  for (;;)
  {
    if ( xQueueReceive(xQ_Message, &px_message, portMAX_DELAY) == pdTRUE )
    {
      xSemaphoreTake( sema_mqttOK, portMAX_DELAY );
      mqttOK = 0;
      xSemaphoreGive( sema_mqttOK );

When the MQTT OK signal is received the parser sets the mqttOK count to 0. As long as the broker is working all is fine. When I reset the Pi, from an upgrade, and the Broker restarts, the MQTT tokens are no longer valid, the ESP32's reset after a time, reconnect, get a new token, and things move along.

Thanks, I also run my broker on a Pi but I've never noticed a problem after a reboot of it. I use wired ethernet though rather than wifi.

I had thought of getting node red to send out a message, say every minute, and then if the mega doesn't receive it after so many times it reboots. My other option was the reverse with the mega sending the message and node red then announcing through Alexa that it hasn't received it so I can manually reboot the mega. Both these though rely on something else and I'd like the mega to do it within itself. As the mega is still working at turning things on and off through its hard wired connections I dont want it to say go into a continuous reboot loop if say the Pi failed.

I'm pretty certain you were right. After putting a reset on my timer2 it hasn't happened since. Thank you.

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.