Simple Sketch randomly crashes after many hours

I recently put together system to detect the LED flashes (pulses) on my smart meter using an LM393 light detector board and a Seeed Studio Xiao ESP32-C6. I wrote my own Sketch (below). It's fairly straightforward with the meat being in the loop() routine.
It measures the milliseconds between each pulse - The LED pulses ON for about 17ms and this in turn, through the LM393 makes the GPIO Input go LOW. The pulse rate is 3200/kWh.
It increments the meter each pulse and holds the pulse duration so that it can be reported over MQTT every 5s. Note also that I turn the inbuilt LED on with each flash.

It works fine and I get the data into Home Assistant OK.

The problem I'm having is that the Sketch appears to crash at a random time after boot; typically many hours or even a couple of days. When it crashes there is no more reporting and the inbuilt LED has always been ON. A power off/on cycle get the Sketch working again.

I can't work out the problem so hoping someone here can spot something.

Thanks

PS. I originally tried to use an ISR to capture the pulses but found it very unstable hence reverted to try and catch it simply in a permanent loop. Also, I know there is an ESPHome method for doing this but I don't get on with ESPHome so haven't tried it.

Here's the full Sketch...

#include <PubSubClient.h>
#include <WiFi.h>
#define PULSE 18 // GPIO18 = D10 Running on a Seeed Studio Xiao ESP32-C6

const char *ssid = "my SSID";          // Change this to your WiFi SSID
const char *password = "my PWD";  // Change this to your WiFi password
String payload; 
char output[110];
char* topic = "tele/pulsemeter";
bool sense;
bool pulseWas = HIGH;
float power;
float meter = 0; // Volatile is needed for  variables used in the ISR
unsigned long millisReport = 0;
unsigned long millisLast;
unsigned long millisNow;
unsigned long pulseLen;

WiFiClient wifiClient;
PubSubClient mqttClient(wifiClient);

void meausureAndPublish() {
  // Calculate Power sensor value. Meter reading is already valid.
  power = 1125/float(pulseLen); // W Based on the latest 2 pulses.
  if ( power < 0.0 || power > 20.0) return; // To avoid sending ridiculous values
  payload = "{\"Meter\": ";
  payload += meter;
  payload += ",\"Power\": ";
  payload += power;
  payload +="}";

  // Report power and energy values
  Serial.print(topic);
  Serial.print(" ");
  Serial.println(payload);
  if (!mqttClient.connected()) {
    reconnect();
  }
  for (int i = 0; i <= payload.length(); i++) output[i] = payload[i];
  mqttClient.publish(topic, output);
  mqttClient.loop();      
}

void setup() {
  pinMode(PULSE, INPUT_PULLUP);
  pinMode(LED_BUILTIN, OUTPUT);
  Serial.begin(115200);

  // We start by connecting to a WiFi network
  Serial.println();
  Serial.println("******************************************************");
  Serial.print("Connecting to ");
  Serial.println(ssid);

  WiFi.begin(ssid, password);

  while (WiFi.status() != WL_CONNECTED) {
    delay(500);
    Serial.print(".");
  }

  Serial.println("");
  Serial.println("WiFi connected");
  Serial.println("IP address: ");
  Serial.println(WiFi.localIP());

  mqttClient.setServer("my HA IP", 1883); // Identify MQTT broker
}

void loop() {
  /* The meter Pulses 3200 times per kWh. The pulses (Lows) are about 17ms long.
     I cannot see any evidence of bouncing on my cheap scope!
     This first if block reacts to when the led flashes on and off.
     If pulseWas == HIGH, then the LED was off.
  */
  if(pulseWas){  // HIGH or True (ie LED was off), check if gone LOW (ie LED on)
    if(digitalRead(PULSE) == LOW){  // LED is on
      delayMicroseconds(50); // Debounce
      if(digitalRead(PULSE) == LOW){  // Debounce OK. LED still on and should have settled. We have a pulse
        pulseWas = LOW;
        digitalWrite(LED_BUILTIN, LOW); // Turn led on to follow the flashing
        meter += 0.0003125; // Increment Meter - 3200 flashes per kWh
        millisNow = millis();  // Local - Note millis() will not increase at all in ISR
        pulseLen = millisNow - millisLast; // Pulse length in milliseconds
        millisLast = millisNow;  // Reset millisLast
      }
    }
  }
  else if(digitalRead(PULSE) == HIGH){  // Wait for led to turn off
    delayMicroseconds(50); // Debounce
    if(digitalRead(PULSE) == HIGH){  // Debounce OK. Pulse over
      pulseWas = HIGH;
      digitalWrite(LED_BUILTIN, HIGH); // Turn built in led off to follow the flashing
    }
  }
  
  // Call the function to report power and energy every 5 seconds
  if((millis() - millisReport) >=5000){
    millisReport = millis();
    if(pulseLen >=0 ) meausureAndPublish(); // Don't report in the event of a millis overflow
  }
  if(millis() < 4000) millisReport = 0; // Ensures smooth operation when millis overflower every ~50 days
}

void reconnect() {
  // Loop until we're reconnected
    // Attempt to connect
  while (!mqttClient.connected()) {
    Serial.print("Attempting MQTT connection...");
    if (mqttClient.connect("mqttClient", "my MQTT name", "my MQTT PWD")) {
      Serial.println("connected");
      mqttClient.setBufferSize(256); 
    } else {
      Serial.print("failed, rc=");
      Serial.print(mqttClient.state());
      Serial.println(" try again in 5 seconds");
      delay(5000); // Wait 5 seconds before retrying
    }
  }
}

Could you keep the Xiao ESP32-C6 connected to the Serial monitor until it crashes and report what you see in the Serial monitor? It would be interesting to see if you get disconnected and then stuck in a reconnect loop for example as you don't attempt to join the WiFi network again in the reconnect() function.

I usually add callbacks to my ESP32 programs to handle the WiFi events

I have a global boolean to keep track of my WiFi status

bool wifiOK = false;

and define callback functions

void wifiStationConnected(WiFiEvent_t event, WiFiEventInfo_t info) {
  Serial.println("...Network connection established");
}

void wifiStationGotIP(WiFiEvent_t event, WiFiEventInfo_t info) {
  Serial.print("Got IP address = ");
  Serial.println(WiFi.localIP());
  wifiOK = true;
}

void wifiStationDisconnected(WiFiEvent_t event, WiFiEventInfo_t info) {
  wifiOK = false;
  Serial.print("Disconnected from network. (reason: ");
  Serial.println(info.wifi_sta_disconnected.reason);
  Serial.println(")\nAttempting to reconnect.");
  WiFi.begin(ssid, wifiPassword);
}

and I register them in setup()

void setup() {
  Serial.begin(115200);
  Serial.println("Connecting to WiFi network");
  WiFi.onEvent(wifiStationConnected, WiFiEvent_t::ARDUINO_EVENT_WIFI_STA_CONNECTED);
  WiFi.onEvent(wifiStationGotIP, WiFiEvent_t::ARDUINO_EVENT_WIFI_STA_GOT_IP);
  WiFi.onEvent(wifiStationDisconnected, WiFiEvent_t::ARDUINO_EVENT_WIFI_STA_DISCONNECTED);
  WiFi.begin(ssid, wifiPassword);
}

When attempting to connect to MQTT, you would first check if wifiOK is true otherwise there is no chance the attempt will work.

What does these lines do?

that's the long version for

  mqttClient.publish(topic, payload.c_str());

for those who don't understand cStrings.

:slight_smile:


also it was my understanding that it would be a good idea to keep mqttClient.loop(); in the loop() directly rather than calling it only from time to time.

I notice you have mixed float and int math, just for giggles, change the int constants to float simply by adding .0 to the end.

:shushing_face:

There is one capital S String. I am led to believe these are not a problem for the ESP32.

Nevertheless, eliminating it would be very easy and remove any possibility that it is an issue.

@Wingnut can use sprintf() and dtostrf() to good effect here.

a7

Just saying that there is no need for both the cString and the String. One is enough

When its mixed math as you call it, the compiler implicitly promotes the int to a float so it’s not really needed.

Really, I have always thought it didn't, and in fact, I recall one or two discussions about bugs caused by mixed math. That must be newish.

No it has always been the case if you have a math operation between type A and B the smaller type is promoted into the larger type before the operation is conducted.

The bugs typically comme from doing int maths and expecting a float result because the destination is a float

That is

float x = 1 / 3; // int maths  ➜   0.0 not 0.3333333

Or having an overflow (esp on small Arduino where an int fits on 2 bytes) in an intermediary operation even if one element does have the decimal dot

For example on UNO

float f = (600 * 1000) / 5.5; // error 600 x 1000 overflows

In that latter example writing 600.0 would save the day as the 1000 would also be promoted to a float and then everything is done using floating point arithmetic.

IDK. 600000 / 5.5 is 109090.91.

float f = (600 * 1000) / 5.5;

is 1850.18.

I was leery of the idea that 1000 is promoted. It appears not to be; the 600 * 1000 overflows and the result is incorrect.

void setup() {
  Serial.begin(115200);
  Serial.println("\nWait a miunute...\n");

  float f = 600000.0 / 5.5;

  Serial.println(f);

  float g = (600 * 1000) / 5.5;

  Serial.println(g);
}

void loop() {
}

and the output

Wait a miunute...

109090.91
1850.18

Bottom line: trust but verify twice.

a7

power is declared float. Will that work? I am so confused now. But the good news is I almost never run into it in my code as I always use a cast if I am in doubt, then I test that line by itself.

Just for the record, what board did you use?

Haha, the wrong board. THX for asking.

On the right board the result is accurate all case. But this does not mean the 1000 was made a float; the calculation is integer 600000 divided by float 5.5. That is why it is kicked into the higher gear.

And this comment

// error 600 x 1000 overflows

is not true for the ESP32.

We can make it by writing

  short yy = 600;
  short tt = 1000;
  short rr = yy * tt;

  Serial.println(rr);

a7

well that's what I said

and


Let me restate

ON A UNO (2 bytes int)

  • if you do (600 * 1000) both 600 and 1000 are seen as int (they fit 2 bytes) so the multiplication is conducted as a 2 byte int and overflows.
  • if you do (600.0 * 1000) then 600 is a float and 1000 in an int and the C++ rule states that as float is a larger type, 1000 will be implicitly promoted into that type and thus the expression is a float worth ~600, 000.0 (at float approximation)

makes sense ?

void setup() {
  int a = 600; 
  float f = 600;
  int b = 1000;
  float r1 = a * b;
  float r2 = f * b;

  Serial.begin(115200);
  Serial.println(r1);
  Serial.println(r2);
}

void loop() {}

10176.00
600000.00

Is any previously used buffer space implicitly deleted in reconnect() ?

   if (mqttClient.connect("mqttClient", "my MQTT name", "my MQTT PWD")) {
      Serial.println("connected");
      mqttClient.setBufferSize(256); 
   }

So that's a division between an int and a float (since pulseLen is cast into a float). Since the float is the larger type, 1125 will be implicitly cast to float and the operation will deliver the expected result.

Without the cast, since pulseLen is an unsigned long, the rule would be that 1125 will be implicitly cast to unsigned long and the math will be conducted in integral math (ie truncating the result) and then saving the truncated result into power ➜ we would have lost the decimals.

yes setBufferSize does the right thing (malloc if there was no buffer, realloc otherwise)

Yes, everything except the part where 1000 was made into a float before, presumably, being added to 600.

Nevermind. Sun blind. :expressionless:

a7