Slow performance ESP8266 MQTT pubsub client

Hi,

I'm using the pub sub client on an ESP8266 to receive MQTT messages. When sending messages every 300ms or slower from my python application to the MQTT broker, the Arduino application receives the messages quite consistently:

[52192] Message arrived: node/47/ext/193/field/switch [52404] Message arrived: node/47/ext/193/field/switch [52697] Message arrived: node/47/ext/193/field/switch [52981] Message arrived: node/47/ext/193/field/switch [53284] Message arrived: node/47/ext/193/field/switch [53581] Message arrived: node/47/ext/193/field/switch [53882] Message arrived: node/47/ext/193/field/switch [54182] Message arrived: node/47/ext/193/field/switch [54482] Message arrived: node/47/ext/193/field/switch [54783] Message arrived: node/47/ext/193/field/switch [55083] Message arrived: node/47/ext/193/field/switch [55384] Message arrived: node/47/ext/193/field/switch [55685] Message arrived: node/47/ext/193/field/switch [55985] Message arrived: node/47/ext/193/field/switch [56285] Message arrived: node/47/ext/193/field/switch [56588] Message arrived: node/47/ext/193/field/switch [56886] Message arrived: node/47/ext/193/field/switch

However, when increasing the rate to 200ms, very inconsistent behavior is happening:

[153020] Message arrived: node/47/ext/193/field/switch [153295] Message arrived: node/47/ext/193/field/switch [153570] Message arrived: node/47/ext/193/field/switch [153571] Message arrived: node/47/ext/193/field/switch [153765] Message arrived: node/47/ext/193/field/switch [153964] Message arrived: node/47/ext/193/field/switch [154164] Message arrived: node/47/ext/193/field/switch [154394] Message arrived: node/47/ext/193/field/switch [154669] Message arrived: node/47/ext/193/field/switch [154919] Message arrived: node/47/ext/193/field/switch [155195] Message arrived: node/47/ext/193/field/switch [155196] Message arrived: node/47/ext/193/field/switch [155444] Message arrived: node/47/ext/193/field/switch [155694] Message arrived: node/47/ext/193/field/switch [155946] Message arrived: node/47/ext/193/field/switch [155967] Message arrived: node/47/ext/193/field/switch [156194] Message arrived: node/47/ext/193/field/switch [156445] Message arrived: node/47/ext/193/field/switch [156695] Message arrived: node/47/ext/193/field/switch [156944] Message arrived: node/47/ext/193/field/switch [157194] Message arrived: node/47/ext/193/field/switch [157194] Message arrived: node/47/ext/193/field/switch [157444] Message arrived: node/47/ext/193/field/switch [157694] Message arrived: node/47/ext/193/field/switch [157945] Message arrived: node/47/ext/193/field/switch [157970] Message arrived: node/47/ext/193/field/switch [158193] Message arrived: node/47/ext/193/field/switch [158444] Message arrived: node/47/ext/193/field/switch [158694] Message arrived: node/47/ext/193/field/switch [158944] Message arrived: node/47/ext/193/field/switch [159194] Message arrived: node/47/ext/193/field/switch [159195] Message arrived: node/47/ext/193/field/switch [159444] Message arrived: node/47/ext/193/field/switch [159694] Message arrived: node/47/ext/193/field/switch

It looks like the application can't keep up, because there is at least 250ms between each package (or if 2 packages are combined it much less of course). However, I checked the timings and yield() and the loop function of the pub sub client are called at least every millisecond (and sometimes faster). I also checked if my broker was the problem, but I checked this with another python program subscribing without any timing issues. The package payload is neither big: only 10 bytes.

Is this normal behavior?

Welcome to the forum.

Please read the How to get the best out of this forum

It will help you get more attention and better answers e.g., by posting your sketch (use code tags), information about your exact hardware e.g. a link to the datasheet or product page, ...

I'm not sure what I can post more about the problem. Every detail is in the topic.

I did found something about the "esp8266 TCP tick timer" which apperently is set to 250ms. There is not much documentation about it and neither cant I find any other questions related to it. But it seems I'm hitting this limit, because a faster update rate of 250ms doesn't seem to be possible.

The fault may not lay with the MCU, there is an entire chain of events that needs to happen once the publication leaves the MCU that may be beyond your control.

Is the MQTT Broker local or public on the Internet? That makes a difference, especially with public brokers that do not want to be overflowed with info, like a DDS attack.

And remember Python, using PAHO is a lot better at doing PubSubClient then the PubSubClient library for Arduino. The PubSubClient for Arduino is a lightweight when compared to PAHO.

The bus timer for the is 80Mhz for the ESP8266 and ESP32.

ESP8266 NO-OS API

Idahowalker:
The fault may not lay with the MCU, there is an entire chain of events that needs to happen once the publication leaves the MCU that may be beyond your control.

Is the MQTT Broker local or public on the Internet? That makes a difference, especially with public brokers that do not want to be overflowed with info, like a DDS attack.

And remember Python, using PAHO is a lot better at doing PubSubClient then the PubSubClient library for Arduino. The PubSubClient for Arduino is a lightweight when compared to PAHO.

The bus timer for the is 80Mhz for the ESP8266 and ESP32.

ESP8266 NO-OS API

I'm running a mosquito server locally on my network. I want to clear out that I'm talking about receiving new information on a topic on the mcu, not publishing from the mcu. While I get that TCP is a bit slower due sending acknowledge messages back, I can't really believe it's limited to max 4 messages a second. My previous application which was using UDP could easily handle 10+ messages a second without hickup. Here we can clearly see it's combining messages every 250ms if they are send faster.

Oh, receiving. One of my ESP32’s is getting 11 messages per second.

Let’s see the code, well formatted, in code tags.

Receiving is based on client.loop() and how ‘fast’ client loop runs.

My threaded task to run client loop, I’ve ran it faster then 10ms but found errors begun to pile up.

////
/*
  Important to not set vTaskDelay to less then 10. Errors begin to develop with the MQTT and network connection.
  makes the initial wifi/mqtt connection and works to keeps those connections open.
*/
void MQTTkeepalive( void *pvParameters )
{
  sema_MQTT_KeepAlive   = xSemaphoreCreateBinary();
  xSemaphoreGive( sema_MQTT_KeepAlive ); // found keep alive can mess with a publish, stop keep alive during publish
  MQTTclient.setKeepAlive( 90 ); // setting keep alive to 90 seconds makes for a very reliable connection, must be set before the 1st connection is made.
  TickType_t xLastWakeTime = xTaskGetTickCount();
  const TickType_t xFrequency = 250; //delay for ms
  for (;;)
  {
    //check for a is-connected and if the WiFi 'thinks' its connected, found checking on both is more realible than just a single check
    if ( (wifiClient.connected()) && (WiFi.status() == WL_CONNECTED) )
    {
      xSemaphoreTake( sema_MQTT_KeepAlive, portMAX_DELAY ); // whiles MQTTlient.loop() is running no other mqtt operations should be in process
      MQTTclient.loop();
      xSemaphoreGive( sema_MQTT_KeepAlive );
    }
    else {
      log_i( "MQTT keep alive found MQTT status % s WiFi status % s", String(wifiClient.connected()), String(WiFi.status()) );
      if ( !(wifiClient.connected()) || !(WiFi.status() == WL_CONNECTED) )
      {
        connectToWiFi();
      }
      connectToMQTT();
    }
    xLastWakeTime = xTaskGetTickCount();
    vTaskDelayUntil( &xLastWakeTime, xFrequency );
  }
  vTaskDelete ( NULL );
}

Maybe the "TCP timer tick" is faster on the ESP32. The only thing I can find in the documentation of the esp8266 is:

In order to lower the power comsumption, ESP8266 changes the TCP timer tick from 250ms to 3s in light-sleep mode, which leads to increased timeout for TCP timer. Therefore, the modem-sleep or deep-sleep mode should be used where there is a requirement for the accurancy of the TCP timer.

The way I see this is that normally, when not running in light sleep mode, the TCP timer tick is 250ms.

Here is the PubSubClient loop function:

boolean PubSubClient::loop() {
    if (connected()) {
        unsigned long t = millis();
        if ((t - lastInActivity > this->keepAlive*1000UL) || (t - lastOutActivity > this->keepAlive*1000UL)) {
            if (pingOutstanding) {
                this->_state = MQTT_CONNECTION_TIMEOUT;
                _client->stop();
                return false;
            } else {
                this->buffer[0] = MQTTPINGREQ;
                this->buffer[1] = 0;
                _client->write(this->buffer,2);
                lastOutActivity = t;
                lastInActivity = t;
                pingOutstanding = true;
            }
        }
        if (_client->available()) {
            uint8_t llen;
            uint16_t len = readPacket(&llen);
            uint16_t msgId = 0;
            uint8_t *payload;
            if (len > 0) {
                lastInActivity = t;
                uint8_t type = this->buffer[0]&0xF0;
                if (type == MQTTPUBLISH) {
                    if (callback) {
                        uint16_t tl = (this->buffer[llen+1]<<8)+this->buffer[llen+2]; /* topic length in bytes */
                        memmove(this->buffer+llen+2,this->buffer+llen+3,tl); /* move topic inside buffer 1 byte to front */
                        this->buffer[llen+2+tl] = 0; /* end the topic as a 'C' string with \x00 */
                        char *topic = (char*) this->buffer+llen+2;
                        // msgId only present for QOS>0
                        if ((this->buffer[0]&0x06) == MQTTQOS1) {
                            msgId = (this->buffer[llen+3+tl]<<8)+this->buffer[llen+3+tl+1];
                            payload = this->buffer+llen+3+tl+2;
                            callback(topic,payload,len-llen-3-tl-2);

                            this->buffer[0] = MQTTPUBACK;
                            this->buffer[1] = 2;
                            this->buffer[2] = (msgId >> 8);
                            this->buffer[3] = (msgId & 0xFF);
                            _client->write(this->buffer,4);
                            lastOutActivity = t;

                        } else {
                            payload = this->buffer+llen+3+tl;
                            callback(topic,payload,len-llen-3-tl);
                        }
                    }
                } else if (type == MQTTPINGREQ) {
                    this->buffer[0] = MQTTPINGRESP;
                    this->buffer[1] = 0;
                    _client->write(this->buffer,2);
                } else if (type == MQTTPINGRESP) {
                    pingOutstanding = false;
                }
            } else if (!connected()) {
                // readPacket has closed the connection
                return false;
            }
        }
        return true;
    }
    return false;
}

As you can see the client loop triggers your callback. Things like serial printing from your callback can greatly slow down your ability to receive subscriptions.

My MQTT callback

void IRAM_ATTR mqttCallback(char* topic, byte * payload, unsigned int length)
{
  memset( x_message.payload, '\0', 150 ); // clear payload char buffer
  x_message.topic = ""; //clear topic string buffer
  x_message.topic = topic; //store new topic
  int i = 0; // extract payload
  for ( i; i < length; i++)
  {
    x_message.payload[i] = ((char)payload[i]);
  }
  x_message.payload[i] = '\0';
  xQueueOverwrite( xQ_Message, (void *) &x_message );// send data to queue
} // void mqttCallback(char* topic, byte* payload, unsigned int length)

Lean and mean, everything added to the callback is another thing slowing down the systems ability to receive a new packet.

Yes, I get that. But I verified that the receive function is not delaying the application. At least every 1ms yield() and the loop function of the pub sub client is called (With outputting those values over serial, which slows it even more down). So the application definitely gives the ESP8266 time to process incoming messages and let the pub sub client process them.

Good luck.

Thanks…

What does ESPFRSSIF say about your theory?

And I'll make a guess that it's not the issue, if ESPRESSIF responds.

I don't know what ESPRESSIF say about the issue. I'm asking here to figure it out, maybe other users have similar issues. You are comparing it with the esp32 which is a false comparison. Also I'm talking about timing specific. I do receive all messages of course, but not really timed equally.

I temporary changed the application back to UDP which receives the same data on the same interval with the same handle function. No issue now again, i'm receiving the messages perfectly timed. Maybe I should switch back to UDP then ..

It might be the no_delay option in the TCP client, which is set to false. I could give that a try by setting it to true.

Actually the I/O of the ESP32 is built upon the same structure and uses many of the same parts and code as the ESP8266, the main difference being the ESP32 has more modules.

The comparison is justified.

What has I/O to do with it? The esp32 is a dual core MCU and is clocked higher.. it's totally not a fair comparison.... It's simply a different soic so no you cannot compare that.