MQTT disconnecting, how to diagnose?

Hey all.
I'm pretty new to MQTT so was wondering if I can get some help with diagnosing a connectivity issue.

My setup is as follows:

  • 2 units of Ardino Mega with Ethernet Shield 2 each
  • Home Assistant green
  • All connected via Cat5e/Cat6 through network switch and router, manually assigned DHCP
  • Arduino using pubsubclient for MQTT, mostly default settings. QOS 1.

I noticed that the unit will disconnect from the broker quite quickly after switching on and sending a few commands, and not able to reconnect despite my attempts to do so in code. It's fine again once I reset the unit. The connection seems to stay alive (quite a while longer) if I don't send any commands. The broker still seems to think the client is connected and will continue to send MQTT requests, but the Arduino is unresponsive.

TCP/IP connection seems to be fine as per the Arduino code, but I have no direct method of testing this.

I've read up about the keepalive options in the client, but from all descriptions it should resolve by reconnection after timeout even if the connection drops. This doesn't appear to be the case. I have no idea how I can detect the PINGREQ and PINGREP messages.

I have downloaded MQTT explorer which only seems to document the exchange between the client broker, but I can't seem to understand how to use it to see if the connection is still live?

Any ideas? I understand I'm not providing much, and I'm not particularly good with network issues so if anyone has guidance as to what tools/methods to use to look further into this, that would be a great start.

Much appreciated.

How are you generating the client id?
You cannot use the same id for two boards.

I have two different client IDs ("arduino1" and "arduino2"). I have also tried using only one unit powered at a time and it seems to be the same issue.

Strangely, I can't repeat this issue on my prototype board, which is identical in every way except the specific unit details (MAC, clientID, IP), and the fact that it's hooked up outside of the completed box. I've flashed all units multiple times with the same code, and it's still showing the same issue.

Perhaps read the pinned post called 'How to get the most from this forum' would be a good place to start. We need to SEE something in a specific way.