Serial Comms fails on Restart

I have a project that I've been working to remotely control an Amateur Radio amplifier located in the basement. Over 90% of it is working but I have a bug that I can't figure out.

I have an M5Stack Core2 that is in the shack and sends 8v Power and Serial Comms to an Arduino NANO, located at the amplifier. The comms works GREAT, most of the time.

Both the M5Stack and NANO are running a 200ms Cycle Time (I don't see any over-runs). Loop Execution time is typically 20ms for both...

In trying to simulate loss & re-connection of comms, I cycle power to the NANO, but the comms but after that does NOT work as it should!

M5Stack code to start comms with the NANO:

 NanoPort.end(true);
  M5.delay(200);
  NanoPort.begin(38400, SERIAL_8N1, NanoRxPin, NanoTxPin);  //Serial.begin() will clear the buffer
//Check the communications with the NANO, once it boots up.
  //  We use the Volts to know when the NANO has sent Power to the Amp, and the voltage has been set
  byte Count = 0;
  do {
    //After 2 seconds, change the Display to indicate we are Powering up the Nano.
    String SecondLine = String(Mode);
    SecondLine = SecondLine + " Count:" + String(Count) + "/100";
    LcdLines("Wait for NANO Com", SecondLine);  //LcdLines prints to M5Stack display.
    Count++;

    //Need to send Comms to the NANO to wake it up.
    **Send5CheckSum(ModePowerTurnedOn, 0, 0);** //We sent Mode, Band and Bypass...
    M5.delay(310);  //Can't be real short, should be about equal or greater than NANO cycle time.  NOTE that I've tried several values, but it doesn't seem to make a difference!!

    Mode = ReceiveMode();
    Serial.print(F("Startup Comms ReceiveMode = ")); Serial.println(ModeToString(Mode));
  } while ((Mode == 0) && (Count < 100));

M5Stack Comms routine:

// Sent To NANO
bool Send5CheckSum(byte Mode, int Band, bool Bypass) {
  // We send the Mode, Band and Bypass variables to the Nano.
  //   Return true = Failed.
  unsigned long Tmr = millis();
  String StringToSend = "";

  //MODE, 2 digits ONLY. (Length=2)
  // To get the "+" String concatonate to work, you need to start out with the []= StringToSend + Mode;]
  //NANO NOTE: NANO does not support Modes > ModeOverTemp (12).  NANO just forces itself to ModeReceive when greater than 12.
  if (Mode < 10) StringToSend = StringToSend + "0" + Mode;
  else StringToSend = StringToSend + Mode;
  //Serial.print(F("Sending Mode = ")); Serial.println(ModeToString(Mode));

  //Band, 2 digits only  (Length=4)
  if (Band < 10) StringToSend = StringToSend + "0" + Band;
  else StringToSend = StringToSend + Band;

  //Bypass, 1 Digit Only, 0 or 1, 1 Digit.  (Length=5)
  StringToSend = StringToSend + Bypass;

  //Sum the Mode + Band + Byp
  int Sum = Mode + Band + int(Bypass);
  // 2 digits:
  if (Sum < 10) StringToSend = StringToSend + "0" + Sum;
  else StringToSend = StringToSend + Sum;

  //Add a termination Carriage Return. (Adds 1 more char.)
  StringToSend = StringToSend + "\n";

  if (StringToSend.length() != 8) {  //I'm NOT Seeing this error!!!
    Serial.print("StringLength does NOT equal 8 (with CR & CheckSum):   "); Serial.print(StringToSend);
    Serial.print("  String.Length: ");  Serial.print(StringToSend.length());
    Serial.print("  Mode(2): ");    Serial.print(Mode);
    Serial.print("  Band(2): ");    Serial.print(Band);
    Serial.print("  Bypass(1): ");    Serial.print(Bypass);
    Serial.print("  Sum(2): ");    Serial.println(Sum);
    return true;  //Programming Error!
  }
***//NOTE: If I execute the following LINES,  it works!!!!!***
//  unsigned long Tmr = millis();
//   //Serial.print(F("WHY DOES IT WORK WHEN I PRINT THIS???"));
//   Serial.print(F("String:  '"));  Serial.print(StringToSend);
//   Serial.print("'  Mode: ");  Serial.print(ModeToString(Mode));
//   Serial.print("  Band: ");  Serial.print(Band);
//   Serial.print("  Bypass: ");  Serial.print(Bypass);
//   Serial.print("  Sum: ");  Serial.print(Sum);
//   Serial.println();
//   Serial.print(F(" Time = ")); Serial.println(millis() - Tmr);


  //Send the string through the Comm Port.
  for (unsigned int i = 0; i < StringToSend.length(); i++) {
    NanoPort.write(StringToSend[i]);  // Push each char 1 by 1 on each loop pass
    Serial.print(StringToSend[i]);
  }
  return 0;
}

The NANO Receive code is:

bool Receive5CheckSum(byte &Mode, int &Band, bool &Act1_Byp0) {
  //returns true for failed comms, false for successful.
  Serial.println("ST....");
  String RcvString = "";
  char inChar;
  int Sum = 0;

  while (M5Port.available()) {
    inChar = (char)M5Port.read();  // get the new byte:
    //delay(1);
    if (inChar == '\n') {  // ignore carriage return, we don't add it to the string
      Serial.println(F("CR"));
      break;               //Break out of the while loop
    }
    RcvString += inChar;  // add it to the inputString:
    Serial.print(RcvString);
  }

  if (RcvString.length() != 7) {
    if (RcvString.length() > 0) {
      Serial.print(F("From M5, RcvString Length NOT 7: ")); Serial.print(RcvString); Serial.print(F("  RcvString Length: ")); Serial.println(RcvString.length());
    }
    Serial.print(F(" Rcved Short: Time = ")); Serial.println(millis() - Tmrr);
    return true;
  }

 Serial.print(RcvString); Serial.print(F("  RcvString Length: ")); Serial.println(RcvString.length());

  //Mode, 2 Digits
  String tmp = RcvString.substring(0, 2);
  Mode = tmp.toInt();

  //Band, 2 Digits
  tmp = RcvString.substring(2, 4);
  Band = tmp.toInt();

  //Act1_Byp0, 1 Digit
  tmp = RcvString.substring(4, 5);
  Act1_Byp0 = tmp.toInt();

  //CheckSum 2 Digits
  tmp = RcvString.substring(5, 7);
  Sum = tmp.toInt();

  //Make sure the Sum is correct
  if ((Mode + Band + Act1_Byp0) == Sum) {
    //Serial.print(F("Sum OK Mode=")); Serial.print(Mode); Serial.print(F(" Band=")); Serial.print(Band); Serial.print(F(" Act1_Byp0=")); Serial.print(Act1_Byp0); Serial.print(F(" Sum=")); Serial.print(Sum); Serial.print(F("  RcvString = ")); Serial.println(RcvString);
    Serial.print(F(" FAILED: Time = ")); Serial.println(millis() - Tmrr);
    return false;
  } else {
    Serial.print(F("Sum Error Mode="));
    Serial.print(Mode);
    Serial.print(F(" Band="));
    Serial.print(Band);
    Serial.print(F(" Act1_Byp0="));
    Serial.print(Act1_Byp0);
    Serial.print(F(" Sum="));
    Serial.println(Sum);

    return true;
  }
}

The Receive String Length should be 7!!!
What I'm getting is the first read shows 4 Characters and the next read shows 3 characters!.

My Debug Output shows: (2 cycles shown)

ST....
19:40:59.078 -> 0000000001CR
19:40:59.078 -> From M5, RcvString Length NOT 7: 0001 RcvString Length: 4
19:40:59.117 -> Rcved Short: Time = 7
19:40:59.117 -> ST....
19:40:59.117 -> 001010From M5, RcvString Length NOT 7: 010 RcvString Length: 3 //NO CR to break out of the loop!!!
19:40:59.117 -> Rcved Short: Time = 9
19:40:59.117 -> Return2 was true, Nothing read...
19:40:59.117 -> ExecuteTime is: 20
19:40:59.278 -> ST....
19:40:59.278 -> 0000000001CR
19:40:59.278 -> From M5, RcvString Length NOT 7: 0001 RcvString Length: 4
19:40:59.313 -> Rcved Short: Time = 7
19:40:59.313 -> ST....
19:40:59.313 -> 001010From M5, RcvString Length NOT 7: 010 RcvString Length: 3

Like I mentioned, these routines work correctly. I use them when I start and run the system (M5Stack turns on a RELAY which sends Power to the NANO, and uses these exact same routines to establish Comms. It's ONLY when I disconnect the cable to get the Error, and re-connect (Power is cycled to the NANO) and it fails as you see above.

I'm stumped!
Suggestions?

Sir Michael

What error detection and recovery is built into your communications protocol?

I guess I don’t understand the question. As you can see, I do have a checksum as the last digits and ending the comm with a CR.

I’m using PostNeoSWSerial on the NANO and a hardware comm port on the M5Stack.

The receive routine, after the reboot, gets 3 characters and exits the loop, then gets the last 4 characters, finds the CR and exits the loop correctly.

Sir Michael

And what does the code do if the check sum is wrong? What if there is no message for a certain time. Does the receiver send back an acknowledgement message? Does the sender wait for and verify that an acknowledgments message was received?
When starting the devices, which sends first and who waits for that first message?

All these things must be taken care of before any communications system can automatically recover from errors and power interruptions.

The comms is updated every 200ms. If there is a checksum error, or if the message is too short/long, the data is ignored and just waits for the next message. There is no direct handshaking (no ack).

If the comms goes away (loss of 50 comm packets in a row) the M5Stack (same for NANO) go into ModeError. This is where I'm testing and running into the problem. (I simulate the error by disconnecting the cable). I have a button on the M5 that when pressed, disconnects the Power to the NANO and tries to re-establish comms.

On initialization, the M5Stack turns on power to the NANO and the M5Stack sends the Mode "ModePowerTurnedOn" and waits for the NANO to return anything other than ModeOff (a '0'). Once the NANO gets the comms Packet and returns "ModePoewrTurnedOn" the M5 continues on and the system works.

I suspect the problem is on the M5Stack side, in that if I UN-comment some "Serial.print..." lines, the comms recovers correctly.

I do keep track of the Error rates:
NANO:
ComCount:4000 ErrCount:163 Percent:4.08 ReXmtCount = 7 ReXmtCount % = 0.18
M5Stack:
ComCount:4000 ErrCount:19 Percent:0.47 ReXmtCount = 12 ReXmtCount % = 0.30

The "ReXmtCount" is where I always do a second read of the buffer to make sure that one or the other didn't get out of sync, to get the latest data from the other side...

status = Receive20CheckSum(state.mode, nanoBand, nanoBypass, state.volts, state.ampTemp, state.fanOutput, state.fwdPower, state.refPower);
 
  //If there was data from the first read, try again, keep the buffer from the NANO flushed and data up to date.
  status2 = Receive20CheckSum(state.mode, nanoBand, nanoBypass, state.volts, state.ampTemp, state.fanOutput, state.fwdPower, state.refPower);
  
  if (status2 == false) { //Successful read
    //Serial.println(F("######### Got Data from 2nd Receive20CheckSum"));  
    ReXmtCount++; 
    }

This is NOT used when I'm trying to establish comms on startup, or error startup.

Incidentally, I did change from #include <PostNeoSWSerial.h> back to #include <SoftwareSerial.h> and it didn't fix the problem, just to try it. PostNeoSwSerial.h is a drop-in replacement for SoftwareSerial.h.

Sir Michael

swSerial (of any kind !!) can be a solution when you lack a hardware Serial port, but if it doesn't behave as you want it to, it is just best to ditch it. Since you are using hwSerial only for debug, you could consider swapping these over and adding an extra USB to TTL converter to get the debug info to the Serial port. An even better option would be to swap the nano over for a Pro-micro, which has UART1 exposed on pins 0 & 1 and UART0 is used as a (native) USB port which you could then use for debug.

Or you could try and solve your software serial related issue, but again; If swSerial is a solution, it is, but if it causes issues, then it just isn't a proper solution.

It takes around 1.5 seconds for the Nano to start up and for the serial ports to become active, perhaps a little longer for a SoftwareSerial.port to get initialised. Despite the comment, there seems to be no 2 second delay here?

Then again, no real idea what:

does and it does look like the routine can go up to 100 times around the loop....

I did wonder, whether as part of the comms protocol, could the Nano not send something to the M5Stack (or even signal via a GPIO pin) to indicate that it is ready? The M5Stack could wait to receive some pre-defined sequence (or GPIO state) before attempting to send its data?

Well, I think that I found a solution...

I didn't open the Comm Port on the M5Stack until I actually started the connection. I would 'end' the comm port on shutdown. And then when Comms to the NANO was lost, I closed and re-opened the comm port to clear the buffer. That seems to have been the problem!!! (It's the SAME open() command.)

I found that if I opened the comm port in the setup() routine and NOT close it, it will pickup the comms after the Error restart as it should.

I was concerned about keeping the comm port open, even when the amp is off, but I realized that I have the M5Stack programmed to go into a 'deep sleep' after it's been off for 60 seconds. When you wake it, it does through the setup() again to re-open the port...

I'll have to test it some more to make sure there are no other 'gotchas'. I tried BitSeeker's suggestion to add a delay after the 'open()', it didn't work. Incidentally, the

Mode = ReceiveMode();

is just a shortened version of the receive message to return ONLY the Mode from the NANO, to see if it received the Mode that I sent.

Anyway, thanks for the bandwidth.

By the way Deva, I had considered using your suggestion on the NANO to use the Hardware Comm Port and try that, but looks like I don't need to. The problem was in the M5Stack and the error-rates that I'm seeing with the PostNeoSWSerial driver are very acceptable for me.

Sir Michael