[SOLVED]Non blocking serial packet receive functio returns null after some time

Hi all!

I have been using successfully for a few years (!) now a function using blocking code that receives incoming packets from hardware serial port (via RS485 driver) and I have recently tried to rewrite this function using non-blocking code shown below.
Few words about the protocol:

-Packet begins with two consecutive 0xAA bytes.
-The next byte indicates the total length of the packet excluding the first two 0xAA bytes.

  • The actual data bytes then follow
  • The packet ends with 2 CRC check bytes

The function below works fine for some time, then for some reason stops returning any received bytes.
Note that the program doesn't freeze and the rest of the functions continue to work.
So , I suspect that some situation arises in the incoming stream that has not been taken care of.

Any help appreciated

So here s my code :

bool checkIncData_SM() {
   
  static int index;
  static enum  {IDLE, AA1, LEN, DATA} state;
  static byte tempPacket[MAX_PACKET_SIZE];                    // max size defined as 78
  byte c;

  if (Serial1.available() < 1 ) { return false;  } // No data; stay in the current state
  while (Serial1.available()>0) {
  c = Serial1.read();                             // read the available data
  
  switch (state) {
  
  case IDLE:                                      // between packets; wait for first 0xAA
    if (c == 0xAA) {
      state = AA1;
      }
   
   break;                                         // Otherwise stay in IDLE.
  
  
  case AA1:                                      // Say first AA, waiting for the second
      if (c == 0xAA) {
        state = LEN;
        
   
      } else
        {state = IDLE;return false;}                // some sort of error
  
      break;
      
  
  case LEN:

    if (c == 0xAA) {break;}                           // Additional AA are ignored 
    if (c > MAX_PACKET_SIZE  || c==0)  {              // in case packet size is received out of range or 0, packet is droped 
     
    
     sprintf( replyBuffer, "Wrong Packet Rcvd Length: %d",c);
     logSDCard(replyBuffer,strlen(replyBuffer),2);
     overflowpackets++;
     state = IDLE;
     c=0;
    // break;
     return false;
    }
    else{
    tempPacket[0] = c;             // save length
    index = 1;
    state = DATA;
    break;
    }
    
  case DATA:
    tempPacket[index++] = c;        // save data from pkt
    if (index == tempPacket[0]) {   // Was that the last byte of the packet?
      state = IDLE;                        // ready to start another packet.
      index = 1;  
      if (Check_crc(tempPacket) )

  // If pkt is good, copy pkt to global IncPacket array
   // and return true to the main loop


       {for (int x=0; x<tempPacket[0]; x++) {IncPacket[x]=tempPacket[x];}       
        return true;}                             
      else {return false;}                     
       
    }
   
    break;
  } // end of switch
 
}
 return false;   // nothing to do yet, return null
}
  if (Serial1.available() < 1 ) { return false;  } // No data; stay in the current state

No support for timeout?

From what I can tell this sequence reaches a poisoned state...

  AA AA 01 ...

I believe this sequence will cause trouble and should be rejected...

  AA AA 02 ...

Could you please elaborate on the need for timeout?

The function is called in the main loop :

if (checkIncData_SM() )    {     ........   }

I believe this sequence will cause trouble and should be rejected...

I can see what you mean...

On the other hand the above scenario has never caused any problems in the blocking version of the code...

Still I will make changes to exclude it...

Packet size check line now changed to :

if (c > MAX_PACKET_SIZE || c<3)  {
 .....
  }

Watcher:
Could you please elaborate on the need for timeout?

As written, your device could receive a partial packet (AA AA 07) that is the result of line noise or a sender dropping out, followed by a delay, then a complete packet. The complete usable packet will be dumped at the CRC check. Including a timeout would allow checkIncData_SM to reset when it becomes clear the sender is not going to finish making it less likely to lose valuable data.

Reporting line noise / sender dropping out can be very handy when debugging communications.

There are applications that require detecting communications failures as part of a recovery process (e.g. shutdown the gas compressing engine if the pressure sensor fails to send fresh data so all of Coke County does not catch on fire).

Watcher:
On the other hand the above scenario has never caused any problems in the blocking version of the code...

Given the fact the two versions are different making direct comparisons is specious.

Posting the blocking version may help identify the problem.

Personal note: In the future, please reformat the code. It is difficult to evaluate as posted.

Including a timeout would allow checkIncData_SM to reset when it becomes clear the sender is not going to finish making it less likely to lose valuable data.

Now I see what you meant!

Given the variable length of packets and the random nature of packet transmission, its not easy to calculate the required max timeout though..

This is the original blocking code (auto-formatted :slight_smile: )

bool checkIncData () {

  byte inByte1 = 0;
  byte inByte2 = 0;


  if (Serial1.available() < 2) return false;   // wait till at least 2 bytes are received
  inByte1 = Serial1.read();
  inByte2 = Serial1.read();

  if  (inByte1 != 0xAA  || inByte2 != 0xAA ) return false;

  
  while (Serial1.available() == 0) { };
  inByte1 = Serial1.read();                        // Read Length of incoming packet
  

  if  (inByte1 > MAX_PACKET_SIZE || inByte1 < 0) {

    
    sprintf( replyBuffer, "Wrong Packet Rcvd Size: %d Orgigin %d", inByte1, 0);
    logSDCard(replyBuffer, strlen(replyBuffer), 2);
    overflowpackets++;
    return false;
  }

  IncPacket[0] = inByte1;                                      // Save packet length
  for (int x = 1; x < IncPacket[0]; x++) {

    while (Serial1.available() == 0) { };
    IncPacket[x] = Serial1.read();
  }

  //  ****** Do Authentication CRC check *******

  bool passed = Check_crc(IncPacket );

  return passed ;

}

Watcher:
auto-formatted :slight_smile:

Appreciated.

I will give it a gander tomorrow.

The code in Serial Input Basics is non blocking and I have never experienced a hang-up when using it.

...R

The code in Serial Input Basics is non blocking and I have never experienced a hang-up when using it.

Perhaps the OP has led to a misunderstanding .

The code above doesn't hang-up.
It just doesn't return a valid packet after working (seemingly) flawlessly for a certain amount of time ranging from a few hours to almost a day.

Watcher:
Perhaps the OP has led to a misunderstanding .

The code above doesn't hang-up.
It just doesn't return a valid packet after working (seemingly) flawlessly for a certain amount of time ranging from a few hours to almost a day.

My code should reliably produce whatever is received. If it happens to be invalid then it would be the fault of external forces - i.e. not sent correctly or corrupted during transmission.

In my experience it is a useful debugging tool to be able to view the received data before trying to parse it.

...R

My code should reliably produce whatever is received.

Are you referring to the code in example 6 ?

In my experience it is a useful debugging tool to be able to view the received data before trying to parse it.

I agree.

I do have an option to enable printout of the received packets on the serial monitor ONCE they have passed CRC validation.

The issue here however is that the problem arises before this point and it is of sporadic nature. Therefore its not practical to monitor all traffic on the (busy) bus (rs485) till the program stops working. Need to figure out some method to only print out data when the problem arises.

What happens in the rest of the sketch ? (more or less, basically do you disable interrupts anywhere ? )

What happens in the rest of the sketch ? (more or less, basically do you disable interrupts anywhere ? )

Well, the sketch is quite big and resides on a MEGA, it includes a webserver, telnet server, ftp server, and others

But no I don't use interrupts, nor disable them.
(unless of course this is done within some library I am using )

Watcher:
Are you referring to the code in example 6 ?

I did not have any particular example in mind - I was leaving it to you to identify which of them is most suitable for your situation.

...R

Watcher:
This is the original blocking code (auto-formatted :slight_smile: )

I assume the goal is the non-blocking version. Correct?

In any case, this sequence is a problem...

...
  if (Serial1.available() < 2) return false;   // wait till at least 2 bytes are received

  inByte1 = Serial1.read();
  inByte2 = Serial1.read();

  if  (inByte1 != 0xAA  || inByte2 != 0xAA ) return false;
...

Odd bytes of noise followed by AA AA result in a potentially valid packet being discarded.

The technique in the non-blocking version is much more robust (wait for AA, check for AA, discard extraneous AA).

While the blocking version is more robust in recovering from an invalid length (1 or 2 bytes) it does contain a bug. The code tries to validate a CRC that may not be present.