UART Parsing

@cattledog, I am going to take a stab at it and then I will post results here for comment.

@cattledog any suggestions in the most efficient way to look at 3 bytes at rolling window?

a circular buffer :slight_smile:

quickly hacked together, that will scan for a byte sequence, here the characters sequence “magic”, arriving on the Serial line (at 115200 bauds)

const uint8_t magicSequence[] = {'m', 'a', 'g', 'i', 'c'};  
const uint8_t circularBufferSize = sizeof(magicSequence) ; 

// or if you are dealing with ASCII, it's easier to type as 
// const uint8_t magicSequence[] = "magic";
// in which case do 
// const uint8_t circularBufferSize = sizeof(magicSequence) -1; // -1 to account for the trailing NULL char


uint8_t circularBuffer[circularBufferSize];

boolean gotMagicSequence(bool forceStart = false, Stream& aStream = Serial) // if called with no parameters, will use Serial and continue scanning
{
  boolean magicFound = false;
  static byte index = 0;

  if (forceStart) {
    memset(circularBuffer, 0xFF, circularBufferSize); // fill the buffer with 0xFF, use something that would not macth your sequence
    index = 0;
    return false;
  }

  if (aStream.available() > 0) {
    circularBuffer[index] = (uint8_t) aStream.read();
    magicFound = true;
    for (uint8_t i = 0; i < circularBufferSize; i++) {
      if (circularBuffer[(index + 1 + i) % circularBufferSize] != magicSequence[i]) {
        magicFound = false;
        break; // no need to continue, no match
      }
    }
    if (++index == circularBufferSize) index = 0; 
  }
  return magicFound;
}


void setup()
{
  Serial.begin(115200);
  gotMagicSequence(true); // initialize parser
}

void loop()
{
  if (gotMagicSequence()) {
    Serial.println("You've got some 'magic' dude!");
    gotMagicSequence(true); // re-initialize parser
  }
}

I don't think that is correct. You are always looking forward to see if you match the magic sequence. When a new byte arrives, you should start your search at the current index minus the length of the magic sequence [with proper modulo] so it will match as soon as the final magic byte arrives

blh64:
I don't think that is correct.

have you tried it ? just launch it with the Serial console opened at 115200 and type "hello world this is magic, yes magic !!!"... and validate, you should have 2 hits

a circular buffer is... circular, so you start where you want as long as you compare the right bytes. if they all match you are good to go.

Remember my circular buffer has the exact size needed for the magic phrase, may be that's what you missed as index+1 With modulo is then the same thing as subtracting the message size with modulo (which is more complicated as you don’t want modulo of negative numbers)

The overall problem here is I need to validate that the parser is working, as I know the chip is working correctly.

Does this same circular buffer work for byte arrays?

Here is some code which finds the AAAA12 preamble and then reads the dataPacket. It builds off the dual start marker serial reading state machine provided by @blh64

const byte packetBytes = 19;//+1 for checksum lengthID+1
byte dataPacket[packetBytes];

const byte preambleBytes = 3;
byte preamble[preambleBytes];
int recvState = 0;
boolean newPreamble = false;
boolean newPacket = false;

void setup() {
  Serial.begin(57600);
  Serial2.begin(57600);
  Serial.println("<Arduino is ready>");
}

void loop() {
  findPreamble();
  readPacket();
  showPacket();
}

enum {START1, START2, LENGTH};

/*
   states
   START1 == waiting for startMaker1 to arrive, discard everything else
   START2 == startMarker1 has arrived, next byte should be startMarker2, else reset
   LENGTH == startMarker2 has arrived, next byte will be packet length byte, want 0x12
*/

void findPreamble() {
  if (newPreamble == false)
  {
    static byte ndx = 0;
    const byte startMarker1 = 0xAA;
    const byte startMarker2 = 0xAA;
    static byte lengthID = 0x12;
    byte rc;
    if (Serial2.available() > 0)
    {
      rc = Serial2.read();
      switch (recvState) {
        case START1:
          if (rc == startMarker1) {
            preamble[ndx] = rc;
            ndx++;
            recvState = START2;
          }
          break;

        case START2:
          if (rc == startMarker2) {
            preamble[ndx] = rc;
            ndx++;
            recvState = LENGTH;
          }
          else {
            // not sequential start markers
            ndx = 0;
            recvState = START1;
          }
          break;

        case LENGTH:
          if (rc == lengthID)
          {
            preamble[ndx] = rc;
            for (byte j = 0; j < preambleBytes; j++)
            {
              Serial.print(preamble[j], HEX);
              newPreamble = true;
            }
          }
          else //did not receive AAAA12
          {
            recvState = START1;
            ndx = 0;
          }
          break;
      }
    }
  }
}

void readPacket()
{
  if (newPreamble == true)
  {
    if (Serial2.available() > 0)
    {
      Serial2.readBytes(dataPacket, packetBytes);
      newPacket = true;
    }
  }
}

void showPacket() {
  if (newPacket == true) {
    Serial.print("This just in ... ");
    for (int i = 0; i < packetBytes; i++) {
      if (dataPacket[i] < 0x10 )
        Serial.print( "0" );
      Serial.print(byte(dataPacket[i]), HEX);
    }
    Serial.println();
    newPacket = false;
    newPreamble = false;
    memset(dataPacket, 0, 19);
  }
}

@cattledog!!!!! Yes, this is working great!!!!!! Now of to test. I am sure more questions to follow.

@cattledog your approach will fail in some cases as you do not trace back in history of your state machine if you get an unwanted byte. You can’t reliably recognize a sequence of n bytes with n>2 unless you memorize at least n bytes. The issue comes as you jump back right at the start of the state machine if you don’t get what you expect but you might have been off by just 1 byte.

For example your code will fail to receive AA AA AA 12 .

Worse if you are unlucky and the payload contains the magic sequence as start data (and if there is no CRC or if you are unlucky and the CRC in the byte stream is correct by chance) you will have a false positive

My code above with the circular buffer works with any byte sequence you define, does not need to be ASCII. I used ascii in the example as this is what you type in from the console for testing. In your case you would do const uint8_t magicSequence[] = {0xAA, 0xAA, 0x12}; // expected preamble

The function call returns true once the magic sequence has been received, not consuming the next byte that is not part of the sequence. So you can use it as a replacement for findPreamble() and keep the rest of the code.

Note that this is still partially OK as you will occasionally loose two messages at boot time (or if you are truly unlucky get a false positive if CRC matches by chance) if you start listening in the middle of a payload and find the preamble as part of the data. you need to be truly unlucky for that but could happen. Having a way to control the coherence of the data (attributes ranges, etc) can help discard unwanted false positives.

If you keep a long enough history buffer (2 messages long) then after identifying a false positive you could trace back where you made the incorrect assumption about the preamble and grab the next one in the history, that will likely get you in sync. That’s more complicated to code though, and memory heavy if messages have variable lengths.

For example your code will fail to receive AA AA AA 12 .

+1 @J-M-L

Yes, when the checksum from the previous packet is AA, the AAAA12 preamble is not found and the circular buffer method would be superior.

I’m not certain if this is a real problem with the data from this sensor. If you scan the previously attached data file called correct.txt. There are 4475 cases of AAAA04 and 12 of AAAA12.

There are 19 instances of AAAAAA04, but no instances of AAAAAA12.

If the AA checksum is a random occurrence (19/4475 is about 1/235 which is reasonably close to the 1/255 random occurrence) then the combined probability of seeing AAAAAA12 is very small. Further, the consequence of missing an AAAAAA12 sequence for the one second before the next AAAA12 sequence comes along may not be important.

The function call returns true once the magic sequence has been received, not consuming the next byte that is not part of the sequence. So you can use it as a replacement for findPreamble() and keep the rest of the code.

Yes, I would try incorporate the circular buffer as @J-M-L suggests, and retain the readBytes() method for packet reading.

Your next adventure will be spent with the checksum and parsing of the actual data you want from the 18 byte payload.

Yes my point is purely theoretical - most of the time you’d be safe with such an approach esp when there is nothing safety critical or when you control when the flow of data starts (ie you send a command and expect an answer) and can reissue the command if the answer was corrupted

Also note that when N=3 (the number of bytes in the leading sequence) and the first two bytes are the same you can easily roll back at stage 1 or 2 based on what you got

if I get AA at stage 0 I move to stage 1

At stage 1 If I get AA I move to stage 2 anything else means going back to stage 0

At stage 2 If I get 12 I’m good otherwise if it’s AA then I can just stay at same stage.

So in your code that would be just one extra test

           else //did not receive AAAA12
          {
            if (rc != startMarker2) {
              recvState = START1;
              ndx = 0;
             }
          }
          break;

If you can’t afford to miss good frames then a look back / look ahead method can be useful.

Also note that when N=3 (the number of bytes in the leading sequence) and the first two bytes are the same you can easily roll back at stage 1 or 2 based on what you got

Yes. Simple comme bonjour.

When looking for the length byte you can finish/accept if the byte is what you want, reset to START1 if its not correct, or stay in the LENGTH state if you receive AA. Allows for any number of sequential AA’s.

case LENGTH:
          if (rc == lengthID)//all good
          {
            preamble[ndx] = rc;
            for (byte j = 0; j < preambleBytes; j++)
            {
              Serial.print(preamble[j], HEX);
              newPreamble = true;
            }
            Serial.print(" ");
            //reset case to Start1 here
            ndx = 0;
            recvState = START1;

          }
          else //did not receive AAAA12
                //go back to START1 if not AA, hold at LENGTH if another AA
          {
            if (rc != startMarker2) 
            {
              ndx = 0;
              recvState = START1;
            }
          }
          break;