Efficient Validation of Serial Data

For a while, I have been receiving serial data the same way. Serial data is often a control mechanism for a ATmega328P that in terms of priority, 1) keeps accurate time, 2 multi-plexes an LED matrix and 3 receives serial and everything else. The timekeeping is done using ISR, but the matrix update happens every 1388 micros(). For this reason, I can’t dwell on any part of the rest of the program for too long. Serial commands have thus far been fixed width, partly for compatibility, but also to eliminate delimiter characters and reduce the overall string width. So, to do this, I count characters and compare each one against a condition. The width is usually 6-9 characters, an initializing byte, 1-3 parameter bytes, and a numerical value in ASCII. It looks like this…

void updateBCD() {
  static unsigned long serialInTimestamp;
  static byte charsRead = 0;
  static char serialBuffer[6] = {'\0'};
  if (charsRead > 0) {
    if (millis() - serialInTimestamp >= 100UL) {
      memset(serialBuffer, '\0', sizeof serialBuffer);
      charsRead = 0;
    }
  }
  if (Serial.available()) {
    int c = Serial.read();
    if (charsRead == 0) {
      if (c == 'T') {// initializer
        serialInTimestamp = millis();
        serialBuffer[charsRead] = c;
        charsRead++;
      }
      else {
        memset(serialBuffer, '\0', sizeof serialBuffer);
        charsRead = 0;
      }
    }
    else if (charsRead == 1) {
      if (c == 'D') {// parameter
        serialBuffer[charsRead] = c;
        charsRead++;
      }
      else {
        memset(serialBuffer, '\0', sizeof serialBuffer);
        charsRead = 0;
      }
    }
    else if (charsRead >= 2 && charsRead <= 5) {
      if (c >= '0' && c <= '9') {
        serialBuffer[charsRead] = c;
        charsRead++;
      }
      else {
        memset(serialBuffer, '\0', sizeof serialBuffer);
        charsRead = 0;
      }
    }
  }
  if (charsRead > 5) {
    if (serialBuffer[0] > 0) {
      byte receivedBcdValue[4];
      receivedBcdValue[0] = (serialBuffer[2] - '0');
      receivedBcdValue[1] = (serialBuffer[3] - '0');
      receivedBcdValue[2] = (serialBuffer[4] - '0');
      receivedBcdValue[3] = (serialBuffer[5] - '0');
      if (receivedBcdValue[0] != bcdValue[0] || receivedBcdValue[1] != bcdValue[1] || receivedBcdValue[2] != bcdValue[2] || receivedBcdValue[3] != bcdValue[3]) {
        bcdValue[0] = receivedBcdValue[0];
        bcdValue[1] = receivedBcdValue[1];
        bcdValue[2] = receivedBcdValue[2];
        bcdValue[3] = receivedBcdValue[3];
      }
    }
    memset(serialBuffer, '\0', sizeof serialBuffer);
    charsRead = 0;
  }
}

Each time the project changes function, or control, the command string changes, and so each project I have to code this a little different. it can be time consuming to code.

Ideally, I would like to write an object class with a constructor that will establish the fixed width and valid pattern, and have a method that is called, like this function to receive each byte of serial and validate it.

I have looked at various regular expression tutorials, but most of them seem to have the goal of returning the pattern match in a large body of text.

Can the way I am doing this be imrpoved?

1 Like

You could have your class take a pointer to a callback function that does this validation. It could be called with charNum and RcdChar as parameters and return true/false. If false, it is invalid and the class resets itself to start over.

Although, I don’t see a huge benefit since this callback function will have to always change…

FYI… you don’t need to constantly reset your serialBuffer (memcpy) since charsRead dictates where things are stored. Resetting that to zero is sufficient.

You also do an unnecessary check near the end

 if (charsRead > 5) {
    if (serialBuffer[0] > 0) {

serialBuffer[0] must always be ‘T’ in this case or you would be in that place in the code.

You can simplify this process by using SerialTransfer.h to automatically packetize, parse, and validate your serial data. SerialTransfer also allows you to customize the format of the packets (to include changing the overall length of the packet) - just what you’re looking for.

It’s fast, non-blocking, validates the data, everything you need in an easy to use library.

Installation Guide

I would almost rather make charsRead derivative of strlen to eliminate the redundancy than taking away the memset. It goes back to a multi-point to multi-point wireless system I had coded, for a factory halfway across the US… I made a mistake in the code, and essentially wrote over the NULL terminator. When this happens, any of the bytes ever written to a part of the string (because luckily I never wrote over the final NULL) would become part of the new string. I had to fly across country paying by own travel and lodging and fix the mistake. Memset, insures the buffer is erased once the data is no longer needed. I can’t find a more efficient way to do that.

SerialTransfer uses delimiters and byte stuffing, which I explained in the beginning were things I cannot afford. I haven’t written this off, because I value the source code. I would like to use some of the source code in a library for fixed-width, if I may.

Do you really need a packet every every 1388 microseconds? I'm not necessarily doubting since I'm not familiar with your project, but still.

If you use a 1M baud, you could theoretically process about 173 bytes every 1388 microseconds - plenty of processing for a modest SerialTransfer packet.

Either way, feel free to use the source code as you wish as long as it isn't for any nefarious purposes (mandatory disclaimer, lmao)

Yes, every 1388 microseconds... makes it 120Hz refresh on the entire matrix, which eliminates moire patterns when videographed at most common camera framerates. Even if it seems arbitrarily ridiculous, it's a written specification that I have to meet.
I can't use 1M baud... 9600 baud, sometimes even 1158 baud, to be compatible with some existing equipment. At 9600 baud, 1 byte is sent every 1041.66 microseconds, so we aren't trying to keep it brief to fit it between refresh. The sentence structure and format come from a legacy protocol that was in place in the systems before adding LED digital readouts.

Doing things the hard way I suppose... Good luck