Need veterans opinions on Serial parsing code

Hi all, I've written a code for parsing serial data from the input and I'm pretty happy with it so far. However, this is going to be part of a bigger project which other people are going to be using within my university down the line and I was wondering if anyone would be able to point out some rookie errors in my code, or maybe a better alternative.

Essentially, the Arduino will be parsing commands from the Serial port and so i've written a quick Arduino code that will store any info typed in the Serial Monitor to a string. When this function is being used in the bigger project I will use masterCommand to tell the Arduino what I want it to do.

Here is the code:

#define CR 13
#define LF 10

//SERIAL_RX_BUFFER_SIZE has a default value of 64
char masterCommand[SERIAL_RX_BUFFER_SIZE];
bool commandAvailable = false;

void setup() {
  Serial.begin(57600);
  Serial.print(F("Buffer in max: "));
  Serial.println(SERIAL_RX_BUFFER_SIZE);

  delay(100);
}

void loop() {

  readSerial(masterCommand);

  if(commandAvailable)
  {
    Serial.println(F("Recieved command:"));
    Serial.println(masterCommand);
    Serial.print(F("Character count: "));
    Serial.println(strlen(masterCommand));
    Serial.println();
    
    //Need to do this to ensure s is empty for next message
    strncpy(masterCommand, "", SERIAL_RX_BUFFER_SIZE);

    commandAvailable = false;
  }
  
}

//Store any available data from Serial Input into *s
void readSerial(char *s) {

  //If nothing is available then return (i.e. do nothing)
  if (!Serial.available())
    return;

  static uint8_t i = 0;


  //While loop to continually read available data in serial buffer and store it into *s
  //If 'i' exceeds the maximum length of the input buffer, reset 'i' to 0 and clear any
  //previously recieved data from *s. 
  //If last two chars in *s are the termination characters, [\r \n], then break out of loop
  while (Serial.available() > 0) {

    s[i] = Serial.read();

    //using '%' ensures 'i' will never exceed length of *s
    i = (i+1)%(SERIAL_RX_BUFFER_SIZE);

    //if i returns to 0, clear *s as there has been a data overflow
    if(i == 0)
      strncpy(s, "", SERIAL_RX_BUFFER_SIZE);

    //Safety, i don't want next part of loop to check s[i-2] and s[i-1]
    //if i is 1 or 2 as it will be checking random memory location?
    if(i<2)
      continue;

    //If terminating char is at the end of string then break out of loop
    if (s[i - 2] == CR&&s[i - 1] == LF)
      break;
  }

  //if we're out of the loop but the last two characters aren't [\r \n] then 
  //exit function. This ensures the incoming message is only parsed once
  //the termination characters have been recieved
  if (s[i - 2] != CR&&s[i - 1] != LF)
    return;

  //remove final two chars from *s, which in this case must be the CR and LF
  strncpy(&s[strlen(s) - 2], "", 2); 

  i = 0;//Reset 'i' for next input string

  commandAvailable = true;
  
  return;

}

I'm aware that at the moment I have hardcoded my termination characters to be \r and \n, but I'm planning to always use these two as the termination characters so I'm quite happy with this.

Can anyone see any flaws with my code that could lead to problems down the line?

P.S. To check the code, make sure the line termination option in the Serial Monitor is 'Both NL & CR'

some random comments:

  • you don't maintain a trailing null char, that's a bad thing when playing with cStrings functions like strlen()
  • is there a requirement for your buffer to have the same size as the Serial buffer?
  • Received not Recieved
  • if you received two commands in a row rapidly, then \r\n might actually be within the incoming Serial buffer and you miss the first command as you check only when while (Serial.available() > 0) { is no longer true

This is not very optimal

just do masterCommand[0] = '\0';

I would suggest to study Serial Input Basics to handle this

If you pass in the buffer to read the command into, you should pass in its length too. Either use a static buffer and fixed size, or allow a dynamic buffer with dynamic size - not a mix of the two.

That's completely surprizing behaviour, wrapping the buffer. The best you can do if the buffer overflows is throw away the excess characters, and perhaps return an error code. Silently truncating the beginning of the command could turn one command into a completely different one, which sounds dangerous to me.

if you actually double check what's done after, OP detects the overflow and empty the command... so a good plain old if () {...} else {...} to test if we are are the end of the buffer would be more efficient than running a modulo all the time...

Some notes:

  1. You already know its weakest point. If there is only a CR or LF or neither, then it might not work.

  2. Are you using the SERIAL_RX_BUFFER_SIZE from the Serial library ? I suggest to have your own value.

  3. You don't need CR and LF, they are '\r' and '\n'.

  4. You don't need a global variable 'commandAvailable', if the function readSerial() would return a bool value.

  5. The buffer is called masterCommand, can you give it a better name ? What about commandBuffer, or inputBuffer or buffer.

  6. In my opinion, you are overthinking the code. Follow the KISS rule. If you don't need the % operator, then don't use it.

i++;
if( i == SERIAL_RX_BUFFER_SIZE)
{
  i = SERIAL_RX_BUFFER_SIZE - 1;  // set index to last character
}
  1. As soon as CR or LF is received, you know that the command has finished. You don't even have to store them, so you don't have to remove them later on.

  2. Is it always safe ? Does it always return a zero-terminated string ? Because I have mixed feelings with this: strncpy(&s[strlen(s) - 2], "", 2);

thankyou everyone, i'll respond as best as I can, but there are a couple of things I am confused by:

Ah yes, i think I see what you mean, could that simply be fixed by ensuring i index for the command string counts up to SERIAL_RX_BUFFER_SIZE - 2 as apposed to SERIAL_RX_BUFFER_SIZE -1?

not really, I thought it would be a good idea to use SERIAL_RX_BUFFER_SIZE as that ensured that I wasn't declearing a string with more characterts than the Serial port could ever store before being read.

my understanding of SERIAL_RX_BUFFER_SIZE is that the value, in this case, 64, is the maximum number of characters in the serial buffer and if the Arduino cannot have more than 64 waiting before they are read. i.e., I'm under the impression that Serial.available() can reach a maximum value of SERIAL_RX_BUFFER_SIZE, am I wrong there?

aside from that, the size of masterCommand is arbitrary.

From my understanding of the code, the final if() statement in the while()loop will ensure that no more characters are read once the first termination characters have been received. So, in the case of my code, I was under the impression that if two commands were received rapidly then the second command would be waiting in the serial buffer and will only be read after the initial command has finished executing

won't that just set the first character to '\0? I thought strncpy(s, "", SERIAL_RX_BUFFER_SIZE); would set every character (or at least SERIAL_RX_BUFFER_SIZE characters) to '\0'.

Yes, i agree this is a better approach, that way (as @Koepel has highlighted), I would be able to avoid the awkward situation of the Arduino never receiving the termination characters and waiting forever

I think in my case it would be as the code can only ever reach this point if the final two characters in *s are '\r' and '\n'. If if was an empty string then I wouldn't be comfortable using it either!

You are right, I missed that you were doing the if (s[i - 2] == CR && s[i - 1] == LF) test twice


Note that when you overflow, you do empty the command but start accumulating the next data that comes in, which might trigger something bad.

if that can inspire you, this is my version of this need:

const size_t maxCommandSize = 100;  // whatever makes sense for your application
char commandBuffer[maxCommandSize + 1]; // +1 for cString trailing null char

// empty the Serial buffer until nothing comes in for 'minTimeout' ms (and don't block more than 'maxWaitTime') 
void cleanupSerial(const uint32_t minTimeout, const uint32_t maxWaitTime = 2000) {
  uint32_t startTime = millis();
  uint32_t lastByteSeen = startTime;

  while (millis() - lastByteSeen < minTimeout)
    if (millis() - startTime >= maxWaitTime) break;
    else while (Serial.read() != -1) lastByteSeen = millis();
}

bool isCommandReady() {
  bool commandComplete = false;
  static size_t commandIndex = 0;
  int r = Serial.read(); 
  if (r != -1) {        // -1 if nothing to read
    if (r != '\r') {    // ignore CR
      if (r != '\n') {
        if (commandIndex < maxCommandSize) {
          commandBuffer[commandIndex++] = r;
        } else {
          cleanupSerial(20, 1000); // max 20ms between 2 consecutive bytes, waiting at max 1s
          strlcpy(commandBuffer, "error: buffer overlfow", sizeof commandBuffer);
          commandIndex = 0;
          commandComplete = true;
        }
      } else {
        commandBuffer[commandIndex] = '\0'; // maintain a cString
        commandIndex = 0;
        commandComplete = true;
      }
    }
  }
  return commandComplete;
}

void setup() {
  Serial.begin(115200);
}

void loop() {
  if (isCommandReady()) {
    Serial.print(F("Got: "));
    Serial.println(commandBuffer);
  }
}

upon buffer overflow I replace the command by an error message and try to get rid of what's left in the Serial buffer (or what's coming for some time) by calling my function cleanupSerial().

1 Like

The cleanupSerial() function is a cool way of clearing the serial input! and the idea of copying an error code to the string if the buffer overflows is also a good idea.

Thank you for the example code, the declaration at the beginning for the commandBuffer (char commandBuffer[maxCommandSize + 1];) looks like good practice for the future to ensure there is always space for the null character, so I'll keep note of that too.

glad if it can be useful!

First of all, I would like to appreciate your efforts to design a sketch that fantastically parses/prints a string once both CR and NL ('\r' and '\n') are detected; where,'\n' arrives as a last control charcater as a part of design of the Serial Monitor. I have tested your sketch and it works fine.

I have throughly studied your sketch line-by-line to understand the logic you have applied; however, I have not understood all. For example:

Once CR and LF have been detected, the while() loop will be exited and then the task would be of printing the array after inserting the null-byte at the appropriate place which you have done well.

However, before doing the above printing task, you have included the following lines in your sketch whose rationality I have not fully understood though they are neede for your sketch to work.

if (s[i - 2] != CR && s[i - 1] != LF)
    return;

Just for reference, I am giving below a short/simplified sketch that accepts/prints a string from the Serial Monitor once both CR ('\r') and LF ('\n') are detected.

char masterCommand[64];
int i = 0;
bool flag = false;

void setup()
{
  Serial.begin(57600);  //Bd must be as high as possible
}

void loop()
{
  while (Serial.available() > 0)
  {
    masterCommand[i] = Serial.read();
    if ((masterCommand[i] == '\n')) //Newline option at the Line ending tab of Serial Monitor
    {
      flag = true;
      break;
    }
    i++;
  }
  if (flag == true)
  {
    if (masterCommand[i - 1] == '\r')//CR = 0x0D
    {
      masterCommand[i - 1] = '\0'; //null-byte at position of '\r'
      Serial.println(masterCommand);
      memset(masterCommand, 0x00, 64);
      flag = false;
      i = 0;
    }
    else
    {
      memset(masterCommand, 0x00, 64);
      flag = false;
      i = 0;
    }
  }
}

Hi @GolamMostafa, thank you for the code! your method definitely seems more elagent than my orignial attempt. After playing around with it I found that it could also be simplified a bit further! Here is my revision of the example code you have provided:

#define BUFFER_MAX 64

char masterCommand[BUFFER_MAX];
int i = 0;
bool flag = false;

void cleanupSerial(const uint32_t minTimeout, const uint32_t maxWaitTime = 2000) {
  uint32_t startTime = millis();
  uint32_t lastByteSeen = startTime;

  while (millis() - lastByteSeen < minTimeout)
    if (millis() - startTime >= maxWaitTime) break;
    else while (Serial.read() != -1) lastByteSeen = millis();
}

void setup()
{
  Serial.begin(57600);  //Bd must be as high as possible
}

void loop()
{
  while (Serial.available() > 0)
  {
    masterCommand[i] = Serial.read();
    if ((masterCommand[i] == '\n')&&(masterCommand[i-1] == '\r')) //Newline option at the Line ending tab of Serial Monitor
    {
      flag = true;
      break;
    }
    
    i++;

    if(i < BUFFER_MAX)
      continue;
      
    Serial.println(F("Error: buffer overflow"));
    cleanupSerial(20,1000);
    memset(masterCommand, 0x00, BUFFER_MAX);
    i = 0;
  }

  if(flag == true)
  {
    //set \r and \n to \0, similar to my strncpy() method
    masterCommand[i - 1] = '\0'; //null-byte at position of '\r'
    masterCommand[i]     = '\0'; //null-byte at position of '\n'
    Serial.println(masterCommand);
    
    i = 0;
    memset(masterCommand, 0x00, BUFFER_MAX);
    flag = false;
  }
  
}

(note, i also added in the cleanUpSerial() function provided by @J-M-L earlier)

I found that the final if() statement could be reduced as some commands were being repeated in both the if() and else{} sections of the code, like setting flag = false and memset() for example.

Another simplification has come from using:

if ((masterCommand[i] == '\n')&&(masterCommand[i-1] == '\r')) { ... }

and is similar to the line I use in my code that you are asking about.

Essentially, it is only setting flag as true and breaking out of the while loop when both the previous character is '\r'and the current character is '\n'. in my original code, I was telling the code to exit the function if the previous characters were not what I wanted. So, I was doing this:

if ((masterCommand[i] != '\n')&&(masterCommand[i-1] != '\r')) 
     return;

//code that executes if terminating characters are present

return; //Return after everything is done

which is the same as:

if ((masterCommand[i] == '\n')&&(masterCommand[i-1] == '\r')) 
{
    //code that executes if terminating characters are present
}

return; //Return after everything is done

Although both of the above snippets are the same, the advantage of the first method is that it avoids executing a lot of code inside an if() statement, which is good practice for programming

1 Like

The real challenge in a command parser is to accept multiple ‘similar’ commands, mixed case, and a variable number of parameters… string or numeric, while rejecting incorrect combinations, and making a best guess to prompt the user what the error is.

I have a chunk of code that I reuse…
Accept a char buffer up until the CR is received.
Handle BS as needed.
THIS IS the 1st cOmmAnd

Trim, and Remove duplicate spaces/tabs and LFs if they’re not required.
Convert to all lowercase for parsing later…
Use strtok() to separate the ‘words’ into an array of parameters p - using SP, or other delimiters.
Now I have the whole command line as separated parameters…
[this] [is] [the] [1st] [command]

Now, scan my list of known keywords for matches with each of the parameters from left to right…
Each subsequent parameter is evaluated in context of the parameter preceding it…
If it’s a numeric variable

e.g. (VOLUME UP) is interpreted differently than (VOLUME 5), or (VOLUME +4)

This continues from param[0] through param[n] to identify and execute a very specific command with each term evaluated in context of what your asking or telling the ‘interpreter’

For example…
both
(SET VOLUME +8) is just as valid as (SHOW volume)

If I type (SET VOLUME), the parser will respond
SET VOLUME (?) unknown…
because it had no third value to act on.

This can be fine tuned with partial matches that are unambiguous..
e.g SHow, VOLume… etc.

Enjoy, there’s your homework.

Excellent work! This is how knowledge evolves. You have made an intial attempt to which @J-M-L has added valuable ingredients, @GolamMostafa, and others have also contributed some stuff. Finally, you have consolidated them all to get a sketch of your liking.

However, I prefer to like Arduino style of sketching; where, all global variables go at the top of the setup() function and the user-defined functions go below the loop() function.

Moreover, is it not enough to have one null-charcater at position '\r' to stop printing?

Just a side note - That’s OK to do if your command language is not using UTF8 extended charsets.

« Définir arrêter après Noël »
Might end up somewhat scrambled if you don’t test for ascii input

The latter is a bad advice/habit to give as this is NOT the C++ norm unless you pre declare the functions (which the IDE attempts to do for you but sometimes fail).

It's like in real life. If I tell you "Go talk to Fred", you will ask "who is Fred?". But if I had introduced Fred to you earlier, then you would know. And if there is more than one Fred, then you need some way to qualify this even further (last name, context, description...)

As you teach I believe it’s important to explain that the language requires that « stuff » need to be declared before they are used (and you can then explain the pre declaration, what the IDE does, the extern keyword, a function signature, and what a linker does or why we create and include .h files etc. It removes the black magic aspect and helps students understand what’s really happening).

1 Like

@J-M-L … Good point…
I use this mostly for simple ASCII command line parsers, but the next iteration should handle that if I live long enough !

Noted down with thanks.

It is very nice to see that all the code is going into the same direction (gather data in a buffer, set a flag when command is received, keep track of millis).
When the zero-terminator is set at the right position at the right time, then there is no need for memset(). I made a millis_serial_timeout.ino that accepts CR or LF or both (in any order) or none.

1 Like

No problem! I'm glad it can be of use.

Yes, the '\0' at the '\r' position will indeed stop the printing as it indicates the end of the string. I just like changing the '\n' to a '\0' as well because it feels weird to me to have a string end and have characters left over, but that is purely down to my own preference/ocd and isn't needed haha

1 Like