Serial Port - Buffering questions - "sliding" / ring buffer performance

Hello everyone,

for a device I made, I need to use a "sliding (or Ring) buffer to compare incoming data against a given string.
Hoever, I tend to lose characters if the opposite side is sending continuously. I know the serial buffer of the Atmega328P should be internally 64 byte, and I can receive up to 75 or 76 characters in a row without problems. Then it starts to skip single characters on reception, and I will miss my expected string to react upon.

Here is the related piece of code - note there is a "setupMode" path that's uncritical, since I have to switch between a "Setup" mode where I can enter configurations interactively via serial line, that will be parsed for commands and parameters, and a "Run" mode that does no parsing but just looks for the one expected string and reacts to it if found - only the latter one is of interest here.

...

// some globals used:

#define SER_LEN 20
String strbuf=""; 
String ser_expect = "expected_string"; 
String ser_send = "some_answer";
...

void doSerialPort() {
 
  char ch = ' ';

  // read one character from serial: ...
  while (Serial.available()) {
    ch = Serial.read();
    if ( setupMode ) {
      // ...
      // ... do parsing etc.
      // ... clear "strbuf" in the end again
      // ... not of interest, works well ...
    }
    else {
      // Run Mode - fill buffer to SER_LEN maximum, else rotate left:
      if ( strbuf.length() < SER_LEN ) {
        strbuf += ch;
      }
      else {
        strbuf = strbuf.substring(1, strbuf.length()) + ch;
      }
      if (ser_expect != "") {
      // try to find "ser_expect" in serial read buffer:
      if ( (strbuf.indexOf(ser_expect) >= 0 ) || ( ser_expect == "*") ) {
        // expected string found - check if some timer needs notification
        for (int i=0; i<NUM_TIMERS; i++) {
          if ( tmr[i].assign & B00000100) {
            tmr[i].events |= B00000100;
          }
        }
        // then clear serial read buffer:
        strbuf = "";
        if (ser_send != "") {
          // if there is something defined to send after having received the matching code, send it now:
          Serial.println(ser_send);
        }        
      }
     }
    }
  }
}

From pure logic, the code does work like it should. If characters come in one by one, everything is fine.
Now, I'd expect that a) using the convenient but bloaty "String" variables make things slow, and b) the shifting around of "strbuf" in general is too slow and will cause characters to be missed.

I'd be glad to see some hints how to squeeze the most performance out of this "else ...." code section, in order to not miss a character (at 115200Bd).

BR

stargar

Please post full code.

This has nothing to do with the String class being slow; it's actually fast. The rest of your code more than likely slows the reception down resulting in a buffer overflow of the software RX buffer (the 64 bytes). Note that that is a software buffer, not a hardware buffer in the 328P.

Your ring is implemented poorly. It seems a little like "schlemeil the painter". You are copying the entire string many times over.

Think of waiting by the mail slot at home and someone sends you a letter once per week. You are loking for "expected_string". If the first letter that comes through the slot is not an "e" then throw it away. Don't store it for later.

If you are waiting for several different messages then there may be several valid first letters. Then you might need to store only the matching letters. Usually you don't even do that because you always start the machine-to-machine messages with a single start char such as $ or <

Hello everyone,

I digged further into it and put together a runnable sketch. In "real life" this is a triple programmalble timer for laboratory use. Don't bother about some functionality that is not strictly required or just dropped here - I just replaced the loop() with something very minimal, and left out parts that are of no interest for time being.

SEE ATTACHMENT!

Main point of interest is still the doSerialPort() part. I have added some diagnostic output in the end of it.

Now, start the sketch, and throw the following string at it IN ONE CHUNK, i.e. copy and paste it into e.g. PuTTY, or fire it from the Serial Monitor in Arduino IDE:

12345678901234567890123456789012345678901234567890123456789012345678901234567890

I see the following output from that:

12Chars read per turn: 2
345678Chars read per turn: 6
901234Chars read per turn: 6
567890123456789012345678901234567890123456789012345678901234567890
 PARAMETER STRING ERROR

Chars read per turn: 68

That looks surprising, and not deterministic... any explanations?
Maybe the timer interrupt is coming in our way? I use the "MsTimer2" librar for a 1 millisecond interval.

As for the buffer shifting:
Morgan, you're not quite right... what I do is to first fill the buffer up to 20 characters, check the complete buffer after every new character if my "expected_string" is now contained, and after reaching 20 chars, shift the buffer 1 char to the left and append the new char at the end.

So, using an example size of 5 chars for demonstration, reading in "abcdefghij..." you would see this:
a
ab
abc
abcd
abcde
bcdef
cdefg
...

Let's say I'm expecting "def" to be contained, this will be the case after the 6th step.

Best regards

stargar

serial_testing_1.ino (16.2 KB)

MorganS:
Your ring is implemented poorly. It seems a little like "schlemeil the painter".

LOL. I had forgotten about him.

...R

OK, I opened up the code.

The 8-bit signed type which you are looking for is called int8_t, not char. You would more often see the unsigned version uint8_t but each of those 8/16/32/64 bit types has both signed and unsigned versions.

Which Arduino is this for? If it's for one with native USB serial such as a Micro or Due, then you're working with the USB buffers, which can contain 68 characters, unlike the hardware serial buffers that (usually) only hold 64 characters.

The USB system will package up the characters into packets, trying to optimise the performance of the link. That's how you get a large number seemingly appearing at once. It's actually working at a much higher baud rate than the 115200 that you asked for. That number is ignored when using native USB serial.

Robin2:
LOL. I had forgotten about him.

...R

Thanks, Robin. New to me!

Hello MorganS,

The 8-bit signed type which you are looking for is called int8_t, not char. You would more often see the unsigned version uint8_t but each of those 8/16/32/64 bit types has both signed and unsigned versions.

Which exact part are you referring to? I mean, "char" is a signed 8-bit type on Arduino Uno/Nano.
"byte" would be the unsigned one. And what's wrong with receiving "char"'s on the serial interface?

I'm on Arduino Uno (clone), basically, but also testing on Nano (clones).
You're probably right about mentioning the type of serial interface. The original Uno has that Atmega 16U2 (or so), while my Uno clones have something else - see here:

http://www.produktinfo.conrad.com/datenblaetter/1300000-1399999/001378656-sp-01-en-C_CONTROL_OPEN.pdf
or here:

I just found out that this chip has a 576 byte send and 576 byte receive buffer.
Still it looks surprising that it will first send 1 or 2 chars only, then 2x 6 chars, then the rest.
Test results stem from this Uno clone.

The el-cheapo Nano clone has the chinese CH340G, and I also run with a separate FTDI232-based Uno shield directly attached to pins 0/1 in the final device. So there may surely be differences in handling from the hardware side.
But overall, all these chips do not have a side channel to communicate with the Atmel 328P, right?
So the Serial.available() command can still only tell what it has seen coming in over pins 0/1 ...

Obviously, my assumption is wrong that when sending a continuous stream, or hunk of data, over the serial interface, it will be received equally continuously by the Arduino, and can be handled without leaving the receiving routine. Maybe I should try to prolong the receiving routine artificially to take at least one additional character duration @115200Bd, after one char has been received. So actually I'm rather too fast than too slow.

BR

stargar

Have a look at the examples in Serial Input Basics - simple reliable ways to receive data. You can change the value of numChars if you need to receive more than 32 chars.

An Arduino can work very much faster than serial data can arrive so there should be no trouble receiving long messages even with the normal 64 byte Serial Input Buffer.

...R

Maybe I should try to prolong the receiving routine artificially to take at least one additional character duration @115200Bd, after one char has been received.

That would be exactly the wrong way to do it. Your Arduino would spend all its time waiting for "one more" and never actually do anything else.

My comment about int8_t was just referring to the long comment in the code which explains why char is used in one of the buffers.

On the Uno and Nano then the discussion about USB packets doesn't apply. It doesn't matter if it has a 16U2 or a cheap anything on the front end. The chip which you are programming can't tell the difference.

The pattern of 2-6-6-LOTS is definitely weird. The lots of characters is a really long time in Arduino terms. If you have a delay(100) in there, then that would be understandable. I think it is something to do with your ring buffer algorithm using slow String operations. Convert it to strings and never move any data - use a pointer or index into the ring buffer or (better) change the algorithm so it doesn't have to store characters which were already examined and recognised/discarded.

You probably want "KMP string search on a stream", which I can find several useful descriptions and ada implementations, but nothing set up for the really austere arduino environment.
Sliding your string is a bad idea, it's much better to implement a st ring compare that wraps around the circle of a buffer...

Good grief.

  1. Even if you get the matching to work, your sketch will not run very long. The ways you are using the String class are going to cause dynamic memory fragmentation fairly quickly. It will freeze or restart at random times, or otherwise produce strange results. Ditch the String class.

  2. Just use a state machine to watch for the expected string as the characters arrive:

char    expected[30] = "def"; // watch for this
size_t  expectedLen  = strlen( expected );
size_t  matchCount   = 0;     // how many chars have matched so far


void setup()
{
  Serial.begin( 9600 );
}


void loop()
{
  if (Serial.available())
  {
    char c = Serial.read();

    if (foundExpected( c )) {
      Serial.print( F("Found ") );
      Serial.println( expected );
    }
  }

}


bool foundExpected( char c )
{
  bool found = false;

  if (c == expected[ matchCount ]) {

    matchCount++;
    if (matchCount == expectedLen) {
      found      = true;
      matchCount = 0;   // reset for next time
    }

  } else {
    matchCount = 0; // didn't match, start over
  }

  return found;
}

This doesn't even require storing the received data. matchCount is the Finite-State Machine "state" variable. It "remembers" how many characters have matched so far.

This saves lots of time, because you don't have to store each character, then search all the previous characters. Not only that, searching the previous characters would have compared each character multiple times (hence the schlemeil reference).

You can put this quick test at the point the characters are being read. It could set a flag that is used somewhere else to send the response string.

Cheers,
/dev

P.S. There is one extra thing you'll need if the match string contains substrings of itself. It's not typical, but you should be aware that watching for "defdefg" would not match an input of "defdefdefg".

For those interested, here's the extra testing for "backtracking" in the match function:

char    expected[30] = "defdefg"; // watch for this
size_t  expectedLen  = strlen( expected );
size_t  matchCount   = 0;     // how many chars have matched so far


void setup()
{
  Serial.begin( 9600 );
}


void loop()
{
  if (Serial.available())
  {
    char c = Serial.read();

    if (foundExpected( c )) {
      Serial.print( F("Found ") );
      Serial.println( expected );
    }
  }

}


bool foundExpected( char c )
{
  bool found = false;

  if (c == expected[ matchCount ]) {

    matchCount++;
    if (matchCount == expectedLen) {
      found      = true;
      matchCount = 0;   // reset for next time
    }

  } else {

    // Backtrack
    int origIndex = matchCount;
    while (matchCount > 0) {

      // First, search for the current char
      matchCount--;
      if (c == expected[matchCount]) {

        // Then check the previously found characters
        //   for the current submatch.
        size_t diff = origIndex - matchCount;
        size_t i;
        for (i = 0; i < matchCount; i++) {
          if (expected[i] != expected[i + diff])
            break;
        }

        // If we successfully got through the previous loop,
        //   we found a previous submatch.
        if (i == matchCount) {
          matchCount++;
          break;
        }
      }
    }
  }

  return found;
}

It's loosely based on the Stream method findMulti, which blocks. :stuck_out_tongue:

Hello everyone,

thans for your comments. (I'm just typing on my tablet so please apologise for some typos...)
I was thinking of the "counting without buffering" method myself, but then found just what you wrote in the last 2 replies - I couldn't directly have catered for , say, a sequence of equal characters in a row. Think of a "poor man's progress bar" out of a couple of dots, followed by something else. I probably need to react to the correct number of dots only - think of a device startup that signals success or failure by such a thing.
So the last post looks like a solution, though I ha e not yet thorughly gone through it yet.
I have made other interesting experiments.So I swithced to "string" instead of "String" at that point, wrote a "shift left" rotuine of my own, and brought the time to shift a 20-char string left down from 200microsecs to about 50microsecs. In this context, I noticed how much faster a do...while loop can be over a for... loop .
I'll be back when I reviewed the suggested code fully.

Thanks again so far for all the suggestions. I think dropping Strings completely is a good thing, though I need to reprogram many comfort functions manually... will take some time.

all the best

stargar

Hello all,

just to tell you all I've revamped my code now, and it works SO MUCH better using just "strings" vs. formerly "Strings".
The downside is, you must dig deeper into the bits, and ALWAYS be aware of the sizes of your arrays, and properly attaching a '\0' to the end manually in various cases.
And then, you must know in advance what sizes you would expect. Dynamic sizing is out of scope here.

One thing that I would want to ask from the regulars/Arduino Gurus here, would be to add a comprehensive chapter upon the "string" capabilities to the main Reference documentation. I know this is largely C/C++ stuff, and this is also how I got along by referring to various other sources on the net. But it would really help to ease things if you'd have a description for the "string" (char *) functions as well as for the "String" methods.
Just as well, I miss a description of the mathematical functions such as "exp" and "pow" and so on.
Really would like to see some reference to these.

Thank you all for your effort - you made my day, and saved my week :slight_smile:

Kind regards

stargar