How can I read bytes until end of text delimiter?

If I have to read data via a serial interface I usually use Serial.readBytesUntil() like explained in this example (here until line feed \n is found):

const int BUFFER_SIZE = 100;
char buf[BUFFER_SIZE];

void setup() {
  Serial.begin(9600); // opens serial port, sets data rate to 9600 bps
}

void loop() {
  // check if data is available
  if (Serial.available() > 0) {
    // read the incoming bytes:
    int rlen = Serial.readBytesUntil('\n', buf, BUFFER_SIZE);

    // prints the received data
    Serial.print("I received: ");
    for(int i = 0; i < rlen; i++)
      Serial.print(buf[i]);
  }
}

However now I have to read until end of text (ETX) is reached. In comparison to line feed \n there is no ASCII representation for ETX cause it's no control character.

I tried to pass ETX in HEX representation which is 0x03 (int rlen = Serial.readBytesUntil(0x03, buf, BUFFER_SIZE);) however it did not work and I'm not sure how Serial.readBytesUntil() handles the representation of the char passed.

Any ideas?

See Serial input basics - updated

@UKHeliBob I already scanned that post but could not find a solution for my problem. What section do you want to reference?

Use '\x03' (hex) or '\003' (octal)

https://en.cppreference.com/w/cpp/language/escape

1 Like

At your transmitter side (sender side), the following executable codes should be performed in sequence so that 0x03 (ASCII code of ETX, Fig-1) is detected by the receiver.

char myData[] = "abcd"; //for example
#define ETX 0x03
Serial.print(myData);
Serial.write(ETX);


Figure-1:

I tried but it did not work.

I tried the initial example (read bytes until n) with Termite as terminal with these settings

grafik

and \n in the overall string but got:

What am I doing wrong during debugging?

It looks like you typed "bulb\nsomethingdifferent" into Termite. That is what Termite sent to the Arduino and that is what Arduino echoed back. To me, that means that Termite is not treating the backslash as an escape character.

Try turning on the "Append LF" feature. Linefeed (LF) is another name for New Line ('\n'). Then when you send "bulb" (followed by Enter) and "somethingdifferent" (followed by Enter) they will show up in separate reads.

Is this working as well if I want to read until \x03?

Yes in the same manner, but with the terminator you need ('\n' is equivalent to 0x10).

int rlen = Serial.readBytesUntil(0x03, buf, BUFFER_SIZE);

Or as write from @johnwasser

int rlen = Serial.readBytesUntil('\x03', buf, BUFFER_SIZE);

@cotestatnt @johnwasser

const int BUFFER_SIZE = 100;
char buf[BUFFER_SIZE];

void setup() {
  Serial.begin(9600); // opens serial port, sets data rate to 9600 bps
}

void loop() {
  // read the incoming bytes until default timeout 1000ms:
  int rlen = Serial.readBytesUntil(0x03, buf, BUFFER_SIZE);

  if (rlen != 0) {
    // prints the received data
    Serial.print("I received: ");
    for(int i = 0; i < rlen; i++)
      Serial.print(buf[i]);
  }

}

The function works like this:

grafik

grafik

0x62 0x6c 0x75 0x62 0x03 0x62 0x6c 0x61 is blub 0x03 bla

grafik

However this is not reading until! It's reading until and reading since! One would expect that the "bla" part is not placed into the buffer.

A workaround is to stick to readBytes():

const int BUFFER_SIZE = 100;
char buf[BUFFER_SIZE];

void setup() {
  Serial.begin(9600); // opens serial port, sets data rate to 9600 bps
}

void loop() {
  // check if data is available
  // read the incoming bytes:
  int rlen = Serial.readBytes(buf, BUFFER_SIZE);

  if (rlen != 0) {
    // prints the received data
    Serial.print("I received: ");
    for(int i = 0; i < rlen; i++) {
      if (buf[i] == 0x03)
        break;
      Serial.print(buf[i]);
    }
  }
}

I think that it is working as it should. There are two loops that occurred here. On the first loop, rlen is assigned 0x62 0x6c 0x75 0x62 (or blub) as it stops on 0x03. "I received : blub" is then printed. But the serial buffer is not empty, it still contains 0x62 0x6c 0x61. So on the second loop, serial.available() is still true and reads the rest of the contents of the buffer until it times out and thus prints out another "I received: bla".

So as you found out, you have to read the whole contents of the serial buffer and empty it before parsing the data. A better alternative is to use NeoHW Serial so you have control on what is actually written into the buffer that you define.

I disagree. If a method is called readBytesUntil() I expect that the method "fires" via rlen != 0 only once and that it does only put chars into the buffer until the delimiter is found.

I could not find "NeoHW Serial" in the libs section of the docs. As your recommendations are workarounds for the method not functioning as the name suggests I'll simply implement readBytes() miself. Anyway... thanks for helping out.

Yes you are correct and it only fires once only within the same loop. However two loops has occurred in your code hence rlen fires up again in the 2nd loop. What I was pointing out here is that readBytesUntil() is working as it should. Your sample code is wrong to begin with and you are expecting to get a correct result.

The flow of the code is ....

  1. You go into void loop, nothing happens as the hardware serial ring buffer does not yet have any data.
  2. You send blubetxbla, the hardware serial ring buffer now contains blubetxbla
  3. Read Serial.readbytes() reads until etx and stops reading from the serial ring buffer, discards etx and assigns blub to rlen, however the serial ring buffer still contains "bla".
  4. "I received: " is then sent out to the serial port
  5. "blub is then sent out to the serial port
  6. So you get on the serial monitor "I received: blub"
  7. It goes back into the loop, Serial ring buffer still contains "bla".
  8. At this point, Serial. readbytesuntil() copies the remaining "bla" to rlen after it timesout
  9. Your rlen now containing a new value "bla"
  10. "I received: " is then sent out to the serial port
  11. "bla" is then sent out to the serial port
  12. There are two loops that occurred here before the hardware serial ring buffer becomes empty.

In your work around code, in the first loop, rlen is assigned all the contents of the serial ring buffer hence in the second loop, there is no longer any data to assign to rlen as there are no data in the hardware serial ring buffer. However if you send data greater than 100 characters, you also get the same results.

Yes, but that's exactly what I'm talking about. "bla" is not followed by 0x03. If I would have send blub0x03bla0x03 this behaviour would have been ok of course.

Anyway... in my code I'm explicitly checking for intermediate delimiters and simply reads the rest of the input buffer without considering it. I'm interested in some intermetiate string containing a value which is postfixed/prefixed with delimiter. The code looks somewhat like

  char bufferOfInterest[255];
  int byteCount = 0;
  enum state { idle, value, finished };
  state state = idle;

  while(Serial1.available() > 0) {
    if (state == idle) {
        char c = Serial1.read();
        if (c == beforeValueDelimiter) {
          state = value;
        }
    }
    else if (state == value) {
      char c = Serial1.read();
      if (c == afterValueDelimiter) {
        state = finished;
      }
      bufferOfInterest[byteCount] = c;
      byteCount++;
    }
   else if (state == finished) {
      Serial1.read();  // empty buffer
    }
    else {
      // forbidden
    }
  }

In the case on the 2nd loop, Serial.readBytesUntill() times out.

Yes of course. However in my understanding of clean implementation of readBytesUntil() if there is a timeout the data received should not be considered valid and the buffer should be cleared. Otherwise it's not reading until an explicit delimiter.