Implementing a binary serial protocol

Hello

I'm working on a serial protocol to send some data back and forth between an ESP8266 and an Arduino or Teensy.
Some context: I'm building a control surface, so the ESP handles the WiFi connection, hosts a webpage, creates UDP connections etc. The Arduino takes care of I/O: it controls motorized faders, keeps track of rotary encoder positions, reads analog values from potentiometers ... (more info here)

First the ESP sends setup information to the Arduino, e.g. "read analog values on pins A1-A8".
If those values change, the Arduino will send back the new value to the ESP.
The ESP can also send new fader positions to the Arduino, and the Arduino should then adjust the position of the motorized faders accordingly.

The messages look like this:
1lll cccc 0ddd dddd ... 0ddd dddd 0ppp pppp
Where lll = length, cccc = command, ddddddd = data and ppppppp = parity (simple XOR)

If the error checking fails in the setup phase, it will resend the message, if it fails for a normal change of value packet, it should be ignored.

I'd like some help to actually implement this protocol. I've made a prototype, but there's a lot of room for improvement, so I am open to any constructive criticism.

I've included the most important parts of my (working) code, and added the rest as attachments.

// ESP8266 (master)

#define UINT unsigned int

struct OneByteMsg {
  UINT command : 4;
  UINT length : 3;
  UINT start : 1;

  UINT data1 : 7;
  UINT continue1 : 1;

  UINT xorCheck : 7;
  UINT stop : 1;

  OneByteMsg(int cmd, int d1) : command(cmd), data1(d1) {
    start = 1;
    length = 1;
    continue1 = 0;
    stop = 0;
    xorCheck = 0x7F & ( (length << 4 | command ) ^ (data1) );
  }
};

struct TwoByteMsg {
  UINT command : 4;
  UINT length : 3;
  UINT start : 1;

  UINT data1 : 7;
  UINT continue1 : 1;

  UINT data2 : 7;
  UINT continue2 : 1;

  UINT xorCheck : 7;
  UINT stop : 1;

  TwoByteMsg(int cmd, int d1, int d2) : command(cmd), data1(d1), data2(d2) {
    start = 1;
    length = 2;
    continue1 = continue2 = 0;
    stop = 0;
    xorCheck = 0x7F & ( (length << 4 | command ) ^ (data1) ^ (data2) );
  }
};

struct faderMsg {
  UINT command : 4;
  UINT length : 3;
  UINT start : 1;

  UINT motor1 : 7;
  UINT continue1 : 1;

  UINT motor2 : 7;
  UINT continue2 : 1;

  UINT motorPWM : 7;
  UINT continue3 : 1;

  UINT fader : 7;
  UINT continue4 : 1;

  UINT touch : 7;
  UINT continue5 : 1;

  UINT xorCheck : 7;
  UINT stop : 1;

  faderMsg(int m1, int m2, int mp, int fd, int tch) {

    motor1 = 0x7F & m1;
    motor2 = 0x7F & m2;
    motorPWM = 0x7F & mp;
    fader = 0x7F & fd;
    touch = 0x7F & tch;

    start = 1;
    for (int i = 1; i < sizeof(*this); i++) { // all data bytes have MSB = 0
      ((uint8_t*)(this))[i] &=  0b01111111;
    }

    command = FADER;
    length = sizeof(*this) - 2;

    xorCheck = 0;
    for (int i = 0; i < sizeof(*this) - 1; i++) { // XOR all bytes together for a parity check
      xorCheck ^= ((uint8_t*)(this))[i] & 0x7F;
    }
  }
};
// ESP8266 (master)

faderMsg fdr(24, 25, 26, 27, 28);

  do {
    Serial.write((uint8_t*)&fdr, 7);
    delay(10);
  } while(Serial.read() != ACK);
// Arduino (slave)

uint8_t values[8];
int index = 0;
int length = 0;
int command = 0;
int check = 0;
void loop() {
  if (Serial1.available() > 0) {
    uint8_t read = Serial1.read();
    if (read >> 7 & 1 == 1) { // if it's a start byte (MSB = 1)
      index = 0;
      length = read >> 4 & 0b111;
      command = read & 0b1111;
    } else if (index == length) { // Last byte (parity check)
      if (read ^ check == 0) { // parity OK
        parseMsg(length, command, values);
      }
    } else { // normal data byte
      values[index++] = read;
      check ^= read;
    }    
  }
}
// Arduino (slave)
void parseMsg(int length, int command, uint8_t* values) {
  Serial.println("Message received:\r\n-----------------");
  switch (command) {
    
    case OUT:
      if (length == OUT_LEN) {
        // ...
        sendAck();
      }
      break;
      
    case IN:
      if (length == IN_LEN) {
        // ...
        sendAck();
      }
      break;

     // ...
  }
}

Things I would like to improve:

  • Right now there's a separate struct for every different length of packet. The struct for a packet with 2 data bytes is nearly identical to a packet with only one byte of data, apart from the extra byte, obviously.
    Also, I explicitly set the MSB of every data byte to zero in the constructor, this is just nasty and doesn't seem right. Is there an easier way?
  • As mentioned before, in the setup phase, the ESP should resend the message if it doesn't get an acknowledgement back. I'm currently using a do ... while loop for this purpose, but I think there's better ways to implement this.
  • Of course, other comments/improvements are very much appreciated as well.
    A link with more information about implementing a binary protocol, or maybe an example would be really helpful. I've done some research but I didn't find much useful information.

Thanks a lot,
Pieter

ESP8266_Arduino_protocol_master.ino (2.68 KB)

ESP8266_Arduino_protocol_slave.ino (2.93 KB)

Is there any value in having messages of different lengths?

I would be very tempted to make all my messages as long as the longest message and just pad the unused bytes with some data.

I would also write my code to receive a complete message before I tried to see what is inside it. Have a look at the examples in Serial Input Basics - they can be adapted to deal with binary data without much trouble.

Life will be much simpler if you can reduce the command to a single byte or character.

...R

Thank you for your reply!

Robin2:
Is there any value in having messages of different lengths?

I would be very tempted to make all my messages as long as the longest message and just pad the unused bytes with some data.

It's just that 99% of the messages have 1 or 2 data bytes, so it seems a bit "wasteful" to make every message 7 bytes long, just because I might need it in 1% of the cases. And splitting the long message up in smaller packets would make everything (especially the receiver part) a lot more complex.

Robin2:
Life will be much simpler if you can reduce the command to a single byte or character.

What exactly do you mean? The command itself is only 4 bits right now, and the whole header fits in a single byte as well.

Having all messages the same length help to recognize partial
messages.
You should also consider fixed preamble and postamble that does
not match any possible data in messages.
This ensures that the message you got was complete at the start and end.
Dwight

dwightthinker:
Having all messages the same length help to recognize partial
messages.
You should also consider fixed preamble and postamble that does
not match any possible data in messages.
This ensures that the message you got was complete at the start and end.
Dwight

I have a fixed preamble, i.e. my start byte is the only byte that has MSB = 1, it also specifies the length of the packet. So if a byte is lost and the message is incomplete, I'll know, because the length of the received packet does not match the length specified in the first byte. Do I still need a unique stop byte, or is this enough to recognize partial messages?
You could argue that the length bits could be corrupted, but then the parity check will most likely fail as well.

What if the reception is lost in the middle of the first message
or right after the first bit.
Even simple 8 bit async used both start and stop bit. That is only for
8 of data or 9 bits with parity.
You intend to send a long string with a single parity bit.
Most would use a CRC for anything longer than a byte.
It is remarkable what random noise can do.
Dwight

Ok, I'll use CRC.

But what about the actual coding part? Any comments on that?

It can be done.
Dwight

I meant the entire program, not just the error checking :wink: .

What about using I2C? That has explicitly signaled start and stop conditions, and since it's a bus protocol you can have multiple Arduino slaves with a single ESP master if you need.

It's a little difficult to implement I2C over a wireless link. :slight_smile:

Personally, I don't like codes which require the length to be specified in advance. You have to assemble a message in a buffer and then count the buffer and modify the first part of the buffer. I want to send it already, not tie up memory.

Producing an NMEA message (or any of Robin's simple serial protocols) doesn't require a buffer. Just send a start char ($) and then squirt out the characters. You don't even have to know how many there are after it's been sent.

With a length header, getting back in sync from a missed header can be difficult. If it's the length byte which is damaged in transmission, then you can end up waiting a long time to get to the end of the impossible 65536-byte packet. So I feel like there's a lot more processing and checking to be done.

NMEA with the checksum at the end is very easy to process. Just collect the characters in a buffer (after the initial $ is seen) and then when you get a checksum, your message is finished and can be checked for validity.

MorganS:
It's a little difficult to implement I2C over a wireless link. :slight_smile:

I don't see anywhere where it says the ESP-Arduino link is wireless.

As mentioned before, in the setup phase, the ESP should resend the message if it doesn't get an acknowledgement back. I'm currently using a do ... while loop for this purpose, but I think there's better ways to implement this.

That code has the potential to hang the sender forever if the receiver is not connected or fails to reply in time.
One option is to use a counter and only try N times. A second option is to implement a small statemachine. In the first state you send the data, in the second state you wait for a reply and check for an ACK and use a timeout (!).

There is currently no indication in the sender that a communication error occurred; might be useful.

Looking at your receiver code, you need to reply with error codes if errors are detected. That can be a XOR error or a timeout (full message not received in time).

PieterP:
so it seems a bit "wasteful" to make every message 7 bytes long,

What exactly would you be wasting? The Arduino will be doing 16 million instructions per second whether or not you make use of them.

I thought your command was 4 byte.

There was a time when I thought it would be "sensible" to transmit binary data and use structs etc. Now, however, my strong preference is to send data in human readable characters unless that makes it impossible to achieve the required performance. Sending data as text makes debugging soooooo much easier.

...R

Thank you all for your interesting comments!

Jiggy-Ninja:
What about using I2C? That has explicitly signaled start and stop conditions, and since it's a bus protocol you can have multiple Arduino slaves with a single ESP master if you need.

I've thought about I²C, but I decided to use serial because the ESP doesn't have hardware I²C, it's slower, the slave can't send to the master when it pleases, the master has to actively poll the slave for changes, and I didn't have the intention to use multiple slaves.
Should I pick I²C instead of serial? Are there other advantages as well?

MorganS:
It's a little difficult to implement I2C over a wireless link. :slight_smile:

Jiggy-Ninja:
I don't see anywhere where it says the ESP-Arduino link is wireless.

It's normal wired hardware serial link, so I could indeed use I²C if I wanted to :wink: .

MorganS:
Personally, I don't like codes which require the length to be specified in advance. You have to assemble a message in a buffer and then count the buffer and modify the first part of the buffer. I want to send it already, not tie up memory.

Producing an NMEA message (or any of Robin's simple serial protocols) doesn't require a buffer. Just send a start char ($) and then squirt out the characters. You don't even have to know how many there are after it's been sent.

I don't really think that's a problem here, since assembling the message is not too hard (data comes from a JSON file stored in the ESP's flash memory or directly from an analog or digital reading). I don't see much harm in storing the message in memory before sending it. Why should I avoid that? When the function they are declared in returns, their memory will be freed, right?

MorganS:
With a length header, getting back in sync from a missed header can be difficult. If it's the length byte which is damaged in transmission, then you can end up waiting a long time to get to the end of the impossible 65536-byte packet. So I feel like there's a lot more processing and checking to be done.

Every time the start byte is received, a new message is started, it doesn't matter whether the last message was finished or not, it won't wait for it to finish. If the message is complete and not corrupted, it's executed when the last byte is received. I know what the last byte was because of the length specified in the start byte, and by counting the received bytes. When the length is wrong, the message is simply ignored, and the receiver just waits for the next start byte.
(N.B: the header is only 1 byte long, and the length portion of it is only 3 bits wide: start byte = 0b1LLLCCCC where 0bLLL = length and 0bCCCC = command. The MSB makes it easier to detect a start byte, it starts with a one, all other bytes after that start with a zero.)

sterretje:
That code has the potential to hang the sender forever if the receiver is not connected or fails to reply in time.
One option is to use a counter and only try N times. A second option is to implement a small statemachine. In the first state you send the data, in the second state you wait for a reply and check for an ACK and use a timeout (!).

There is currently no indication in the sender that a communication error occurred; might be useful.

I'm aware of that, it's just pointless for the ESP to continue if the Arduino doesn't respond. But you're right, I'll add a timeout and I counter, if it fails N times, I'll show a message on my webpage and blink some LEDs.

sterretje:
Looking at your receiver code, you need to reply with error codes if errors are detected. That can be a XOR error or a timeout (full message not received in time).

That's a good point, I'll try to add it!
Again, is there an example of an implementation of a similar protocol, where I can get some ideas for my own code? That would be really helpful.

Robin2:
What exactly would you be wasting? The Arduino will be doing 16 million instructions per second whether or not you make use of them.

Resources on the ESP8266: It drives things like LED VU-meters as well, it's not super timing critical, but a lag of a couple of milliseconds is very annoying. And if it gets slower, the lag on the buttons and motorized faders will become noticeable as well. A lot of time goes into parsing the received UDP packets (currently my loop takes between 1700µs and 9500µs without communication with the Arduino, depending on the size of the UDP packet), so I'm afraid that adding another 'text-based' protocol would add a lot of overhead because of the longer messages, parsing and converting numbers to text and then back to numbers.

Or is the time I lose with this compared to a binary protocol negligible?

Not sure why you picked the ACK value that you did; from memory 0xAA. Usually ACK is represented by 0x00.

Anyway you already have a check if the checksum is correct; add an else and send CRCERROR back. Same for length, send a LENERROR back. Up to you to define the values.

If you are struggling to get work completed in the available time then binary data would certainly help becaude the data arrives "ready to use".

However I suspect you lose some of that benefit if you have to figure out binary messages of different lengths or if you have to parse single bytes to extract separate data from the two different nibbles.

I don't think there will be an appreciable saving in CPU cycles by receiving (say) 5 bytes rather than 10 bytes.

I would use a protocol that incudes start- and end-markers and receive a complete message before trying to analyze of verify anything.

Of course I am assuming that the other code in your program is very efficient and could not be speeded up to make time for easier serial communication. :slight_smile:

...R

PieterP:
Every time the start byte is received, a new message is started, it doesn't matter whether the last message was finished or not, it won't wait for it to finish. If the message is complete and not corrupted, it's executed when the last byte is received. I know what the last byte was because of the length specified in the start byte, and by counting the received bytes. When the length is wrong, the message is simply ignored, and the receiver just waits for the next start byte.
(N.B: the header is only 1 byte long, and the length portion of it is only 3 bits wide: start byte = 0b1LLLCCCC where 0bLLL = length and 0bCCCC = command. The MSB makes it easier to detect a start byte, it starts with a one, all other bytes after that start with a zero.)

No.

If the last message had a length of 255 because the length field was corrupted then the thing will sit and wait when it only gets 20 bytes. The sender sees that it didn't get an ACK, so it sends the same message again. You don't recognise the start byte because you are still trying to read a long message.

If you choose a start byte which is always identifiable as being a control character and not a data character, then you don't need to send any length byte.

MorganS:
No.

If the last message had a length of 255 because the length field was corrupted then the thing will sit and wait when it only gets 20 bytes. The sender sees that it didn't get an ACK, so it sends the same message again. You don't recognise the start byte because you are still trying to read a long message.

If you choose a start byte which is always identifiable as being a control character and not a data character, then you don't need to send any length byte.

I beg to differ: If the length field is somehow corrupted and is longer than the actual message, the receiver is waiting for more data, so it doesn't send an acknowledgement, so the sender times out, and sends the same message again. This new transmission starts with a start byte (MSB = 1), so the receiver just drops the previous (unfinished) message, resets the index, the target length and the checksum (this line is missing in my original code) and starts receiving the new message. If the lengths and checksums match, it sends an acknowledgement, and the sender moves on to the next packet. (I've tried this.)

If the sender sends the wrong length, it's a completely different story, and in that case you are be absolutely right; it will keep on sending data, and never getting an acknowledgement. (But the same goes for sending/calculating a wrong checksum).

Anyway, thank you all for the interesting new insights, I'm currently trying to write a new piece of code, based on your suggestions, using a dedicated start and stop byte seems like a good idea indeed, and I'll get rid of the length field: I'll make all 'setup' messages 5 data bytes long, and after the initialization (special command), it switches to messages with only 2 data bytes (fixed length).

Robin2:
However I suspect you lose some of that benefit if you have to figure out binary messages of different lengths or if you have to parse single bytes to extract separate data from the two different nibbles.

Do some bitwise &'s and bitshifts really take that much cpu time?

Robin2:
Of course I am assuming that the other code in your program is very efficient and could not be speeded up to make time for easier serial communication. :slight_smile:

I'm trying to optimize it as much as I can :slight_smile: but most of the time goes into parsing the OSC messages using the OSC library included with Teensyduino. Maybe I'll dive into the library and create my own parsing function, because there's a lot of functionality of the library that I don't really need.

sterretje:
Not sure why you picked the ACK value that you did; from memory 0xAA. Usually ACK is represented by 0x00.

Anyway you already have a check if the checksum is correct; add an else and send CRCERROR back. Same for length, send a LENERROR back. Up to you to define the values.

I don't really see why the sender should care about why it failed, the important thing is that it sends the message again, right? What am I missing here?

PieterP:
Do some bitwise &'s and bitshifts really take that much cpu time?

In my world, that is the wrong question.

The question I would ask is the inverse ...
"Am I so tight for time that I must forsake the simplicity of using whole byte values?"

...R