Ideas and suggestions for byte based serial protocols

When using serial communication I almost always use string/ascii data with Robin's start and end markers function. This method has proved to be very reliable. Now I want to try and reduce the characters/bytes being transmitted and I am looking for techniques/methods of using bytes instead of asciii. At present I have no real need for this beyond doing it to learn.

What are preferred protocols and how would you go about labeling data to differentiate it while minimizing the amount of bytes used? As an example, I have several sensors and devices connected to an Arduino. Data from this Arduino is transmitted to a second Arduino for data logging. The data has identifying labels such as T for temperature, H for humidity, P for pressure, etc. and a typical data set would be [T1,025]. What methods and techniques can be employed to send the same values but using the minimum number of bytes while still being able to determine
1 - I can reliable find the start of a data set
2 - identify the sensor or what the data is
3 - the actual value.

For starter you could:

  • Get rid of identified label
  • Send data as binary

For this to work you need to make sure you reconstruct the data in the order that you send them

Arudino Stringobjects

A master can ask for data. That would be the most reliable way as you will know when data is coming.

Protocol wise,
1)
Add a length indicator so the receiver (master) can count.
2)
Add a crc or checksum to check for corruption.
3)
You can still use start- and endmarkers; however they are no guarantee that you actually found the beginning or the end if the message. The length counting and a timeout mechanism are your friends in that case.

For Nextion to Arduino / PIC I devised:
3 unique bytes that I knew would never appear in the data, these being 0xa5 0xa5 0xa5 followed by a fixed 5 bytes of data, soon to become 7 bytes for a particular application I working on.

I also have a PIC measuring the mains voltage of my mains supply, this is connected directly to the mains so has to send its output via serial through an opto isolator. As the result of the measurement is 4 bytes (uint32_t), which could contain any value whatsoever, the only way I could think of to get a unique starting value was to use 5 bytes to indicate start, so I have 0x5a 0x5a 0x5a 0x5a 0x5a followed by 4 bytes of data. You might point out that this is wasteful and you'd have a point except I don't need to transmit much data and all I am wasting is time on a serial port that would otherwise be idle.

Both the above were designed for very specific purposes and both relied on fixed data lengths. Both have short comings that would probably mean they are not useful any where else. The point I am trying to illustrate is that if you move away from a text based ASCII oriented system of data transmission, it brings its own problems you will have to consider and take into account in the particular application you are designing.

MartynC:
When using serial communication I almost always use string/ascii data with Robin's start and end markers function. This method has proved to be very reliable. Now I want to try and reduce the characters/bytes being transmitted and I am looking for techniques/methods of using bytes instead of asciii. At present I have no real need for this beyond doing it to learn.

Although my examples are based on text the exact same technique can be used with byte values if you are prepared not to use 2 values in your data - for example keep 254 and 255 for the start and end markers. If that forces you to treat 253 as the max rather than 255 you have only lost 2/255 or less than 1% of the range.

However that doesn't really work if you want to send multi-byte binary values - for example a 2-byte int. The way I would do that is only a little more complex. I would use 3 values for marker bytes - 253, 254 and 255. As before 254 and 255 are the start and end bytes. And 253 is used as a special marker to let the system know that the value following it is not a marker. So, for example, to send a data value of 254 I would convert that to 253 254 etc.

An in-between system that I find more useful is to convert numbers to a version of base-64 coded characters. That means everything is still using human readable text but 2 characters can store up to 4160 and 3 characters can store up to 266304. Converting those 2 or 3 character sequences back to a number is very much faster than using atoi().

A downside of NOT sending multi-byte data (i.e. an int or a long) as text is that you use up more space for very small values. For example a long will always be 4 bytes but you could send a value up to 999 in 3 chars.

...R

A simple way is to use the most significant bit to indicate the first byte of a packet. The same approach is successfully used in the MIDI protocol.

For example, if you want 16 sensors with a resolution of 10 bits, you could use the following scheme:

1 s s s s h h h   0 l l l l l l l

Where s is a number between 0 and 15 indicating which sensor the data is coming from, h are the three highest bits of the measurement, and l are the seven least significant bits of the measurement.

void send(uint8_t sensor, uint16_t value) {
  Serial.write(0x80 | ((sensor & 0x0F) << 3) | ((value >> 7) & 0x07));
  Serial.write(value & 0x7F);
}
void handleSensorMeasurement(uint8_t sensor, uint16_t value) {
  // Do something with the received data
}

void loop() {
  static uint8_t sensor = 0xFF;
  static uint16_t value;
  if (Serial.available()) {
    uint8_t databyte = Serial.read();
    if (databyte & 0x80) { // most significant bit is set
      sensor = (databyte >> 3) & 0x0F;
      value = (databyte & 0x07) << 7;
    } else if (sensor != 0xFF) { // most significant bit is not set, and first byte was received correctly
      value |= databyte;
      handleSensorMeasurement(sensor, value);
      sensor = 0xFF;
    }
  }
}

You could even optimize this by removing the last line (sensor = 0xFF), so if you send multiple measurements for one sensor, and if only the least significant bits change, you can omit the first byte.

If you want to send larger amounts of arbitrary binary data, I think an elegant solution is to use the SLIP protocol. It uses an end marker to separate messages, and an escape character if the end marker occurs in the data.
An explanation and a C implementation can be found in RFC1055.

Pieter

The asynchronous serial communication protocol (UART Communication) has been developed based on the motivation that the digits of a data byte would be transmitted in their ASCII forms. As a result, the numerical data will remain confined within 0x30 - 0x39 and 0x41 - 0x46. What a fantastic protocol -- you use a control character as a beginning of 'transmission frame', and you GO! You think that number of bytes are doubled; but, look for the Bd (even 2 MHz). If you are fond of transmitting bytes (aka binary), go for SPI or I2C at a very reduced transfer speed. It is not fair to torture engage (edit) the UART Communication (edit) to convey binary data (except the control bytes) for which it is not designed intended (edit).

GolamMostafa:
It is not fair to torture the UART to convey binary data (except the control bytes) for which it is not designed.

Why not?
The UART hardware can send any sequence of bytes perfectly fine. AFAIK, it doesn't care whether it's ASCII or anything else.

GolamMostafa:
The asynchronous serial communication protocol (UART Communication) has been developed based on the motivation that the digits of a data byte would be transmitted in their ASCII forms.

Sorry, but that is just incorrect.

The serial protocol is equally capable of transmitting any byte value from 0 to 255.

...R

Robin2:
The serial protocol is equally capable of transmitting any byte value from 0 to 255.

The I2C bus can execute very well the Write.print() function; but, many veterans of this Forum discourage the use of the I2C bus for string transmission.

If you write the receiver to expect a string, then print() can be used to send the value. But, that is NOT the normal way that I2C is used. I2C is used to write values, not strings, to other devices.

For the same reason, UART should not be used to transfer natural binary data though it has that capability.

There is also a serious technical problem in sending the binary coded data. The data could be anything from 0x00 to 0xFF; if so, which code to use as a 'start mark' of the 'transmission frame'? The code to be chosen (say, 0x3A for : which is widely used in Intel-hex frame transmission) might be a byte of the data frame. There is a chance for the reception program to get confused as to the detection of the 'start mark' because of the duplication of the codes. If sending ASCII codes for the data bytes, it is absolutely safe as the control bytes (from which start mark is taken) are outside 0x30 - 0x39 and 0x41 - 0x46 which are used for the ASCII representations of the digits of the binary data bytes.

The problem of sending binary data using UART Port has already been mentioned in Post#5.

Although my examples are based on text the exact same technique can be used with byte values if you are prepared not to use 2 values in your data - for example keep 254 and 255 for the start and end markers. If that forces you to treat 253 as the max rather than 255 you have only lost 2/255 or less than 1% of the range.

GolamMostafa:
For the same reason, UART should not be used to transfer natural binary data though it has that capability.

You can not draw that conclusion.

I do agree that it can make life more difficult. But it's not impossible.

GolamMostafa:
For the same reason, UART should not be used to transfer natural binary data though it has that capability.

You are just digging yourself deeper and deeper into a hole.

The Serial hardware doesn't care what you send - it just sees bytes, it has no idea whether the byte 65 is a number or code for the letter 'A'.

Or a code representing anything else. I am thinking of a coding system in which 65 will mean turn the LED on pin 33 OFF.

...R

I think the idea that because something has, or is perceived to have, been designed for this or that purpose, or to be used in this or that way, it should only be used as the designer intended, is very limiting indeed. Even more so with a product like Arduino which is intended for people to play with and find new ways of doing things on. UARTs have been around for a very long time, as has ASCII. I doubt the original designers of either are still around to ask about their intentions when they devised them, but I can't believe they intended that people many years on would not be finding new and different ways to use them to do whatever it is they are trying to do. Be creative, find new ways to use old tools, that's the way we progress, not by saying "you can't use that like that because it wasn't intended to be used in that way".

sterretje:
A master can ask for data. That would be the most reliable way as you will know when data is coming.

Protocol wise,
1)
Add a length indicator so the receiver (master) can count.
2)
Add a crc or checksum to check for corruption.
3)
You can still use start- and endmarkers; however they are no guarantee that you actually found the beginning or the end if the message. The length counting and a timeout mechanism are your friends in that case.

The request-send protocol is about the safest fast way I know of. The data target can wait for no incoming data for 2 chars time ( like > 160us at 115200 baud) before requesting a binary block. Length and CRC are industry standard for digital-interfaced meters.

The data target needs to be able to request a resend in the case of corrupt blocks. As someone used to mention often here, serial has no guarantees.

Robin2:
You are just digging yourself deeper and deeper into a hole.

This is your post and check what you have said about the problem of sending message in binary format.

Robin2:
Although my examples are based on text the exact same technique can be used with byte values if you are prepared not to use 2 values in your data - for example keep 254 and 255 for the start and end markers. If that forces you to treat 253 as the max rather than 255 you have only lost 2/255 or less than 1% of the range.

You have requested not to send 0xFE and 0xFF which are my valid data; I want to send them over the UART Port reliably.

GolamMostafa:
The asynchronous serial communication protocol (UART Communication) has been developed based on the motivation that the digits of a data byte would be transmitted in their ASCII forms.

I disagree.

UART stands for Universal Asynchronous Receiver/Transmitter. It’s not a communication protocol, but a physical circuit in a microcontroller (or a stand-alone IC). A UART’s main purpose is to transmit and receive serial data.

UART - also known as Asynchronous Communication Interface Adapter (ACIA) has been developed to get rid of parallel cables.

Serial communication is used for all long-haul communication and most computer networks, where the cost of cable and synchronization difficulties make parallel communication impractical. Serial computer buses are becoming more common even at shorter distances, as improved signal integrity and transmission speeds in newer serial technologies have begun to outweigh the parallel bus's advantage of simplicity.

ASCII has nothing to do with this.

a protocol - which is OP's question - then would need to handle how you know it's the start of a frame and end of a frame. There are many techniques to do so - involving magic numbers, CRC, etc

There is value in textual protocol, here is a good read on this.

GolamMostafa:
This is your post and check what you have said about the problem of sending message in binary format.
You have requested not to send 0xFE and 0xFF which are my valid data; I want to send them over the UART Port reliably.

This is just getting f***ing ridiculous. STOP. If you had read the next paragraph in the Reply from which you quoted you would have seen how to deal with that.

And, in any case, you are completely confusing the capability of the hardware with the way in which the hardware might be used in a particular case.

...R

Robin2:
This is just getting f***ing ridiculous. STOP. If you had read the next paragraph in the Reply from which you quoted you would have seen how to deal with that.

Depending on the context, even a word could be autonomous; never mind about a line or a paragraph or a page or a Chapter or the whole book. You want to win -- I say you are the winner.

If you have tagged ascii data, like "!T102A99B0C123.5\n\r", you have a certain amount of redundancy that allows you to notice errors (if you're paying attention.) If you get "!T76&^!@zz13Cxy\r" instead, you know something has gone wrong.

The usual "more efficient transmission" scheme would be to transmit a fixed-format "packet" of binary data, where you T, A, B, and C are always there, but indicated only by their position in the packet:

packetStart
one byte of T
one byte of A
one byte of B
four bytes of (floating point) C, in some previously-agreed-upon order
packetEnd

But you have no more redundancy, so you probably add some sort of error-checking code, like a checksum or CRC, as well.

Binary data works just fine through UARTs. Previously "widely used" full-binary protocols that ran over UARTs include X/Y/ZModem (CP/M and MSDOS era File transfers), AX.25 (Ham packet radio), SLIP (Serial Line IP), ARAP (Appletalk remote access), Xremote (don't ask), and PPP (like SLIP, only multiprotocol and many features.)

Occasionally, something "in between" the sender and receiver will object to certain binary bytes (0, Software flow control characters, End-of-line characters), or only have 7bit capabilities, or you want to allow for those "packetStart" and "packetEnd" to NEVER appear in the data itself, and you have to do something a bit tricky. But this is NOT due to the UART technology itself...

OTOH...

I want to try and reduce the characters/bytes being transmitted

Why? History has shown that text-style communication has a lot of advantages. Modern Internet protocols like HTML, XML, JSON, SMTP. HTTP, and many others are all text based for most of their transactions.
If you want faster transmission for most Arduino-scale things, it is easier and just as effective to bump up the baud rate, as it is to reduce the character sent.