problem with Arduino Linux serial com - receiving bytes that were never sent

Hi all,

First post, long-time lurker.

I am tracking down some strange issues I came across when developing a serial protocol for communicating between an Arduino board (Uno in my case) and a Linux host (actually macOS but using the C Posix library that is the default on Linux and unix like systems).

I have added code examples and output to make this post easy to follow and to allow others to reproduce.

I have isolated the the problem into a very short sketch and a very short C program for debugging purposes.

  • sketch on Arduino: The sketch continuously sends the same byte over Serial to the host as fast as it can in an endless loop. The byte remains the same during the execution of the sketch. However, the next time the sketch runs, the next byte is used. For example, during the first run the byte 'a' is continuously send, during the second run the byte 'b' is continuously send, during the third run the byte 'c' and so on. This is done by storing the byte in eeprom and updating it on every run.
  • program on host: The program on the host opens the serial port (hence resetting the Arduino), sets the correct serial port options (8N1 raw mode), performs 4 reads dumping the data in the console, and closes the serial port.

One would (maybe naively) expect dat during the first run of the program on the host the byte 'a' is observed, during the second run the byte 'b', during the third run 'c', and so on. However, this is not the case. Although this behavior is frequently observed, I am also observing
1/ bytes in the current run that belong to the previous run, and
2/ bytes in the current run that have never been produced by the board.

Programs and output:

I first reset the sent byte in the eprom using the sketch below:

#include <EEPROM.h>
void setup() {
  EEPROM.write(0, 'a');
}
void loop() {}

This is the sketch I use to send the bytes. It reads the byte to send from eeprom, updates it for the next run of the sketch, and then sends the byte continuously and as fast as it can checking Serial.availableForWrite().

#include <EEPROM.h>
byte b;
void setup() {
  Serial.begin(9600);
  b = EEPROM.read(0);
  EEPROM.write(0, b + 1);
}
void loop() {
  while (Serial.availableForWrite() > 0) {
    Serial.write(b);
  }
}

This is the C program that is used on the host:

// to compile:
// cc -Wall serial.c -o serial
//
// to run:
// ./serial /dev/cu.usbmodem1421 (replace /dev/cu.usbmodem1421 with your device)

#include <assert.h>
#include <stdio.h>
#include <ctype.h>

#include <fcntl.h>
#include <termios.h>
#include <unistd.h>

// hexdump
// used to print a hex and ascii dump of the read data, code not relevant in this context
void hexdump(const void * buffer, size_t size)
{
  for (size_t o = 0; o < size; o += 16) {
    printf ("%0*lx ", 4, o);
    for (size_t i = o; i < o + 16; ++i) {
      if (i < size) {
        unsigned char uc = ((unsigned char *) buffer)[i];
        printf(" %02x", uc);
      }
      else printf("   ");
    }    
    printf("  ");
    for (size_t i = o; i < o + 16; ++i) {
      if (i < size) {
        char c = ((char *) buffer)[i];
        printf("%c", isprint(c) ? c : '?');
      }
      else printf(" ");
    }
    printf("\n");
  }
}

int main(int argc, char* argv[])
{
  assert(argc == 2);
  int rv;
  
  // open
  int fd = open(argv[1], O_RDWR | O_NOCTTY);
  assert(fd != -1);
  //usleep(1000); // uncommenting "fixes" the problem, see below
  struct termios termios;
  rv = tcgetattr(fd, &termios);
  assert(rv != -1);
  cfmakeraw(&termios);
  cfsetspeed(&termios, B9600);
  rv = tcsetattr(fd, TCSANOW, &termios);
  assert(rv != -1);
  rv = tcflush(fd, TCIOFLUSH); 
  assert(rv != -1);
  
  // 4 reads
  char buf[256];
  int nbyte;
  printf("\n");
  for (int j = 0; j < 4; ++j) {
    nbyte = read(fd, buf, sizeof(buf));
    assert(nbyte != -1);
    printf("read %d bytes, hexdump: \n\n", nbyte);
    hexdump(buf, nbyte);
    printf("\n");
  }
  
  // close
  rv = close(fd);
  assert(rv != -1);
}

I am frequently observing the behavior decribed in 1/, for example below where 'u' was expected but first some 't' was received (due to the setup very likely from the previous run of the sketch).

XXX-MacBook-Pro:computer XXX$ ./serial /dev/cu.usbmodem1421

read 111 bytes, hexdump: 

0000  74 74 74 74 74 74 74 74 74 74 74 74 74 74 74 74  tttttttttttttttt
0010  74 74 74 74 74 74 74 74 74 74 74 74 74 74 74 74  tttttttttttttttt
0020  74 74 74 74 74 74 74 74 74 74 74 74 74 74 74 74  tttttttttttttttt
0030  74 74 74 74 74 74 74 74 74 74 74 74 74 74 74 74  tttttttttttttttt
0040  74 74 74 74 74 74 74 74 74 74 74 74 74 74 74 74  tttttttttttttttt
0050  74 74 74 74 74 74 74 74 74 74 74 74 74 74 74 74  tttttttttttttttt
0060  74 74 74 74 74 74 74 74 74 74 74 74 74 74 74     ttttttttttttttt 

read 2 bytes, hexdump: 

0000  75 75                                            uu              

read 4 bytes, hexdump: 

0000  75 75 75 75                                      uuuu            

read 4 bytes, hexdump: 

0000  75 75 75 75                                      uuuu

And somewhat less frequently the behavior decribed in 2/ where bytes were received that were definitely never send by the Arduino (note the two f5 bytes in the first read that show as ? in the ascii part of the dump):

XXX-MacBook-Pro:computer XXX$ ./serial /dev/cu.usbmodem1421

read 212 bytes, hexdump: 

0000  75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75  uuuuuuuuuuuuuuuu
0010  75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75  uuuuuuuuuuuuuuuu
0020  75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75  uuuuuuuuuuuuuuuu
0030  75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75  uuuuuuuuuuuuuuuu
0040  75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75  uuuuuuuuuuuuuuuu
0050  75 75 75 f5 75 75 75 75 75 75 75 75 75 75 75 75  uuu?uuuuuuuuuuuu
0060  75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75  uuuuuuuuuuuuuuuu
0070  75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75  uuuuuuuuuuuuuuuu
0080  75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75  uuuuuuuuuuuuuuuu
0090  75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75  uuuuuuuuuuuuuuuu
00a0  75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75  uuuuuuuuuuuuuuuu
00b0  75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75  uuuuuuuuuuuuuuuu
00c0  75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 75  uuuuuuuuuuuuuuuu
00d0  75 75 75 f5                                      uuu?            

read 4 bytes, hexdump: 

0000  76 76 76 76                                      vvvv            

read 4 bytes, hexdump: 

0000  76 76 76 76                                      vvvv            

read 4 bytes, hexdump: 

0000  76 76 76 76                                      vvvv

As for 1/ I assume that there can be bytes on the line and in the buffers while the Arduino is being reset. So this might be normal behavior.

I did some further digging with ioctl calls and the data is actually arriving after some time after the open() call. A short sleep of 1000 us (!) seems to resolve this issue. This allows the data to arrive and the subsequent tcflush makes it disappear. Withouth the sleep the tcflush flushes the buffer before the data has arrived and it shows up at the first read(). But sill the usleep() feels bad and the time to sleep is just a wild guess.

I have no explanation for the strange bytes though. So my question is if anyone could give me more insight into all of this, especially the reason for receiving the bytes that have not been send. What could cause this?

All of this does imply that making a robust serial protocol requires careful design (error detection, synchronization, ...) and is definitely not an "arduino style beginner level" task. But again, I am not looking for help in this direction, I am looking for an explanation of the strange bytes. I am also planning to test on different hosts and with other boards.

Maybe I should also look into the code for the Arduio GUI serial monitor.

Keep the serial channel open for the whole communication and ony restart it if the communication fails.

I bet you will see no ghost characters in that setup (or you have a hardware problem on the serial line).

Did some more testing ...

  • I am getting the same behavior on a different Posix platform (Raspberry Pi). It therefore seems unlikely this is due to a damaged serial port at the side of the host.

  • I have also observed the same behavior with the Arduino IDE console monitor (I did need many tries). It therefore seems unlikely the C program is at fault. (Along the way I found out that if you open and close the serial monitor may times the Arduino IDE crashes ...)

So one possibility is that the Arduino board is damaged. But this is unlikely as well since all other sketches I upload - using that very serial port - seem to work fine.

So then this must be "normal" behavior. It seems that the opening of the serial connection and the reset of the board corrupt the data that is on the line. Does not sound crazy.

I can live with and work around that. I thought it was a bug. I am still not 100% convinced though. Might look into it some more.

At least this will be a usefull environment to test the error detection and framing of the serial protocol :wink:

@Whandall: That is indeed what I am plannig to do. I still need to disable the reset on serial connect on the board however. This is how I observed this behavior, every time I open() the board resets. I am not sure if it is possible to open() the port while disabling the DTR lines on Posix. So I guess I will have to modify the board or add a resistor / capacitor.

areslagae:
I still need to disable the reset on serial connect on the board however.

Why do you think you have to?

areslagae:
every time I open() the board resets.

That is expected behaviour and gives your communication a clean start point.

Just open the connection once.

Whandall:
Why do you think you have to?

In my application the Arduino is acting as a stand alone controller for a system. However, through it's serial interface (i) it reports about what it is doing upstream and (ii) it can accept external commands that control the system. I have several of these Arduino controllers connected to a single board computer (BBBlack or RPi) which in turn allows for higher level functionality, networking, ... The idea is indeed that they are permanently connected, but they should keep on functioning when the single board computer crashes, is rebooted, updated, ... Hence I am considering disabling the reset on connect. This is a permanent setup, so connect once does not work. At some point the single board computer will have to be reset.

Whandall:
That is expected behaviour and gives your communication a clean start point.

Well, no, that is my point exactly. There is definitely not a clean start point. These controllers continuously send status data. After a connect the board resets but I still get a bunch of data (up to several 100's of bytes) of the previous run. I can manage but a board reset on the open() call definitely is not a clear start. This is what I also expected to begin with which lead me to believe all of this was a bug.

If given explicit control over the reset procedure this could be somewhat more elegantly solved:

  • open port
  • pull DTR low (or high, don't remember)
  • board is in reset
  • wait and flush all data from serial line while board in reset, no new data is generated
  • pull DTR high
  • clean start point
    Not sure if this can be done though. You'd need to disable DTR functionality during the open() call.

What is currently happening is:

  • open port pulls DTR low and high resetting the device
  • data generated before reset is still coming in from wire or more likely from internal buffers. The fact that this is serial over usb probably does not help either.

You want to hardware-band-aid a bad design.

Sorry, but I don't want to help you go that direction.

Good luck with you project.

@Whandall:

I am definitely willing to consider other designs and I am definitely open to suggestions. I am here to learn ...

What exactly do you consider bad design? Disabling the reset on connect? Or the setup I am describing? What would you consider a good design in this case? Using a softserial perhaps and leaving the main serial unchanged?

areslagae:
What exactly do you consider bad design?

  • Opening and closing ports without need (this is what creates part of your problem).
    You are using C on the host, which can easily keep the connection open.
  • Piling up bytes in serial communication.
    100+ bytes from 9600 baud means 100 ms no read.

areslagae:
Hence I am considering disabling the reset on connect. This is a permanent setup, so connect once does not work. At some point the single board computer will have to be reset.

One way to avoid the reset when the PC opens the serial port is to connect to the Arduino Rx Tx and GND pins with a USB-TTL cable.

...R

@Whandall

Of course, I fully agree. My explanation must not have been clear, my apologies. The programs I posted are only to illustrate and test a very specific situation, they are definitely not the real code. I simply made the smallest programs that illustrate the issue at hand.
1/ The single board computer will simply keep the connection open as long as possible.
2/ The software on the single board computer has a separate thread for every serial port. Then probably message queues to communicate with other threads.
However, even in this case, there will be situations in which the single board computer will have to be rebooted or the software on it will have to be restarted. In that case I will have the issue I sketched above. Also in the case where the Arduino is reset.

@Robin2

Thanks for the suggestion. This is a better solution that disabling the reset. I was planning on expertimenting with a ARDUINO USB 2 SERIAL MICRO and/or an FTDI cable using a SoftwareSerial secondary port but it might indeed be better to simply use the Rx and Tx pins of Serial1. I was also considering to add a second serial only for debugging purposes since the primary serial will be used for comms.

The ARDUINO USB 2 SERIAL MICRO seems interesting since the firmware is open. I was also condidering to do a test with an FTDI cable to see of there are significant performance differences. It would be good to have a low latency in my application. I have not really tested but I seem to remember I was seeing round trip times for very small packets of 4 ms.

areslagae:
I was planning on expertimenting with a ARDUINO USB 2 SERIAL MICRO and/or an FTDI cable using a SoftwareSerial secondary port but it might indeed be better to simply use the Rx and Tx pins of Serial1. I was also considering to add a second serial only for debugging purposes since the primary serial will be used for comms.

The ARDUINO USB 2 SERIAL MICRO seems interesting since the firmware is open. I was also condidering to do a test with an FTDI cable to see of there are significant performance differences.

I don't understand that. What is the "ARDUINO USB 2 SERIAL MICRO"

It would be good to have a low latency in my application. I have not really tested but I seem to remember I was seeing round trip times for very small packets of 4 ms.

USB can perform poorly with small packets.IIRC it waits for 64 bytes or until a timeout occurs.

From a more general point of view, if you are trying to design a system in which the host single board computer (SBC) may need to be reset and (separately) (and perhaps not at the same time) one of more Arduinos may need to be reset then your software on both the SBC and on the Arduinos must allow for all of that. That can get quite complex. It may actually simplify the system if all the Arduinos are reset whenever the SBC is reset and an Arduino crash is solved simply by resetting the SBC. If there is a risk of data loss then some form of non-volatile memory could be used - but then that requires some special recovery code to gather that data and bring the system up to date without recovering the same data twice or three times.

...R

@Robin2:

Thanks for your suggestions.

With ARDUINO USB 2 SERIAL MICRO I mean this Arduino product:

The usb seral firmware is open and can be modified (e.g. for lower latency, looks like one can simply adjust the timeout timer) although I do not think it will be needed.

Indeed, software might get complex, but I think I can manage. The Arduino's should be running at all times since they are the primary controllers for the attached sensors/actuator system and the critical functionality (think e.g. a thermostat for a simple analogy). Withouth them the system is unusable. So I was thinking on using the watchdog timer to make sure that they are always responsive. The SBR only adds higher level functionality that is somewhat less critical (e.g. monitoring over network, remote control, some communication between Arduino boards for less critical functionality) (think, e.g. remotely controlling the termostat, setting all thermostats to a specific temp, graphing temperatures, ...in the same analogy). So temporary loss of that functionality is not extremely critical.

areslagae:
With ARDUINO USB 2 SERIAL MICRO I mean this Arduino product:
Accessories — Arduino Official Store

That seems to be just another version of a USB-TTL converter. AFAIK the 16U2 is the device used on the Uno and Mega for USB-TTL purposes.

If you want high-performance USB operation I suggest you use a Leonardo or Micro both of which use the 32U4 microprocessor and communicate over USB at the full USB speed. They also have the interesting side effect that they do NOT reset when the PC opens the Serial port.

...R