Data loss when sending to native USB port (SerialUSB)

I noticed that when sending 64 or more bytes from a pc to the Due's native USB port, bytes get lost.
This sketch reproduces it:

static const int led = 13;

void setup() {
  // initialize serial:
  SerialUSB.begin(9600);
  pinMode(led, OUTPUT);     
}

void loop() {
  // if there's any serial available, read it:
  int i = 0;
  while (SerialUSB.available() > 0) {
    SerialUSB.read();
    i++;
  }
  SerialUSB.println(i);
  digitalWrite(led, HIGH);
  delay(250);
  digitalWrite(led, LOW);
  delay(250);
}

If you send say 80 bytes to the serial port (e.g. simply via an echo to /dev/ttyACM1), the sketch prints series of numbers like this : 0 0 0 63 0 0, which indicates that (at least) 17 bytes are lost.

It can easily be fixed in CDC.cpp: the code defines a ring buffer of size **CDC_**SERIAL_BUFFER_SIZE (==512). But the code that uses it uses the SERIAL_BUFFER_SIZE macro which is 64. So replacing SERIAL_BUFFER_SIZE with CDC_SERIAL_BUFFER_SIZE in CDC.cpp fixed the problem for me (I wanted to run ArduinoISP which requires receiving 128 byte messages from avrdude).

It will go wrong if you feed the due more than 511 bytes at once. Therefore it would be better to use USB's built in handshaking mechanism (make the device reply with NAK or NYET until it has room to store more incoming data). The same was done for the Leonardo. I could look into how to do that with the due's otg controller, but if someone already knows, that would save time...

PeterVH,

PeterVH:
It will go wrong if you feed the due more than 511 bytes at once. Therefore it would be better to use USB's built in handshaking mechanism (make the device reply with NAK or NYET until it has room to store more incoming data). The same was done for the Leonardo. I could look into how to do that with the due's otg controller, but if someone already knows, that would save time...

Thanks for the suggestion I've fixed the buffer macros. Can you point me out how the Leonardo implements the (NAK,NYET) handshaking?

The idea is to consume only as much data from the usb controller as you can store. As long as you don't have room for new data, the usb controller must be instructed to reply NACK or NYET to the host (it does that autonomously).

On the sam, the usb controller sends these NAKs/NYETs as long you don't "return" the fifo (or rather the memory bank(s) attached to it) to the the contoller. This is done by the udd_ack_fifocon(CDC_RX) function. There is an excellent diagram in the data sheet at pg. 1092 (40.5.2.13 Management of OUT Endpoints).

I gave it a try. In USBCore.cpp in USB_ISR():

 ...
#ifdef CDC_ENABLED
        if (Is_udd_endpoint_interrupt(CDC_RX))
        {
                udd_ack_out_received(CDC_RX);

                // Handle received bytes
                if (USBD_Available(CDC_RX))  // (this test is probably not needed)
                        SerialUSB.accept();

//              udd_ack_fifocon(CDC_RX);  // <-- don't return the fifo to the controller here, 
                                                               // do it from accept() if we could store all bytes
                  ...
        }

In CDC.cpp:

void Serial_::accept(void)
{
        ring_buffer *buffer = &cdc_rx_buffer;
        uint32_t i = (uint32_t)(buffer->head+1) % CDC_SERIAL_BUFFER_SIZE;

        // if we should be storing the received character into the location
        // just before the tail (meaning that the head would advance to the
        // current location of the tail), we're about to overflow the buffer
        // and so we don't write the character or advance the head.
        while (i != buffer->tail) {
                uint32_t c;
                if (!USBD_Available(CDC_RX)) {
                        udd_ack_fifocon(CDC_RX);
                        break;
                }
                c = USBD_Recv(CDC_RX);
                // c = UDD_Recv8(CDC_RX & 0xF);
                buffer->buffer[buffer->head] = c;
                buffer->head = i;

                i = (i + 1) % CDC_SERIAL_BUFFER_SIZE;
        }
}

In read(), after we consumed some data we must call accept again, thi might return the fifo to the controller:

int Serial_::read(void)
{
        ring_buffer *buffer = &cdc_rx_buffer;

        // if the head isn't ahead of the tail, we don't have any characters
        if (buffer->head == buffer->tail)
        {
                return -1;
        }
        else
        {
                unsigned char c = buffer->buffer[buffer->tail];
                buffer->tail = (unsigned int)(buffer->tail + 1) % CDC_SERIAL_BUFFER_SIZE;
                if (USBD_Available(CDC_RX))
                        accept();
                return c;
        }
}

It does not yet work correctly: it misses every 512th byte. The best test is to use Stimmer's loopback sketch Arduino Due - Serial speed? - #18 by stimmer - Arduino Due - Arduino Forum
Cat a file of e.g. 8KB to /dev/ttyACM1. Capture the reply also via cat:
cat /dev/ttyACM1 > txt. A diff shows you miss every 512th byte.

I don't think it is some off by one error in the circular buffer because if I define the CDC_SERIAL_BUFFER_SIZE as 64, I still loose each 512th byte.

I think it is somewhere in the code accessing the rx endpoint, but the low level api's are complex and there is some redundancy in them.

To be continued...
I thought I already share this as a starting point, maybe somebody else sees what is wrong...

You have a race condition - under certain circumstances accept() can be executing twice simultaneously in the main loop (called by read() ) and in the interrupt handler. This causes the data loss.

The following change greatly reduces the problem:

void Serial_::accept(void)
{
  static int r=0;
  if(r)return;
  r=1;
         ring_buffer *buffer = &cdc_rx_buffer;
        uint32_t i = (uint32_t)(buffer->head+1) % CDC_SERIAL_BUFFER_SIZE;

        // if we should be storing the received character into the location
        // just before the tail (meaning that the head would advance to the
        // current location of the tail), we're about to overflow the buffer
        // and so we don't write the character or advance the head.
        while (i != buffer->tail) {
                uint32_t c;
                if (!USBD_Available(CDC_RX)) {
                        udd_ack_fifocon(CDC_RX);
                        break;
                }
                c = USBD_Recv(CDC_RX);
                // c = UDD_Recv8(CDC_RX & 0xF);
                buffer->buffer[buffer->head] = c;
                buffer->head = i;

                i = (i + 1) % CDC_SERIAL_BUFFER_SIZE;
        }
   r=0;
       
}

It's still theoretically possible for accept() to get called twice even with that guard in place although it hasn't happened to me yet in over 20 megabytes of testing. There is probably a better way to solve the problem. But I think we are getting somewhere :slight_smile:

PeterVH, stimmer,

awesome! can't wait to check this one.

stimmer:
It's still theoretically possible for accept() to get called twice even with that guard in place

Using synchronization primitives could help in this case?
I'm talking about:

unsigned int __ldrex(volatile void *ptr);
int __strex(unsigned int val, volatile void *ptr);
void __clrex(void);

From the datasheet (paragraph 12.5.7):

12.5.7.1
A Load-Exclusive instruction
Used to read the value of a memory location, requesting exclusive access to that location.

12.5.7.2
A Store-Exclusive instruction
Used to attempt to write to the same memory location, returning a status bit to a register. If this
bit is:
0: it indicates that the thread or process gained exclusive access to the memory, and the write
succeeds,
1: it indicates that the thread or process did not gain exclusive access to the memory, and no
write is performed,
The pairs of Load-Exclusive and Store-Exclusive instructions are:
• the word instructions LDREX and STREX
• the halfword instructions LDREXH and STREXH
• the byte instructions LDREXB and STREXB.
Software must use a Load-Exclusive instruction with the corresponding Store-Exclusive
instruction.

It works great for me!

I'll merge this before the coming release 1.5.2.

Thank you guys!

Thanks, that's useful... I'm not sure I'm using the instructions exactly right but the following seems to work:

void Serial_::accept(void)
{
  static uint8_t r=0;
  int t=__LDREXB(&r);
  if(t)return;
  if(__STREXB(1,&r))return;
         ring_buffer *buffer = &cdc_rx_buffer;
        uint32_t i = (uint32_t)(buffer->head+1) % CDC_SERIAL_BUFFER_SIZE;

        // if we should be storing the received character into the location
        // just before the tail (meaning that the head would advance to the
        // current location of the tail), we're about to overflow the buffer
        // and so we don't write the character or advance the head.
        while (i != buffer->tail) {
                uint32_t c;
                if (!USBD_Available(CDC_RX)) {
                        udd_ack_fifocon(CDC_RX);
                        break;
                }
                c = USBD_Recv(CDC_RX);
                // c = UDD_Recv8(CDC_RX & 0xF);
                buffer->buffer[buffer->head] = c;
                buffer->head = i;

                i = (i + 1) % CDC_SERIAL_BUFFER_SIZE;
        }
   r=0;      
}

(cmaglie - I just saw your code, looks good, I'll try it out)

It's not amazingly fast (I am getting a 7MByte file through in about 50s) but it is reliable - no diff in 50MBytes of testing so far :slight_smile:

cmaglie:
It works great for me!

Fixed SerialUSB data handshake when host sends a lot of data (PeterVH, stimmer) by cmaglie · Pull Request #1255 · arduino/Arduino · GitHub

I'll merge this before the coming release 1.5.2.

Thank you guys!

I've been testing this code - no errors in 250Mbytes of data so far :smiley:

Update: No errors in 500Mbytes of data in ~ 1 hour :grin:

Update 2: All 888888888 bytes transferred in 1h48m24s, no errors 8)

Woohoo!
I just got back from work and noticed the problem is diagnosed, fixed, tested and committed in git-hub!
Thanks guys.

Nevertheless I still feel we should think about the approach taken to avoid entering accept() from the isr. This is what happens without the guards:

  • bytes arrive for the 1st time from the host
  • the isr fills up the serial rx buffer, it can't free the fifo because it can only write 511 bytes and it has 512 bytes available.
  • the sketch performs SerialUSB.read() so it reads 1 byte.
  • so we end up in accept() which calls USBD_Recv(). This function automatically frees the fifo if we read the last byte from it! I think this is where the problem lies. This causes a new irq to arrive while we are still in our loop and did not yet update the head ptr of the rx buffer.

Therefore I tried with the function UDD_Recv8(): it does not automatically free the fifo, so we can finish our loop and call udd_ack_fifocon(CDC_RX) when we are done accessing the state in rx buffer.

I could not see the problem yesterday, but now, after seeing your fix, I can. Following code works without guards:

void Serial_::accept(void)
{
        ring_buffer *buffer = &cdc_rx_buffer;
        uint32_t i = (uint32_t)(buffer->head+1) % CDC_SERIAL_BUFFER_SIZE;

        if (!USBD_Available(CDC_RX))
                return;

        // if we should be storing the received character into the location
        // just before the tail (meaning that the head would advance to the
        // current location of the tail), we're about to overflow the buffer
        // and so we don't write the character or advance the head.
        while (i != buffer->tail) {
                // UDD_Recv8() does not automatically free fifo
                uint32_t c = UDD_Recv8(CDC_RX & 0xF); 
                buffer->buffer[buffer->head] = c;
                buffer->head = i;

                if (!USBD_Available(CDC_RX)) {
                        udd_ack_fifocon(CDC_RX);
                        break;
                }
                i = (i + 1) % CDC_SERIAL_BUFFER_SIZE;
        }
}

Safer would be to disable the rx interrupt during accept(). The various low level api's like USBD_Recv() already do something like that via the LockEP class. (though I don't like the idea to use a class for this)

Let us further think about this...

It's not amazingly fast (I am getting a 7MByte file through in about 50s) but it is reliable

Indeed, consuming these bytes one by one is really inefficient. But now we have something reliable, we can think about optimization strategies.
B.t.w if we define CDC_SERIAL_BUFFER_SIZE as 513, would this be noticable? (then we have room to accept an entire fifo bank).

513-byte buffers might cause the compiler to calculate all those % operators by division. In any case I did try the code with 2048-byte buffers, but got data loss for some unknown reason. I'll have to look into that further (I can't think of any logical reason why it should happen). Shortening the buffer down to 4 bytes surprisingly didn't cause much of a performance hit.

We aren't being bottlenecked by write speed - I get nearly 10MByte/sec from the Due to the computer and transferred 800MBytes with perfect reliability.

Transferring more than one byte at a time is probably the best idea. I noticed there is a UDD_Recv(ep,*data,len) which can be used to get multiple bytes at once, and maybe we could have an overloaded SerialUSB::read(*data,len) too.

Update: PeterVH, I tried your fix without the guard but I can't get it to work reliably. It works for 100KB or so, then corruption, then nothing. Are there any other changes I have to make elsewhere?

Update: PeterVH, I tried your fix without the guard but I can't get it to work reliably. It works for 100KB or so, then corruption, then nothing.

I can confirm the behavior. I had only tested with an 8KB file. With an 170K file I have corruption too: it goes well for some time but then each byte gets sent several times.

I don't understand. I also tried with replacing the guard code with:

irqflags_t flags = cpu_irq_save();
... body of accept ...
cpu_irq_restore(flags);

The idea was instead of avoiding the isr to enter the routine, to avoid an isr alltogether by disabling interrupts. It does not work either.

The good thing is that I can also confirm the version in github works ok (tested with a 170KB file).

I will do tests with ArduinoISP also.

Fasten your seatbelts 8)

I have been trying to debug and simplify the code, and also add in multi-byte reads. The (modified) loopback test now runs at 4.3 MB/s (that's about 45 million baud). My 888888888-byte test passes through in about 3 min 30 sec. I also added in some code to control the LEDs. I did this by removing the troublesome ringbuffer code and using UDD_ calls directly (My reasoning was, since there is 2 banks of USB fifo for buffering, why do we need more?)

I do seem to be getting occasional errors but only about once every several gigabytes, which makes it hard to debug as I don't know if it's a problem with the code or the cable or with the way I am testing it. Testing so far indicates it is a problem with transmission (SerialUSB.write())

The new loopback test with multi-byte read looks like this:

void setup() {
  Serial.begin(115200);
  SerialUSB.begin(0);
}
int s=0;
int t=0;
void loop() {
  int l;
  byte b[2048]; // 512 might be enough
  if(l=SerialUSB.available()){
    l=SerialUSB.read(b,l);
    SerialUSB.write(b,l);
    s+=l;
  }
  if((millis()-t) > 1000){t=millis();Serial.println(s);}
}

I've attached the three changed USB files as there are many changes.

The way I am testing on linux is as follows:

seq 99999999 >o1
stty -F /dev/ttyACM1 sane raw -iexten -echo -echoe -echok -echoctl -echoke -onlcr min 1
time cat o1 >/dev/ttyACM1 & cat /dev/ttyACM1 >o2
# ctrl-c when it is finished
cmp o1 o2

I've also tested with gtkterm (which has been 100% reliable)

Please test thoroughly! I am especially interested to see if removing the ringbuffer causes performance degradation in any situation - I haven't found anything yet which runs slower despite having less buffering. Also I am unable to test on Windows or Mac.

CDC.cpp (7.83 KB)

USBAPI.h (6.99 KB)

USBCore.cpp (22.7 KB)

Hi Stimmer,

Would it be possible for you to make the same test but on Programming Port USB ?

Thank you, Albert

Albert: not really - programming port serial doesn't have a direct multi-byte read, it is too slow to need one. And an 888888888 byte test would take over 21 hours to run :slight_smile:

If you need a programming port loopback sketch use MultiSerialMega and connect pins 18 and 19. As for the 16u2 firmware it is almost the same as the Uno firmware and I have tested that to death 2 years ago - it can be tested by uploading Blink and linking pins 0 and 1.

Ok Stimmer I see now but I'm confused, in other threads there were issues with USB routines, are there just on native USB ?

As i've reported when moving an arduino MEGA code to arduino DUE, the USB link between Java running on my MacBook air and DUE would never work, in that case i was using USB programing port.

Maybe it would help someone makes clear thread on USB programming port and USB native port, different type of issues, have they been solved in new IDE...

I'm lost now :~

Hi Stimmer,

I found kicking out the circular buffer a great demonstration "out of the box" thinking.

I have tested your implementation with ArduinoISP. I think this is a good real life use scenario of the serial port, with lots of messages of different sizes (though relatively small) in both directions. I powered my leonardo from the due's 3v3 and modified ArduinoISP to use SerialUSB instead of Serial. It worked well: I could upload download the bootloader to/from the leonardo several times.

Nevertheless dropping the serial buffer also has consequences for feeding the due lots of small messages. Each message is received in a separate bank and you only have two of them. Take for example this shell script:

n=0
while true; do

        echo "$n" > /dev/ttyACM1
        echo $?
        sleep 1
        let "n = $n + 1"
done

I ran this sketch on the due:

static const int led = 13;
static const int button = 3;

void setup() {
  // initialize serial:
  SerialUSB.begin(9600);
  Serial.begin(115200);
  pinMode(led, OUTPUT);     
  pinMode(button, INPUT);
  digitalWrite(button, HIGH); 
}

void loop() {
  // if there's any serial available, read it:
  int i = 0;
  if (digitalRead(button)) {
    while (SerialUSB.available() > 0) {
      char c = SerialUSB.read();
      Serial.write(c);
      i++;
    }
  }
  Serial.print("i=");
  Serial.println(i);
  digitalWrite(led, HIGH);
  delay(500);
  digitalWrite(led, LOW);
  delay(500);
}

If you press the button for a few seconds, some of the messages are lost.
Odd enough the return code of echo is always 0.

I am not sure where the bytes get lost. And maybe it can be prevented with stty calls, but a circular buffer would prevent this. A circular buffer which you fill up with memcpy would also make sense as an implementation.

Interesting - I am not losing any data with that script. What happens for me is that echo blocks until ttyACM1 accepts the data. I expect that there is some mysterious stty setting which controls this.

However, it isn't really the right way to use a serial port in a script. The echo command opens the serial port, writes to it, then closes it again each time it executes. That's not what you'd do in, say, a C++ program, where you'd open the serial port once, write to it periodically, and close it at the end of the program. To do this in a shell script you'd use a fifo or a fd, for example:

exec 3<>/dev/ttyACM1
n=0
while true; do

        echo "$n" >&3
        echo $?
        sleep 1
        let "n = $n + 1"
done

This doesn't lose any data for me either, and it buffers the data on the computer rather than blocking.

What happens for me is that echo blocks until ttyACM1 accepts the data.

Well, that is what I expected to happen for me too when I wrote the script. (And on another pc (linux 3.5 vs. 3.0.0)), I did actually observe echo blocking). The point I wanted to make is that having no circular buffer, one could temporarily block a program on the pc.

But you are right. Closing and reopening the file every time causes the os to flush the file and this makes echo block. If a program would write to ttyACM1 in a normal way, the os would buffer the output... So it looks like we can indeed live without the buffer in the Due.

I read you have Albert's test running. Is it easy to test with SerialUSB?

Yes, Albert's program does work with my SerialUSB. In SimpleArduinoSerPro3 change "Serial." to "SerialUSB." in all 5 places. Then in setup() after SerialUSB.begin(BAUD_RATE); add this line: delay(1000);
I'll have a think about buffers. I'm sure I could reintroduce a buffer with not too much loss of speed. But right now I am sick at the sight of USB code :grin: So I'm going to take a break for a week or two and try some other projects which are more fun.