Data loss when sending to native USB port (SerialUSB)

Thanks, that's useful... I'm not sure I'm using the instructions exactly right but the following seems to work:

void Serial_::accept(void)
{
  static uint8_t r=0;
  int t=__LDREXB(&r);
  if(t)return;
  if(__STREXB(1,&r))return;
         ring_buffer *buffer = &cdc_rx_buffer;
        uint32_t i = (uint32_t)(buffer->head+1) % CDC_SERIAL_BUFFER_SIZE;

        // if we should be storing the received character into the location
        // just before the tail (meaning that the head would advance to the
        // current location of the tail), we're about to overflow the buffer
        // and so we don't write the character or advance the head.
        while (i != buffer->tail) {
                uint32_t c;
                if (!USBD_Available(CDC_RX)) {
                        udd_ack_fifocon(CDC_RX);
                        break;
                }
                c = USBD_Recv(CDC_RX);
                // c = UDD_Recv8(CDC_RX & 0xF);
                buffer->buffer[buffer->head] = c;
                buffer->head = i;

                i = (i + 1) % CDC_SERIAL_BUFFER_SIZE;
        }
   r=0;      
}

(cmaglie - I just saw your code, looks good, I'll try it out)

It's not amazingly fast (I am getting a 7MByte file through in about 50s) but it is reliable - no diff in 50MBytes of testing so far :slight_smile:

cmaglie:
It works great for me!

Fixed SerialUSB data handshake when host sends a lot of data (PeterVH, stimmer) by cmaglie · Pull Request #1255 · arduino/Arduino · GitHub

I'll merge this before the coming release 1.5.2.

Thank you guys!

I've been testing this code - no errors in 250Mbytes of data so far :smiley:

Update: No errors in 500Mbytes of data in ~ 1 hour :grin:

Update 2: All 888888888 bytes transferred in 1h48m24s, no errors 8)

Woohoo!
I just got back from work and noticed the problem is diagnosed, fixed, tested and committed in git-hub!
Thanks guys.

Nevertheless I still feel we should think about the approach taken to avoid entering accept() from the isr. This is what happens without the guards:

  • bytes arrive for the 1st time from the host
  • the isr fills up the serial rx buffer, it can't free the fifo because it can only write 511 bytes and it has 512 bytes available.
  • the sketch performs SerialUSB.read() so it reads 1 byte.
  • so we end up in accept() which calls USBD_Recv(). This function automatically frees the fifo if we read the last byte from it! I think this is where the problem lies. This causes a new irq to arrive while we are still in our loop and did not yet update the head ptr of the rx buffer.

Therefore I tried with the function UDD_Recv8(): it does not automatically free the fifo, so we can finish our loop and call udd_ack_fifocon(CDC_RX) when we are done accessing the state in rx buffer.

I could not see the problem yesterday, but now, after seeing your fix, I can. Following code works without guards:

void Serial_::accept(void)
{
        ring_buffer *buffer = &cdc_rx_buffer;
        uint32_t i = (uint32_t)(buffer->head+1) % CDC_SERIAL_BUFFER_SIZE;

        if (!USBD_Available(CDC_RX))
                return;

        // if we should be storing the received character into the location
        // just before the tail (meaning that the head would advance to the
        // current location of the tail), we're about to overflow the buffer
        // and so we don't write the character or advance the head.
        while (i != buffer->tail) {
                // UDD_Recv8() does not automatically free fifo
                uint32_t c = UDD_Recv8(CDC_RX & 0xF); 
                buffer->buffer[buffer->head] = c;
                buffer->head = i;

                if (!USBD_Available(CDC_RX)) {
                        udd_ack_fifocon(CDC_RX);
                        break;
                }
                i = (i + 1) % CDC_SERIAL_BUFFER_SIZE;
        }
}

Safer would be to disable the rx interrupt during accept(). The various low level api's like USBD_Recv() already do something like that via the LockEP class. (though I don't like the idea to use a class for this)

Let us further think about this...

It's not amazingly fast (I am getting a 7MByte file through in about 50s) but it is reliable

Indeed, consuming these bytes one by one is really inefficient. But now we have something reliable, we can think about optimization strategies.
B.t.w if we define CDC_SERIAL_BUFFER_SIZE as 513, would this be noticable? (then we have room to accept an entire fifo bank).

513-byte buffers might cause the compiler to calculate all those % operators by division. In any case I did try the code with 2048-byte buffers, but got data loss for some unknown reason. I'll have to look into that further (I can't think of any logical reason why it should happen). Shortening the buffer down to 4 bytes surprisingly didn't cause much of a performance hit.

We aren't being bottlenecked by write speed - I get nearly 10MByte/sec from the Due to the computer and transferred 800MBytes with perfect reliability.

Transferring more than one byte at a time is probably the best idea. I noticed there is a UDD_Recv(ep,*data,len) which can be used to get multiple bytes at once, and maybe we could have an overloaded SerialUSB::read(*data,len) too.

Update: PeterVH, I tried your fix without the guard but I can't get it to work reliably. It works for 100KB or so, then corruption, then nothing. Are there any other changes I have to make elsewhere?

Update: PeterVH, I tried your fix without the guard but I can't get it to work reliably. It works for 100KB or so, then corruption, then nothing.

I can confirm the behavior. I had only tested with an 8KB file. With an 170K file I have corruption too: it goes well for some time but then each byte gets sent several times.

I don't understand. I also tried with replacing the guard code with:

irqflags_t flags = cpu_irq_save();
... body of accept ...
cpu_irq_restore(flags);

The idea was instead of avoiding the isr to enter the routine, to avoid an isr alltogether by disabling interrupts. It does not work either.

The good thing is that I can also confirm the version in github works ok (tested with a 170KB file).

I will do tests with ArduinoISP also.

Fasten your seatbelts 8)

I have been trying to debug and simplify the code, and also add in multi-byte reads. The (modified) loopback test now runs at 4.3 MB/s (that's about 45 million baud). My 888888888-byte test passes through in about 3 min 30 sec. I also added in some code to control the LEDs. I did this by removing the troublesome ringbuffer code and using UDD_ calls directly (My reasoning was, since there is 2 banks of USB fifo for buffering, why do we need more?)

I do seem to be getting occasional errors but only about once every several gigabytes, which makes it hard to debug as I don't know if it's a problem with the code or the cable or with the way I am testing it. Testing so far indicates it is a problem with transmission (SerialUSB.write())

The new loopback test with multi-byte read looks like this:

void setup() {
  Serial.begin(115200);
  SerialUSB.begin(0);
}
int s=0;
int t=0;
void loop() {
  int l;
  byte b[2048]; // 512 might be enough
  if(l=SerialUSB.available()){
    l=SerialUSB.read(b,l);
    SerialUSB.write(b,l);
    s+=l;
  }
  if((millis()-t) > 1000){t=millis();Serial.println(s);}
}

I've attached the three changed USB files as there are many changes.

The way I am testing on linux is as follows:

seq 99999999 >o1
stty -F /dev/ttyACM1 sane raw -iexten -echo -echoe -echok -echoctl -echoke -onlcr min 1
time cat o1 >/dev/ttyACM1 & cat /dev/ttyACM1 >o2
# ctrl-c when it is finished
cmp o1 o2

I've also tested with gtkterm (which has been 100% reliable)

Please test thoroughly! I am especially interested to see if removing the ringbuffer causes performance degradation in any situation - I haven't found anything yet which runs slower despite having less buffering. Also I am unable to test on Windows or Mac.

CDC.cpp (7.83 KB)

USBAPI.h (6.99 KB)

USBCore.cpp (22.7 KB)

Hi Stimmer,

Would it be possible for you to make the same test but on Programming Port USB ?

Thank you, Albert

Albert: not really - programming port serial doesn't have a direct multi-byte read, it is too slow to need one. And an 888888888 byte test would take over 21 hours to run :slight_smile:

If you need a programming port loopback sketch use MultiSerialMega and connect pins 18 and 19. As for the 16u2 firmware it is almost the same as the Uno firmware and I have tested that to death 2 years ago - it can be tested by uploading Blink and linking pins 0 and 1.

Ok Stimmer I see now but I'm confused, in other threads there were issues with USB routines, are there just on native USB ?

As i've reported when moving an arduino MEGA code to arduino DUE, the USB link between Java running on my MacBook air and DUE would never work, in that case i was using USB programing port.

Maybe it would help someone makes clear thread on USB programming port and USB native port, different type of issues, have they been solved in new IDE...

I'm lost now :~

Hi Stimmer,

I found kicking out the circular buffer a great demonstration "out of the box" thinking.

I have tested your implementation with ArduinoISP. I think this is a good real life use scenario of the serial port, with lots of messages of different sizes (though relatively small) in both directions. I powered my leonardo from the due's 3v3 and modified ArduinoISP to use SerialUSB instead of Serial. It worked well: I could upload download the bootloader to/from the leonardo several times.

Nevertheless dropping the serial buffer also has consequences for feeding the due lots of small messages. Each message is received in a separate bank and you only have two of them. Take for example this shell script:

n=0
while true; do

        echo "$n" > /dev/ttyACM1
        echo $?
        sleep 1
        let "n = $n + 1"
done

I ran this sketch on the due:

static const int led = 13;
static const int button = 3;

void setup() {
  // initialize serial:
  SerialUSB.begin(9600);
  Serial.begin(115200);
  pinMode(led, OUTPUT);     
  pinMode(button, INPUT);
  digitalWrite(button, HIGH); 
}

void loop() {
  // if there's any serial available, read it:
  int i = 0;
  if (digitalRead(button)) {
    while (SerialUSB.available() > 0) {
      char c = SerialUSB.read();
      Serial.write(c);
      i++;
    }
  }
  Serial.print("i=");
  Serial.println(i);
  digitalWrite(led, HIGH);
  delay(500);
  digitalWrite(led, LOW);
  delay(500);
}

If you press the button for a few seconds, some of the messages are lost.
Odd enough the return code of echo is always 0.

I am not sure where the bytes get lost. And maybe it can be prevented with stty calls, but a circular buffer would prevent this. A circular buffer which you fill up with memcpy would also make sense as an implementation.

Interesting - I am not losing any data with that script. What happens for me is that echo blocks until ttyACM1 accepts the data. I expect that there is some mysterious stty setting which controls this.

However, it isn't really the right way to use a serial port in a script. The echo command opens the serial port, writes to it, then closes it again each time it executes. That's not what you'd do in, say, a C++ program, where you'd open the serial port once, write to it periodically, and close it at the end of the program. To do this in a shell script you'd use a fifo or a fd, for example:

exec 3<>/dev/ttyACM1
n=0
while true; do

        echo "$n" >&3
        echo $?
        sleep 1
        let "n = $n + 1"
done

This doesn't lose any data for me either, and it buffers the data on the computer rather than blocking.

What happens for me is that echo blocks until ttyACM1 accepts the data.

Well, that is what I expected to happen for me too when I wrote the script. (And on another pc (linux 3.5 vs. 3.0.0)), I did actually observe echo blocking). The point I wanted to make is that having no circular buffer, one could temporarily block a program on the pc.

But you are right. Closing and reopening the file every time causes the os to flush the file and this makes echo block. If a program would write to ttyACM1 in a normal way, the os would buffer the output... So it looks like we can indeed live without the buffer in the Due.

I read you have Albert's test running. Is it easy to test with SerialUSB?

Yes, Albert's program does work with my SerialUSB. In SimpleArduinoSerPro3 change "Serial." to "SerialUSB." in all 5 places. Then in setup() after SerialUSB.begin(BAUD_RATE); add this line: delay(1000);
I'll have a think about buffers. I'm sure I could reintroduce a buffer with not too much loss of speed. But right now I am sick at the sight of USB code :grin: So I'm going to take a break for a week or two and try some other projects which are more fun.

PeterVH:
I read you have Albert's test running. Is it easy to test with SerialUSB?

Please note that the real author of SerPro is Alvaro (on arduino forum Arduino Forum) who build some interesting applications here, one being arduinoscope running on UNO and MEGA, see old posts on arduino forum like http://arduino.cc/forum/index.php/topic,7762.0.html

The topic of secure USB serial communication with arduino's was always tricky so Alvaro came up with SerPro. What happened is that my scientific and extreme electronics project required very safe RX/TX error-free so Alvaro added 16 CRC's along with HDLC protocol and other stuff as we both are telco engineers with critical software.

For some of you interested in non-java SerPro like GTK or C or C++, you'll find a mine of info directly at Alvaro's github GitHub - alvieboy/arduino-serpro: Serial Protocol (packet/frame based) library for Arduino intercommunication

The IDE version i've published on my website has removed the arduino-oscilloscope hence not using anymore ADC's link.

In fact, Alvaro is building a new arduino tech based on ZPU, namely you create your own softcore uC inside a FPGA with dedicated instructions, like emulate part of the Atmega2560 you want keep and build low level instructions not found in the market
http://www.alvie.com/zpuino/

Regarding ElectroShaman.jar, I've made for my own projects and might publish on github if there is more interest.

stimmer:
Fasten your seatbelts 8)

I have been trying to debug and simplify the code, and also add in multi-byte reads. The (modified) loopback test now runs at 4.3 MB/s (that's about 45 million baud). My 888888888-byte test passes through in about 3 min 30 sec. I also added in some code to control the LEDs. I did this by removing the troublesome ringbuffer code and using UDD_ calls directly (My reasoning was, since there is 2 banks of USB fifo for buffering, why do we need more?)

I do seem to be getting occasional errors but only about once every several gigabytes, which makes it hard to debug as I don't know if it's a problem with the code or the cable or with the way I am testing it. Testing so far indicates it is a problem with transmission (SerialUSB.write())

The new loopback test with multi-byte read looks like this:

void setup() {

Serial.begin(115200);
 SerialUSB.begin(0);
}
int s=0;
int t=0;
void loop() {
 int l;
 byte b[2048]; // 512 might be enough
 if(l=SerialUSB.available()){
   l=SerialUSB.read(b,l);
   SerialUSB.write(b,l);
   s+=l;
 }
 if((millis()-t) > 1000){t=millis();Serial.println(s);}
}




I've attached the three changed USB files as there are many changes.

The way I am testing on linux is as follows:


seq 99999999 >o1
stty -F /dev/ttyACM1 sane raw -iexten -echo -echoe -echok -echoctl -echoke -onlcr min 1
time cat o1 >/dev/ttyACM1 & cat /dev/ttyACM1 >o2

ctrl-c when it is finished

cmp o1 o2



I've also tested with gtkterm (which has been 100% reliable)

Please test thoroughly! I am especially interested to see if removing the ringbuffer causes performance degradation in any situation - I haven't found anything yet which runs slower despite having less buffering. Also I am unable to test on Windows or Mac.

Sorry to revive an old topic, but,

How do you achieve that speed? (4.3MB's) ?

I dont have linux to test your sending-code but i have one made on VB.NET which is fairly simple, made a USB packet of 100KB and receive on the arduino, it takes 700ms to do so, which matches this tests:

http://dangerousprototypes.com/2013/06/03/usb-virtual-serial-benchmarks/

That say that the arduino serialUSB speed is around 110/130kbs and it matches my speed.

My receiving code is:

void setup() {
  // put your setup code here, to run once:
      SerialUSB.begin(1200000);
      delay(2000);
      //Wait for start-mark packet
      while (SerialUSB.available() < 1){}

      unsigned long ChronoOne = micros();      
      int readed = 0;
      int total_bytes = 100000;      
      while (readed < total_bytes)
      {
          while (SerialUSB.available() > 1){SerialUSB.read();readed = readed + 1;}
      }
      
      ChronoOne = micros() - ChronoOne;
      SerialUSB.println("T:" + String(ChronoOne));    //Total bytes received           
      SerialUSB.println("R:" + String(readed));         //Time it took
      delay(50000);
}

void loop() {
  // put your main code here, to run repeatedly: 
  
}

Sending code on VB.NET

        Dim dt1970 = DateTime.Now

        Dim bArraySend2(1) As Byte
        bArraySend2(0) = 1
        SerialPort1.Write(bArraySend2, 0, bArraySend2.Length) 'Start Packet


        Dim bArraySend(100012) As Byte
        bArraySend(0) = 255
        bArraySend(1) = 47
        SerialPort1.Write(bArraySend, 0, bArraySend.Length)

        Dim now = DateTime.Now
        MsgBox("done:" & (now - dt1970).TotalMilliseconds.ToString())

I cant get any faster than this, any ideas? Using 1.5.2

Did you replace the three files Stimmer attached to his post?

I can't explain it any better than I already did - I modified the Arduino source code to add a multi-byte read method and removed some unnecessary buffering. The speedup comes from using multi-byte reads, how that works is explained quite well on the blog off the page you linked to. Note that I have overloaded the read method instead of overriding readBytes, the signature is the same as readBytes though.

I don't think my modification would get accepted into a future Arduino release as it is too much of a change. But it is interesting to note the benchmark results, at the moment the Due is 8x slower than a Teensy, with my code it would be over 4x faster :slight_smile: (The reason is that the Due can use USB 2.0 High Speed)

I've modified my code so that readBytes now also uses the multi-byte read and now Paul Stoffregen's benchmark will run as intended. Here are the results:

~/arduino/sketchbook/usb_serial_receive/host_software$ ./receive_test /dev/ttyACM0 
port /dev/ttyACM0 opened
ignoring startup buffering...
ignoring startup buffering...
ignoring startup buffering...
ignoring startup buffering...
ignoring startup buffering...
Bytes per second = 7385524
Bytes per second = 7273609
Bytes per second = 7414731
Bytes per second = 7309051
Bytes per second = 7339450
Bytes per second = 7260407
Bytes per second = 7387343
Bytes per second = 7384615
Bytes per second = 7293946
Bytes per second = 7390073
Bytes per second = 7265682
Bytes per second = 7386434
Bytes per second = 7385524
Bytes per second = 7275373
Bytes per second = 7384615
Average bytes per second = 7342425

$) $) $) $) $)
I've reattached the 3 files that need to be changed (in the arduino/hardware/arduino/sam/cores/arduino/USB directory)

USBAPI.h (7.07 KB)

CDC.cpp (7.83 KB)

USBCore.cpp (22.7 KB)

duh, sorry, i thought this was implemented in the current version since this thread was from jan.

Alright, i used the new files and with your read-example it works alright, but with the original read method (reading 1 by 1, my posted receiving-code) i only receive 512b out of 100k and gets stuck, i assume this might be because every time i read the buffer with read() or read(b,1) it does not read just 1 byte and it drops all the data received so i end up with only 512b out of 100k.

Is there a way to make it backwards compatible with the current read method, so we can use the old-read-1-by-1 and when needed fast speed for fast transfer use this one