UNO serial latency

hey all

I'm fighting with some serial latency issues on my duemilanove arduino. I read that the new UNO should have lower latency - but what does this means?

no more funny ftdi-usb-buffer-or-not issues with the serial device? did someone did a serial speed connection test on duemilanove vs uno?

regards
michu

The USB packets are 64 bytes, at least in these cases. FTDI uses 62 bytes for data and 2 for flags. CDC (used by Uno's 8u2 chip) uses all 64 and a separate mechanism for flags.

When you send characters to the FTDI chip (at whatever baud rate you've configured) they need to be put into a USB packet. If you keep sending without any delays, every 62 will be sent together in 1 packet.

The common latency issue happens when you send a relatively short message that doesn't completely fit into 1 packet. The FTDI a timeout, where if it does not receive another character from you, it sends whatever bytes it has buffered in a less-than-full-size packet.

The FTDI chip's timeout can be configured by the driver, though it appears a default of 16 or 8 ms is common. I'm not sure how to change this setting, but I've seen info published about loading special drivers that can configure this setting. I'm sure you'll find that info if you search well. Since you're already using FTDI, this is the path that doesn't involve buying anything else.

Looking at the Uno's 8u2 firmware in "Arduino-usbserial.c", it appears timer0 is used to implement a similar function, though the way it's written seems very strange to me. That timer is used in 8 bit CTC mode with a divide-by-256 prescaler, which results in a 4.1 ms timeout with 16 MHz crystal.

If you wanted to reduce the latency, you could try recompiling with a faster prescaler setting (look for the line that writes to TCCR0B), or you could try more ambitious edits to the code.

The 8u2 firmware code seems very odd to me. Maybe I'm just missing something?

As the bytes are received (at some slow baud rate), they're piled up in a buffer. There's a "nearly full" trigger threshold of 96 bytes, where they are then copied to USB. There's no point leaving bytes in memory and not trying to push them out to a USB packet once 64 are buffered. That appears to be the way the code is designed. Luckily, that bad scenario seems to be impossible due to other design issues with the timer.

The timer appears to trigger every 4.1 ms. It does not get reset upon each byte received. So in addition to checking for 96 bytes buffered, the 8u firmware appears to send whatever is buffered every 4.1 ms.

At 115200 baud, you can transmit 47 characters in 4.1 ms. So it appears to be impossible to hit the 96 byte threshold. It also seems impossible to completely fill a 64 byte packet.

Since the timer isn't reset for each byte received, that 4.1 ms cycle could happen to end right as you've sent just a byte or two, or after many are buffered. It's impossible to know if a byte you've just sent will wait 4.1 ms or just narrowly made the end of a 4.1 ms window.

Well, unless I've missed something in the code? This seems like a really weird design. If anyone else knows better how it really works, I'd sure like to hear.

The good news for Uno, it seems, is you'll never have to wait longer than 4.1 ms from sending bytes to when they actually go on the USB bus (*but see below for USB host controller stuff). That's a lot better than the FTDI default at 8 or 16 ms.

Of course, you can edit the code and reflash the 8u2 chip. It's not as necessarily easy, but it is possible. Changing the FTDI setting may or may not be easier.

The rest of this message could probably be interpreted as a shameless plug for a product, and for full disclosure it's one I am affiliated with, so if that sort of thing offends, now would be the point to stop reading....

The other alternative you have for low latency would be the Teensy USB board using the Teensyduino add-ons for Arduino. On Teensy, Serial.print() writes directly to USB without any slow serial link. The bytes go directly into USB packets, which are always sent at 12 Mbit/sec, regardless of the baud rate setting.

Teensyduino, like the FTDI chip, implements a timeout. If you haven't written enough to completely fill a packet, it gets transmitted automatically about 3 ms later. However, because the USB is on-chip, you can use Serial.send_now() to make it transmit the partial packet immediately. The combination of fast writing to the USB packet buffer and immediate partial packet transmit is the absolute best you can achieve for low-latency.

Of course, USB is a shared bus. So "immediate" really means "the next occasion the PC's host controller allocates bandwidth and issues an IN token". Usually that is a very short time, but it can vary if other bandwidth heavy USB devices are active.

The final latency step happens on the PC, between the time the host controller chip has stored the packet in memory and issues an interrupt. Most drivers respond to the interrupt quickly, but the operating system may or may not actually run your application that is waiting for the data until some time later. That's entirely an issue on the operating system, but if latency is critical, you might need to look into real-time scheduling for your application.

But of course you first would want to look into reducing the FTDI timeout or Uno's 8u2 timer0 stuff, or Teensy's Serial.send_now() so your data isn't sitting in a buffer before and waiting many milliseconds being sent as a not-completely-full USB packet.

wow, now that I would call a huge reply :wink: - thanks.

I know about the ftdi usb buffer size - but i'm interested in a real life round trip time latency between duemillanove and uno, example:

send 92 bytes serial data to arduno, reply with a 4 byte serial packet. I have a round trip time of 20ms on a duemillanove - and would be interested in the time the uno needs.

Changing the FTDI setting may or may not be easier
I guess not on mac osx - there is a windows registry entry to change the timeout, but I didnt found anything for mac osx.

Teensy USB board...
that sounds interesting, like the latency time the uno needs.

again, thanks paul for your huge reply!
cheers

send 92 bytes serial data to arduno, reply with a 4 byte serial packet. I have a round trip time of 20ms on a duemillanove

If you post this code, I (and others) could give it a try and post the results. Remember we'd need the code on the PC/Mac side as well as the Arduino.

hey paul, that would be fantastic!

arduino firmware:
http://code.google.com/p/neorainbowduino/source/browse/trunk/arduinoFw/neoLed/neoLed.pde

you need processing and this library:
http://code.google.com/p/neorainbowduino/downloads/detail?name=neorainbowduino.jar&can=1&q=#makechanges

and this would be the test sketch:
http://code.google.com/p/neorainbowduino/source/browse/trunk/processingLib/src/com/neophob/lib/rainbowduino/test/TestRoundtrip.java

thanks in advance!

cheers

I tried running your processing code, but it seems to depend on libraries I don't have. Right now it's complaining about "com.neophob". I can run some tests, but I really don't have a lot of time to fiddle with getting this app running.

Could you export the test program as a stand alone app?

Ok, I did a little fiddling. Instead of processing, I wrote a tiny C program. I'll attach the code below.

It sends the 7 byte pin command and measures the time until the 4 byte ack is received.

I tested on Mac OS-X 10.5.8 on a 2.4 GHz (Intel) Macbook. Here's what I got:

For Duemilanove:

port /dev/cu.usbserial-A800daD3 opened, waiting for board to boot up
sending 7 bytes, read 4 bytes, elased: 11.75 ms
sending 7 bytes, read 4 bytes, elased: 15.96 ms
sending 7 bytes, read 4 bytes, elased: 15.96 ms
sending 7 bytes, read 4 bytes, elased: 15.96 ms
sending 7 bytes, read 4 bytes, elased: 15.87 ms
sending 7 bytes, read 4 bytes, elased: 16.02 ms
sending 7 bytes, read 4 bytes, elased: 15.96 ms
sending 7 bytes, read 4 bytes, elased: 15.96 ms
sending 7 bytes, read 4 bytes, elased: 15.96 ms
sending 7 bytes, read 4 bytes, elased: 15.96 ms

For Uno:

port /dev/cu.usbmodem411 opened, waiting for board to boot up
sending 7 bytes, read 4 bytes, elased: 3.56 ms
sending 7 bytes, read 4 bytes, elased: 3.96 ms
sending 7 bytes, read 4 bytes, elased: 3.96 ms
sending 7 bytes, read 4 bytes, elased: 3.97 ms
sending 7 bytes, read 4 bytes, elased: 3.97 ms
sending 7 bytes, read 4 bytes, elased: 3.96 ms
sending 7 bytes, read 4 bytes, elased: 3.96 ms
sending 7 bytes, read 4 bytes, elased: 4.98 ms
sending 7 bytes, read 4 bytes, elased: 3.96 ms
sending 7 bytes, read 4 bytes, elased: 3.96 ms

For Teensy (with Serial.send_now() added)

port /dev/cu.usbmodem12341 opened, waiting for board to boot up
sending 7 bytes, read 4 bytes, elased: 0.85 ms
sending 7 bytes, read 4 bytes, elased: 0.95 ms
sending 7 bytes, read 4 bytes, elased: 0.87 ms
sending 7 bytes, read 4 bytes, elased: 1.03 ms
sending 7 bytes, read 4 bytes, elased: 0.98 ms
sending 7 bytes, read 4 bytes, elased: 0.96 ms
sending 7 bytes, read 4 bytes, elased: 0.96 ms
sending 7 bytes, read 4 bytes, elased: 0.96 ms
sending 7 bytes, read 4 bytes, elased: 0.96 ms
sending 7 bytes, read 4 bytes, elased: 0.96 ms

Here is the (admittedly ugly and quickly tossed together) C program:

// compile with: gcc -O2 -Wall -o latency_test latency_test.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdarg.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/time.h>
#include <fcntl.h>
#include <poll.h>
#include <termios.h>
#include <unistd.h>

#define PORT "/dev/cu.usbserial-A800daD3"       // Duemilanove
//#define PORT "/dev/cu.usbmodem411"            // Uno
//#define PORT "/dev/cu.usbmodem12341"          // Teensy

//#define PORT "/dev/ttyUSB0"                   // Duemilanove on Linux
//#define PORT "/dev/ttyACM0"                   // Uno or Teensy on Linux
#define BAUD B115200

void die(const char *format, ...) __attribute__ ((format (printf, 1, 2)));


int main()
{
        int r, fd, count;
        struct termios tinfo;
        unsigned char buf[7];
        struct pollfd fds;
        struct timeval begin, end;
        double elapsed;

        fd = open(PORT, O_RDWR);
        if (fd < 0) die("unable to open port %s\n", PORT);
        if (tcgetattr(fd, &tinfo) < 0) die("unable to get serial parms\n");
        if (cfsetspeed(&tinfo, B115200) < 0) die("error in cfsetspeed\n");
        if (tcsetattr(fd, TCSANOW, &tinfo) < 0) die("unable to set baud rate\n");

        printf("port %s opened, waiting for board to boot up\n", PORT);
        sleep(3);

        for (count=0; count < 10; count++) {
                // send the ping request
                buf[0] = 1;
                buf[1] = 0;
                buf[2] = 1;
                buf[3] = 4;
                buf[4] = 0x10;
                buf[5] = 2;
                buf[6] = 0x20;
                printf("sending 7 bytes");
                gettimeofday(&begin, NULL);
                r = write(fd, buf, 7);
                if (r != 7) die("unable to write, r = %d\n", r);

                // wait for a responds
                fds.fd = fd;
                fds.events = POLLIN;
                poll(&fds, 1, 500);
                r = read(fd, buf, 4);
                gettimeofday(&end, NULL);
                printf(", read %d bytes", r);
                if (r != 4) die ("unable to read 4 bytes\n");

                elapsed = (double)(end.tv_sec - begin.tv_sec) * 1000.0;
                elapsed += (double)(end.tv_usec - begin.tv_usec) / 1000.0;
                printf(", elased: %.2f ms\n", elapsed);
        }
        close(fd);
        return 0;
}

void die(const char *format, ...)
{
        va_list args;
        va_start(args, format);
        vfprintf(stderr, format, args);
        exit(1);
}

hey paul, thanks you very much for your work. the results are very intressting, it looks like uno's latency is much better than the duemillanove latency. not (yet) as fast as the teensy - but close enough!

do you mind if i copy your results and put it in a blog post?

cheers

Sure, that's fine. It's ok to copy the code too if you like.

Something to keep in mind is this benchmark is only for the 7 byte TX, 4 byte RX case, and I only tested on Mac OS-X 10.5, and only for this specific C program. (I tried Linux briefly but had a lot of trouble getting Uno to work reliably). If you transmit 96 bytes, at 115200 baud between the chips, that'll add about 8.3 ms. 96 bytes might change things for Teensy too, since that will become more than 1 USB packet.

How the software writes I/O operations also can matter greatly. In this test, I did a single 7 byte write, a single poll, and a single 4 byte read. In other testing I did with Puredata a few months ago, results varied quite a lot on each operating system. The size of the actual writes from the app to operating system matter greatly (only OS-X will combine smaller I/O operations together... Windows and Linux will happily pass single byte writes to the USB host controller, making horribly inefficient use of the bandwidth). Processing may or may not do things efficiently. It's amazing how much difference it can make, especially on Windows due to what seems to be poor driver design. The point is differently designed software could attempt the same communication and get completely different results, especially if issues single-byte I/O operations.

So when blogging, please remember the caveat that this is a very narrow benchmark.

thanks again, if someone is interessted: http://www.neophob.com/2010/11/arduino-serial-latency/.

cheers and thanks paul!

Latency on virtual USB com ports is typically a tradeoff between CPU load and response time. That is low latency is likely to put significantly more load on the host CPU (Windows/Linux). For this reason manufacturers may choose a default latency that balances these concerns (load versus responsiveness).

On Windows, FTDI latency defaults to 16ms, but can easily be reconfigured (port properties) to any value from as low as 1ms to meet special requirements.

In short, this is driver related and not a limiting factor with Uno versus Duemillanove. Nor is this a matter of old versus new technology. It is not really hardware related at all - it is simply a matter of different default values used in the driver. If you need low latency with FTDI, just change the value in port properties to whatever your requirements dictate.

thanks ben to clarify this - however there are some differences between ftdi and Atmega8U2 regarding buffer handling - or not?

and you might be right about the ftdi driver on WINDOWS - but there is no "clean" solution on macosx for example. or did i miss something?

cheers

Below is an image of the FTDI port properties dialog on Windows. The driver will certainly have a similar latency timer implemented for Mac, but I don't know how you can get access to it.

another question, ben you wrote:

In short, this is driver related and not a limiting factor with Uno versus Duemillanove. Nor is this a matter of old versus new technology. It is not really hardware related at all - it is simply a matter of different default values used in the driver. If you need low latency with FTDI, just change the value in port properties to whatever your requirements dictate.

however on the arduino website was written (Dinner is Ready | Arduino Blog):

We replaced the aging FTDI chipset with a custom made usb-serial converter built with an Atmel ATmega8U2 this provides lower latency and doesn't require to install any drivers on mac and linux

so what true now? is the latency lower because of the better stock driver or are the any hardware related pros?

cheers

The lion's share of the total latency (with FTDI or Uno) is due to the USB-serial chip waiting for more bytes to arrive before it finally times out and sends a less-than-full packet.

It appears the FTDI chip and Uno's 8u2 work similarly, but the default is 16 ms vs 4 ms. It also looks like both simply send a packet every time interval instead of implementing an actual timeout. But I didn't throughly explore that on either. On Uno, where the source code is available, it's pretty clear that's how it works.

Teensy also implements a timeout (3 ms), which is a true timeout (is reset back to zero if you write more data rather than always flushing data on a fixed interval), but since the USB is on chip, you can manually flush the buffer the instant you know all of your response has been written. Sending the partial packet on command is the best you can do to achieve minimal latency (well, writing directly into the USB packate buffer as fast as your code can run, instead of 115200 baud, also helps...)

In theory, you could send the packet on command on Uno's 8u2 by connecting an extra wire between the '328 and 8u2 chip. You'd modify the 8u2 code to sense a change on that pin, and of course immediately send any partial packet, just like Serial.send_now() does on Teensy.

Or perhaps instead of connecting an extra wire, the 8u2 code could be modified to detect a specific "end of message" pattern in the data, or some other protocol-specific message framing, and send the packet as soon as possible.

I believe that is what was meant by the announcement on Uno, that because the 8u2 chip is programmable and the code is open source (it was published several days after release), you can modify Uno to work in ways that are simply impossible with FTDI (but already implemented on Teensy and easy to use from your sketch).

The tests with Teensy show what is probably the "best case" you could hope to achieve by completely eliminating this timeout. Maybe I'll investigate why it's still 1 ms (Teensy is definitely a lot faster than 1 ms), though I'm pretty sure it's a Mac USB host controller driver issue, where a new transfer can't happen until the next USB frame, even if the last one took far less than 1 ms.

I know this is a shameless plug... but if you need low latency, why not just try using Teensy? Not only is the latency 4 times less, but the cost is about half as much. You'd probably need to tweak pin numbers or other minor details, but I already ran your code in ping mode! (ok, end shameless plug)

Then again, it might be interesting to play with the 8u2 code.......

Or maybe someone knows a way to change the FTDI timeout on Mac OS-X ??