Why is Uploading any Sketch to the Due so Slow?

After compiling, of course, the Upload process goes very slowly.
I am using the Programming USB Port.

Even small 10k (after compiling) are slow, and larger (say 100k)
sketches are... very slow. Wait, wait,... for flashing, and then
wait, wait,... again for verification.

Is the baud rate set to 9600 for this flashing?

I use the Serial Monitor at 115200 baud, and that seems to
works just fine, after I set the baud rate in terms Serial Monitor
to match the Serial.begin(115200) in my shetch.

Is there any way to make the Upload of the Script faster?
Thans, Gary

I’ve been curious about this too. Since it’s a Sunday morning, when I’m theoretically not actually working, I hooked up my scope to take a quick look.

Here’s the first activity. Those bits are about 11 us wide, so it sure looks like 115200 baud.

Here’s the first 1 second of activity. Lots of little bursts of something happening, but long dead times between them.

It gets much more interesting about 2.1 seconds after the first falling edge. This looks like it’s probably the actual data transfer. Even then, check out the lengthy gaps. It’s spending almost as much time waiting as it is actually moving bytes… and that’s after it took its time (2.1 seconds) to even begin sending.

My guess is the slowness comes from several sources.

Obviously it's not even beginning data transfer for 2 seconds. Why, I have no idea.

My guess is those gaps are latency added mostly by the firmware in the 16u2 chip. I believe the upload protocol, which is created by Atmel and permanently burned into a non-upgradeable ROM inside the SAM3X chip, involves a command-response approach. That's never great, since there's always some latency (especially on Windows) from the USB frame times. But the 16u2 probably adds much more, because it probably waits for a timeout before sending any buffered serial data as a partial USB packet.

The protocol itself involves ascii encoding of data, and 115200 is only 11 kbytes/sec, so even without the dead times, the speed can't be great with serial.

Upload protocols and speeds are something of an obsession of mine. For Teensy 1.0 & 2.0, I used USB control transfers, but also still a 1-at-a-time approach requiring the control transfer's final ACK. That's reasonably fast, but still far from optimal. For Teensy 3.0, I tried a new approach that allows streaming with substantial buffering by the bootloader, and an ACK/NAK approach to allow the transmitter to sense the board's ability to accept more data. It turned out a lot of the limitation in speed was due to the operating system's latency in scheduling the userspace program to run. The streaming, buffering and ACK/NAK solves that problem and lets the upload happen at a speed paced only by the flash write timing (if you try a Teensy3, I think you'll be impressed how fast 100K uploads). Still, even with all those measures, detecting the request to upload, disconnecting from the USB and re-enumerating are slow, taking about 1 to 2 seconds. At least with Teensy they are, since the USB disconnects. I have a few ideas about speeding that up... but they're extremely difficult. Did I mention I tend to obsess about upload protocols and speeds?

Anyway, for Arduino Due, the upload speed could probably be improved considerably if someone put a lot of programming work into making the 16u2 chip more aware of the upload protocol. Actually, it probably has no need to be aware of the PC-to-Due stream... it's the Due-to-PC responses that it could recognize. If it could detect those and quickly transmit a partial USB packet, rather than sitting there waiting for more data, you'd probably see substantial boost in speed. It might also be possible to increase the baud rate? I don't know what baud rates Atmel's bootloader can support, but the 16u2 is theoretically capable of up to 2 Mbit/sec. In practice, 16 MHz AVR can do about 0.5 Mbit/sec with well written code in C while also polling the USB stuff.

But ultimately, some of the slowness is the non-optimal protocol Atmel designed, both the non-binary data format and the command-response nature. Unfortunately, there's nothing you can do about those issues, since the bootloader is permanently burned into a ROM on the chip that can never be upgraded.

Obviously it's not even beginning data transfer for 2 seconds. Why, I have no idea.

This is probably the time it takes to erase the entire Flash in the Due.

Unfortunately these delays are all workarounds to fix some upload issues with SAMBA bootloader, and the bootloader itself is burned into the SAM3X ROM and cannot be changed in any way.

If you want a complete list of the patches to bossac you can give a look here:

Atmel actually use another trick to improve upload speed in their client: they use SAMBA to upload a small app into the SAM3X SRAM, and afterward run this app, that takes over the CPU and do the real flashing in an efficient way.

BTW its quite complex and needs some stack machinery to work. Arduino is not going to change the way code is uploaded soon.

When Massimo gave me one of the early Due betas (Maker Faire in May 2011), I remember playing with it that first week before the beta site opened. I didn’t know about bossac. I didn’t have ANY code from anyone. I only had Atmel’s datasheet. So as a first experiment starting from nothing, I wrote this shell script to blink some LEDs:

#! /bin/sh

# SAM-BA Blinky - Blink LEDs Arduino Due's on Digital 0 to 5 pins
# Written by Paul Stoffregen, paul@pjrc.com
# this code is in the public domain



./slowecho "W400E0EE4,50494F00#" > $port
./slowecho "W400E0E00,00000200#" > $port
./slowecho "W400E0E34,00000200#" > $port
./slowecho "W400E0E00,00000100#" > $port
./slowecho "W400E0E10,00000100#" > $port
./slowecho "W400E0E10,00000200#" > $port
./slowecho "W400E1010,02000000#" > $port
./slowecho "W400E1210,10000000#" > $port
./slowecho "W400E0E10,20000000#" > $port
./slowecho "W400E1210,02000000#" > $port

while [ 1 ]
  ./slowecho $digitalWrite_0_high > $port
  usleep $pause
  ./slowecho $digitalWrite_1_high > $port
  usleep $pause
  ./slowecho $digitalWrite_2_high > $port
  usleep $pause
  ./slowecho $digitalWrite_3_high > $port
  usleep $pause
  ./slowecho $digitalWrite_4_high > $port
  usleep $pause
  ./slowecho $digitalWrite_5_high > $port
  usleep $pause
  ./slowecho $digitalWrite_0_low > $port
  usleep $pause
  ./slowecho $digitalWrite_1_low > $port
  usleep $pause
  ./slowecho $digitalWrite_2_low > $port
  usleep $pause
  ./slowecho $digitalWrite_3_low > $port
  usleep $pause
  ./slowecho $digitalWrite_4_low > $port
  usleep $pause
  ./slowecho $digitalWrite_5_low > $port
  usleep $pause

I remember spending several hours with it sort-of working, but crashing. It turned out you can’t just send stuff to Atmel’s SAM-BA bootloader at native USB speed. It crashes!

So I wrote this little “slowecho” replacement for normal unix “echo”. With this, the LEDs blink.

// slowecho - like echo, but slowly, for certain bootloaders that can't
// accept data quickly, even when running on a fast processor and
// communicating with USB protocols that have end-to-end flow control.
// this code is in the public domain
// compile with:
//    gcc -Wall -O2 -o slowecho slowecho.c

#include <stdio.h>
#include <unistd.h>

int main(int argc, char **argv)
        const char *p;

        if (argc < 2) return 0;
        for (p = argv[1]; *p; p++) {
        return 0;

Looks like Atmel never fixed the bugs in their bootloader?

Sadly, not. And the worst thing is that if they even do it right now, there is a ton of SAM3X with the old one burned already in the wild...

If Atmel ever did fix it, maybe bossac could begin by detecting the bootloader version?

The idea of uploading an optimized bootloader to RAM is actually pretty crafty. If someone ever did go to all that trouble and added it all to a new version of bossac, wouldn’t you consider accepting the contribution? Of course, I think the odds of anyone ever going to all that trouble are pretty slim. But maybe someone could convince Atmel to release their bootloader helper code under an open source license, so someone could use it as a starting point?

But really, I just obsess too much about data transfer speeds and protocols. I should probably stop now and get back to more urgent work…