Teensy 3.0

LOL, I'm just now seeing this thread.... been far too busy shipping the rewards and writing software!

I was wondering if anyone would ever notice those C++ classes?! I did some looking at libraries that use SPI, and sadly most of them directly access the AVR registers. The official SPI library arrived relatively late in the development of Arduino, so it hasn't been widely adopted. It also changed its own API at least once, causing at least one library author to dump it and go directly to the registers. The existing SPI library isn't much of an abstraction (eg, able to support the fifo, dma, or automatic chip select signals). Fortunately, the compiler optimizes away pretty much all of the C++ stuff because it's inline functions. The SPCR part isn't highly efficient, but the data register and status flag compile to the equivalent registers accesses. It was pretty painful having to clear the fifo every time SPDR is written, but that's necessary to faithfully emulate the AVR registers.....

For your sdfat library, at least making good use of the fifo should be much faster. Would you prefer to put the Freescale registers directly into your sdfat library, or work with the a new SPI library that supports the fifos and other features (and might be adaptable to other new chips with similar SPI features)?

Paul, I am doing a major redesign of SdFat to use better caching and faster SD commands so large writes/reads will be much faster.

I plan to support SPI and 4-bit SDIO on various Cortex M chips. I also want to make SdFat RTOS friendly when using DMA.

I would love to have a better low level SPI library for each chip.

I need a way to restore the SPI speed and mode each time I access the SD. I need single byte read and write for sending commands, receiving status, and polling for busy.

I need fast block read and write routines. These could use a fifo or DMA.

I am ready to start testing with some prototype SPI functions I have done but for some reason my Teensy 3.0 has not arrived in California yet.

Edit: I need the equivalent of these AVR functions.

//------------------------------------------------------------------------------
/**
 * Initialize hardware SPI
 * Set SCK rate to F_CPU/pow(2, 1 + spiRate) for spiRate [0,6]
 */
static void spiInit(uint8_t spiRate) {
  // See avr processor documentation
  SPCR = (1 << SPE) | (1 << MSTR) | (spiRate >> 1);
  SPSR = spiRate & 1 || spiRate == 6 ? 0 : 1 << SPI2X;
}
//------------------------------------------------------------------------------
/** SPI receive a byte */
static uint8_t spiRec() {
  SPDR = 0XFF;
  while (!(SPSR & (1 << SPIF)));
  return SPDR;
}
//------------------------------------------------------------------------------
/** SPI read data - only one call so force inline */
static inline __attribute__((always_inline))
  void spiRead(uint8_t* buf, uint16_t nbyte) {
  if (nbyte-- == 0) return;
  SPDR = 0XFF;
  for (uint16_t i = 0; i < nbyte; i++) {
    while (!(SPSR & (1 << SPIF)));
    uint8_t b = SPDR;
    SPDR = 0XFF;
    buf[i] = b;
  }
  while (!(SPSR & (1 << SPIF)));
  buf[nbyte] = SPDR;
}
//------------------------------------------------------------------------------
/** SPI send a byte */
static void spiSend(uint8_t b) {
  SPDR = b;
  while (!(SPSR & (1 << SPIF)));
}
//------------------------------------------------------------------------------
/** SPI send block - only one call so force inline */
static inline __attribute__((always_inline))
  void spiSendBlock(uint8_t token, const uint8_t* buf) {
  SPDR = token;
  for (uint16_t i = 0; i < 512; i++) {
    uint8_t b = buf[i];
    while (!(SPSR & (1 << SPIF)));
    SPDR = b;
  }
  while (!(SPSR & (1 << SPIF)));
}

fat16lib:
Edit: I need the equivalent of these AVR functions.

I can do those, using the fifo for good speed.

Is there a version of your library which already uses these? Or some test code that calls them to do something simple, like read and print the MBR or Volume ID sector?

I need some sort of test code that I can compile and run.

This beta of SdFat uses the above functions: SdFatBeta20120825.zip http://code.google.com/p/beta-lib/downloads/list.

The functions are at the top of Sd2Card.cpp

It also has the following function to initialize AVR SPI pins:

/**
 * initialize SPI pins
 */
static void spiBegin() {
  pinMode(MISO, INPUT);
  pinMode(MOSI, OUTPUT);
  pinMode(SCK, OUTPUT);
  // SS must be in output mode even it is not chip select
  pinMode(SS, OUTPUT);
  // set SS high - may be chip select for another SPI device
#if SET_SPI_SS_HIGH
  digitalWrite(SS, HIGH);
#endif  // SET_SPI_SS_HIGH
}

This version of SdFat does not have the new stuff to speed up large reads and writes. That involves changes to use multi-block SD commands.

I'm looking that the beta code now....

One minor but important point for compiling on 32 bit platforms is the use of packed structs. By default, the compiler will align 32 bit types to 4 byte boundaries on 32 bit processors. That's definitely not what you want in SdFatStructs.h. It's necessary to add "attribute((packed))" to each struct definition, so the compiler packs the struct as intended.

For example.

struct masterBootRecord {
           /** Code Area for master boot program. */
  uint8_t  codeArea[440];
           /** Optional Windows NT disk signature. May contain boot code. */
  uint32_t diskSignature;
           /** Usually zero but may be more boot code. */
  uint16_t usuallyZero;
           /** Partition tables. */
  part_t   part[4];
           /** First MBR signature byte. Must be 0X55 */
  uint8_t  mbrSig0;
           /** Second MBR signature byte. Must be 0XAA */
  uint8_t  mbrSig1;
} __attribute__((packed));

The 32 bit compiler does not like many things in your iostream headers. :frowning:

I'll work on this more later today or tomorrow. For now, I need to focus on getting the rest of the kickstarter rewards shipped.

Sorry, I forgot to mention the packed attribute. for FAT structs.

I replaced the name fpos_t with FatPos_t

I had problems with types like uint16_t in extractors and inserters so I changed them to C types int, long...

The diff file that makes SdFat compile is:

diff -rb ArduinoOldVer/SdFat/SdBaseFile.cpp Arduino/libraries/SdFat/SdBaseFile.cpp
335c335
< void SdBaseFile::getpos(fpos_t* pos) {
---
> void SdBaseFile::getpos(FatPos_t* pos) {
1110c1110
<   fpos_t pos;
---
>   FatPos_t pos;
1794c1794
< void SdBaseFile::setpos(fpos_t* pos) {
---
> void SdBaseFile::setpos(FatPos_t* pos) {
diff -rb ArduinoOldVer/SdFat/SdBaseFile.h Arduino/libraries/SdFat/SdBaseFile.h
32c32
<  * \struct fpos_t
---
>  * \struct FatPos_t
36c36
< struct fpos_t {
---
> struct FatPos_t {
41c41
<   fpos_t() : position(0), cluster(0) {}
---
>   FatPos_t() : position(0), cluster(0) {}
201c201
<   void getpos(fpos_t* pos);
---
>   void getpos(FatPos_t* pos);
205c205
<   void setpos(fpos_t* pos);
---
>   void setpos(FatPos_t* pos);
diff -rb ArduinoOldVer/SdFat/SdStream.h Arduino/libraries/SdFat/SdStream.h
122c122
<   void getpos(fpos_t* pos) {SdBaseFile::getpos(pos);}
---
>   void getpos(FatPos_t* pos) {SdBaseFile::getpos(pos);}
138c138
<   void setpos(fpos_t* pos) {SdBaseFile::setpos(pos);}
---
>   void setpos(FatPos_t* pos) {SdBaseFile::setpos(pos);}
185c185
<   void getpos(fpos_t* pos) {SdBaseFile::getpos(pos);}
---
>   void getpos(FatPos_t* pos) {SdBaseFile::getpos(pos);}
193c193
<   void setpos(fpos_t* pos) {SdBaseFile::setpos(pos);}
---
>   void setpos(FatPos_t* pos) {SdBaseFile::setpos(pos);}
diff -rb ArduinoOldVer/SdFat/bufstream.h Arduino/libraries/SdFat/bufstream.h
61c61
<   void getpos(fpos_t *pos) {
---
>   void getpos(FatPos_t *pos) {
72c72
<   void setpos(fpos_t *pos) {
---
>   void setpos(FatPos_t *pos) {
diff -rb ArduinoOldVer/SdFat/istream.cpp Arduino/libraries/SdFat/istream.cpp
70c70
<   fpos_t pos;
---
>   FatPos_t pos;
143c143
<   fpos_t endPos;
---
>   FatPos_t endPos;
231c231
<   fpos_t pos;
---
>   FatPos_t pos;
264c264
<   fpos_t endPos;
---
>   FatPos_t endPos;
323c323
<   fpos_t pos;
---
>   FatPos_t pos;
384c384
<   fpos_t pos;
---
>   FatPos_t pos;
407c407
<   fpos_t pos;
---
>   FatPos_t pos;
diff -rb ArduinoOldVer/SdFat/istream.h Arduino/libraries/SdFat/istream.h
138a139
> 
144c145
<   istream &operator>>(int16_t& arg) {
---
>   istream &operator>>(int& arg) {
153c154
<   istream &operator>>(uint16_t& arg) {
---
>   istream &operator>>(unsigned int& arg) {
162c163
<   istream &operator>>(int32_t& arg) {
---
>   istream &operator>>(long& arg) {
171c172
<   istream &operator>>(uint32_t& arg) {
---
>   istream &operator>>(unsigned long& arg) {
256c257
<   int16_t getch(fpos_t* pos) {
---
>   int16_t getch(FatPos_t* pos) {
264c265
<   virtual void getpos(fpos_t* pos) = 0;
---
>   virtual void getpos(FatPos_t* pos) = 0;
271c272
<   virtual void setpos(fpos_t* pos) = 0;
---
>   virtual void setpos(FatPos_t* pos) = 0;
diff -rb ArduinoOldVer/SdFat/ostream.cpp Arduino/libraries/SdFat/ostream.cpp
158a159,160
>   putStr(str);
>   /*
164a167
>   */
diff -rb ArduinoOldVer/SdFat/ostream.h Arduino/libraries/SdFat/ostream.h
137c137
<   ostream &operator<< (int16_t arg) {
---
>   ostream &operator<< (int arg) {
145c145
<   ostream &operator<< (uint16_t arg) {
---
>   ostream &operator<< (unsigned int arg) {
153c153
<   ostream &operator<< (int32_t arg) {
---
>   ostream &operator<< (long arg) {
161c161
<   ostream &operator<< (uint32_t arg) {
---
>   ostream &operator<< (unsigned long arg) {

I suggest you try the bench example. Remove this line to get it to compile:

  cout << pstr("Free RAM: ") << FreeRam() << endl;

retrolefty:
Well lets just not tell them. :smiley:

Well given kickstarter has now changed the rules, and Teensy 3.0 as it was funded would not have been allowed under the new rules. So it may not make any difference whether you tell them or not. Paul himself has spoken out against the new rules, and I imagine hardware projects like the Teensy will just go elsewhere.

Kickstarter can go back to being a tip jar to fund art projects, as it evidently wants to do so. As I said in their blog, I wish them a good life, but I have stopped looking at KS for interesting tech projects to fund. The only things I do on KS now is to check on updates to the 4 tech products I recently backed (RadioBlock, Teensy, Digispark, and JumpShot) as well as the one non-tech product (Deck of Extraordinary Voyages)

I believe the recent rule change was motived mainly to protect Kickstarter from liability, rather than protecting backers from failed projects. I haven't seen anything really conclusive about the Hanfree lawsuit, but the plaintiff publicly said he asked the court to rule the nature of the transaction was an ordinary sale. I'm an engineer, not an attorney, so I'm not going to speculate what that might mean?

It could also be purely coincidental that the "not a store" rule change came about just as the Hanfree lawsuit was making it's way through the legal system.

Here's the failed Hanfree project, where you can read all the ugly details in the comments.

http://www.kickstarter.com/projects/831303939/hanfree-ipad-accessory-use-the-ipad-hands-free/comments

It's the legal analog of Brooks' law: "Adding lawyers to an already bad situation is only going to make things worse."

Just wanted to let others know of my progress so far on using my new Teensy 3.0 board. I had to order a micro USB cable and it arrived yesterday. I had previously loaded the modified IDE and the Teensy driver thingee, so I just attached the Teensey to the cable and plugged it in. PC seemed to be happy with the attachment and a led on the Teensey was blinking away so I assumed they ship it with the blink sketch loaded. I opened the IDE, selected the Teensey 3.0 board, loaded the mimumSketch example and it upload. I was a little surprised when the Teesey loader pop window sprung up, as I had no idea how the Teensy works with the arduino IDE, but the loader has a option to follow a scrip log and it seemed to all be working correctly, even though in the IDE results window is says something about compile size is zero bytes, but the Teensy loader pop-up log shows all the correct size info and a lot of other stuff. Anyway the Teensy board did stop blinking it's led, so everything seemed to upload and run OK. I then loaded blink sketch example in the IDE and hit upload and everything worked again and the board did indeed start blinking it's led again.

So I guess the report is that the Teensey 3.0 seems to work right out of the box as designed even for this software-installing-challenged kind of guy that I am. I still haven't a clue what I might do with this board yet. And Paul seems to be releasing a new IDE version every other day to add some new arduino library update, so it seems kind of silly to rush into anything. But it's a great little product with a lot of promise ahead for it I think. I kind of hope a Teensey forum might start up to help support this product, if one is not already around somewhere?

I'm still getting over the shock of how....well teensey this thing is, so small.

Lefty

@Lefty - glad it's working. I fixed the size reporting in beta4. This evening I'm going to publish beta5, with a master-only port of Wire (slave mode to be filled in next week), and Serial1, Serial2, and Serial3 working.

@fat16lib - I applied the patch. Now the code compiles. :slight_smile: It's not quite working, but that's probably a bug on my end. I'm investigating now. Will try to get those 24 Mbit/sec optimized routines written for you.

Lefty

I'm still getting over the shock of how....well teensy this thing is, so small.

I know how you feel. When I got my first teensy in hands I was also going like "this small"? And indeed it all worked great out of the box and the blink sample is preloaded (This is part of Paul testing the boards).

I would really advice Arduino to look at what Paul is doing (more often). For instance: Even though the teensy 2 uses the same chip as the Leonardo it does not have the 2 com ports issue which is very distracting. The due has 2 com ports. I guess if they had used halfkay (the window that popped up during upload) like Teensy they would not need 2 com ports.

Best regards
Jantje

My Teensy is on its way. I got a friend to order from Ireland where it has arrived, looking forward to it finishing the final leg of its journey to Dubai next week.

Duane B.

I've been working with the SdFat beta for the last hour, digging into why it's not working. Turns out attribute((packed)) bit me yet again. Forgot I've started from a clean copy to apply the patch.

Paul,

I mailed you a version with mods for faster reads/writes. It compiles for Teensy 3.0 but is not tested on Teensy 3.0 since I have not received the replacement for the missing Teensy 3.0 that Robin sent yesterday.

The results for a AVR Mega with 4096 byte writes and reads are promising.

Type any character to start
Free RAM: 2666
Type is FAT16
File size 5MB
Buffer size 4096 bytes
Starting write test. Please wait up to a minute
Write 536.40 KB/sec
Maximum latency: 10336 usec, Minimum Latency: 6908 usec, Avg Latency: 7592 usec

Starting read test. Please wait up to a minute
Read 595.04 KB/sec
Maximum latency: 7984 usec, Minimum Latency: 6804 usec, Avg Latency: 6877 usec

Here's my first attempt at 24 Mbit/sec speed in SdFat. It still needs work (block send it's working at all), but even this makes a pretty substantial speedup.

//------------------------------------------------------------------------------
/**
 * Initialize hardware SPI
 * Set SCK rate to F_CPU/pow(2, 1 + spiRate) for spiRate [0,6]
 */
static void spiInit(uint8_t spiRate) {
  // See avr processor documentation
#if defined(USE_NATIVE_MK20DX128) && 1
  SIM_SCGC6 |= SIM_SCGC6_SPI0;
  SPI0_MCR = SPI_MCR_MDIS | SPI_MCR_HALT;
  // spiRate = 0 : 24 or 12 Mbit/sec
  // spiRate = 1 : 12 or 6 Mbit/sec
  // spiRate = 2 : 6 or 3 Mbit/sec
  // spiRate = 3 : 3 or 1.5 Mbit/sec
  // spiRate = 4 : 1.5 or 0.75 Mbit/sec
  // spiRate = 5 : 250 kbit/sec
  // spiRate = 6 : 125 kbit/sec
  uint32_t ctar;
  switch (spiRate) {
   case 0: ctar = SPI_CTAR_FMSZ(7) | SPI_CTAR_DBR | SPI_CTAR_BR(0); break;
   case 1: ctar = SPI_CTAR_FMSZ(7) | SPI_CTAR_BR(0); break;
   case 2: ctar = SPI_CTAR_FMSZ(7) | SPI_CTAR_BR(1); break;
   case 3: ctar = SPI_CTAR_FMSZ(7) | SPI_CTAR_BR(2); break;
   case 4: ctar = SPI_CTAR_FMSZ(7) | SPI_CTAR_BR(3); break;
#if F_BUS == 48000000
   case 5: ctar = SPI_CTAR_FMSZ(7) | SPI_CTAR_PBR(1) | SPI_CTAR_BR(5); break;
   default: ctar = SPI_CTAR_FMSZ(7) | SPI_CTAR_PBR(1) | SPI_CTAR_BR(6);
#elif F_BUS == 24000000
   case 5: ctar = SPI_CTAR_FMSZ(7) | SPI_CTAR_PBR(1) | SPI_CTAR_BR(4); break;
   default: ctar = SPI_CTAR_FMSZ(7) | SPI_CTAR_PBR(1) | SPI_CTAR_BR(5);
#else
#error "MK20DX128 bus frequency must be 48 or 24 MHz"
#endif
  }
  SPI0_CTAR0 = ctar;
  SPI0_MCR = SPI_MCR_MSTR;
  CORE_PIN11_CONFIG = PORT_PCR_DSE | PORT_PCR_MUX(2);
  CORE_PIN12_CONFIG = PORT_PCR_MUX(2);
  CORE_PIN13_CONFIG = PORT_PCR_DSE | PORT_PCR_MUX(2);
#else
  SPCR = (1 << SPE) | (1 << MSTR) | (spiRate >> 1);
  SPSR = spiRate & 1 || spiRate == 6 ? 0 : 1 << SPI2X;
#endif
}
//------------------------------------------------------------------------------
/** SPI receive a byte */
static inline __attribute__((always_inline))
  uint8_t spiRec() {
#if defined(USE_NATIVE_MK20DX128) && 1
  SPI0_MCR = SPI_MCR_MSTR | SPI_MCR_CLR_RXF;
  SPI0_SR = SPI_SR_TCF;
  SPI0_PUSHR = 0xFF;
  while (!(SPI0_SR & SPI_SR_TCF)) ;
  return SPI0_POPR;
#else
  SPDR = 0XFF;
  while (!(SPSR & (1 << SPIF)));
  return SPDR;
#endif
}
//------------------------------------------------------------------------------
/** SPI read data - only one call so force inline */
static inline __attribute__((always_inline))
  void spiRead(uint8_t* buf, uint16_t nbyte) {
#if defined(USE_NATIVE_MK20DX128) && 1
  SPI0_MCR = SPI_MCR_MSTR | SPI_MCR_CLR_RXF;
  uint32_t status, txcount=0, rxcount=0;
  while (txcount < nbyte) {
    status = SPI0_SR;
    if (((status >> 12) & 15) < 4) {
      SPI0_PUSHR = 0xFF;
      txcount++;
    }
    if (((status >> 4) & 15) > 0) {
      *buf++ = SPI0_POPR;
      rxcount++;
    }
  }
  while (rxcount < nbyte) {
    if (((status >> 4) & 15) > 0) {
      *buf++ = SPI0_POPR;
      rxcount++;
    }
  }
#else
  if (nbyte-- == 0) return;
  SPDR = 0XFF;
  for (uint16_t i = 0; i < nbyte; i++) {
    while (!(SPSR & (1 << SPIF)));
    buf[i] = SPDR;
    SPDR = 0XFF;
  }
  while (!(SPSR & (1 << SPIF)));
  buf[nbyte] = SPDR;
#endif
}
//------------------------------------------------------------------------------
/** SPI send a byte */
static inline __attribute__((always_inline))
  void spiSend(uint8_t b) {
#if defined(USE_NATIVE_MK20DX128) && 1
  SPI0_SR = SPI_SR_TCF;
  SPI0_PUSHR = b;
  while (!(SPI0_SR & SPI_SR_TCF)) ;
#else
  SPDR = b;
  while (!(SPSR & (1 << SPIF)));
#endif
}
//------------------------------------------------------------------------------
/** SPI send block - only one call so force inline */
static inline __attribute__((always_inline))
  void spiSendBlock(uint8_t token, const uint8_t* buf) {
#if defined(USE_NATIVE_MK20DX128) && 0  // This does not work... why??
  uint32_t status, txcount=0;
  SPI0_SR = SPI_SR_TCF;
  SPI0_PUSHR = token;
  while (txcount < 512) {
    status = SPI0_SR;
    if (((status >> 12) & 15) < 4) {
      SPI0_PUSHR = *buf++;
      txcount++;
    }
  }
  while (1) {
    status = SPI0_SR;
    if (((status >> 12) & 15) == 0) break;
  }
  while (!(SPI0_SR & SPI_SR_TCF)) ;
#else
  SPDR = token;
  for (uint16_t i = 0; i < 512; i += 2) {
    while (!(SPSR & (1 << SPIF)));
    SPDR = buf[i];
    while (!(SPSR & (1 << SPIF)));
    SPDR = buf[i + 1];
  }
  while (!(SPSR & (1 << SPIF)));
#endif
}

I received my Teensy 3.0 yesterday, it is awesome! I downloaded the latest beta and got the board going on first try! I have used Teensy 2.0 before so maybe that has a bit to do with it working so quickly but, I am impressed!

So far, I have just changed and reloaded the blink code a few times, and found the reload of a sketch is blazing fast!

Here is the latest link to the beta code:

http://www.pjrc.com/teensy/beta/arduino-1.0.1-teensy3-beta5-linux32.tar.gz

http://www.pjrc.com/teensy/beta/arduino-1.0.1-teensy3-beta5-linux64.tar.gz

http://www.pjrc.com/teensy/beta/arduino-1.0.1-teensy3-beta5-macos.zip

http://www.pjrc.com/teensy/beta/arduino-1.0.1-teensy3-beta5-windows.zip

Paul should keep a map of how far Teensy 3.0 has spread, can anyone beat Dubai ?

Duane B

I'm glad Teensy 3 is working out well, especially the fast upload process.

I actually put months of work into optimizing the upload speed. I spent quite a bit of time analyzing the Teensy 2.0 upload process. The 2.0 bootloader is quite simple (it needs to be, since it fits entirely within 512 bytes of code). It receives a chunk of data, then programs it to the flash, then sends an ACK. That works well, but it doesn't achieve the best speed. First, the entire chip needs to be erased when the first write request shows up. Erasing the chip earlier isn't an option, because users expect to be able to go into bootloader mode, but if nothing is transmitted the chip is expected to remain unmodified. So there's a lag while erasing, and then while writing the first chunk. Then when the ACK is sent, the software on the PC sends the next chunk. Because it's a user space program, it's subjected to ordinary scheduling delays, which can average several milliseconds before the next chunk is sent. Over the span of writing many chunks, those USB latencies and operating system scheduling delays add up. More time is spent waiting than writing.

In 3.0, I implemented quite a lot of buffering in RAM. So while the chip is erasing, several chunks of data are buffered into RAM. Once programming begins, there's plenty of data buffered to keep the speed constrained only by the flash writing. The operating system's userspace scheduling delays result in bursts of incoming data to refill the buffers. With enough on-chip buffering, the flash writing never stalls waiting for more data. So the upload speed runs at very close to the maximum possible speed imposed only by the flash itself. I used the chip's fast DMA engine to copy from the buffers to the flash controller, to minimize the non-writing time. But overall, buffering in the chip's RAM is the key to avoiding delay.

There's another speedup I've designed, which isn't currently in use on Teensy 3.0. While the loading speed is nearly as fast as the flash memory allows, there is about a 1 second delay (longer on some windows machines) from the instant the compiler produces the .hex file to the upload actually beginning. When Teensy reboots into bootloader mode, the USB disconnects, and then it reappears as the new device which accepts the download. Leonardo does the same. Each operating system does USB enumeration slightly differently, but there are delays associated with USB reset signals and other USB stuff, which add up to about 1 second or more.

My hope is to eliminate all that delay by "rebooting" into the bootloader code without disconnecting the USB. The support for it is all in place in Teensy 3.0, so I hope to enable this with a software update sometime next year (2013). But this is a lot of difficult software work to implement successfully on the computer side. The loader needs to deal with requesting the "reboot" and then checking to see if it was successful. There are a number of complications with basically switching a USB device at runtime without going through the proper enumeration process. If it doesn't work, of course the fallback is to disconnect. Success or failure needs to be determined in a matter of milliseconds to be of any value. All that needs to be done times 3, because each operating system has dramatically different low-level USB APIs. But if it is successful, the opportunity for speedup is pretty incredible. That 1+ second USB enumeration is now the single largest delay (assuming your machine is fast and the build-dependency patch is avoiding full recompile). I want to turn that last 1 second delay into only single-digit milliseconds!

Yes, I'm obsessed with upload speed optimization.......