Measuring and Improving Serial Performance / Serial.flush() Blocks Forever on Nano 33 BLE Sense

Let me explain my application so my goals make more sense (:

I'm trying to measure the data from all the sensors on the Nano 33 BLE Sense, and transmit them in real time, as fast as possible.

I want to set the sample rates of each sensor to their maximum:

  • 476 Sps for accelerometer & gyroscope xyz.
  • 80 Sps for the magnetometer xyz.
  • 12.5 Sps for the humidity and temperature.
  • 16000 Sps for the microphone.

Every 62.5 us I want to Serial.write a buffer consisting of the following concatenation:

  • buffer = microseconds + ax + ay + az + gx + gy + gz + mx + my + mz + humidity + temp + microphone.
  • sizeof(buffer) = 54 bytes.

The goal is to achieve a USB data transfer time (call this usbDTTime) less than 62.5 us.

Here is the pseudo code:

#include "myLibrary.h"; //Contains temp data and current data buffers, 54 bytes each.

unsigned long startClock;

void setup() {
    while (!Serial);
    mySetup();
}

void loop() {
    //Non-blocking method will Serial.write all of the current data buffer...
    //54 bytes + 2*54 bits = 540 bits including the start and stop bits for each byte of data:
    mySerialBufferWrite();
    //Start timer:
    startClock = micros();
    //Fill temp data buffer with new sensor readings...
    //should take much less than 62.5 us to complete:
    fillTempDataBuffer();
    //Wait on async mySerialBufferWrite() method to rejoin:
    mySerialBufferWriteFlush(); //We should still have a few us left to do this last thing:
    //Move data from temp buffer to current buffer:
    moveTempToCurrentBuffer();
    //In case we have a little time left over, wait until the full 62.5 us is up:
    while(micros() - startClock > 62);
}

So my problem is that when I run this new speed testing code on my PC (Nano33 micro-USBA3.0)...

//Preliminary Notes:
//* Need to calculate the total # of bytes we will need per data transfer:
//  * BYTES_PER_TRANSFER = MICROSECONDS_BYTES + ACCELEROMETER_XYZ_BYTES + GYROSCOPE_XYZ_BYTES + MAGNETOMETER_XYZ_BYTES + HUMIDITY_BYTES + TEMPERATURE_BYTES + MICROPHONE_BYTES = 8 + 3*12 + 2*4 + 2 = 54 bytes/transfer.
//    * MICROSECONDS_BYTES = sizeof(uint64_t) = 8.
//    * ACCELEROMETER_XYZ_BYTES = GYROSCOPE_XYZ_BYTES = MAGNETOMETER_XYZ_BYTES = 3*sizeof(float) = 3*4 = 12.
//    * HUMIDITY_BYTES = TEMPERATURE_BYTES = sizeof(float) = 4.
//    * MICROPHONE_BYTES = sizeof(short) = 2.

#include "mbed.h"
#include "nrf_delay.h"
#include "nrf_gpio.h"

#define OUTPUT_PIN NRF_GPIO_PIN_MAP(1,11)

const byte BYTE_TO_SEND = 170; //b'10101010'.
const int BYTES_PER_TRANSFER = 54; //Refer to preliminary notes for calculation description.

unsigned long startClock;
unsigned long endClock;

byte buffer[BYTES_PER_TRANSFER];

void setup(){
  Serial.begin(1000000); //Does nothing on the Nano 33 BLE Sense.
  while (!Serial); //Wait for serial port to connect. Needed for native USB CDC on Nano 33 BLE Sense.

  nrf_gpio_cfg_output(OUTPUT_PIN); //Configure pin as digital output for measuring time of events on scope.
  
  memset(buffer, BYTE_TO_SEND, sizeof(buffer));

  //Measured 300 ns on scope:
  togglePinTwice();
  delay(10);

  //Measured 9.2 us on scope:
  togglePinTwiceAndCallMicrosOnce();
  delay(10);

  //Measured 17.6 us on scope:
  togglePinTwiceAndCallMicrosTwice();
  delay(10);

  //Measured 88 us on scope and 78 us from micros difference:
  togglePinTwiceCallMicrosTwiceAndWriteBytes(); //This result makes sense bc 88 - 9.2 = 78.8 us (which is basically 78 us).
  delay(10);

  //Measured 68 us on scope:
  togglePinTwiceAndWriteBytes();
  
  unsigned long elapsedTime = endClock - startClock;
  Serial.println("");
  Serial.print(elapsedTime);
  Serial.println(" us");
}
  
void loop(){}

void togglePinTwice(){
  nrf_gpio_pin_toggle(OUTPUT_PIN);
  nrf_gpio_pin_toggle(OUTPUT_PIN);
}

void togglePinTwiceAndCallMicrosOnce() {
  nrf_gpio_pin_toggle(OUTPUT_PIN);
  startClock = micros();
  nrf_gpio_pin_toggle(OUTPUT_PIN);
}

void togglePinTwiceAndCallMicrosTwice() {
  nrf_gpio_pin_toggle(OUTPUT_PIN);
  startClock = micros();
  endClock = micros();
  nrf_gpio_pin_toggle(OUTPUT_PIN);
}

void togglePinTwiceCallMicrosTwiceAndWriteBytes() {
  nrf_gpio_pin_toggle(OUTPUT_PIN);
  startClock = micros();
  Serial.write(buffer, sizeof(buffer)); //36047-37163 bytes/s on portenta h7 Mac.
  Serial.flush();
  endClock = micros();
  nrf_gpio_pin_toggle(OUTPUT_PIN);
}

void togglePinTwiceAndWriteBytes() {
  nrf_gpio_pin_toggle(OUTPUT_PIN);
  Serial.write(buffer, sizeof(buffer));
  Serial.flush();
  nrf_gpio_pin_toggle(OUTPUT_PIN);
}

...I measure on my oscilloscope that it takes 68 us to fully transfer the 54 byte buffer, 5.5 us longer than my desired working time ):

Any ideas how to speed this up?