Micros() is taking over 500 cycles to execute on Nano 33 BLE Sense

I worked pretty hard to make a simple sketch proving how slow the micros() method is on the Nano 33 BLE Sense:

#include "mbed.h"
#include "nrf_delay.h"
#include "nrf_gpio.h"

#define OUTPUT_PIN NRF_GPIO_PIN_MAP(1,11)

unsigned long startClock;
unsigned long stopClock;
unsigned long arithmeticMinuend = 4294967295;
unsigned long arithmeticSubtrahend = 2147483648;
unsigned long arithmeticDifference = 0;
unsigned long timeDifference1 = 0;
unsigned long timeDifference2 = 0;

//Assume 1 to 4 clock cycles (15.625 to 62.5 ns) is required for every...
//* call of micros();
//* subtraction of two unsigned longs.
//Therefore we should not be able to measure the time to execute either task by using the micros() - startClock technique, as...
//* micros() - startClock = 0 in either case.

void setup() {
  Serial.begin(115200);
  while (!Serial);

  nrf_gpio_cfg_output(OUTPUT_PIN);

  startClock = micros();
  stopClock = micros();
}

void loop() {
  timeDifference1 = microsecondsToSubtract2LongsOnce();
  timeDifference2 = microsecondsToSubtract2LongsTwice();
  Serial.print("timeDifference1 = ");
  Serial.print(timeDifference1);
  Serial.print(" us.\ntimeDifference2 = ");
  Serial.print(timeDifference2);
  Serial.println(" us.");
  
  microsecondsToSubtract2LongsOnceMeasuredWithScope();
  delay(100);
  microsecondsToSubtract2LongsTwiceMeasuredWithScope();
  delay(100);
  microsecondsToToggleOutputPinMeasuredWithScope();
  
  while(1) {
    
  }
}

//Prints 8 us:
unsigned long microsecondsToSubtract2LongsOnce() {
  startClock = micros();
  stopClock = micros();
  return stopClock - startClock;
}

//Prints 9 us:
unsigned long microsecondsToSubtract2LongsTwice() {
  startClock = micros();
  arithmeticDifference = arithmeticMinuend - arithmeticSubtrahend;
  stopClock = micros();
  return stopClock - startClock;
}

//Measured 18 us on scope:
void microsecondsToSubtract2LongsOnceMeasuredWithScope() {
  nrf_gpio_pin_toggle(OUTPUT_PIN);
  startClock = micros();
  stopClock = micros();
  nrf_gpio_pin_toggle(OUTPUT_PIN);
}

//Measured 18 us on scope:
void microsecondsToSubtract2LongsTwiceMeasuredWithScope() {
  nrf_gpio_pin_toggle(OUTPUT_PIN);
  startClock = micros();
  arithmeticDifference = arithmeticMinuend - arithmeticSubtrahend;
  stopClock = micros();
  nrf_gpio_pin_toggle(OUTPUT_PIN);
}

//Measured 500 ns on scope:
void microsecondsToToggleOutputPinMeasuredWithScope() {
  nrf_gpio_pin_toggle(OUTPUT_PIN);
  nrf_gpio_pin_toggle(OUTPUT_PIN);
}

I compare time measurements I made with oscilloscope to the ones I made using micros(). Doing the math we get that the # of clock cycles required for one micros() call = (0.5*(18000 - 500) ns)/15.625 ns = 560 clock cycles!!!

Can someone give an example for measuring time on this board which doesn't require more than 10 clock cycles???

Please take a look at this same discussion on SO; I have discovered a strategy which is an order of magnitude faster than this, using mBed's us_ticker_read method. Please let me know if you know of a faster solution which will beat my 42 clock cycles. I would love to hear about it!

1 Like

For short intervals, you should be able to read cycle counts from either the ARM Core SysTick Timer, or the ARM DWT Cycle Counter (you'd probably have to enable the DWT first.)

        startTime = SysTick->VAL; // 24bit counter, probably resets at ~1ms.  Counts Down!
    // or
        start = DWT->CYCCNT;  //CYCCNT wraps at 32bits; never resets, counts up.
    // With:
      // Somewhere in system init
      CoreDebug->DEMCR |= CoreDebug_DEMCR_TRCENA_Msk;
      DWT->CTRL |= DWT_CTRL_CYCCNTENA_Msk;

I don't know how those will be affected by (or affect) low power modes and such.

1 Like

I will try this out today, thank you (:

Let me start by saying THANK YOU so much for kicking my strategy's ***! (:

Here are the results (and respective scope measurement to back them up):

//Preliminary Notes:
//* Need to calculate the total # of bytes we will need per data transfer:
//  * BYTES_PER_TRANSFER = MICROSECONDS_BYTES + ACCELEROMETER_XYZ_BYTES + GYROSCOPE_XYZ_BYTES + MAGNETOMETER_XYZ_BYTES + HUMIDITY_BYTES + TEMPERATURE_BYTES + MICROPHONE_BYTES = 8 + 3*12 + 2*4 + 2 = 54 bytes/transfer.
//    * MICROSECONDS_BYTES = sizeof(uint64_t) = 8.
//    * ACCELEROMETER_XYZ_BYTES = GYROSCOPE_XYZ_BYTES = MAGNETOMETER_XYZ_BYTES = 3*sizeof(float) = 3*4 = 12.
//    * HUMIDITY_BYTES = TEMPERATURE_BYTES = sizeof(float) = 4.
//    * MICROPHONE_BYTES = sizeof(short) = 2.

#include "mbed.h"
#include "nrf.h"
#include "nrf_delay.h"
#include "nrf_gpio.h"

#define OUTPUT_PIN NRF_GPIO_PIN_MAP(1,11)

const float MICROS_PER_CYCLE = 0.015625; //Nano 33 BLE Sense operates at 64 MHz -> 15.625 ns/cc.
const byte BYTE_TO_SEND = 170; //b'10101010'.
const int BYTES_PER_TRANSFER = 54; //Refer to preliminary notes for calculation description.

unsigned long startClock;
unsigned long endClock;

byte buffer[BYTES_PER_TRANSFER];

volatile uint32_t stopWatchTime1;
volatile uint32_t stopWatchTime2;
volatile uint32_t stopWatchTime3;

void setup(){
  Serial.begin(1000000); //Does nothing on the Nano 33 BLE Sense.
  while (!Serial); //Wait for serial port to connect. Needed for native USB CDC on Nano 33 BLE Sense.

  stopWatchInit();
  
  nrf_gpio_cfg_output(OUTPUT_PIN); //Configure pin as digital output for measuring time of events on scope.
  
  memset(buffer, BYTE_TO_SEND, sizeof(buffer));

  //Measured 300 ns on scope and printed 190 ns.
  togglePinTwice();
  delay(10);

  //Measured 840 ns on scope and printed 730 ns:
  togglePinTwiceAndCallMicrosTwice();
  delay(10);

  //Measured ~68 us on scope and printed 67.69 us:
  togglePinTwiceCallMicrosTwiceAndWriteBytes(); //This result makes sense bc 88 - 9.2 = 78.8 us (which is basically 78 us).
  delay(10);

  //Measured ~68 us on scope:
  togglePinTwiceAndWriteBytes();

  Serial.println("");
  Serial.print(MICROS_PER_CYCLE*((float)stopWatchTime1));
  Serial.println(" us");
  Serial.println("");
  Serial.print(MICROS_PER_CYCLE*((float)stopWatchTime2));
  Serial.println(" us");
  Serial.println("");
  Serial.print(MICROS_PER_CYCLE*((float)stopWatchTime3));
  Serial.println(" us");
}
  
void loop(){}

void togglePinTwice(){
  stopWatchStart();
  nrf_gpio_pin_toggle(OUTPUT_PIN);
  nrf_gpio_pin_toggle(OUTPUT_PIN);
  stopWatchTime1 = stopWatchGet();
}

void togglePinTwiceAndCallMicrosTwice() {
  nrf_gpio_pin_toggle(OUTPUT_PIN);
  stopWatchStart();
  stopWatchTime2 = stopWatchGet();
  nrf_gpio_pin_toggle(OUTPUT_PIN);
}

void togglePinTwiceCallMicrosTwiceAndWriteBytes() {
  nrf_gpio_pin_toggle(OUTPUT_PIN);
  stopWatchStart();
  Serial.write(buffer, sizeof(buffer)); //36047-37163 bytes/s on portenta h7 Mac.
  Serial.flush();
  stopWatchTime3 = stopWatchGet();
  nrf_gpio_pin_toggle(OUTPUT_PIN);
}

void togglePinTwiceAndWriteBytes() {
  nrf_gpio_pin_toggle(OUTPUT_PIN);
  Serial.write(buffer, sizeof(buffer));
  Serial.flush();
  nrf_gpio_pin_toggle(OUTPUT_PIN);
}




void stopWatchInit() {
  CoreDebug->DEMCR |= 0x01000000; //Enable the use of DWT.
}

void stopWatchStart() {
  DWT->CYCCNT = 0; //Reset cycle counter.
  DWT->CTRL |= 0x1; //Enable cycle counter.
}

uint32_t stopWatchGet() {
  return DWT->CYCCNT; //Number of clock cycles stored in count variable.
}

Have a great night!