nRF52840 micros() issue

I want to do some ADC with a board that uses the nRF52840 (specifically the promicro version) and I started with the timing code for it but am seeing some odd behaviour when I check the timing with micros(). Whether this is an artefact of how micros() works or something else is what I would like to find out.

Code:

#include <Arduino.h>
#include <Adafruit_TinyUSB.h>
#include <nrf_timer.h>

volatile uint8_t fired = 0;
uint16_t loopcount = 0;
unsigned long lastTime;

void setup() {
  Serial.begin(115200);
  delay (2000);
  
  NRF_TIMER4->TASKS_STOP = 1;
  NRF_TIMER4->TASKS_CLEAR = 1;

  NRF_TIMER3->MODE = TIMER_MODE_MODE_Timer;
  NRF_TIMER4->BITMODE = TIMER_BITMODE_BITMODE_16Bit << TIMER_BITMODE_BITMODE_Pos; 
  NRF_TIMER4->PRESCALER = 0 << TIMER_PRESCALER_PRESCALER_Pos;   // No prescaling (16 MHz clock; ~62.5 ns per tick)
  NRF_TIMER4->CC[0] = 64000;       // Set compare value = 64000 at 16MHz = 250Hz
  NRF_TIMER4->INTENSET = TIMER_INTENSET_COMPARE0_Enabled << TIMER_INTENSET_COMPARE0_Pos;
  NRF_TIMER4->SHORTS = TIMER_SHORTS_COMPARE0_CLEAR_Enabled << TIMER_SHORTS_COMPARE0_CLEAR_Pos;
  NRF_TIMER3->EVENTS_COMPARE[0] = 0;  // Clear any pending compare event

  NVIC_EnableIRQ(TIMER4_IRQn);
  NRF_TIMER4->TASKS_START = 1; // Start
}

extern "C" void  TIMER4_IRQHandler() {
  NRF_TIMER4->EVENTS_COMPARE[0] = 0;
  fired++;
}

void loop() {
  loopcount++;
  if (fired > 249) { 
    unsigned long CurrentTime = micros();
    unsigned long ElapsedTime = CurrentTime - lastTime;
    lastTime = CurrentTime;

//    uint16_t adcvalue = analogRead(16);

    Serial.printf("%d %d %d\n", fired, ElapsedTime, loopcount);
    fired = 0;
    loopcount = 0;
  }
}

So I am firing an interrupt every 4ms and incrementing a flag in the ISR, and in the main loop I'm polling as fast as possible to check when the flag gets to 250 and when it does, I check the time with micros() and save the time in a variable before resetting the counters. Next time the main loop detects 250 ISR counts I print the time that has passed.

So firstly as the ISR and loop() are asynchronous I would expect a bit of jitter in the time that passed, I expect it to be about 1 second but since the loop executes about every 24uS I would expect the time that passed from loop to loop to vary by around this amount (but it does not).

And then also I am seeing every now and then a large delay of around 1ms (so instead of reporting 1000000uS the loop is reporting 1000976uS or so).

250 1000000 41643
250 1000000 41738
250 1000977 41734
250 1000000 41745
250 1000000 41752
250 1000000 41202
250 1000000 41665
250 1000000 41636
250 1000000 41700

Maybe this is because micros() does something odd but this is expected behaviour and I should not trust it, but maybe the whole processor stops for a millisecond now and then in which case my ADC will be thrown off and I'll need to make huge code contortions to manage it as far as I can see.

Can anyone throw any light on this? I should say that I've already used (what I think are) higher level libraries like the software timer one but had similar problems with them all so ended up doing it as low level as I could find info on for maximum control but still no joy. Hence starting to think its something to do with micros() and this is simply not able to actually measure microseconds but is estimating them or rounding them or something.

TIA, Pete

This looks strange to me

In general, the micros values doesn't look realistic. Note that the loopcount exhibits noticeable jitter, while the micros doesn't. The fact that micros outputs exact as 1 million with an accuracy of 1 µs is also abnormal.
The µs values looks like to be updated once per ms and were calculated by simply multiplying millis by 1000. This also can explains why it changed by 977 at a single jump.

Yes I agree. The loopcount looks exactly what I would expect, since this loop is free running. But the exactly (to the uS) elapsedtime is weird - impossible really since the loop is not synchronised to the interrupt and the loop takes 24uS. Your hypothesis of the millis being multiplied by 1000 makes sense but what about the 977 (this exact value varies by 2 or 3 iirc but its fairly stable). I don't see that fitting into the multiply by 1000 scenario but maybe I'm missing something.

It does seem like its micros() thats to fault here and much less likely to be the interrupt. Maybe since this chip is new to Arduino they bodged the implementation of micros() for now and intend to come back to it or something in the future.

Probably what I should do is somehow count clock pulses. Not sure exactly how to do this other than using a counter again and there is a call that retrieves the current count. Only problem is I'm already using one counter and I'm not sure how many there are available (the system takes one or more IIRC but I think there are 4, will need to check).

If there are spares I should probably rewrite this without micros() and using the counter and see if the problem persists.

@sonofcy I know that looks weird, when you're familiar with the syntax of the atmega Arduino stuff much of this does to me anyway. There aren't many references to timer code on the web for this chip but after quite a lot of research this does seem to be the right way to set these timers up (although I am very much still rookie with this).

On ArduinoCore-avr millis() counts the overflow of a timer. This timer runs at 1.024ms. Since millis() cannot return fractions it runs too slow. To correct this, the fraction is collected and causes millis() to jump by 2 every now and than (approx every 42ms).

Maybe the NRF core does something similar and you see the correction of millis().

LGTM

You have an interrupt vector table. This is a list of function pointers.
TIMER4_IRQn is the index where you can find the pointer to TIMER4_IRQHandler.

So I removed the micros() call and replaced it with a second 32 bit / 1MHz timer which just counts up for an hour or so (1MHz at 32 bits is about 1 hour 10 mins). Its possible to call a function to return the current count so this can act as a 1MHz clock.

I cant be totally certain that this is not also 'pausing' and so faking the results (since its triggered by the main clock if there was a problem that low down, it could affect both this timer and the everything else etc). But it seems really unlikely that this is the case as its and internal function of the chip implemented in hardware AFAIK and unless some code actually calls it to pause nothing is going to make it do so.

This gives very different results, exactly the results that would be expected.

ADC, fired, ElapsedTime, loopcount
118172 250 999997 39437
119284 250 999998 38960
119783 250 1000001 39390
118835 250 999975 38621
118706 250 1000028 39509
118118 250 999996 37832
118263 250 1000002 39491
118664 250 999999 36158
119221 250 1000001 39184
119641 250 1000002 39180
119816 250 1000001 39013
119700 250 999997 39473
119638 250 999998 38905
119127 250 1000002 39445
118580 250 1000000 38060
118102 250 999999 39018
117688 250 1000004 38653
117707 250 999999 39331
117838 250 999997 39132
118171 250 1000002 39316
118728 250 999999 38847
119125 250 1000002 39374
119522 250 1000000 39098
118437 250 999998 39340
118109 250 1000000 39433
117253 250 999998 37856
117316 250 1000001 39415
118334 250 1000000 39392
119079 250 999999 39519
118592 250 1000001 39305
117589 250 999999 39268
117274 250 1000000 38818
117638 250 1000002 39483
118034 250 999974 39232
119014 250 1000026 39328
118034 250 999974 39232
119014 250 1000026 39328
119347 250 1000002 39397
119482 250 999998 39510
119304 250 1000000 39448
118768 250 1000000 39025
118696 250 999998 39509
118584 250 1000000 39292
118983 250 1000000 39482
119034 250 1000005 38305
119184 250 999999 39246
118882 250 999997 38313
118636 250 1000004 39335
118130 250 999999 39403
117780 250 999997 39174
118427 250 1000002 37882
119410 250 1000001 39273
119988 250 1000000 39480
119311 250 1000000 39064
118356 250 1000000 39124
118164 250 1000000 39175
118701 250 1000000 39003
119448 250 999999 38115
119518 250 999997 38133
120028 250 1000005 39447
119347 250 999996 39300
118995 250 1000006 39075
118366 250 999997 39219

Note that I slipped some ADC calls in there but it was the same before adding them. They just show that the ADC does not massively affect the timings. They are just summed and printed to stop them being optimised out but I will be stuffing them into an array eventually.

Theres a little jitter to and fro on the ElapsedTime, but it averages out to zero which is expected. Occasionally up to about +-28uS which is about right for a loop function thats running at about 38000 Hz. Again jitter in the loopcount as expected.

To me this all looks completely fine now, but could I ask if anyone else sees any issues now?

Clearly micros() is not of any use on the nRF52840 though anyway.

#include <Arduino.h>
#include <Adafruit_TinyUSB.h>
#include <nrf_timer.h>

volatile uint8_t fired = 0;
uint8_t lastfired = 0;
uint16_t loopcount = 0;
uint32_t lastTime;
uint32_t adcsum = 0;

void setup() {
  Serial.begin(115200);
  delay (2000);
  
  NRF_TIMER3->TASKS_STOP = 1;
  NRF_TIMER3->TASKS_CLEAR = 1;

  NRF_TIMER3->MODE = TIMER_MODE_MODE_Timer;
  NRF_TIMER3->BITMODE = TIMER_BITMODE_BITMODE_32Bit << TIMER_BITMODE_BITMODE_Pos; 
  NRF_TIMER3->PRESCALER = 4 << TIMER_PRESCALER_PRESCALER_Pos;   // 4 bits prescaling = 8? (16 MHz clock; ~62.5 ns per tick)
  NRF_TIMER3->CC[0] = 4294967200;       // Set compare value near end of 32 bits
  NRF_TIMER3->INTENSET = TIMER_INTENSET_COMPARE0_Enabled << TIMER_INTENSET_COMPARE0_Pos;
  NRF_TIMER3->SHORTS = TIMER_SHORTS_COMPARE0_CLEAR_Enabled << TIMER_SHORTS_COMPARE0_CLEAR_Pos;
  NRF_TIMER3->EVENTS_COMPARE[0] = 0;  // Clear any pending compare event

  NRF_TIMER4->TASKS_STOP = 1;
  NRF_TIMER4->TASKS_CLEAR = 1;

  NRF_TIMER4->MODE = TIMER_MODE_MODE_Timer;
  NRF_TIMER4->BITMODE = TIMER_BITMODE_BITMODE_16Bit << TIMER_BITMODE_BITMODE_Pos; 
  NRF_TIMER4->PRESCALER = 0 << TIMER_PRESCALER_PRESCALER_Pos;   // No prescaling (16 MHz clock; ~62.5 ns per tick)
  NRF_TIMER4->CC[0] = 64000;       // Set compare value = 64000 at 16MHz = 250Hz
  NRF_TIMER4->INTENSET = TIMER_INTENSET_COMPARE0_Enabled << TIMER_INTENSET_COMPARE0_Pos;
  NRF_TIMER4->SHORTS = TIMER_SHORTS_COMPARE0_CLEAR_Enabled << TIMER_SHORTS_COMPARE0_CLEAR_Pos;
  NRF_TIMER4->EVENTS_COMPARE[0] = 0;  // Clear any pending compare event

  NVIC_EnableIRQ(TIMER4_IRQn);
  NRF_TIMER3->TASKS_START = 1; // Start
  NRF_TIMER4->TASKS_START = 1; // Start
}

extern "C" void  TIMER4_IRQHandler() {
  NRF_TIMER4->EVENTS_COMPARE[0] = 0;
  fired++;
}

void loop() {

  if (lastfired != fired) {
    uint16_t adcvalue = analogRead(16);
    adcsum += adcvalue;
    lastfired = fired;
  }

  loopcount++;
  if (fired > 249) { 
    // Capture the current TIMER1 count into CC[1].
    NRF_TIMER3->TASKS_CAPTURE[1] = 1;
    uint32_t counter = NRF_TIMER3->CC[1];

    uint32_t ElapsedTime = counter - lastTime;
    lastTime = counter;

    Serial.printf("%d %d %d %d\n", adcsum, fired, ElapsedTime, loopcount);
    fired = 0;
    loopcount = 0;
    adcsum = 0;
  }
}


@Rintin thats interesting thanks for the detail. Something I did notice was that there didn't seem to be any exact perodicity to the extra 1ms delay, but I didn't set up to carefully measure that. I just printed the time in uS on single lines, with newlines when the time was greater than 500uS late and observed that the newlines happened at very different and quite random logging line lengths. Still, it seemed to be worst at the start of execution and become more regular later. I would be interested to look at the source for this stuff but not sure where to look. Maybe the mechanism would be obvious. I saw the same effect in the nRF52Timerinterrupt library with a frequency of about 250 Hz interrupt rate. When I changed the rate by a couple of Hz up or down the effect saw a very large change (IE at maybe 248 Hz the 1000970 uS delay was happening at a 1:4 ratio instead of a 1:50 ratio at 252Hz). I guess this does show some kind of 'beat frequency' effect similar to what you explained.

Anyway reading the source code would probably be better than speculating, but I am glad my other test with the counter seems to be fine, I can go ahead and continue with the application I was working on. Just got to remember to not use milis() on this platform.

Just to close this up - I didn't realise but this board runs freeRTOS (I think all arduino cores run freeRTOS - the pcook one I am using certainly does but all the others I looked at did too).

So freeRTOS uses a tick frequency of 1kHz and in delay.h in the source for the pcook BSP I found this:

static inline uint32_t micros( void )
{
  // Use DWT cycle count if it is enabled, otherwise use rtos tick
  return dwt_enabled() ? (DWT->CYCCNT / 64) : tick2us(xTaskGetTickCount());
}

DWT is "Debug, Watchdog, and Trace" but I have not played with it. Its a direct system clock counter by all accounts. Perhaps reliable but I'm not sure what happens to it when you put the cpu to sleep as the 64MHz clock gets sleeped too. Anyway tick2us must be a x1000 multiplier since it takes ticks at 1kHz and outputs microseconds. But clearly the resolution is going to be 1000us. Meaning it can't be used as a microsecond timer.

So this seems to confirm, if you want anything better than 1millisecond resolution, use a counter one way or the other - figure out the DWT thing or use an appropriate counter for your sleep level. I am using the 32kHz clock now so that will be all I can do but its still better than 1kHz.