Is there an Intel RDTSC equivalent clock counter on Due SAM3X8E microcontroller?

Alternative question:
Is it possible to measure time on Due with better precision than with "micros()" [sub microsecond]?

http://www.atmel.com/Images/Atmel-11057-32-bit-Cortex-M3-Microcontroller-SAM3X-SAM3A_Datasheet.pdf#page=217
shows "rime stamping" as part of ITM module.

Searching for "clock" has >1800 hits, for "counter" >500, too much to step search through.

Hermann.

I looked into implementation of "micros()" below and found "GetTickCont()" used.

".arduino15/packages/arduino/hardware/sam/1.6.4/system/libsam/source/timetick.c"
says "Get current Tick Count, in ms."

Now I ran this little sketch on Due:

void setup() {
  volatile unsigned long t0,t1;
  Serial.begin(9600);
  Serial.println(GetTickCount());
  Serial.println(GetTickCount());
  t0=micros();
  t1=micros();
  Serial.println(t0);
  Serial.println(t1);
}

void loop() {}

Output after pressing Reset button several times is this:

2
2
2015
2017
2
2
2015
2017
2
2
2015
2017
2
2
2016
2017
2
2
2015
2017
2
2
2016
2017

So "t1=micros()" seems to take between 1 and two micro seconds.

This is from
".arduino15/packages/arduino/hardware/sam/1.6.4/cores/arduino/wiring.c":

// Interrupt-compatible version of micros
// Theory: repeatedly take readings of SysTick counter, millis counter and SysTick interrupt pending flag.
// When it appears that millis counter and pending is stable and SysTick hasn't rolled over, use these 
// values to calculate micros. If there is a pending SysTick, add one to the millis counter in the calculation.
uint32_t micros( void )
{
    uint32_t ticks, ticks2;
    uint32_t pend, pend2;
    uint32_t count, count2;

    ticks2  = SysTick->VAL;
    pend2   = !!((SCB->ICSR & SCB_ICSR_PENDSTSET_Msk)||((SCB->SHCSR & SCB_SHCSR_SYSTICKACT_Msk)))  ;
    count2  = GetTickCount();

    do {
        ticks=ticks2;
        pend=pend2;
        count=count2;
        ticks2  = SysTick->VAL;
        pend2   = !!((SCB->ICSR & SCB_ICSR_PENDSTSET_Msk)||((SCB->SHCSR & SCB_SHCSR_SYSTICKACT_Msk)))  ;
        count2  = GetTickCount();
    } while ((pend != pend2) || (count != count2) || (ticks < ticks2));

    return ((count+pend) * 1000) + (((SysTick->LOAD  - ticks)*(1048576/(F_CPU/1000000)))>>20) ;
    // this is an optimization to turn a runtime division into two compile-time divisions and 
    // a runtime multiplication and shift, saving a few cycles
}

Seems to indicate that there is no register for just reading out clock cycles since system start as with RDTSC.

Hermann.

There is a "delayMicroseconds()" function that works.

This little sketch delays pin flipping by 500000μs=0.5s several times:

void setup() {
  pinMode(7,OUTPUT);
  digitalWrite(7,HIGH);
  delayMicroseconds(500000);
  digitalWrite(7,LOW);
  delayMicroseconds(500000);
  digitalWrite(7,HIGH);
  delayMicroseconds(500000);
  digitalWrite(7,LOW);
  delayMicroseconds(500000);
  digitalWrite(7,HIGH);
  delayMicroseconds(500000);
  digitalWrite(7,LOW);
  delayMicroseconds(500000);
  digitalWrite(7,HIGH);
  delayMicroseconds(500000);
  digitalWrite(7,LOW);
  delayMicroseconds(500000);
}

void loop() {}

Oscilloscope confirms that "delayMicroSceonds()" does the right thing:

After fixing a bug of myself it works also when inserted between "micros()" calls as in previous sketch:

...
  t0=micros();
  t1=micros();
  delayMicroseconds(100);
  t2=micros();
  t3=micros();
...

Results in this output:

2016
2017
2119
2120

Seems I need to understand in detail how "delayMicroseconds()" and "micros()" work to answer my original question (is sub microsecond precision time measurement possible with Due?).

From ".arduino15/packages/arduino/hardware/sam/1.6.4/cores/arduino/wiring.h":

/**
 * \brief Pauses the program for the amount of time (in microseconds) specified as parameter.
 *
 * \param dwUs the number of microseconds to pause (uint32_t)
 */
static inline void delayMicroseconds(uint32_t) __attribute__((always_inline, unused));
static inline void delayMicroseconds(uint32_t usec){
    /*
     * Based on Paul Stoffregen's implementation
     * for Teensy 3.0 (http://www.pjrc.com/)
     */
    if (usec == 0) return;
    uint32_t n = usec * (VARIANT_MCK / 3000000);
    asm volatile(
        "L_%=_delayMicroseconds:"       "\n\t"
        "subs   %0, #1"                 "\n\t"
        "bne    L_%=_delayMicroseconds" "\n"
        : "+r" (n) :
    );
}

Hermann.

Hello there

Is it possible to measure time on Due with better precision than with "micros()" [sub microsecond]?

Here is the code I use to measure time Under the us :

float t;
void setup() {
  // put your setup code here, to run once:
Serial.begin(250000);
}

void loop() {
 /* 
    
  Measure time for sequences wich last less than 2 ms
  The system is initialized with SysTick_Config (SystemCoreClock / 1000)

  */


        t=SysTick->VAL;       
        Serial.println("tadaaaaa !");
        t -= SysTick->VAL ;                                               // SysyTick->Val counts down 
       
        Serial.print("nb of ticks = "); Serial.println(t);    
            
        if (t<0) t = t + SystemCoreClock/pow(10,3);                       // If one roll over, then t<0
        t= (t-1)/(SystemCoreClock/pow(10,9));                             // minus an offset of 1 clock cycle to measure
        
        Serial.print(" Time to execute this code "); 
        Serial.print(t);Serial.println( " ns");      
        
       delay(3000);        
  


}

A better precision can be obtained if SysTick_Config is initialized with a greater value than SystemCoreClock / 1000 (with some side effects and Under 2exp24-1) but can be handy for debugging.

With the default initialization of Systick config, 1 tick = 1 clock cycle.

Thank you, that is really cool !

And you answered the question:
SysTick->Val is RDTSC equivalent.
It measures with SystemCoreClock(=84,000,000) tick precision or 11.90476ns!

I wanted to test your measurement and replaced your Serial.println() statement by 0..9 increments of unsigned long variables:

        t=SysTick->VAL;       
        //Serial.println("tadaaaaa !");
        ++c1;
        ++c2;
        ++c3;
        ++c4;
        ++c5;
        ++c6;
        ++c7;
        ++c8;
        ++c9;
        t -= SysTick->VAL ;                                               // SysyTick->Val counts down

I did compile with -O3 instead of -Os (optimized for speed):

The first measurements confirm that you really read "clock ticks used" (plus 1):

0: 1
1: 5
2: 6
3: 8
4: 9
5: 13
6: 13 (!)
7: 108
8: 99 (!!)
9: 140

Seems that SAM microcontroller cannot handle more than 6 unsigned longs in registers.

Hermann.

The "SysTick" counter is common feature on all Cortex processors (it's part of the CPU core, rather than being considered a "peripheral." It's a 24bit down counter, normally clocked at the CPU frequency (sometimes there are other clocking options, but full CPU rate is the default.) It is automatically reloaded with a configurable value when it reaches zero, and can generate an interrupt.
The Arduino usage is typical: the reload value is set to the number of CPU clocks in 1ms (84000 for the 84MHz Due), and then the systick interrupt is used to count the number of milliseconds that have elapsed.
The total number of ticks that have elapsed since initialization is thus (millis*84000 + (84000-SyTick->VAL)) For shorter time periods, if you're lucky (or if you handle wrapping), you can just look at SysTick-VAL.)

I believe Due also has a bunch of "TC" style timers that aren't normally used. This can be clocked at least up to FCPU/2...

Seems that SAM microcontroller cannot handle more than 6 unsigned longs in registers.

Seems about right. There are 8 GP, generally accessible registers in cortex architectures, another 5 that are "somewhat accessible" on CM3/4 (32bit instructions rather than 16bit) and harder on M0 (doesn't have 32bit instructions. I think MOV can still access the high registers, and maybe some others.) Some of those may not be used because they're used by the function calling convention.
I'm a little surprised that the jump is so big between 6 and 7 variables; you'd have to look at the code produced to see why. I'd think a non-register local variable (you didn't include your variable declarations) would get incremented in 4 or 5 clocks (load, add, store.)

Thanks for all the good information!

I'd think a non-register local variable (you didn't include your variable declarations) would
get incremented in 4 or 5 clocks (load, add, store.)

This are the first 3 lines of the sketch I used, unsigned long variable are declared at sketch scope:

float t;
unsigned long c1=0,c2=0,c3=0,c4=0,c5=0,c6=0,c7=0,c8=0,c9=0;
void setup() {

I will investigate the generated code later.

The reason why I asked for sub microsecond timer resolution was this:
(laser) light travels 1m in 3.3ns, or 3.6m per Due clock tick.

I have some laser sensors since quite some time:

And today I learned how to split a laser beam at the edge of a small mirror:

So having one sensor near the mirror and the other 10m away, both connected to Arduino Due (same cable length) triggering an interrupt when Due turned laser on might measure speed of light (if 1st interrupt service routine does not take too long, just "t0=SysTick->VAL;", second ISR is "t1=SysTick->VAL;") ...

Hermann.

P.S: Just tested, 1 more mirror far away brings back the laser beam easily and no long cables are needed. Living room plus kitchen give me 2x9m=18 meter or 5 Due clock ticks the light will travel.

I made progress and built a 2x7m prototype in living room:

The right laser sensor is lit from the shown mirror, the middle laser sensor via mirror 7m away on other side of the room.

Next I found that laser cannot be driven wih 3.3V, but the laser sensor can (allows use with Due w/o level shifters):

Next I used a single laser sensor with Arduino Due and this sketch:

// Measure Speed of Light
//
#include "bitlash.h"

volatile uint32_t t0=0, t1=0;

void int4(void) {
   t0 = SysTick->VAL;
 }

void int5(void) {
   t1 = SysTick->VAL;
}

void resetVars(void) {
    t0=t1=0;
}

void ts(void) {
    Serial.print(t0);
    Serial.print(" ");
    Serial.println(t1);
}

void setup(void) {
    initBitlash(57600);		// must be first to initialize serial port

    addBitlashFunction("re", (bitlash_function) resetVars);

    addBitlashFunction("ts", (bitlash_function) ts);

    attachInterrupt(digitalPinToInterrupt(4), int4, RISING);
    attachInterrupt(digitalPinToInterrupt(5), int5, RISING);
}

void loop(void) {
    runBitlash();
}

In order to test how long processing of the (short) interrupt service routines (ISR) takes I did connect the laser sensor with both pins, D4 and D5. I did that because I read this:

If your sketch uses multiple ISRs, only one can run at a time, other interrupts will be executed after the current one finishes ...

This is a sample session, did light the sensor 3 times (and triggered RISING):

bitlash here! v2.0 (c) 2013 Bill Roy -type HELP- 1000 bytes free
> ts
0 0
> ts
4840 4877
> ts
83440 83477
> ts
49048 49085
>

Regardless of the implications for my two laser sensor measurement plan having measured 37 clock cycles for ISR is nice.

So this is not gonna work in our living room with 14m, because the (first) ISR needs 37 Arduino clock cycles (consistently). The laser beam would need to travel a distance longer than 1000/84*37/3.3=133.5m. Looking at the laser beam after 14m at the kitchen door in photo above (with shadow of laser sensor) we can see that it is not a small beam anymore but is much wider. Not sure whether adding another mirror at kitchen door and making beam travel 14m 10 times will leave enough light for the laser sensor to trigger (after 140m).

Next I will send beam with Due (level shifter needed) and work with single laser sensor.
Will try to determine laser startup time by multiple measurements (14m, 12m, 10m, ..., 2m).

Hermann.