nanos() [time elapsed since program start in nanoseconds]?

hi everyone!

In my current application I'd like to be able to have access to timing in the 10 to 100 ns range. So I started digging around in the micros() source code and came up with the following:

uint32_t nanos( void )
    uint32_t ticks ;
    uint32_t count ;

    do {
        ticks = SysTick->VAL;
    } while (SysTick->CTRL & SysTick_CTRL_COUNTFLAG_Msk);

    // GetTickCount() is millis()
    uint32_t load= (SysTick->LOAD + 1 - ticks);
    uint32_t milliS = GetTickCount() * 1000000;
    uint32_t microS =  load/ (SystemCoreClock/1000000) * 1000;
    uint32_t nanoS = load/ (SystemCoreClock/10000000); // these are actually 100s of nanoseconds; 84MHz  = 11.9ns
    //Serial.print(milliS); Serial.print(" "); Serial.print(microS); Serial.print(" "); Serial.println(nanoS);
    return milliS + microS + nanoS;


It's exactly the micros() source code except that on the return I changed the divisions by 1000. Could someone please comment on whether this is the right approach?

On a side note: isn't division notoriously slow on the Due, so isn't there a better way to optimize this code for performance?

multiplication and division are in hardware on the Cortex, multiplication is single cycle, don't know the figure for division though.