Nano: millis() sparadically running backwards?

I am having an issue with trying to time an external 1PPS pulse. My goal is to measure the microcontroller's clock drift against a known good external source (in this case, a GPS module), so that when GPS lock is lost I have a quick and dirty way to keep somewhat accurate time until the next GPS lock (without having to buy an RTC).

For the most part, my solution appears to work on the bench. My test setup includes two nanos: one driving a 1PPS signal (just a mock up of an external source) and one running my measurement code. My issue is that every once in a while, I get a spurious update that gets a measured elapsed time of -20ms. When this happens, I end up getting an extra interrupt and measuring about ~1 second short, skewing my correction factor.

My relevant logs:

ClockCorrection::update  obs=25   max=60
ClockCorrection::update  obs=26   max=60
ClockCorrection::lost frame! delta=65536
ClockCorrection::update  obs=27   max=60
ClockCorrection::lost frame! delta=4294902763
ClockCorrection::update  obs=28   max=60
ClockCorrection::update  obs=29   max=60
ClockCorrection::update  obs=30   max=60

I have attached complete source to this post. The "ExternalClockSource" sketch is the ~1PPS output nano, "ClockCorrection" is the nano that is doing the measurement and emitting the above logs.
ClockCorrection.zip (3.1 KB)

Apologies, ignore the comment in ClockCorrection::update() that says "This check needed to counter correction of timer0". I forgot to clean up that comment when I reverted to a simple != check.

millis() sparadically running backwards?

No it doesn't, what ever it does it does not run backwards.

You may want to read this before you proceed:-
how to get the best out of this forum

It will tell you how to post code correctly using code tabs.

Many members use mobile devices and they will not cope with a .zip file and what it contains. So by posting code properly there more people who will be able to help you.

Please post the code here to avoid the need to do a download to see it

Thanks Mike. I am sure it is not running backwards and is rather a problem with my code, but that is the best descriptor for what is being observed.

This code involves a class and an interrupt service routine. I figured it would be too long to post but I will post the individual files here shortly.

Here is the demo sketch (the ClockCorrection class is coming up shortly, please be patient)

#include "Arduino.h"

#define L_EXTERNAL_CLOCK_PIN   3 /* D3 */

#include "ClockCorrection.h"

ClockCorrection clockCorrection;

unsigned long lastBlinkMillis;
bool blink;
volatile bool start;
bool started;

void /*ICACHE_RAM_ATTR*/ isr() // INTERRUPT SERVICE ROUTINE
{
  ClockCorrectionInterruptHandler();
  start = true;
}

void setup() {
  // put your setup code here, to run once:

  Serial.begin(9600);
  Serial.println("ClockCorrection demo starting");

  lastBlinkMillis = millis();
  start = false;
  started = false;

  clockCorrection.begin();

  pinMode(13, OUTPUT);
  pinMode(L_EXTERNAL_CLOCK_PIN, INPUT_PULLUP);
  attachInterrupt(digitalPinToInterrupt(L_EXTERNAL_CLOCK_PIN), isr, RISING);
}

void loop() {
  // put your main code here, to run repeatedly:
  unsigned long localMillis;

  clockCorrection.update();

  if (start)
  {
    //localMillis = millis();
    localMillis = clockCorrection.getMillis();
    
    if (!started)
    {
      // Sync up clocks at first go
      lastBlinkMillis = localMillis;
      started = true;
      blink = true;
    }
    else
    {    
      if (localMillis - lastBlinkMillis >= 500)
      {
        blink = !blink;
        lastBlinkMillis = localMillis;
        digitalWrite(13, blink ? HIGH : LOW);    
      }        
    }
  }
}

ClockCorrection.h:

#ifndef CLOCKCORRECTION_H
#define CLOCKCORRECTION_H

#include "Arduino.h"

typedef unsigned long T_TIME;

extern volatile T_TIME gClockCorrectionLastMillis;

void ClockCorrectionInterruptHandler();

/* Proper decorations for ISR routines */

#ifdef ICACHE_RAM_ATTR
#define INTERRUPT_SERVICE_CALLEE ICACHE_RAM_ATTR
#else
#define INTERRUPT_SERVICE_CALLEE /**/
#endif

class ClockCorrection
{
  private:
    T_TIME           mLastInterruptMillis;
    T_TIME           mFirstInterruptMillis;
    unsigned int     mInterruptIntervalTarget;
    unsigned int     mInterruptSampleCountTarget;

    unsigned int     mInterruptsObserved;

    unsigned int     mTimeoutInterval;

    double           mCorrectionFactor;

    void reset(bool isTimeout);
    void calculateCorrectionFactor(T_TIME finalMillisValue);

    void trace(const char *szMsg, ...)
    {
      static char log_msg[256];
  
      va_list args;
      va_start(args, szMsg);
      vsprintf(log_msg, szMsg, args);
      va_end(args);
        
      if (!strcmp(szMsg, "."))
       Serial.print(szMsg);
      else
      {
        Serial.write("ClockCorrection::");
        Serial.println(log_msg);
      }
    }

  public:
  
    ClockCorrection(void);

    void begin();

    /**
     * Should be called fairly frequently from the sketch's loop() function
     */
    void update();

    /**
     * Returns the corrected millisecond count
     */
    T_TIME getMillis();
};

#endif

And ClockCorrection.cpp, the module that is ultimately generating the logs:

#include "ClockCorrection.h"

/**
 * gClockCorrectionLastMillis - A global variable to hold the value of millis()
 * at the last time the external clock went HIGH
 */
volatile T_TIME gClockCorrectionLastMillis;

/**
 * ClockCorrectionInterruptHandler
 * 
 * To be invoked by the sketch's interrupt handler once per external clock signal
 * (the author suggests the rising edge of that signal), this simply saves off
 * the current value of millis for later comparison
 */
void INTERRUPT_SERVICE_CALLEE ClockCorrectionInterruptHandler()
{
  gClockCorrectionLastMillis = millis();
}




ClockCorrection::ClockCorrection(void)
{
  // Set some defaults for now
  mInterruptIntervalTarget = 1000;
  mInterruptSampleCountTarget = 60;
  mTimeoutInterval = 500;
  mCorrectionFactor = 1.0;

  reset(false);
}

void ClockCorrection::begin()
{
  
}

void ClockCorrection::update()
{
  static T_TIME isrMillis, delta;

  // Snapshot global register to prevent odd behavior if the ISR fires
  // during execution of this routine
  isrMillis = gClockCorrectionLastMillis;
  delta = isrMillis - mLastInterruptMillis;
  
  if (isrMillis != mLastInterruptMillis)
  {
    if (delta > 1100)
    {
      trace("lost frame! delta=%lu", delta);
    }
    
    if (mInterruptsObserved == 0)
    {
      mFirstInterruptMillis = isrMillis;
    }
    
    trace("update  obs=%i   max=%i", mInterruptsObserved, mInterruptSampleCountTarget);

    if (mInterruptsObserved >= mInterruptSampleCountTarget)
    {
      calculateCorrectionFactor(isrMillis);
      reset(false);
      mFirstInterruptMillis = isrMillis;
      mInterruptsObserved = 1;
    }
    else
    {
      mInterruptsObserved++;      
    }

    mLastInterruptMillis = isrMillis;
  }
  /*else if (delta > 2 * mInterruptIntervalTarget)
  {
    // Interrupt timed out, flag and reset
    trace("timeout! delta=%lu", delta);
    reset(true);

    mLastInterruptMillis = isrMillis;
  }*/
}

T_TIME ClockCorrection::getMillis()
{
  return millis() * mCorrectionFactor;
}

/****** private members ******/

void ClockCorrection::reset(bool isTimeout)
{
  mInterruptsObserved = 0;
}

void ClockCorrection::calculateCorrectionFactor(T_TIME finalMillisValue)
{
  static T_TIME localElapsedTime,
                expectedElapsedTime;
  static char   logMsg[256];

  localElapsedTime    = finalMillisValue - mFirstInterruptMillis;
  expectedElapsedTime = mInterruptIntervalTarget * mInterruptSampleCountTarget;

  mCorrectionFactor = (double)expectedElapsedTime / (double)localElapsedTime;

  sprintf(logMsg, "localElapsedTime    = %lu", localElapsedTime);
  Serial.println(logMsg);
  
  sprintf(logMsg, "expectedElapsedTime = %lu", expectedElapsedTime);
  Serial.println(logMsg);

  Serial.print("M                   = ");
  Serial.println(mCorrectionFactor, 10);
}

I think I may have found my issue. I forgot that the nano is an 8-bit processor, and is therefore copying 1 byte at a time when executing:

isrMillis = gClockCorrectionLastMillis;

This explains the -20... isrMillis (being a 4 byte ulong) is being torn during the copy. I have updated the code to be:

  noInterrupts();
  isrMillis = gClockCorrectionLastMillis;
  interrupts();

And so far I have not had any missed frames.

Can you please explain your solution?

The solution to my problem is that the variable "gClockCorrectionLastMillis" (type unsigned long, aka 4 bytes) was getting overwritten during the copy into isrMillis. Take a look at what is happening on the instruction level, if we assume the value of gClockCorrectionLastMillis to be 0xAABBCCDD, then the instruction:

isrMillis = gClockCorrectionLastMillis;

is equivalent to:

char *dest = (char *)isrMillis;
char *src = (char *)gClockCorrectionLastMillis;

dest[0] = src[0]; // "0xAA" Natively, the arduino is copying one byte at a time
// The interrupt can happen here
dest[1] = src[1]; // "0xBB"
// The interrupt can happen here
dest[2] = src[2]; // "0xCC"
// The interrupt can happen here
dest[3] = src[3]; // "0xDD"

So if the interrupt fires during this line and sets to gClockCorrectionLastMillis to 0xFFFFFFFF, isrMillis can end up any of the values 0xAABBCCFF, 0xAABBFFFF, or 0xAAFFFFFF.

The moral of the story is, never assume that "X = Y" is an atomic operation!

I think you still have the issue that millis() increments by two sometimes. That's because it actually runs a bit slow. But you can replace the default millis() with your own millis timer0 interrupt that increments at exactly 1ms intervals.

/*
This is a replacement for the ISR(TIMER0_OVF_vect)
interrupt that drives millis(). It disables the OVF
interrupt, enables the COMPA interrupt, sets OCR0A to 249,
and changes Timer0 to CTC mode.  This results in an
interrupt every 250 timer clock cycles, which is exactly 1ms
for a 16MHz crystal, or 2ms for an 8MHz crystal.  The new ISR
increments millis, but since the interrupt rate is exactly
correct, no periodic double increment of millis is needed.

Using this code probably means you can't do any analog
writes that use Timer0, which would include pins 5 and 6.
*/

extern volatile unsigned long timer0_millis;   //these defined in wiring.c
extern volatile unsigned long timer0_overflow_count;

volatile unsigned long MILLIS_INCB = (64 * 250) / (F_CPU / 1000);  // ms between timer overflows

const int cycleTime = 500;                 // flash LED every second
int LEDcount = cycleTime;
unsigned long oldMillis = millis();
unsigned long newMillis = 0;

void setup() {                             //Set up alternate interrupt
                                           //   at 249 on timer0
                                           
  cli();                                   // disable interrupts while doing this

  TCCR0A = 0;                              // set entire TCCR0A register to 0
  TCCR0B = 0;                              // same for TCCR0B
  TCNT0  = 0;                              // initialize timer0 count to 0

  OCR0A = 249;                             // set top of counter
  TIMSK0 &= ~bit(TOIE0);                   // disable overflow interrupt
  TCCR0A |= bit(WGM01);                    // turn on CTC mode
  TCCR0B |= (bit(CS01)+bit(CS00));         // Set CS00&CS01 bits for prescaler = 64
  TIMSK0 |= bit(OCIE0A);                   // enable timer compare interrupt

  sei();                                   // enable interrupts

  pinMode(13,OUTPUT);
  digitalWrite(13,HIGH);

}


void loop() {                               // flashes LED for 1/2 second
                                            //    every second

  newMillis = millis();
  if ((newMillis - oldMillis) > 0) {
    oldMillis = newMillis;
    LEDcount -= 1;
    if (LEDcount == 0) {
      digitalWrite(13,!digitalRead(13));    // invert pin 13 state
      LEDcount = cycleTime;
    }   
  }
}


ISR(TIMER0_COMPA_vect) {                    // this is the new ISR - much
                                            //   simpler than original
  
  timer0_millis += MILLIS_INCB;
  timer0_overflow_count++;

}

Thanks for the code, but the point of this code is that the long-term drift of the CPU clock is accounted for. It won't be perfect, but it should be better than just using millis() to manage a local clock during loss of gps sync. It assumes that ambient temperature (and therefore clock drift factor) is reasonably stable, and is not meant for weeks or months of gps loss. This code is going into an NTP server.

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.