Millis Accuracy Again

@afremont,

I did read what you said. But I don't see how it applies in my situation.

  1. I don't need for my routine to run at precisely 1000ms intervals

  2. I don't need to (and don't) keep track of how long atan takes

  3. I don't see how keeping track of the rollover, etc. is any more accurate than reading millis() directly.

It's not that I don't appreciate the help, but that I don't understand why your approach is better than reading millis().

Again to reiterate, I do not need for my routine (or event, or what-not) to happen at fixed intervals. I just need to know when it happens.

You keep saying you need to know what time it is and how many seconds there are in a sidereal day. I gave you a mechanism that will precisely keep track of seconds as time passes. All you have to do is add one to a counter each time the while loop finishes. micros() will roll over every 72 minutes so you might possibly need to keep that in mind.

Agreed. But the mechanism also relies on millis(). Or micros() for that matter.

How is it superior to just reading one of the above functions directly?

And if we are agreed that millis() / micros() are drifting, wouldn't the above approach also drift?

From my understanding of the stability of crystals and resonators, they drift based on temperature. If the temp is fairly constant, their deviation from rated frequency should be more-or-less the same, and not all over the place.

I also don't understand why accumulating a count yourself (instead of relying on the millis() routine to do that) is any better than using millis() directly.

If millis() is losing ticks, then any routine that relies on millis() would also lose ticks.

No, I'm relying on the long term accuracy of millis(). In the long run it is accurate as the ISR is self compensating for the 1.024mS interrupt interval. Any drift using what I gave you is from the oscillator and cannot be stopped without going to a more accurate oscillator such as a TCXO or a super accurate reference such as a Chronodot.

EDIT: Catching up with your edit. If anything is causing a long enough delay to cause the Timer0 ISR to miss a tick, then that is a severe problem. The only thing I know of that does that is SoftwareSerial, and possibly the IR library.

In my experience, temperature drift of the ceramic resonator is quite small at room temperature variations. Perhaps I've been lucky, but my two Uno's are within 300ppm and 50ppm respectively of being on the dot.

Yes. And that's why I don't understand your approach.

Right now I am reading millis() directly. So you could say I am also relying on the long-term accuracy of millis(). And while it's true that the ceramic resonator in my Mega is not that good, I'm also seeing drift in millis() with the Max32 which has a +/- 30ppm crystal.

Again - I'm not doing anything special with millis(). I'm just reading it. And it's losing time.

Why would your approach result in better time-keeping when it also relies on millis() ?

I'll have to dig thru your code and see how your handling it. Notice how I never use the value returned by millis() to adjust the roll time. I always adjust it by exactly 1000 and in the long run it's very accurate. I only use millis() value for comparison and to prime the roll time initially.

If your T1 and T2 times are like 1 or 2mS apart, then using millis() will not cut it. If they are on the order of seconds apart, then it will probably be ok though there will always be error (39 out of 40 times anyway) in the value returned by millis(). This is where micros() shines as it doesn't have this springy behavior.

At any rate, if your problems are truly from oscillator drift or initial inaccuracy, then you'll have to solve that with a hardware upgrade.

I already mentioned above where millis() is used, both in the "encoder" tab, where I read tcnv and _tstart.

Those are the only (important) uses of millis().

T1 and T2 are very far apart. T1 gets set when the encoder gets calibrated (which only happens once).

T2 is basically the current value of millis().

This is what I don't understand, the value of millis() deviates from wall clock time by an ever-increasing amount (I posted a link to an Excel spreadsheet very early in this thread that shows the razor-straight slope of the deviation).

The only thing I can think of that is causing missed ticks is the SPI library. I haven't looked inside it, but in my read_encoder I read the ADC 128 times very rapidly (at 2MHz SPI clock). This procedure takes about 4ms. If the SPI library disables interrupts while bit-banging, that would result in a lot of missed ticks.

A linear deviation from the wall clock seems to make sense if the oscillator is off frequency OR if you have a bug that introduces a consistent error.

I would think that no matter how many times you sample the ADC, that ints would be enabled between each call. A ~2mS SPI ISR time sounds unlikely.

At this point, I would do what it takes to find out precisely (as you can) what your resonator is running at on one of your boards by running a time keeping routine, like I described, for 24 hours and just see how far off it is. For every second that it's in error after 24 hours, that's roughly 10ppm of error. Or if you have an accurate scope or frequency counter? Once you know what your base error is, then we can determine if your results are to be expected or if they are out of line because of a bug.

EDIT: If you have an RTC or even a GPS that can generate a 1pps pulse, I can give you a sketch that will precisely measure the pulse length (1uS) using the hardware capture. You can then take this measurement and get a decent idea of how far off your resonator is. For example if you get consistent measurements of a one second GPS pulse of 999700uS, then you know your off by 300ppm.

It is definitely something to try. But the deviation is not the same (as noted above, sometimes it's -2000ppm, sometimes -1000ppm, other times +1000ppm). And I have neither scope nor counter. I'd like to get one of the TBS1022's though. $)

Actually I already bought some of the RTCs. I just can't work on them as I need to travel for the next week. But realistically the effort of putting in an RTC is less than digging around the code. And the added cost to the solution isn't that much, considering that it relies on an encoder that costs anywhere from $350 to $700. So a $5 RTC and crystal are insignificant.

Now if I still see drift after putting in the RTC.. then it would have to be a software bug. :astonished:

Check my EDIT on the last post (I do this a lot, so keep that in mind) about using a GPS output pulse to measure the accuracy of your resonator.

EDIT: You should check out the Rigol scopes too, they're pretty good and you can get 100MHz bandwidth for $400.

The RTCs I got (PCF8583) does output a 1Hz pulse. I was thinking of just using that instead of bothering with the I2C bus (would save pins, and less code needed).

GPS.. I have thought about it. But I don't like the idea of having to rely on a GPS fix. The Chronodot costs about the same.

Thing is - I don't need GPS or even Chronodot accuracy. 50ppm would be good enough! so in theory the Max32 should have no trouble achieving that with its 30ppm crystal. But for some strange reason millis() doesn't give correct results even on that system. I can't help thinking it must be some software bug.. but I can't find it.

I was only thinking of using the GPS temporarily to measure the accuracy of your board. Here is my standard sketch that measures pulse length (the whole thing) on pin8 and outputs the number of microseconds since the last capture. The first reading output is noise, but all other output will be very precisely measured by the hardware. No software can introduce any jitter in the measurement since the ICF is completely done in hardware.

#include "Arduino.h"

volatile unsigned t1captured = 0;
volatile unsigned t1capval = 0;
volatile unsigned t1ovfcnt = 0;
volatile unsigned long t1time;
volatile unsigned long t1last = 0;

#define BUFFER_SIZE 32

volatile unsigned long int buffer[BUFFER_SIZE];
volatile int head = 0;
volatile int tail = 0;

void setup() {

  Serial.begin(9600);  

  TCCR1A = 0x0;    // put timer1 in normal mode
  TCCR1B = 0x2;    // change prescaler to divide clock by 8

  // clear any pending capture or overflow interrupts
  TIFR1 = (1<<ICF1) | (1<<TOV1);
  // Enable input capture and overflow interrupts
  TIMSK1 |= (1<<ICIE1) | (1<<TOIE1);
  
  pinMode(8, INPUT);   // This is where to feed the signal in
}

void loop() {

  if(head != tail) {
    head = (head + 1) % BUFFER_SIZE;
    Serial.println(buffer[head]);
  }
  
}

ISR(TIMER1_OVF_vect) {
  
   t1ovfcnt++;              // keep track of overflows

}


ISR(TIMER1_CAPT_vect) {
  
  unsigned long t1temp;

  // combine overflow count with capture value to create 32 bit count
  //  calculate how long it has been since the last capture
  //   stick the result in the global variable t1time in 1uS precision

  t1capval = ICR1;
  t1temp = ((unsigned long)t1ovfcnt << 16) | t1capval;
  t1time = (t1temp - t1last) >> 1;  // convert to full uS
  t1last = t1temp;
  
  tail = (tail + 1) % BUFFER_SIZE;
  buffer[tail] = t1time;
}

I won't have time to look into the code you posted until later, but if you haven't already done so I suggest it would be worth your time writing a minimal sketch that demonstrates the problem in the simplest way you can.

I assume that a simple sketch that just calls millis() repeatedly and prints the result won't reproduce the problem, because it didn't for me.

Something else you're doing within the sketch must be triggering it, and the suggestions that it's interrupt overflow seem like the most likely explanation. However, that would not occur on a well-behaved system.

You may find it's something that can be provoked by doing SPI writes, or SPI reads, or something else. If you can figure out by trial and error what the key factor is, that would help us understand the cause and get us closer to finding a resolution.

PeterH:
I won't have time to look into the code you posted until later, but if you haven't already done so I suggest it would be worth your time writing a minimal sketch that demonstrates the problem in the simplest way you can.

I assume that a simple sketch that just calls millis() repeatedly and prints the result won't reproduce the problem, because it didn't for me.

Something else you're doing within the sketch must be triggering it, and the suggestions that it's interrupt overflow seem like the most likely explanation. However, that would not occur on a well-behaved system.

You may find it's something that can be provoked by doing SPI writes, or SPI reads, or something else. If you can figure out by trial and error what the key factor is, that would help us understand the cause and get us closer to finding a resolution.

Exactly. He's getting way too much variation to conclude that millis() are the cause.

GoForSmoke:
Exactly. He's getting way too much variation to conclude that millis() are the cause.

Exactly X2

I did a quick look at your code and offer these suggestions:

  1. Delay() is bad, and it appears your code is planted there. Try using a non-blocking delay.
  2. You are running lots of ISRs (Serial, SPI, ADC, to identify a few) which could be blocking. Try eliminating/minimizing some and see if your timer accuracy improves.

orly_andico:
Note that the version of the code posted above uses exttimer_millis()

...which...

  1. Is flawed.
  2. Will be no more or less accurate than millis.

Stop using it.

Hi orly_andico

I think it's a software related problem. You are doing some strange things with millis()...

I guess this is your loop:

void loop() {  
  long tstart = exttimer_millis();

  handler_called++;

  if ((handler_called % THINKPERIOD) != 0) {
    do_autoguider();
  } 
  else {
    handler();
  }

  long tcnv = (exttimer_millis() - tstart) + 1;

  if (tcnv < PERIODMILLIS) {
    delay(PERIODMILLIS - tcnv);
  }
}

a delay in main loop and an incremente to keep track of when to do things. That is not how I would have done it.

void exttimer_init() {
  Timer1.initialize(10000);    // 10 milliseconds
  Timer1.attachInterrupt(exttimer_callback);
}

void exttimer_callback() {
  _mymillis += 10;
}

The exttimer has a resolution of 10 ms. That is alot considering you wanna do tcnv average here:

void read_encoder(long &A, long &B, long &tcnv) {
  int reading;
  int i;

  long t0, t1;

  t0 = exttimer_millis();

  // this should finish in 5ms or less @ 32ksps
  for (i = 0; i < OVERSAMPLING; i++) {
    reading = read_adc(1);
    A += reading;

    reading = read_adc(2);
    B += reading;
  }

  A = A / OVERSAMPLING;
  B = B / OVERSAMPLING;

  t1 = exttimer_millis();
  
  // tcnv should be in milliseconds
  tcnv = (t0 + t1) / 2;
}

Byt the way. You average calculation is a disaster just waiting to happpen. What happens if you forget to set the variables that &A and &B is refering to to not 0 prior to calling the routine..... strange average .....
Edit: This is happening in the calibrate() where encoderA and encoderB is set to 0 outside the while loop.

Btw, can you pleast explane how the calibrate() routine works. It seemes like a strange way to get the mean values from the encoders and keep the outliers out. It dos not follow dixon's test for outliers.

edit,edit: I have been thinking a but more about that calibration. I think you need to rethink it. You are oversampling from the encoders 64 times and then returning the average as reading value. If this value is an outlire then numerious of the 64 readings must be outliers. You should test for outliers among the crude values returned from the encoders.

-Fletcher

Hi all,

An update.

  1. I changed the encoder reading to an interquartile mean (following the STMicro Application Note 3964 "How to design a simple temperature measurement application using the STM32L-DISCOVERY")

  2. I got rid of the delay() in the main loop and replaced it with a do-nothing loop

  tcnv = 0;
  while (tcnv < PERIODMILLIS) {
    tcnv = (micros() - tstart) / 1000UL;
  }

The 2nd part seems to have made all the timing drift problems go away!

This is really strange, because in Unix-land where I come from, sleep() is good - and spinning the CPU is bad. But it seems in Arduino-land the opposite is true.

orly_andico:
2) I got rid of the delay() in the main loop and replaced it with a do-nothing loop
The 2nd part seems to have made all the timing drift problems go away!

I'm glad that you have solved the problem.

I agree it is strange that the change cured it. However, since you never did post a test case that demonstrated the problem, I've never been able to reproduce it and have no way to investigate it myself, so I'm afraid you're on your own.

In Unix-land you have a multi-tasking OS. With Arduino you write your own tasking.

Nice to see you finally dumped the code-blocking delay that was screwing your results up.
How many times was that suggested in how many ways?