Millis Accuracy Again

I already mentioned above where millis() is used, both in the "encoder" tab, where I read tcnv and _tstart.

Those are the only (important) uses of millis().

T1 and T2 are very far apart. T1 gets set when the encoder gets calibrated (which only happens once).

T2 is basically the current value of millis().

This is what I don't understand, the value of millis() deviates from wall clock time by an ever-increasing amount (I posted a link to an Excel spreadsheet very early in this thread that shows the razor-straight slope of the deviation).

The only thing I can think of that is causing missed ticks is the SPI library. I haven't looked inside it, but in my read_encoder I read the ADC 128 times very rapidly (at 2MHz SPI clock). This procedure takes about 4ms. If the SPI library disables interrupts while bit-banging, that would result in a lot of missed ticks.

A linear deviation from the wall clock seems to make sense if the oscillator is off frequency OR if you have a bug that introduces a consistent error.

I would think that no matter how many times you sample the ADC, that ints would be enabled between each call. A ~2mS SPI ISR time sounds unlikely.

At this point, I would do what it takes to find out precisely (as you can) what your resonator is running at on one of your boards by running a time keeping routine, like I described, for 24 hours and just see how far off it is. For every second that it's in error after 24 hours, that's roughly 10ppm of error. Or if you have an accurate scope or frequency counter? Once you know what your base error is, then we can determine if your results are to be expected or if they are out of line because of a bug.

EDIT: If you have an RTC or even a GPS that can generate a 1pps pulse, I can give you a sketch that will precisely measure the pulse length (1uS) using the hardware capture. You can then take this measurement and get a decent idea of how far off your resonator is. For example if you get consistent measurements of a one second GPS pulse of 999700uS, then you know your off by 300ppm.

It is definitely something to try. But the deviation is not the same (as noted above, sometimes it's -2000ppm, sometimes -1000ppm, other times +1000ppm). And I have neither scope nor counter. I'd like to get one of the TBS1022's though. $)

Actually I already bought some of the RTCs. I just can't work on them as I need to travel for the next week. But realistically the effort of putting in an RTC is less than digging around the code. And the added cost to the solution isn't that much, considering that it relies on an encoder that costs anywhere from $350 to $700. So a $5 RTC and crystal are insignificant.

Now if I still see drift after putting in the RTC.. then it would have to be a software bug. :astonished:

Check my EDIT on the last post (I do this a lot, so keep that in mind) about using a GPS output pulse to measure the accuracy of your resonator.

EDIT: You should check out the Rigol scopes too, they're pretty good and you can get 100MHz bandwidth for $400.

The RTCs I got (PCF8583) does output a 1Hz pulse. I was thinking of just using that instead of bothering with the I2C bus (would save pins, and less code needed).

GPS.. I have thought about it. But I don't like the idea of having to rely on a GPS fix. The Chronodot costs about the same.

Thing is - I don't need GPS or even Chronodot accuracy. 50ppm would be good enough! so in theory the Max32 should have no trouble achieving that with its 30ppm crystal. But for some strange reason millis() doesn't give correct results even on that system. I can't help thinking it must be some software bug.. but I can't find it.

I was only thinking of using the GPS temporarily to measure the accuracy of your board. Here is my standard sketch that measures pulse length (the whole thing) on pin8 and outputs the number of microseconds since the last capture. The first reading output is noise, but all other output will be very precisely measured by the hardware. No software can introduce any jitter in the measurement since the ICF is completely done in hardware.

#include "Arduino.h"

volatile unsigned t1captured = 0;
volatile unsigned t1capval = 0;
volatile unsigned t1ovfcnt = 0;
volatile unsigned long t1time;
volatile unsigned long t1last = 0;

#define BUFFER_SIZE 32

volatile unsigned long int buffer[BUFFER_SIZE];
volatile int head = 0;
volatile int tail = 0;

void setup() {

  Serial.begin(9600);  

  TCCR1A = 0x0;    // put timer1 in normal mode
  TCCR1B = 0x2;    // change prescaler to divide clock by 8

  // clear any pending capture or overflow interrupts
  TIFR1 = (1<<ICF1) | (1<<TOV1);
  // Enable input capture and overflow interrupts
  TIMSK1 |= (1<<ICIE1) | (1<<TOIE1);
  
  pinMode(8, INPUT);   // This is where to feed the signal in
}

void loop() {

  if(head != tail) {
    head = (head + 1) % BUFFER_SIZE;
    Serial.println(buffer[head]);
  }
  
}

ISR(TIMER1_OVF_vect) {
  
   t1ovfcnt++;              // keep track of overflows

}


ISR(TIMER1_CAPT_vect) {
  
  unsigned long t1temp;

  // combine overflow count with capture value to create 32 bit count
  //  calculate how long it has been since the last capture
  //   stick the result in the global variable t1time in 1uS precision

  t1capval = ICR1;
  t1temp = ((unsigned long)t1ovfcnt << 16) | t1capval;
  t1time = (t1temp - t1last) >> 1;  // convert to full uS
  t1last = t1temp;
  
  tail = (tail + 1) % BUFFER_SIZE;
  buffer[tail] = t1time;
}

I won't have time to look into the code you posted until later, but if you haven't already done so I suggest it would be worth your time writing a minimal sketch that demonstrates the problem in the simplest way you can.

I assume that a simple sketch that just calls millis() repeatedly and prints the result won't reproduce the problem, because it didn't for me.

Something else you're doing within the sketch must be triggering it, and the suggestions that it's interrupt overflow seem like the most likely explanation. However, that would not occur on a well-behaved system.

You may find it's something that can be provoked by doing SPI writes, or SPI reads, or something else. If you can figure out by trial and error what the key factor is, that would help us understand the cause and get us closer to finding a resolution.

PeterH:
I won't have time to look into the code you posted until later, but if you haven't already done so I suggest it would be worth your time writing a minimal sketch that demonstrates the problem in the simplest way you can.

I assume that a simple sketch that just calls millis() repeatedly and prints the result won't reproduce the problem, because it didn't for me.

Something else you're doing within the sketch must be triggering it, and the suggestions that it's interrupt overflow seem like the most likely explanation. However, that would not occur on a well-behaved system.

You may find it's something that can be provoked by doing SPI writes, or SPI reads, or something else. If you can figure out by trial and error what the key factor is, that would help us understand the cause and get us closer to finding a resolution.

Exactly. He's getting way too much variation to conclude that millis() are the cause.

GoForSmoke:
Exactly. He's getting way too much variation to conclude that millis() are the cause.

Exactly X2

I did a quick look at your code and offer these suggestions:

  1. Delay() is bad, and it appears your code is planted there. Try using a non-blocking delay.
  2. You are running lots of ISRs (Serial, SPI, ADC, to identify a few) which could be blocking. Try eliminating/minimizing some and see if your timer accuracy improves.

orly_andico:
Note that the version of the code posted above uses exttimer_millis()

...which...

  1. Is flawed.
  2. Will be no more or less accurate than millis.

Stop using it.

Hi orly_andico

I think it's a software related problem. You are doing some strange things with millis()...

I guess this is your loop:

void loop() {  
  long tstart = exttimer_millis();

  handler_called++;

  if ((handler_called % THINKPERIOD) != 0) {
    do_autoguider();
  } 
  else {
    handler();
  }

  long tcnv = (exttimer_millis() - tstart) + 1;

  if (tcnv < PERIODMILLIS) {
    delay(PERIODMILLIS - tcnv);
  }
}

a delay in main loop and an incremente to keep track of when to do things. That is not how I would have done it.

void exttimer_init() {
  Timer1.initialize(10000);    // 10 milliseconds
  Timer1.attachInterrupt(exttimer_callback);
}

void exttimer_callback() {
  _mymillis += 10;
}

The exttimer has a resolution of 10 ms. That is alot considering you wanna do tcnv average here:

void read_encoder(long &A, long &B, long &tcnv) {
  int reading;
  int i;

  long t0, t1;

  t0 = exttimer_millis();

  // this should finish in 5ms or less @ 32ksps
  for (i = 0; i < OVERSAMPLING; i++) {
    reading = read_adc(1);
    A += reading;

    reading = read_adc(2);
    B += reading;
  }

  A = A / OVERSAMPLING;
  B = B / OVERSAMPLING;

  t1 = exttimer_millis();
  
  // tcnv should be in milliseconds
  tcnv = (t0 + t1) / 2;
}

Byt the way. You average calculation is a disaster just waiting to happpen. What happens if you forget to set the variables that &A and &B is refering to to not 0 prior to calling the routine..... strange average .....
Edit: This is happening in the calibrate() where encoderA and encoderB is set to 0 outside the while loop.

Btw, can you pleast explane how the calibrate() routine works. It seemes like a strange way to get the mean values from the encoders and keep the outliers out. It dos not follow dixon's test for outliers.

edit,edit: I have been thinking a but more about that calibration. I think you need to rethink it. You are oversampling from the encoders 64 times and then returning the average as reading value. If this value is an outlire then numerious of the 64 readings must be outliers. You should test for outliers among the crude values returned from the encoders.

-Fletcher

Hi all,

An update.

  1. I changed the encoder reading to an interquartile mean (following the STMicro Application Note 3964 "How to design a simple temperature measurement application using the STM32L-DISCOVERY")

  2. I got rid of the delay() in the main loop and replaced it with a do-nothing loop

  tcnv = 0;
  while (tcnv < PERIODMILLIS) {
    tcnv = (micros() - tstart) / 1000UL;
  }

The 2nd part seems to have made all the timing drift problems go away!

This is really strange, because in Unix-land where I come from, sleep() is good - and spinning the CPU is bad. But it seems in Arduino-land the opposite is true.

orly_andico:
2) I got rid of the delay() in the main loop and replaced it with a do-nothing loop
The 2nd part seems to have made all the timing drift problems go away!

I'm glad that you have solved the problem.

I agree it is strange that the change cured it. However, since you never did post a test case that demonstrated the problem, I've never been able to reproduce it and have no way to investigate it myself, so I'm afraid you're on your own.

In Unix-land you have a multi-tasking OS. With Arduino you write your own tasking.

Nice to see you finally dumped the code-blocking delay that was screwing your results up.
How many times was that suggested in how many ways?

GoForSmoke,

True, getting rid of delay() was suggested several times. It was just very counter-intuitive.

Back to Unix-land, using sleep() allow other processes to run (or lets the CPU go into low-power mode, if there are no runnable processes), but using a do-nothing loop spins the CPU which keeps power consumption up.

While the Arduino is single-tasking, I would have thought that allowing it to power-down when doing nothing was good. Or maybe delay() isn't smart enough for that.. or the AVR CPU doesn't have that capability. The STM32 CPU's do have the capability to clock down to 4MHz and even go into deep sleep, then wake up on interrupt.

(incidentally I will probably port this code to an STM32 going forward, the boards are laughably cheap and the processors significantly more powerful)

orly_andico:
GoForSmoke,

True, getting rid of delay() was suggested several times. It was just very counter-intuitive.

Back to Unix-land, using sleep() allow other processes to run (or lets the CPU go into low-power mode, if there are no runnable processes), but using a do-nothing loop spins the CPU which keeps power consumption up.

While the Arduino is single-tasking, I would have thought that allowing it to power-down when doing nothing was good. Or maybe delay() isn't smart enough for that.. or the AVR CPU doesn't have that capability. The STM32 CPU's do have the capability to clock down to 4MHz and even go into deep sleep, then wake up on interrupt.

(incidentally I will probably port this code to an STM32 going forward, the boards are laughably cheap and the processors significantly more powerful)

Strange how things change when you have 2 million times less RAM...

orly_andico:
GoForSmoke,

True, getting rid of delay() was suggested several times. It was just very counter-intuitive.

Back to Unix-land, using sleep() allow other processes to run (or lets the CPU go into low-power mode, if there are no runnable processes), but using a do-nothing loop spins the CPU which keeps power consumption up.

While the Arduino is single-tasking, I would have thought that allowing it to power-down when doing nothing was good. Or maybe delay() isn't smart enough for that.. or the AVR CPU doesn't have that capability. The STM32 CPU's do have the capability to clock down to 4MHz and even go into deep sleep, then wake up on interrupt.

(incidentally I will probably port this code to an STM32 going forward, the boards are laughably cheap and the processors significantly more powerful)

That is true but whether powered down or using blocking code like delay, the Arduino cannot also be watching the I/O pins or sensors. A multitasking system can if the watching is done with a different task.

I look at the ability to watch and time as the 'attention' of the computer and write my state machines with that as a budget. I know that the attention budget of an UNO or MEGA is far more than enough to do the job you set -if- nothing else is grabbing too many cycles in between, like may be or have been happening with the print load you also have.

I am just up from 'can't sleep' and not remembering details but will you be running the project while connected to a PC? If so then all the math and verbose output can be done on the PC using the free PC language that Arduino is based on, Processing. If you have both worlds then you might as well have the best of both worlds. I even wonder if there's a way to get an accurate timing signal from the PC, possibly through the parallel port using Linux.

GoForSmoke:
Nice to see you finally dumped the code-blocking delay that was screwing your results up.

... by replacing it with a home-made equivalent which is also blocking. The only reason I can see for this to 'correct' the problem is that the code relies on the actual elapsed time being exactly what was specified - which would be exactly the sort of bug we'd be looking for to explain these symptoms. IF we'd got a simple test case that demonstrated the problem, we might have been able to point the bug out. Since we don't, the only conclusion I can come to is that it's a bug somewhere in the sketch.

of course, even up to version 1.5.2 beta of the code. millis() can sometime miss a timer0count rollover.

the micros() call handles the missed counter ISR trigger, but millis() does not.

a call exists on the arduino bug tracker, but as its been so long, its refering to older IDE versions, where both micros() & millis() could lose counts.

http://code.google.com/p/arduino/issues/detail?id=187

although this call is old, and newer IDE versions dont need the fix doing for micros(), i've modified my code to allow me to have it correct millis() depending on what board I compile my own code for.

Darryl