Consistently inconsistent results in timing code

I'm working on a project that involves measuring the intervals between a series of short (~10 uS) low-high-low pulses. The code below works but I noticed that the first interval comes out to be about 50 clock cycles longer than all of the subsequent ones. (Though sometimes the difference is close to zero or around a hundred.)

This sketch is running on a Duemilanove and for testing purposes I'm using another Arduino as a pulse generator. I've looked at the test pulses on my scope and I'm pretty confident that the intervals are more consistent than this.

The sketch works well enough for my purposes but there's obviously something going on that I don't understand, so I'd appreciate any efforts to cast light on what's going on here. General feedback on the approach I am taking here is welcome, too.

(Thank you, Nick Gammon, for the examples on your website. This is my first foray into interrupts and timers and your code was super helpful.)

const int outPin = PD2; //pin 2
const int inPin = PD7;  // pin 7

volatile unsigned int overflowCount;
unsigned int timer1CounterValue;
unsigned int overflowCountCopy;


unsigned long t1Count[] = {0,0,0,0}; //measuring four splits between five signals
unsigned long t1ovf[] = {0,0,0,0};

void setup() {
  pinMode(inPin, INPUT); 
  Serial.begin(9600);
 
}

void loop() {
  scan5();
  delay(1000);

}

ISR (TIMER1_OVF_vect) {
  ++overflowCount; // count number of Counter1 overflows
}  // end of TIMER1_OVF_vect

void scan5() {
  overflowCount = 0;
  TCCR1A = 0;
  TCCR1B = 0;

  TIFR1 = bit (TOV1);
  TIMSK1 = bit (TOIE1); // interrupt on Timer 1 overflow 
  
  // start Timer 1, no prescaler
  TCCR1B =  bit (CS10); //Set CS12:0 to 001 for no prescaling

  for (int i = 0; i < 4; i++){
    //wait for first rising edge:
    while ((PIND &= _BV(inPin)) == 0) {/*wait*/} 

    TCNT1 = 0; //signal line went high so reset counter to zero
    overflowCount = 0;
    
    // do nothing while pin stays high
    while ((PIND &= _BV(inPin)) == _BV(inPin)) {/*wait*/}
  
    // wait for inPin to go high again
    while ((PIND &= _BV(inPin)) == 0) {/*wait*/}
    timer1CounterValue = TCNT1;
    
    overflowCountCopy = overflowCount;
    overflowCount = 0;
    //check for interrupt while getting TCNT1
    if ((TIFR1 & bit(TOV1)) && timer1CounterValue < 256){
      overflowCountCopy++;     
    }

    t1Count[i] = timer1CounterValue;
    t1ovf[i] = overflowCountCopy;
    
  }

  for (int i = 0; i < 4; i++){
    float microSeconds;
    unsigned long totalCounts;
 
    totalCounts = (t1ovf[i] << 16) + t1Count[i];
    microSeconds = float(totalCounts)/16;
    
    Serial.print("Overflow count: "); Serial.print(t1ovf[i]); Serial.print("\tT1 count: ");
   Serial.print(t1Count[i]);
    Serial.print(" -- "); Serial.print(microSeconds); Serial.println(" microseconds");
  }

  Serial.println(t1Count[0]-t1Count[1]); // show discrepancy between measurements
  Serial.println("****************************************************");
}

There is no need to reinitialize the settings Timer1 every time scan5() runs. Put them in setup().

You need to disable interrupts while making this copy, then enable them again.

    overflowCountCopy = overflowCount;

It would be helpful to state the pulse rate that is being counted, as there are lots of other potential timing problems in that code.

Can you explain why it makes sense to declare microSeconds as float, which has only 6-7 digits of accuracy?

Thanks for the recommendations - changes made.

Re. the pulse rate, the pulses themselves (~ 10 microseconds) are short compared to the intervals which are up to 100 milliseconds but no less than a millisecond, so only up to about 1 KHz. Considering the amount of time I've got to work with between signals, I realize I probably don't need to use interrupts or direct timer access, so this is in large part a learning experiment.

I didn't think of the precision limitation of floats. If I use a long instead of float for microSeconds won't I also lose some precision? Is this a situation where I might want to multiply by a larger number first and then divide?

As this is very much a learning project, I would definitely like to know more about what the other potential timing problems might be.

We strongly recommend to always use unsigned long integers for microsecond and millisecond quantities.

If you want to retain precision, don't do a divide! At the moment, Timer1 is counting the number of Arduino clock pulses between events. You can't do better than that, except to use an Arduino with a more accurate clock.

I don't really see the point in separating the "overflow count" and the "count" into separate arrays. You can combine them, as you do here:
    totalCounts = (t1ovf[i] << 16) + t1Count[i];
and store them in a four element array of unsigned long integers. Each of those numbers is the total clock cycles between events.

What is the maximum count you expect between events?

Great! I actually made that change last night. I now have:

splits[pulseCount] = (overflowCountCopy << 16) + timer1CounterValue;

Where splits is an array of unsigned longs. After much head scratching I realized that overflowCountCopy also has to be an unsigned long due to the left shift. Am I right in thinking that this is faster than multiplying by 65536 (2^16)?

For my actual project I'm planning on measuring up to 100 pulses (99 intervals). The pulses shouldn't be much more than 50 milliseconds apart. Counting the number of pulses is also important so for now I'm thinking of setting a timeout condition to stop measuring if the interval exceeds 100 milliseconds. So for the maximum count I think I would have 1,600,000 (16 counts per microsecond * 100,000 microseconds). Looks like I've got lots of headroom here (2^32/16/1000000 = 268 seconds).

Am I right in thinking that this is faster than multiplying by 65536 (2^16)?

Much faster.

Am I right in thinking that this is faster than multiplying by 65536 (2^16)?

Much faster.

Theoretically. But the compiler is smart enough to produce the same (efficient) code for both cases, as long as you're multiplying by a constant...