Jitter in the main() loop?

system · December 28, 2011, 5:25am

I've been running this program:

unsigned short buf[200];
long prev = 0;
unsigned int n = 0;

void setup() {
  pinMode(13, OUTPUT);
  digitalWrite(13, HIGH);
  Serial.begin(115200);
  delay(50);
  digitalWrite(13, LOW);
}

void loop() {
  long us = micros();
  long diff = us - prev;
  if (diff > 199) {
    diff = 199;
  }
  buf[diff] += 1;
  prev = us;
  n += 1;
  if (n == 50000) {
    Serial.print("us,N\r\n");
    for (int i = 0; i < 200; ++i) {
      if (buf[i]) {
        Serial.print(i);
        Serial.print(",");
        Serial.print(buf[i]);
        Serial.print("\r\n");
      }
    }
    delay(1000);
    prev = micros();
    n = 0;
  }
}

This calculates a distribution of latency of executing loop() as a histogram. I would expect this to either put all the data into one bucket, or to distribute the samples across two buckets (if there is some granularity in the counter and you get time aliasing).

However, I get something really weird:

us,N
8,237
12,50468
16,1871
20,815
199,1

This tells me that the loop() function, and/or the counter, may jitter a fair bit, for some reason. Note that I wait for the serial port to drain after dumping the data, so I don't think it's the Serial "driver."
If there is timing aliasing (say, the micros() counter is quantized to 4 us), then I would expect the samples to be spread across only two buckets.

Microcontrollers are supposed to be deterministic! This is on a R2 Uno, with no peripherals. Why is this happening?

Coding_Badly · December 28, 2011, 6:01am

Microcontrollers are supposed to be deterministic!

The AVR8 processor is perfectly deterministic. Everything is synchronized to the system clock and each machine instruction takes a precise number of clock cycles to execute.

jwatte:
If there is timing aliasing (say, the micros() counter is quantized to 4 us), then I would expect the samples to be spread across only two buckets.

There is code in the core and code in your sketch that depends on the previous state of the hardware (previous runs of loop). So long as those dependencies exist, it will be extremely difficult for you to test the "determinism". Some examples...

There is a branch in the timer 0 overflow handler. Branch taken means loop runs just a bit longer than expected.
The delay(1000); in your sketch can leave loop at a different synchronization point relative to timer 0 than previous runs of loop.
The Serial calls in your sketch can leave loop at a different synchronization point relative to timer 0 than previous runs of loop.

To make the situation even more complicated, if either your code or the timer 0 handler uses a prime number of clock cycles to run then the full cycle will be extremely long. In other words, it could take hours, days, months, years, forever for your test to produce two identical runs.

nickgammon · December 28, 2011, 6:10am

I was about to test your program with interrupts off, but of course then you couldn't time anything!

Well, as one of the regular posters here has in his signature "measurement changes behaviour".

Isn't this something to do with quantum theory? Or someone's cat? You would get more precise behaviour if you didn't measure the behaviour.

system · December 28, 2011, 6:37am

Microcontrollers are supposed to be deterministic!

The AVR8 processor is perfectly deterministic. Everything is synchronized to the system clock and each machine instruction takes a precise number of clock cycles to execute.

jwatte:
If there is timing aliasing (say, the micros() counter is quantized to 4 us), then I would expect the samples to be spread across only two buckets.

There is code in the core and code in your sketch that depends on the previous state of the hardware (previous runs of loop). So long as those dependencies exist, it will be extremely difficult for you to test the "determinism". Some examples...

There is a branch in the timer 0 overflow handler. Branch taken means loop runs just a bit longer than expected.

The delay(1000); in your sketch can leave loop at a different synchronization point relative to timer 0 than previous runs of loop.

The Serial calls in your sketch can leave loop at a different synchronization point relative to timer 0 than previous runs of loop.
To make the situation even more complicated, if either your code or the timer 0 handler uses a prime number of clock cycles to run then the full cycle will be extremely long. In other words, it could take hours, days, months, years, forever for your test to produce two identical runs.

The point of the delay() is to remove the jitter from the Serial port -- it was jittering up to 40 microseconds before I inserted that!

I had assumed that millis() and micros() would be wrappers to bare CPU instructions that read internal counters. It sounds like this is not the case. Instead, the Arduino library sounds like a thin veil on top of the hardware, with undocumented side effects that I have to guess at.
I don't like guessing
I suppose I can go look at the source or something...

I almost wish there was a 100 MHz part that came in DIP so I could throw cycles at the problem without breaking the bank

nickgammon · December 28, 2011, 6:49am

You don't have to guess. The source for millis() and micros() is there somewhere. I usually do a "find in files" to find them.

There is a timer interrupt that catches the overflow of the timer used by millis() and micros(). That adds to a counter and mucks around a bit. If the timer fires at the "wrong" time your loop will jitter.

You can remove the jitter by disabling interrupts, if your code will work adequately with that done.

Here, I found it in wiring.c:

SIGNAL(TIMER0_OVF_vect)
{
	// copy these to local variables so they can be stored in registers
	// (volatile variables must be read from memory on every access)
	unsigned long m = timer0_millis;
	unsigned char f = timer0_fract;

	m += MILLIS_INC;
	f += FRACT_INC;
	if (f >= FRACT_MAX) {
		f -= FRACT_MAX;
		m += 1;
	}

	timer0_fract = f;
	timer0_millis = m;
	timer0_overflow_count++;
}

So not only does the interrupt periodically fire, causing jitter, but the "if" test inside it, if met, will cause additional jitter because it causes more instructions to be executed.

Coding_Badly · December 28, 2011, 6:52am

jwatte:
I almost wish there was a 100 MHz part that came in DIP so I could throw cycles at the problem without breaking the bank

"the problem"? What problem is that?

nickgammon · December 28, 2011, 6:58am

12,50468
16,1871
20,815

It takes around 3.5 uS to enter an ISR and about the same to leave it. So I read into that, that some of the time you entered, or left, the ISR during the timing period (4 uS), and some of the time you did both (8 uS). That sounds about right.

system · December 28, 2011, 7:31am

Correct me if I'm wrong, but wouldn't it be impossible to just enter or leave the ISR? Unless there are two execution cores, that is If you take the interrupt, you have to complete it before the main code gets its time back, so the minimum disruption would be 7 us. (Push and pop take two cycles each, and there's half a dozen of each in an ISR -- grumble! How hard would it be to provide a second set of registers for ISRs, anyway

Anyway, I'm re-thinking my design, working towards a loop that can run with interrupts off in the timing sensitive parts. Also, I'm replacing digitalWrite (a whopping 78 instructions!) with some port banging myself, too. Now, where did I put that avr-gcc intrinsics reference ...

nickgammon · December 28, 2011, 8:55am

In your timing portion, it might have already entered the ISR, and leave it in the middle. Or enter it towards the end. So yes, I think it is completely possible for an ISR to partially affect your timing.

system · December 28, 2011, 6:13pm

I want to be able to toggle output pins, driving communications hardware, ideally with microsecond precision (but I can tolerate a handful of microseconds of jitter) based on some input that arrives asynchronously. With the Atmega328p, I can do the driving when turning off interrupts, but not receive data at the same time. For prototyping, I can probably live with this.
At 100 MHz, there would be enough cycles to do everything polled at the same time -- output and input, polled. Or run the input on interrupts, and the output polled but with interrupts on, and take a small amount of jitter when interrupts arrive.

This guy runs at 96 MHz and is price competetive with an Uno, but there is no DIP version
http://parts.digikey.com/1/parts/1950073-board-lpcxpresso-lpc1768-1769-om13000.html
I could conceivably even use the built-in SPI DMA hardware, ignoring the clock, and just using the bit-out signal...

Hmm, the 328 does have some SPI capability, but not DMA. I wonder if it at least has a FIFO? (... goes off reading data sheets some more)

Coding_Badly · December 28, 2011, 7:04pm

some input that arrives asynchronously

From?

Grumpy_Mike · December 28, 2011, 8:02pm

I wonder if it at least has a FIFO

No it hasn't.

The problem here is that you have a system with all sorts of asynchronous events taking place that makes precise timing not possible.
There are three internal timers and for the best precision you should use these. But this is the price you pay for having a system and not just a raw processor.

nickgammon · December 28, 2011, 8:47pm

jwatte:
I want to be able to toggle output pins, driving communications hardware, ideally with microsecond precision (but I can tolerate a handful of microseconds of jitter) based on some input that arrives asynchronously.

Perhaps if you describe the actual requirement, rather than "nice to have"? Most comms stuff is tolerant of some delays, as it works in real-life situations. Even quite fast processors (eg. modern Macs, PCs, Linux) running at 3 GHz still has to service interrupts. If you said you wanted to work with fast USB, yes you probably can't get that to work on the bare board. But then there are USB interface chips, so that isn't a particular worry. Ditto for Ethernet.

system · December 29, 2011, 5:51am

Argh! Couldn't they have spared the handful dozen gates for a one-byte FIFO for that SPI interface?

In my target system, the controller decodes IR input on a wide range of bands (16 kHz through 60 kHz carriers), and sends out the serial port.
At the same time, it needs to also receive commands from a serial port (where I can control the command rate, so I can manage the interrupt load) and hard-wired buttons.
While sending commands with high precision, I can let the hardware serial back up, and delay processing button presses (but I'll probably want to OR together the input bits received during the time.)
The end result is automation of a number of different IR remote control protocols.

And it's funny you should mention high-speed PCs -- ten to fifteen years ago, I spent years working on an operating system that drove interrupt latencies on then-standard PC hardware with a general-purpose GUI down below milliseconds (for media production.) Even with modern hardware, neither Windows, nor Linux, nor MacOS will get to those levels. That's because they do many things at once, and use "cheapest possible" design instead of dedicated circuits for many things.

In effect, I want to use a microcontroller as the dedicated circuit for what I want to do. If that SPI interface had at least one byte of FIFO, then I probably could do it just fine (mashing in another output byte when the FIFO runs dry) but as it is, I have to take a 4 us interrupt every 8 bits, which means that every 8th bit I send out essentially gets extended by 4 us or so.

The reason I have such strict tolerances is because I need to generate the carrier wave for the IR modulation, at between 16 and 60 kHz. If I want to stick with an Atmega328P, I may have to use a separately programmable timer to generate the carrier, and then use the Arduino only as gate for that carrier (it's all Manchester coded AM -- at least I don't have to do FM in software

So, a 555, with a variable timing resistor, might get me there. But then the external circuitry is looking a lot hairier, and maybe I should go with some of the bigger boysthat have DMA to SPI for seamless modulation.

I can build a state machine to do exactly what I need, and count cycles from interrupt handlers to figure out what my budget is -- for example, I can bang a byte to the SPI, then enable interrupts, then immediately disable interrupts; as long as the longest interrupt handler is shorter than the time to send one byte out the SPI, I'm good. With 4 us pulse width (not ideal), this means < 30 us interrupt latency, which can be done on the current board. However, that's 4 us pulse width, not 1 us. With a device that runs faster, and has better circuitry for generating the pulse forms I care about (DMA, say), my target would probably be easier to reach.

Maybe the solution really is a LPC1768 for communications and smarts, and a 328P that just does pulse generation, using SPI for receive (which has a one-byte buffer).

nickgammon · December 29, 2011, 5:55am

Argh! Couldn't they have spared the handful dozen gates for a one-byte FIFO for that SPI interface?

Page 167 of the data sheet:

The system is single buffered in the transmit direction and double buffered in the receive direction. This means that bytes to be transmitted cannot be written to the SPI Data Register before the entire shift cycle is completed. When receiving data, however, a received character must be read from the SPI Data Register before the next character has been completely shifted in. Otherwise, the first byte is lost.

So there is a two-byte receive buffer, and a one-byte send buffer.

nickgammon · December 29, 2011, 6:03am

There's also an interrupt "SPI Interrupt Flag". It looks like you could use that to stuff the next byte into the SPI buffer when the previous one has been sent.

nickgammon · December 29, 2011, 6:05am

You could look into the Kemani CPLD Key or his similar products to use a CPLD as a high-speed interface.

http://majolsurf.net/wordpress/?page_id=1302

system · December 29, 2011, 6:38pm

But what I want is a one byte FIFO to feed that one byte buffer!
Also note I described the interrupt solution. It essentially extends every eighth bit by a handful microseconds because of the time taking the interrupt and starting the next byte.
Really, a variable frequency timer might be best here...

system · December 30, 2011, 12:01am

So here's where I'm at.

Spinning up a LPC 1768 is significant new effort what with new toolchain, harder final production (surface mount only) etc. I'll go with a carrier generator modulated by the Arduino, as I have less stringent requirements for the modulator (10 usec jitter is ok)

Trying to control R2 of a 7555 to generate a stable carrier requires more than just a JFET or BJT -- expensive analog stuff. I want a programmable timer!

Something like a 8254 would work. Bug those are expensive and require wide parallel data interfacing.
The cheapest programmable timer in DIP I can find is an ATtiny for $1.19. But that's a new toolchain again.

A 328 is less than $4. Add a crystal or resonator, some resistors/caps, and a socket. I now have a programmable timer I can talk SPI or I2C or UART to! I might even be able to share crystal between the two in the final implementation.

Coding_Badly · December 30, 2011, 12:09am

ATtiny ... But that's a new toolchain again.

How so?

Topic		Replies	Views
Inconsistent loop timing Syntax & Programs	2	947	May 6, 2021
delay microSeconds bug? Bugs & Suggestions	4	1863	May 6, 2021
Delay of main loop caused by Serial.begin() ?? Project Guidance	7	1605	May 5, 2021
jitter in squarewave output Programming Questions	6	1222	May 5, 2021
Similar Code - Drastically Different Results Development	9	1695	May 6, 2021

Jitter in the main() loop?

Related Topics