Serial.print messing with Timer Interrupts

I am currently implementing a somewhat timing critical thingy. Basically it uses timer2 to phase lock to a DCF77 signal.

It relies on msTimer2 to generate 1000 ticks per second. Clock jitter and long term stability is no concern. I compensate for this. The only thing that I really need is that on overage I get a tick each 16000 (+/- 100ppm) clock cycles. Obviously interrupts that are processed while timer2 triggers will introduce jitter into this schedule. This is something I understand and that is fine with me. Also it is clear that any means that blocks interrupts for more than 1ms will sooner or later block 2 timer 2 interrupts in a row and cause that I will lose timer2 ticks. Also understood.

Now here is what happens: if I let my code run everything is fine. But if I add Serial.print statements outside of all interrupts and if the statements print more than ~100 chars per second, then I start to miss some of the timer2 ticks. After 300s this is something in the order of magnitude of 50 ticks. So I assume I roughly lose about one 1ms tick per 6s.

BTW: I stopped timer 0, so the only interrupt sources I can think of are:
analog conversion,
timer 2,
serial transmission (sending, no receive)

Has anyone an idea what exactly could be blocking long enough to cause this issue?

It is not that it kills my code. Once I drop the (debug) print statements it works fine. What bugs me is that I can not find out why this happens. Has anyone any ideas what could cause such a behaviour?

analog conversion,

Should only be enabled if you enabled it. Did you?

if the statements print more than ~100 chars per second

What baud rate?

115200 baud

The interesting part is: it does not matter if I print 100 cps or 300cps. The impact on lost ticks is imperceptible. I am really puzzled.

You say you are printing at 115200 baud. At that rate each character will be clocked out at 87 uS. How much Serial.print are you doing at one time? The Serial.prints are actually sent in an ISR, so they will be affecting interrupts.

Argh - I think we are getting closer. In HardwareSerial.cpp it says

void serialEventRun(void)
{
#ifdef serialEvent_implemented
  if (Serial.available()) serialEvent();
#endif
#ifdef serialEvent1_implemented
  if (Serial1.available()) serialEvent1();
#endif
#ifdef serialEvent2_implemented
  if (Serial2.available()) serialEvent2();
#endif
#ifdef serialEvent3_implemented
  if (Serial3.available()) serialEvent3();
#endif
}


#if !defined(USART0_UDRE_vect) && defined(USART1_UDRE_vect)
// do nothing - on the 32u4 the first USART is USART1
#else
#if !defined(UART0_UDRE_vect) && !defined(UART_UDRE_vect) && !defined(USART0_UDRE_vect) && !defined(USART_UDRE_vect)
  #error "Don't know what the Data Register Empty vector is called for the first UART"
#else
#if defined(UART0_UDRE_vect)
ISR(UART0_UDRE_vect)
#elif defined(UART_UDRE_vect)
ISR(UART_UDRE_vect)
#elif defined(USART0_UDRE_vect)
ISR(USART0_UDRE_vect)
#elif defined(USART_UDRE_vect)
ISR(USART_UDRE_vect)
#endif
{
  if (tx_buffer.head == tx_buffer.tail) {
	// Buffer empty, so disable interrupts
#if defined(UCSR0B)
    cbi(UCSR0B, UDRIE0);
#else
    cbi(UCSRB, UDRIE);
#endif
  }
  else {
    // There is more data in the output buffer. Send the next byte
    unsigned char c = tx_buffer.buffer[tx_buffer.tail];
    tx_buffer.tail = (tx_buffer.tail + 1) % SERIAL_BUFFER_SIZE;
	
  #if defined(UDR0)
    UDR0 = c;
  #elif defined(UDR)
    UDR = c;
  #else
    #error UDR not defined
  #endif
  }
}
#endif
#endif

#ifdef USART1_UDRE_vect
ISR(USART1_UDRE_vect)
{
  if (tx_buffer1.head == tx_buffer1.tail) {
	// Buffer empty, so disable interrupts
    cbi(UCSR1B, UDRIE1);
  }
  else {
    // There is more data in the output buffer. Send the next byte
    unsigned char c = tx_buffer1.buffer[tx_buffer1.tail];
    tx_buffer1.tail = (tx_buffer1.tail + 1) % SERIAL_BUFFER_SIZE;
	
    UDR1 = c;
  }
}
#endif

#ifdef USART2_UDRE_vect
ISR(USART2_UDRE_vect)
{
  if (tx_buffer2.head == tx_buffer2.tail) {
	// Buffer empty, so disable interrupts
    cbi(UCSR2B, UDRIE2);
  }
  else {
    // There is more data in the output buffer. Send the next byte
    unsigned char c = tx_buffer2.buffer[tx_buffer2.tail];
    tx_buffer2.tail = (tx_buffer2.tail + 1) % SERIAL_BUFFER_SIZE;
	
    UDR2 = c;
  }
}
#endif

#ifdef USART3_UDRE_vect
ISR(USART3_UDRE_vect)
{
  if (tx_buffer3.head == tx_buffer3.tail) {
	// Buffer empty, so disable interrupts
    cbi(UCSR3B, UDRIE3);
  }
  else {
    // There is more data in the output buffer. Send the next byte
    unsigned char c = tx_buffer3.buffer[tx_buffer3.tail];
    tx_buffer3.tail = (tx_buffer3.tail + 1) % SERIAL_BUFFER_SIZE;
	
    UDR3 = c;
  }
}
#endif

However I would have expected that this is not an issue because it returns after each character. So now my theory: my prints are to fast --> the serial buffer may sometimes be completely full --> serial interrupts will occur virtually each 87uS. Timer 2 Interrupts occur each 1000uS. Of course sometimes serial ISR may be triggered while timer 2 ISR is busy. So far so good.

This is OK. But what I do not understand: how would this make me lose 1 timer 2 tick? I just do not get it.

So now my theory: my prints are to fast

Too frequent, maybe. Not too fast.

This might be a good time to show your code. I understand that with 87 uS per output character you would fit 11 or so of them into 1000 uS, and timer interrupts are higher priority than serial interrupts. So really, you should potentially only be slightly out. Maybe you are doing something else as well?

OK, but here comes the disclaimer. This is work in progress, the code is nowhere close to done. It requires an open collector DCF77 module @ pin 19 and is assumes that all pins have leds connected to them ("Blinkenlight Shield").

The code is in a transitional state. So it is NOT a DCF77 decoder yet. It is also not yet documented. But you asked for it :wink: Please do not distribute or ask for in depth comments. I will document it in my Blinkenlightblog once it is finished. The code works as expected without the debug calls. Once the debug calls are in it slowly gets out of sync. If I remove the print statements it stays in sync.

Blinkenlight_C0X_DCF77_decode_with_ms_samples.tar.gz (8.43 KB)

So far I can't see anything obvious. I note that you are doing analogRead inside an ISR, so the time left (with interrupts enabled) is now down to 900 uS or less. Still, with serial interrupts every 87 uS that shouldn't push you over the 900 uS limit. Can you time (eg. with strobing pins and a logic analyzer) exactly how long the ISR takes? Maybe it is a lot longer than you think.

How do you know you are missing the ticks?

I know because it is a phase locked loop that locks to DCF77. If I reduce the prints the lock stays exactly where it should. If I have all debug prints active the lock starts to drift out of phase. If I increase the band width while debugging the lock is able to follow faster and thus stays in phase. So I assume my code behaves as expected.

The point is I noticed it when I tried to tune the lock parameters. If the parameters would have been correct the crystal would be off by ~300ppm. I double checked with a frequency counter and the crystal is accurate to 10ppm. I checked it with the ISR active. Then I noticed that the behaviour changes if I change the print statements.

This is how I assume that the print statements cause the issue.

Can you time how long the ISR is active? Toggle a pin at the start and the end of it, measure the difference.

Hi Nick, here is the toggle code:

#include <MsTimer2.h>
#include "dcf77_demodulator.h"

const uint8_t dcf77_sample_pin = 19; // A5
const uint8_t dcf77_analog_sample_pin = 5;

void process_one_sample() {
    digitalWrite(17,HIGH);
    const uint8_t sampled_data = analogRead(dcf77_analog_sample_pin)>200? 1: 0;
    
    DCF77_Demodulator::detector(sampled_data);
    
    digitalWrite(17,LOW);
}

void setup() { 
    Serial.begin(115200);  

    pinMode(dcf77_sample_pin, INPUT);
    digitalWrite(dcf77_sample_pin, HIGH);
    
    DCF77_Demodulator::setup();
    MsTimer2::set(1, process_one_sample);
    MsTimer2::start();

    // disable timer0
    TIMSK0 = 0;

    pinMode(17, OUTPUT);
}

void loop() {
    DCF77_Demodulator::debug();
    DCF77_Decoder::debug();    
}

Because I was lazy I just took a picture of the attached LED with the same trick as described here: Shutter Speed | Blinkenlight (see attachment).

The ticks are of course 1000us apart. As you can clearly see the duty cycle is below 50%. So this should not be causing the issue. I also can not see any significant jitter.

I would rather have figures than dots. I am trying to rule in, or rule out, a hypothesis that the time taken in the ISR is long enough that, if delayed a bit by a serial interrupt, and then it takes long enough to overshoot the next 1000 uS, then it is delayed a bit more, that you might eventually creep over 2 x 1000 uS intervals.

Today I checked with a DSO: ISR duration is almost exactly 123 microseconds for each and every period the DSO captured. I tried to trigger on >130 microsecond hi pulses but got not trigger condition. --> no pulse exceeds 130 microseconds

Then I triggered on >950 microseconds low pulses --> it does not trigger --> seems it does not drop any interrupt.

I conclude that this looks like a bug in my code and the ISR seems to work fine. However this would imply that adding the prints somehow makes my code glitch. Very strange.

I can't really explain it.

If you were doing the Serial prints inside an ISR, yes it would be explained. But you don't appear to be doing that.


BTW I would use the variable "max" with caution as max is defined in Arduino.h:

#define max(a,b) ((a)>(b)?(a):(b))

Without the parentheses you get away with it, but it is a bit dubious.


I conclude that this looks like a bug in my code and the ISR seems to work fine. However this would imply that adding the prints somehow makes my code glitch.

Perhaps you are altering some global variable you forgot about?

Processor?

Finally I found it out. It is a very subtle bug in MsTimer2:

ISR(TIMER2_OVF_vect) {
#if defined (__AVR_ATmega168__) || defined (__AVR_ATmega48__) || defined (__AVR_ATmega88__) || defined (__AVR_ATmega328P__) || (__AVR_ATmega1280__)
	TCNT2 = MsTimer2::tcnt2;
#elif defined (__AVR_ATmega128__)
	TCNT2 = MsTimer2::tcnt2;
#elif defined (__AVR_ATmega8__)
	TCNT2 = MsTimer2::tcnt2;
#endif
	MsTimer2::_overflow();
}

If you remove the macros and only look at mega168 then you get

ISR(TIMER2_OVF_vect) {
	TCNT2 = MsTimer2::tcnt2;
	MsTimer2::_overflow();
}

What is going on here? The library runs timer2 with a prescaler of 64. Thus the timer will tick with 250 kHz. So every 250 ticks an interrupt must be triggered to achieve the desired rate of one interrupt per millisecond. Instead of using CTC mode the library uses "normal mode". Thus the timer will process 256 ticks between to interrupts but it should use only 250. In order to compensate for this the ISR will advance TCNT2 immediately to the value of MsTimer2::tcnt2; which happens to be 6.

If TCNT2 == 0 at the start of the ISR, then of course this is equivalent to skipping 6 ticks. The result is the desired delay. However if TCNT2 > 0 at the start of the ISR, then it will skip less ticks. In the most extreme case TCNT2 == 255 setting it to 6 would result in 249 additional ticks instead of 6 skipped ticks.

But why is this an issue if I use Serial.print??? In order to find out I disassembled my compiled code. Here is what I found:

ISR(TIMER2_OVF_vect) {
 22c:	1f 92       	push	r1				cycles:	2	total:  2
 22e:	0f 92       	push	r0				cycles:	2	total:  4
 230:	0f b6       	in	r0, 0x3f	; 63		cycles:	1	total:  5
 232:	0f 92       	push	r0				cycles:	2	total:  7
 234:	11 24       	eor	r1, r1				cycles:	1	total:  8
 236:	2f 93       	push	r18				cycles:	2	total: 10
 238:	3f 93       	push	r19				cycles:	2	total: 12
 23a:	4f 93       	push	r20				cycles:	2	total: 14
 23c:	5f 93       	push	r21				cycles:	2	total: 16
 23e:	6f 93       	push	r22				cycles:	2	total: 18
 240:	7f 93       	push	r23				cycles:	2	total: 20
 242:	8f 93       	push	r24				cycles:	2	total: 22
 244:	9f 93       	push	r25				cycles:	2	total: 24
 246:	af 93       	push	r26				cycles:	2	total: 26
 248:	bf 93       	push	r27				cycles:	2	total: 28
 24a:	ef 93       	push	r30				cycles:	2	total: 30
 24c:	ff 93       	push	r31				cycles:	2	total: 32
#if defined (__AVR_ATmega168__) || defined (__AVR_ATmega48__) || defined (__AVR_ATmega88__) || defined (__AVR_ATmega328P__) || (__AVR_ATmega1280__)
	TCNT2 = MsTimer2::tcnt2;
 24e:	80 91 1b 01 	lds	r24, 0x011B			cycles:	2	total: 34
 252:	90 91 1c 01 	lds	r25, 0x011C			cycles:	2	total: 36
 256:	80 93 b2 00 	sts	0x00B2, r24			cycles:	2	total: 38
#elif defined (__AVR_ATmega128__)
	TCNT2 = MsTimer2::tcnt2;
#elif defined (__AVR_ATmega8__)
	TCNT2 = MsTimer2::tcnt2;
#endif
	MsTimer2::_overflow();
 25a:	0e 94 d7 00 	call	0x1ae	; 0x1ae <_ZN8MsTimer29_overflowEv>

So MsTimer2 takes 38 cpu cycles + ISR trigger overhead till it can set TCNT2. So everything that delays the MsTimer2 ISR for more than 24 cpu cycles will drag it down.

Looking into the serial ISR I find

000004b8 <__vector_19>:
#elif defined(USART0_UDRE_vect)
ISR(USART0_UDRE_vect)
#elif defined(USART_UDRE_vect)
ISR(USART_UDRE_vect)
#endif
{
 4b8:	1f 92       	push	r1			cycles:	2	total:  2
 4ba:	0f 92       	push	r0			cycles:	2	total:  4
 4bc:	0f b6       	in	r0, 0x3f	; 63	cycles:	1	total:  5
 4be:	0f 92       	push	r0			cycles:	2	total:  7
 4c0:	11 24       	eor	r1, r1			cycles:	1	total:  8
 4c2:	2f 93       	push	r18			cycles:	2	total: 10	
 4c4:	3f 93       	push	r19			cycles:	2	total: 12
 4c6:	5f 93       	push	r21			cycles:	2	total: 14
 4c8:	6f 93       	push	r22			cycles:	2	total: 16
 4ca:	7f 93       	push	r23			cycles:	2	total: 18
 4cc:	8f 93       	push	r24			cycles:	2	total: 20
 4ce:	9f 93       	push	r25			cycles:	2	total: 22
 4d0:	af 93       	push	r26			cycles:	2	total: 24
 4d2:	bf 93       	push	r27			cycles:	2	total: 26
 4d4:	ef 93       	push	r30			cycles:	2	total: 28
 4d6:	ff 93       	push	r31			cycles:	2	total: 30
  if (tx_buffer.head == tx_buffer.tail) {
 4d8:	20 91 a1 01 	lds	r18, 0x01A1		cycles:	2	total: 32
 4dc:	30 91 a2 01 	lds	r19, 0x01A2		cycles:	2	total: 34
 4e0:	80 91 a3 01 	lds	r24, 0x01A3		cycles:	2	total: 36
 4e4:	90 91 a4 01 	lds	r25, 0x01A4		cycles:	2	total: 38
 4e8:	28 17       	cp	r18, r24		cycles:	1	total: 39
 4ea:	39 07       	cpc	r19, r25		cycles:	1	total: 40
 4ec:	31 f4       	brne	.+12     	; 0x4fa <__vector_19+0x42> cycles:	1-2	total: 41-42
	// Buffer empty, so disable interrupts
#if defined(UCSR0B)
    cbi(UCSR0B, UDRIE0);
 4ee:	80 91 c1 00 	lds	r24, 0x00C1
 4f2:	8f 7d       	andi	r24, 0xDF	; 223
 4f4:	80 93 c1 00 	sts	0x00C1, r24
 4f8:	16 c0       	rjmp	.+44     	; 0x526 <__vector_19+0x6e>
    cbi(UCSRB, UDRIE);
#endif
  }
  else {
    // There is more data in the output buffer. Send the next byte
    unsigned char c = tx_buffer.buffer[tx_buffer.tail];
 4fa:	e0 91 a3 01 	lds	r30, 0x01A3		cycles:	2	total: 44
 4fe:	f0 91 a4 01 	lds	r31, 0x01A4		cycles:	2	total: 46
 502:	ef 59       	subi	r30, 0x9F	; 159	cycles:	1	total: 47
 504:	fe 4f       	sbci	r31, 0xFE	; 254	cycles:	1	total: 48
 506:	20 81       	ld	r18, Z			cycles:	1	total: 49
    tx_buffer.tail = (tx_buffer.tail + 1) % SERIAL_BUFFER_SIZE;
 508:	80 91 a3 01 	lds	r24, 0x01A3		cycles:	2	total: 51	
 50c:	90 91 a4 01 	lds	r25, 0x01A4		cycles:	2	total: 53
 510:	01 96       	adiw	r24, 0x01	; 1	cycles:	2	total: 55
 512:	60 e4       	ldi	r22, 0x40	; 64	cycles:	1	total: 56
 514:	70 e0       	ldi	r23, 0x00	; 0	cycles:	1	total: 57
 516:	0e 94 f0 03 	call	0x7e0	; 0x7e0 <__divmodhi4>	cycles:	2	total: 59 + time for the subroutine
 51a:	90 93 a4 01 	sts	0x01A4, r25		cycles:	2	total: 61
 51e:	80 93 a3 01 	sts	0x01A3, r24		cycles:	2	total: 63
	
  #if defined(UDR0)
    UDR0 = c;
 522:	20 93 c6 00 	sts	0x00C6, r18		cycles:	2	total: 65
    UDR = c;
  #else
    #error UDR not defined
  #endif
  }
}
 526:	ff 91       	pop	r31			cycles:	2	total: 67
 528:	ef 91       	pop	r30			cycles:	2	total: 69
 52a:	bf 91       	pop	r27			cycles:	2	total: 71
 52c:	af 91       	pop	r26			cycles:	2	total: 73
 52e:	9f 91       	pop	r25			cycles:	2	total: 75
 530:	8f 91       	pop	r24			cycles:	2	total: 77	
 532:	7f 91       	pop	r23			cycles:	2	total: 79
 534:	6f 91       	pop	r22			cycles:	2	total: 81
 536:	5f 91       	pop	r21			cycles:	2	total: 83
 538:	3f 91       	pop	r19			cycles:	2	total: 85
 53a:	2f 91       	pop	r18			cycles:	2	total: 87	
 53c:	0f 90       	pop	r0			cycles:	2	total: 89
 53e:	0f be       	out	0x3f, r0	; 63	cycles:	1	total: 90
 540:	0f 90       	pop	r0			cycles:	2	total: 92
 542:	1f 90       	pop	r1			cycles:	2	total: 94
 544:	18 95       	reti				cycles:	4	total: 98

Obviously 98 is greater than 24 and I did not even count the cycles used inside of "call 0x7e0 ; 0x7e0 <__divmodhi4>".

In order to verify my hypthesis I replace MsTimer2 with the following code:

ISR(TIMER2_COMPA_vect) {
    process_one_sample();
}

void initTimer2() {
    // Timer 2 CTC mode, prescaler 64
    TCCR2B = (1<<WGM22) | (1<<CS22);
    TCCR2A = (1<<WGM21);
    
    // 249 + 1 == 250 == 250 000 / 1000 ==  (16 000 000 / 64) / 1000
    OCR2A = 249;
    
    // enable Timer 2 interrupts
    TIMSK2 = (1<<OCIE2A);    
}

Yeah!!! Now only 30ppm deviation. This is within the expected tolerance of the crystal. Bug understood and fixed :slight_smile:

I will contact the author of the library to fix this for the whole community.

One afterthought: I do not like to have ISR driven serial output. But now matter if this is a good idea or not, the library is full with code like

int i = (unsigned int)(buffer->head + 1) % SERIAL_BUFFER_SIZE;

Since the Arduino does have no hardware division this is very costly both in terms of CPU and memory. A much better way to wrap a counter is:

int i = (unsigned int)(buffer->head);
i = i<SERIAL_BUFFER_SIZE-1? i+1: 0;

Thank for the follow-up and amazingly thorough analysis!