Serial.print messing with Timer Interrupts

So far I can't see anything obvious. I note that you are doing analogRead inside an ISR, so the time left (with interrupts enabled) is now down to 900 uS or less. Still, with serial interrupts every 87 uS that shouldn't push you over the 900 uS limit. Can you time (eg. with strobing pins and a logic analyzer) exactly how long the ISR takes? Maybe it is a lot longer than you think.

How do you know you are missing the ticks?

I know because it is a phase locked loop that locks to DCF77. If I reduce the prints the lock stays exactly where it should. If I have all debug prints active the lock starts to drift out of phase. If I increase the band width while debugging the lock is able to follow faster and thus stays in phase. So I assume my code behaves as expected.

The point is I noticed it when I tried to tune the lock parameters. If the parameters would have been correct the crystal would be off by ~300ppm. I double checked with a frequency counter and the crystal is accurate to 10ppm. I checked it with the ISR active. Then I noticed that the behaviour changes if I change the print statements.

This is how I assume that the print statements cause the issue.

Can you time how long the ISR is active? Toggle a pin at the start and the end of it, measure the difference.

Hi Nick, here is the toggle code:

#include <MsTimer2.h>
#include "dcf77_demodulator.h"

const uint8_t dcf77_sample_pin = 19; // A5
const uint8_t dcf77_analog_sample_pin = 5;

void process_one_sample() {
    digitalWrite(17,HIGH);
    const uint8_t sampled_data = analogRead(dcf77_analog_sample_pin)>200? 1: 0;
    
    DCF77_Demodulator::detector(sampled_data);
    
    digitalWrite(17,LOW);
}

void setup() { 
    Serial.begin(115200);  

    pinMode(dcf77_sample_pin, INPUT);
    digitalWrite(dcf77_sample_pin, HIGH);
    
    DCF77_Demodulator::setup();
    MsTimer2::set(1, process_one_sample);
    MsTimer2::start();

    // disable timer0
    TIMSK0 = 0;

    pinMode(17, OUTPUT);
}

void loop() {
    DCF77_Demodulator::debug();
    DCF77_Decoder::debug();    
}

Because I was lazy I just took a picture of the attached LED with the same trick as described here: Shutter Speed | Blinkenlight (see attachment).

The ticks are of course 1000us apart. As you can clearly see the duty cycle is below 50%. So this should not be causing the issue. I also can not see any significant jitter.

I would rather have figures than dots. I am trying to rule in, or rule out, a hypothesis that the time taken in the ISR is long enough that, if delayed a bit by a serial interrupt, and then it takes long enough to overshoot the next 1000 uS, then it is delayed a bit more, that you might eventually creep over 2 x 1000 uS intervals.

Today I checked with a DSO: ISR duration is almost exactly 123 microseconds for each and every period the DSO captured. I tried to trigger on >130 microsecond hi pulses but got not trigger condition. --> no pulse exceeds 130 microseconds

Then I triggered on >950 microseconds low pulses --> it does not trigger --> seems it does not drop any interrupt.

I conclude that this looks like a bug in my code and the ISR seems to work fine. However this would imply that adding the prints somehow makes my code glitch. Very strange.

I can't really explain it.

If you were doing the Serial prints inside an ISR, yes it would be explained. But you don't appear to be doing that.


BTW I would use the variable "max" with caution as max is defined in Arduino.h:

#define max(a,b) ((a)>(b)?(a):(b))

Without the parentheses you get away with it, but it is a bit dubious.


I conclude that this looks like a bug in my code and the ISR seems to work fine. However this would imply that adding the prints somehow makes my code glitch.

Perhaps you are altering some global variable you forgot about?

Processor?

Finally I found it out. It is a very subtle bug in MsTimer2:

ISR(TIMER2_OVF_vect) {
#if defined (__AVR_ATmega168__) || defined (__AVR_ATmega48__) || defined (__AVR_ATmega88__) || defined (__AVR_ATmega328P__) || (__AVR_ATmega1280__)
	TCNT2 = MsTimer2::tcnt2;
#elif defined (__AVR_ATmega128__)
	TCNT2 = MsTimer2::tcnt2;
#elif defined (__AVR_ATmega8__)
	TCNT2 = MsTimer2::tcnt2;
#endif
	MsTimer2::_overflow();
}

If you remove the macros and only look at mega168 then you get

ISR(TIMER2_OVF_vect) {
	TCNT2 = MsTimer2::tcnt2;
	MsTimer2::_overflow();
}

What is going on here? The library runs timer2 with a prescaler of 64. Thus the timer will tick with 250 kHz. So every 250 ticks an interrupt must be triggered to achieve the desired rate of one interrupt per millisecond. Instead of using CTC mode the library uses "normal mode". Thus the timer will process 256 ticks between to interrupts but it should use only 250. In order to compensate for this the ISR will advance TCNT2 immediately to the value of MsTimer2::tcnt2; which happens to be 6.

If TCNT2 == 0 at the start of the ISR, then of course this is equivalent to skipping 6 ticks. The result is the desired delay. However if TCNT2 > 0 at the start of the ISR, then it will skip less ticks. In the most extreme case TCNT2 == 255 setting it to 6 would result in 249 additional ticks instead of 6 skipped ticks.

But why is this an issue if I use Serial.print??? In order to find out I disassembled my compiled code. Here is what I found:

ISR(TIMER2_OVF_vect) {
 22c:	1f 92       	push	r1				cycles:	2	total:  2
 22e:	0f 92       	push	r0				cycles:	2	total:  4
 230:	0f b6       	in	r0, 0x3f	; 63		cycles:	1	total:  5
 232:	0f 92       	push	r0				cycles:	2	total:  7
 234:	11 24       	eor	r1, r1				cycles:	1	total:  8
 236:	2f 93       	push	r18				cycles:	2	total: 10
 238:	3f 93       	push	r19				cycles:	2	total: 12
 23a:	4f 93       	push	r20				cycles:	2	total: 14
 23c:	5f 93       	push	r21				cycles:	2	total: 16
 23e:	6f 93       	push	r22				cycles:	2	total: 18
 240:	7f 93       	push	r23				cycles:	2	total: 20
 242:	8f 93       	push	r24				cycles:	2	total: 22
 244:	9f 93       	push	r25				cycles:	2	total: 24
 246:	af 93       	push	r26				cycles:	2	total: 26
 248:	bf 93       	push	r27				cycles:	2	total: 28
 24a:	ef 93       	push	r30				cycles:	2	total: 30
 24c:	ff 93       	push	r31				cycles:	2	total: 32
#if defined (__AVR_ATmega168__) || defined (__AVR_ATmega48__) || defined (__AVR_ATmega88__) || defined (__AVR_ATmega328P__) || (__AVR_ATmega1280__)
	TCNT2 = MsTimer2::tcnt2;
 24e:	80 91 1b 01 	lds	r24, 0x011B			cycles:	2	total: 34
 252:	90 91 1c 01 	lds	r25, 0x011C			cycles:	2	total: 36
 256:	80 93 b2 00 	sts	0x00B2, r24			cycles:	2	total: 38
#elif defined (__AVR_ATmega128__)
	TCNT2 = MsTimer2::tcnt2;
#elif defined (__AVR_ATmega8__)
	TCNT2 = MsTimer2::tcnt2;
#endif
	MsTimer2::_overflow();
 25a:	0e 94 d7 00 	call	0x1ae	; 0x1ae <_ZN8MsTimer29_overflowEv>

So MsTimer2 takes 38 cpu cycles + ISR trigger overhead till it can set TCNT2. So everything that delays the MsTimer2 ISR for more than 24 cpu cycles will drag it down.

Looking into the serial ISR I find

000004b8 <__vector_19>:
#elif defined(USART0_UDRE_vect)
ISR(USART0_UDRE_vect)
#elif defined(USART_UDRE_vect)
ISR(USART_UDRE_vect)
#endif
{
 4b8:	1f 92       	push	r1			cycles:	2	total:  2
 4ba:	0f 92       	push	r0			cycles:	2	total:  4
 4bc:	0f b6       	in	r0, 0x3f	; 63	cycles:	1	total:  5
 4be:	0f 92       	push	r0			cycles:	2	total:  7
 4c0:	11 24       	eor	r1, r1			cycles:	1	total:  8
 4c2:	2f 93       	push	r18			cycles:	2	total: 10	
 4c4:	3f 93       	push	r19			cycles:	2	total: 12
 4c6:	5f 93       	push	r21			cycles:	2	total: 14
 4c8:	6f 93       	push	r22			cycles:	2	total: 16
 4ca:	7f 93       	push	r23			cycles:	2	total: 18
 4cc:	8f 93       	push	r24			cycles:	2	total: 20
 4ce:	9f 93       	push	r25			cycles:	2	total: 22
 4d0:	af 93       	push	r26			cycles:	2	total: 24
 4d2:	bf 93       	push	r27			cycles:	2	total: 26
 4d4:	ef 93       	push	r30			cycles:	2	total: 28
 4d6:	ff 93       	push	r31			cycles:	2	total: 30
  if (tx_buffer.head == tx_buffer.tail) {
 4d8:	20 91 a1 01 	lds	r18, 0x01A1		cycles:	2	total: 32
 4dc:	30 91 a2 01 	lds	r19, 0x01A2		cycles:	2	total: 34
 4e0:	80 91 a3 01 	lds	r24, 0x01A3		cycles:	2	total: 36
 4e4:	90 91 a4 01 	lds	r25, 0x01A4		cycles:	2	total: 38
 4e8:	28 17       	cp	r18, r24		cycles:	1	total: 39
 4ea:	39 07       	cpc	r19, r25		cycles:	1	total: 40
 4ec:	31 f4       	brne	.+12     	; 0x4fa <__vector_19+0x42> cycles:	1-2	total: 41-42
	// Buffer empty, so disable interrupts
#if defined(UCSR0B)
    cbi(UCSR0B, UDRIE0);
 4ee:	80 91 c1 00 	lds	r24, 0x00C1
 4f2:	8f 7d       	andi	r24, 0xDF	; 223
 4f4:	80 93 c1 00 	sts	0x00C1, r24
 4f8:	16 c0       	rjmp	.+44     	; 0x526 <__vector_19+0x6e>
    cbi(UCSRB, UDRIE);
#endif
  }
  else {
    // There is more data in the output buffer. Send the next byte
    unsigned char c = tx_buffer.buffer[tx_buffer.tail];
 4fa:	e0 91 a3 01 	lds	r30, 0x01A3		cycles:	2	total: 44
 4fe:	f0 91 a4 01 	lds	r31, 0x01A4		cycles:	2	total: 46
 502:	ef 59       	subi	r30, 0x9F	; 159	cycles:	1	total: 47
 504:	fe 4f       	sbci	r31, 0xFE	; 254	cycles:	1	total: 48
 506:	20 81       	ld	r18, Z			cycles:	1	total: 49
    tx_buffer.tail = (tx_buffer.tail + 1) % SERIAL_BUFFER_SIZE;
 508:	80 91 a3 01 	lds	r24, 0x01A3		cycles:	2	total: 51	
 50c:	90 91 a4 01 	lds	r25, 0x01A4		cycles:	2	total: 53
 510:	01 96       	adiw	r24, 0x01	; 1	cycles:	2	total: 55
 512:	60 e4       	ldi	r22, 0x40	; 64	cycles:	1	total: 56
 514:	70 e0       	ldi	r23, 0x00	; 0	cycles:	1	total: 57
 516:	0e 94 f0 03 	call	0x7e0	; 0x7e0 <__divmodhi4>	cycles:	2	total: 59 + time for the subroutine
 51a:	90 93 a4 01 	sts	0x01A4, r25		cycles:	2	total: 61
 51e:	80 93 a3 01 	sts	0x01A3, r24		cycles:	2	total: 63
	
  #if defined(UDR0)
    UDR0 = c;
 522:	20 93 c6 00 	sts	0x00C6, r18		cycles:	2	total: 65
    UDR = c;
  #else
    #error UDR not defined
  #endif
  }
}
 526:	ff 91       	pop	r31			cycles:	2	total: 67
 528:	ef 91       	pop	r30			cycles:	2	total: 69
 52a:	bf 91       	pop	r27			cycles:	2	total: 71
 52c:	af 91       	pop	r26			cycles:	2	total: 73
 52e:	9f 91       	pop	r25			cycles:	2	total: 75
 530:	8f 91       	pop	r24			cycles:	2	total: 77	
 532:	7f 91       	pop	r23			cycles:	2	total: 79
 534:	6f 91       	pop	r22			cycles:	2	total: 81
 536:	5f 91       	pop	r21			cycles:	2	total: 83
 538:	3f 91       	pop	r19			cycles:	2	total: 85
 53a:	2f 91       	pop	r18			cycles:	2	total: 87	
 53c:	0f 90       	pop	r0			cycles:	2	total: 89
 53e:	0f be       	out	0x3f, r0	; 63	cycles:	1	total: 90
 540:	0f 90       	pop	r0			cycles:	2	total: 92
 542:	1f 90       	pop	r1			cycles:	2	total: 94
 544:	18 95       	reti				cycles:	4	total: 98

Obviously 98 is greater than 24 and I did not even count the cycles used inside of "call 0x7e0 ; 0x7e0 <__divmodhi4>".

In order to verify my hypthesis I replace MsTimer2 with the following code:

ISR(TIMER2_COMPA_vect) {
    process_one_sample();
}

void initTimer2() {
    // Timer 2 CTC mode, prescaler 64
    TCCR2B = (1<<WGM22) | (1<<CS22);
    TCCR2A = (1<<WGM21);
    
    // 249 + 1 == 250 == 250 000 / 1000 ==  (16 000 000 / 64) / 1000
    OCR2A = 249;
    
    // enable Timer 2 interrupts
    TIMSK2 = (1<<OCIE2A);    
}

Yeah!!! Now only 30ppm deviation. This is within the expected tolerance of the crystal. Bug understood and fixed :slight_smile:

I will contact the author of the library to fix this for the whole community.

One afterthought: I do not like to have ISR driven serial output. But now matter if this is a good idea or not, the library is full with code like

int i = (unsigned int)(buffer->head + 1) % SERIAL_BUFFER_SIZE;

Since the Arduino does have no hardware division this is very costly both in terms of CPU and memory. A much better way to wrap a counter is:

int i = (unsigned int)(buffer->head);
i = i<SERIAL_BUFFER_SIZE-1? i+1: 0;

Thank for the follow-up and amazingly thorough analysis!

Interesting! Here, you mean?

void MsTimer2::_overflow() {
	count += 1;
	
	if (count >= msecs && !overflowing) {
		overflowing = 1;
		count = 0;
		(*func)();
		overflowing = 0;
	}
}

So it relies on not missing those "overflow" ticks? I see what you mean.

No. It relies on skipping 6 cycles by setting TCNT2 to 6. However this implies that TCNT2 would be 0 which is not necessarily the case. It never misses any of the 1ms ticks. It is only sometimes some cycles to late. As I said: this is very subtle and it took me ~4 weeks to figure this one out.

To be more precise: the bug is:

TCNT2 = MsTimer2::tcnt2;

Actually the underlying issue is to use "normal" mode instead of CTC mode.

As calculated here?

	tcnt2 = 256 - (int)((float)F_CPU * 0.001 / prescaler);

Hmm. I see. Well I don't like fiddling with counters anyway, I believe they "belong" to the timer, especially if it is running.

Well done, BTW.

It didn't occur to me to look in the library, nor did it occur to me that the library-writer would use a method like that which relied on things happening "just right".

This is another issue. Compare

tcnt2 = 256 - (int)((float)F_CPU * 0.001 / prescaler);

to

TCNT2 = MsTimer2::tcnt2;

C is case sensitive. The first line sets tcnt2 in the MsTimer2 namespace. The second line changes the timer register. So he does not rely on things happening just right. He just created a race condition. This happens even to the best :wink: The tricky part is to find and eliminate them.

Thanks for the compliments. I am proud of this analysis as well. Now the way is paved to the most resilent DIY DCF77 clock ever. The whole story will take probably 4 months of blogging :slight_smile:

I understand about the variables, I was just wondering how you got TCNT2 = 6;

Since this:

256 - (16000000 * 0.001 / 64) == 6

I presume the prescaler was 64.

BTW on this page here I set up Timer2 to generate an interrupt at exactly 1 mS intervals. It didn't take a lot of code, and since the timer itself counted to 1 mS you didn't have this overflow issue.

As I already wrote: I have code for this:

ISR(TIMER2_COMPA_vect) {
    process_one_sample();
}

void initTimer2() {
    // Timer 2 CTC mode, prescaler 64
    TCCR2B = (1<<WGM22) | (1<<CS22);
    TCCR2A = (1<<WGM21);
    
    // 249 + 1 == 250 == 250 000 / 1000 ==  (16 000 000 / 64) / 1000
    OCR2A = 249;
    
    // enable Timer 2 interrupts
    TIMSK2 = (1<<OCIE2A);    
}

However I would have preferred the library approach. That way I would not have to explain to my readers how the timers actually work. Creating a recursively phase locked DCF clock will be more than enough to swallow.

Once again, my compliments on not only solving it, but attempting to solve it in the first place.

Many people would say "near enough is good enough" or "it must be a processor bug" but you stuck with it, had confidence in your ability to find it, and did so. It gives one confidence that your other code will be equally well-written. :slight_smile:

In the last 20 years I have seen only one processor bug that caused a program to fail. So in doubt I always assume code is to blame. What made this one particular difficult was the length of my ISR. So thought that the ISR might be to blame as ISRs have a high chance to cause trouble. Thus I was ~3 weeks on the wrong track.