Interrupt handling question

My project today requires a timer ISR updating pixels, use of millis() for timing/scheduling purposes, and reception of Serial strings (short, but several characters) at 115200 baud.

For an Arduino Nano running at the usual 16 MHz, about how long will the millis interrupt take? (My suspicion is, < 5 us, the function it performs is pretty trivial). And, how long will it take for a character to be received and queued by Serial? (My suspicion is, < 10 us, or setting baud to 1000000 wouldn't work reliably),

Any answers would be appreciated; what I'm looking for, though, are answers like "18 us, give or take, depending on the path through the ISR". Even "your suspicions are conservative" would be adequate.

The reason is, I may have to enable, then disable, interrupts within a timer interrupt response; it's got to have a 1.3 ms 'block' of code that executes to completion, to feed data to 32 WS2812 pixels. Right now, we do that all within one interrupt response, but it means we cannot receive Serial at anything better than 4800, due to overruns of the UART. While that baud is acceptable, I'm exploring alternatives, because extending the number of pixels further naturally reduces the usable baud further.

If the answers to my questions are small enough, I could break the 1.3 ms up by sending each pixel's data with an enable/disable interrupts action between each one, I just have to avoid breaks long enough to trigger the 'reset' state in the pixels(> 200 us, as I read the datasheet).

I think this will work, though if someone sees a fundamental 'oops', such as "you may not vary the clock rate of the data to the WS2812 in that manner", please speak up.
TIA

Stop abusing interrupts. ISRs set flags or do other simple things, everything else has to be done in the main thread.

1 Like

Wow. Thanks for the advice. All I asked for was a number, not a lecture. Sheesh.

I believe the IDE inserts code before what appears in the ISR code to push a number of registers to the stack, and then pull them back off at the end. So that will take some time, but I don't have a feel for how much.

Note that it is possible to replace the millis timer0 interrupt with one you write, and to include other things besides the millis update so as to avoid the push/pull overhead from multiple interrupts.

But I think @DrDiettrich is right. lf you move the WS2812 updating into the loop(), then nothing will slow that down except for very short ISRs to service the timer and UART, which probably won't be noticeable.

Anyway, if you wanted to measure how long it actually takes to service an interrupt, I think you could set timer1 to run at the full 16MHz processor rate, and set up its "capture" interrupt. The timer count at the instant the trigger occurs could be compared to the count in the middle of your capture ISR. Double that would be close to the time it takes to service an interrupt (the total time interrupts will be disabled).

In case it might be useful, here's "blink" code that replaces the millis interrupt with a new one that interrupts exactly once per millisecond, and takes less time to service.

/*
This features a replacement for the ISR(TIMER0_OVF_vect)
interrupt that drives millis(). It disables the OVF
interrupt, enables the COMPA interrupt, sets OCR0A to 249,
and changes Timer0 to CTC mode.  This results in an
interrupt every 250 timer clock cycles, which is exactly 1ms
for a 16MHz crystal.  For 8MHz, OCR0A is set to 124.  The new
ISR increments millis, but since the interrupt rate is exactly
correct, no periodic double increment of millis is needed.

Using this code probably means you can't do any analog
writes that use Timer0, which would include pins 5 and 6.

Millis() should work as normal at 16MHz or 8MHz.
The effect on micros() is unknown.
*/

extern volatile unsigned long timer0_millis;   //these defined in wiring.c
extern volatile unsigned long timer0_overflow_count;

byte MILLIS_INCB = (64 * 250) / (F_CPU / 1000);  // ms to 250 ticks

const int cycleTime = 500;                 // flash LED every second
unsigned long oldMillis = millis();
unsigned long newMillis = 0;

void setup() {                             //Set up alternate interrupt
                                           //   at 249 on timer0

  cli();                                   // disable interrupts while doing this

  TCCR0A = 0;                              // set entire TCCR0A register to 0
  TCCR0B = 0;                              // same for TCCR0B
  TCNT0  = 0;                              // initialize timer0 count to 0

  OCR0A   = (250/MILLIS_INCB) - 1;         // set top of counter (249 for 16MHz, 124 for 8MHz)
  TIMSK0 &= ~bit(TOIE0);                   // disable overflow interrupt
  TCCR0A |= bit(WGM01);                    // turn on CTC mode
  TCCR0B |= (bit(CS01)+bit(CS00));         // Set CS00&CS01 bits for prescaler = 64
  TIMSK0 |= bit(OCIE0A);                   // enable timer compare interrupt

  sei();                                   // enable interrupts

  pinMode(13,OUTPUT);
  digitalWrite(13,HIGH);

}


void loop() {                               // flashes LED for 1/2 second
                                            //    every second

  newMillis = millis();
  if ((newMillis - oldMillis) == cycleTime) {
    oldMillis = newMillis;
    digitalWrite(13,!digitalRead(13));    // invert pin 13 state
  }
}


ISR(TIMER0_COMPA_vect) {                    // this is the new ISR - much
                                            //   simpler than original

  timer0_millis++;
  timer0_overflow_count++;                  // probably not needed

}

You should post an overview about your project.

Why is a smaller baudrate not sufficient?
Depending on how much neopixels you want to expand on as the maximum and depending on if you are planning animations on all the pixels or not suggestions can be made

Using multiple Neopixel channels would shorten the transferring time.

Using a different microcontroller like a RP2040 which has internal hardware that can do the bit-banging to the neopixels by using the PIO-module would solve the problem too.

Here is the Timer0 ISR. I count 56 instructions, many of which are two-cycle instructions. If we assume that all of the instructions are 2-cycle, and they are all executed, this will take about 7 microseconds. (It is, alas, not particularly well-optimized code, especially WRT maintaining both a millisecond count AND an total interrupt count.

ISR(TIMER0_OVF_vect)
 6b6:	1f 92       	push	r1
 6b8:	0f 92       	push	r0
 6ba:	0f b6       	in	r0, 0x3f	; status
 6bc:	0f 92       	push	r0
 6be:	11 24       	eor	r1, r1
 6c0:	2f 93       	push	r18
 6c2:	3f 93       	push	r19
 6c4:	8f 93       	push	r24
 6c6:	9f 93       	push	r25
 6c8:	af 93       	push	r26
 6ca:	bf 93       	push	r27
 6cc:	80 91 45 01 	lds	r24, 0x0145	;  <timer0_millis>
 6d0:	90 91 46 01 	lds	r25, 0x0146	;  <timer0_millis+0x1>
 6d4:	a0 91 47 01 	lds	r26, 0x0147	;  <timer0_millis+0x2>
 6d8:	b0 91 48 01 	lds	r27, 0x0148	;  <timer0_millis+0x3>
 6dc:	30 91 44 01 	lds	r19, 0x0144	;  <__data_end>
 6e0:	23 e0       	ldi	r18, 0x03
 6e2:	23 0f       	add	r18, r19
 6e4:	2d 37       	cpi	r18, 0x7D	; 125
 6e6:	58 f5       	brcc	.+86     	; 0x73e <__vector_16+0x88>
 6e8:	01 96       	adiw	r24, 0x01
 6ea:	a1 1d       	adc	r26, r1
 6ec:	b1 1d       	adc	r27, r1
 6ee:	20 93 44 01 	sts	0x0144, r18	;  <__data_end>
 6f2:	80 93 45 01 	sts	0x0145, r24	;  <timer0_millis>
 6f6:	90 93 46 01 	sts	0x0146, r25	;  <timer0_millis+0x1>
 6fa:	a0 93 47 01 	sts	0x0147, r26	;  <timer0_millis+0x2>
 6fe:	b0 93 48 01 	sts	0x0148, r27	;  <timer0_millis+0x3>
 702:	80 91 49 01 	lds	r24, 0x0149	;  <timer0_overflow_count>
 706:	90 91 4a 01 	lds	r25, 0x014A	;  <timer0_overflow_count+0x1>
 70a:	a0 91 4b 01 	lds	r26, 0x014B	;  <timer0_overflow_count+0x2>
 70e:	b0 91 4c 01 	lds	r27, 0x014C	;  <timer0_overflow_count+0x3>
 712:	01 96       	adiw	r24, 0x01
 714:	a1 1d       	adc	r26, r1
 716:	b1 1d       	adc	r27, r1
 718:	80 93 49 01 	sts	0x0149, r24	;  <timer0_overflow_count>
 71c:	90 93 4a 01 	sts	0x014A, r25	;  <timer0_overflow_count+0x1>
 720:	a0 93 4b 01 	sts	0x014B, r26	;  <timer0_overflow_count+0x2>
 724:	b0 93 4c 01 	sts	0x014C, r27	;  <timer0_overflow_count+0x3>
 728:	bf 91       	pop	r27
 72a:	af 91       	pop	r26
 72c:	9f 91       	pop	r25
 72e:	8f 91       	pop	r24
 730:	3f 91       	pop	r19
 732:	2f 91       	pop	r18
 734:	0f 90       	pop	r0
 736:	0f be       	out	0x3f, r0	; 63
 738:	0f 90       	pop	r0
 73a:	1f 90       	pop	r1
 73c:	18 95       	reti
 73e:	26 e8       	ldi	r18, 0x86	; 134
 740:	23 0f       	add	r18, r19
 742:	02 96       	adiw	r24, 0x02
 744:	a1 1d       	adc	r26, r1
 746:	b1 1d       	adc	r27, r1
 748:	d2 cf       	rjmp	.-92     	; 0x6ee <__vector_16+0x38>

And here is the UART receive interrupt code. It's actually somewhat shorter (no 32bit data to deal with!) About 42 instructions, or at worst about 5.25us

  ISR(USART0_RXC_vect)
 652:	1f 92       	push	r1
 654:	0f 92       	push	r0
 656:	0f b6       	in	r0, 0x3f
 658:	0f 92       	push	r0
 65a:	11 24       	eor	r1, r1
 65c:	2f 93       	push	r18
 65e:	8f 93       	push	r24
 660:	9f 93       	push	r25
 662:	ef 93       	push	r30
 664:	ff 93       	push	r31
 666:	e0 91 5d 01 	lds	r30, 0x015D	;  <Serial+0x10>
 66a:	f0 91 5e 01 	lds	r31, 0x015E	;  <Serial+0x11>
 66e:	80 81       	ld	r24, Z
 670:	e0 91 63 01 	lds	r30, 0x0163	;  <Serial+0x16>
 674:	f0 91 64 01 	lds	r31, 0x0164	;  <Serial+0x17>
 678:	82 fd       	sbrc	r24, 2
 67a:	1b c0       	rjmp	.+54     	; 0x6b2 <__vector_18+0x60>
 67c:	90 81       	ld	r25, Z
 67e:	80 91 66 01 	lds	r24, 0x0166	;  <Serial+0x19>
 682:	8f 5f       	subi	r24, 0xFF	; 255
 684:	8f 73       	andi	r24, 0x3F	; 63
 686:	20 91 67 01 	lds	r18, 0x0167	;  <Serial+0x1a>
 68a:	82 17       	cp	r24, r18
 68c:	41 f0       	breq	.+16     	; 0x69e <__vector_18+0x4c>
 68e:	e0 91 66 01 	lds	r30, 0x0166	;  <Serial+0x19>
 692:	f0 e0       	ldi	r31, 0x00	; 0
 694:	e3 5b       	subi	r30, 0xB3	; 179
 696:	fe 4f       	sbci	r31, 0xFE	; 254
 698:	95 8f       	std	Z+29, r25	; 0x1d
 69a:	80 93 66 01 	sts	0x0166, r24	;  <Serial+0x19>
 69e:	ff 91       	pop	r31
 6a0:	ef 91       	pop	r30
 6a2:	9f 91       	pop	r25
 6a4:	8f 91       	pop	r24
 6a6:	2f 91       	pop	r18
 6a8:	0f 90       	pop	r0
 6aa:	0f be       	out	0x3f, r0
 6ac:	0f 90       	pop	r0
 6ae:	1f 90       	pop	r1
 6b0:	18 95       	reti
 6b2:	80 81       	ld	r24, Z
 6b4:	f4 cf       	rjmp	.-24     	; 0x69e <__vector_18+0x4c>

In a related note, the UART TX interrupt is much worse.
For reasons that aren't immediately obvious, the TX ISR is not able to inline the C++ code, and SO it has to save/restore an extra 7 registers to obey the rules of the ABI. (an extra 1.75us, just for that.)

That is very widely regarded as a Bad Idea. I agree with @DrDiettrich.

Not answering the question about timing.

A carefully designed serial protocol will handle that. The basics that I use

  1. Sender sends one byte.
    Receiver echoes/acknowledges.
  2. Sender waits for echo/acknowledge.
    If not all bytes transferred, back to (1).
  3. Done.

Yes, but I'm limited on that front, the protocol is governed by others. So I must ensure I don't allow overrun by excessive baud, or too-long interrupt blocking. My original approach (overloading an existing interrupt routine with an occasional burst) works fine, but on reflection, it will be worth moving the code over to loop(), and if proven necessary, blocking interrupts for as long as required(likely not at all) - it will depend on the true tolerance of the WS2812 for stretching the clocking.
TBD.
Thanks, on reflection, to @DrDiettrich for making me reconsider the loop() approach - the original interrupt overload works perfectly, but the impact on serial speeds is detrimental. We'll see how the WS2812 handles it.
Thanks also to @westfw for answering the question more thoroughly. The two interrupts appear to be short enough that they will not impact the WS2812 if they occur separately, but it will remain to be seen if it tolerates those if they occur back-to-back. I'll have to introduce some simulated 'blackouts' to test that.