Difference in time between interrupt start on Mega (2560) and Nano Every (4809)

I have a bigger program moved from the traditional Arduino's (UNO, MEGA) to the Nano Every (4809). Basically everything works. However I noticed on my logic analyser quite a difference between the times interrups are raised and the ISR is entered. On the classical Arduino's it takes less than one microsec (when compiling with board: "Arduino Mega or Mega 2560") , but on the Every it takes between 6 microseconds (when compiling with MegaCoreX an board: "ATMega 4809 / Pinout: Nano Every") or even 8 microseconds (when compiling with board: "Arduino Nano Every" / Registers Emulation: None").

I made a test setup with an Arduino Mega 2560 and an Arduino Nano Every. The input signal that triggers the interrupt is in both cases identical. I wrote the following test code:

#define InputSignalPin 19 // INT2 om the Mega 2560 - PF3 on the Nano Every

void test_interrupt(void) {
  // ISR Start
#if defined(__AVR_ATmega2560__)
  PORTF |= (1<<2);
  PORTF &= ~(1<<2);
#endif
#if defined(__AVR_ATmega4809__)
  PORTF.OUT |= PIN2_bm;
  PORTF.OUT &= ~(PIN2_bm);
  ;
#endif
}

void setup() {
  // Use PF2 as output pin for the logic analyser
  #if defined(__AVR_ATmega2560__)
  DDRF  |= (1<<2);  // Set PF2 as output
  #endif
  #if defined(__AVR_ATmega4809__)
  PORTF.DIR |= (1 << PIN2_bp);
  #endif

  // Set the Interrupt pin as input
  pinMode(InputSignalPin, INPUT);
  attachInterrupt(digitalPinToInterrupt(InputSignalPin), test_interrupt, FALLING );
}


void loop() {
}

Any idea what might cause this very long time between the time the interrupt signal appears, and the moment the ISR becomes active on the new Arduino Nano Every?

On the classical Arduino's it takes less than one microsec

I somewhat suspicious of your measurements. See this discussion by Nick Gammon in interrupt latency.
https://gammon.com.au/interrupts

He found that there was about 3 microseconds to enter the interrupt when using the attachInterrupt() syntax.

The time between recognition of an interrupt and start of the ISR is used to save the controller state and initialize the ISR environment. On a Mega without an OS this only means to push some registers onto the stack. On controllers with kind of an RT OS more system state information has to be saved, eventually the interrupt handler has to be checked, and in the worst case all user code is interpreted (Lua?) and then the interpreter state must be saved as well. That's why faster controllers can be slower in interrupt handling than small dumb controllers.

You can find out more about the interrupt timing with two independent interrupts triggered at the same time, each toggling a pin on and off. On most (single core...) controllers the interrupts are serialized and you can see the begin and end of the first ISR followed by a delay and begin and end of the second ISR. This way you can measure the time for initialization, execution and finalization of an interrupt handler.

True, but the ATmega4809 boards used by the OP do not run an RT OS.

But yeah, the 4809 "attachInterrupt" on an arbitrary pin is going to be a lot slower than the 2560 code on one of the External Interrupt pins.
The 2560 External Interrupt is a dedicated interrupt for that pin, so it can go pretty directly to the handler.
On the other hand, the 4809 has essentially "enhanced" pin-change interrupts (added edge control), which means it gets one interrupt per port and then has to scan the status to see which pin actually interrupted. Which would be pretty slow, even if optimally written, which it's not :frowning:

1 Like

But that system supplied handler has to check and fork to the currently attached user handler.

Port based PCINT are handled almost the same on every architecture. If only one pin is interrupt enabled on a port then an interrupt can occur only if that pin was activated, no further arbitration required. It's a matter of project management and coding.

"fork" is a pretty heavy word for what is only a function calls (via pointer-to-function) on an AVR.

[quote]Port based PCINT are handled almost the same on every architecture. If only one pin is interrupt enabled on a port then an interrupt can occur only if that pin was activated, no further arbitration required.
[/quote]
Sure, but the ISR used with attachInterrupt() can't assume "only one pin enabled" and has to scan all the pins that MIGHT be enabled. You could do a lot better by providing your own ISR with that "inside knowledge", but in the Arduino world attachInterrupt() already owns those vectors, making it tricky.

(I'm not talking theory, here. I'm talking about the way the AVR interrupt code IS in Arduino...)

Please do not confuse PCINT and external interrupts. A PCINT for a port is handled exclusively in a user ISR, while external interrupts get a handler attached dynamically to the system ISR.

If it is not clear in a project which PCINT pins are used then the project manager should be fired.

Like I said; the 4809 doesn't have "external interrupts" - only a variation of pin-change interrupts, and the Arduino system provides API via attachInterrupt()
(Isn't there also an attachInterrupt()-like library for pin-change interrupts on the older AVRs as well? (yeah - this one: PinChangeInterrupt - Arduino Reference ))

It's all much nicer on the 32bit chips where you can at least theoretically move the vectors into RAM and actually change the first-level ISR. (although I don't think any Ardunio code does that :frowning: )

Huh. If you were willing to say "the Every/UW2 supports 5 interrupts on pins 2, 3, 4, 5, and 8" you could get functionality and performance equivalent to the traditional ATmegas (because those are on separate ports A, F, C, B, and E, respectively.) But then you couldn't do anything interrupt-like on the other pins.
(well, you could do the Analog pins (Port D), I guess.)

First thanks for your reactions. I now realise that indeed the differences in reaction time are caused by the differences in the "interrupt architecture".
The traditional AVR processors have a limited number of "real hardware interrupt" pins, each associated with their own ISR vector and thus routine. This makes the 2560 relatively fast.
The newer processors, like the 4809 (MegaCoreX and the AVR 128DA (DxCore), share for all pins connected to a certain port the same ISR vector and routine. Therefore the attachInterrupt()
Arduino call has to do extra work to determine to which pin the interrupt has to be associated, and thus which routine to call for handling the user specific part of the interrupt.

Since my code is supposed to go into a library, I can't make any assumptions about possible other pin interrupts from other libraries or the main sketch. Therefore (as already mentioned in the reactions above), I can't write my own variant of attachInterrupt().

One solution might be to use the Event or the Configurable Custom Logic (CCL) system of these newer processors. There I can bind a pin to the CCL / Event logic. I have to check, however, if this also allows me to directly call my own interrupt handling functions. I'll dive into that. Any thoughts on that are appreciated. :grinning:

It looks like there is ONE CCL interrupt. Is that enough? (I guess you could do more pins by implementing your own scanner to see which pin had changed, and still be quicker than the default attachInterrupt() code.)

Indeed this moves the problem from the pin interrupt routine (attachInterrupt()) to the CCL interrupt routine. The advantage might be that CCLs are (still?) hardly used. Since there are no existing Arduino functions for CCL, the CCL Interrupt vector is still "free" (not in use by some Arduino Code).

While searching I found on SpenceKonde's DxCore pages on github some interesting explanation for the long time attachInterrupt() takes. He also gives some solutions that basically say that (for time critical code) you better avoid attachInterrupt() and write the ISR yourself.

An alternative to CCL (or the Event system) could be to separate my library into a slow part (for people who want to choose any available pin), and a fast part, in which an entire Port is reserved for my library and handled by my library. On the 4809 possible candidates are Port B and Port E, which have a limited number of pins anyway.

If you want to experiment, I've made some changes to the MegaCoreX implementation of the AttachInterrupt() ISRs to make everything faster. (megaCoreX only; it seems to have drifted from the Arduino core...)

This hasn't been tested yet; just had its object code peered at pretty carefully.

  1. Implements one "fast" interrupt per port (pins 2, 3, 4, 5, 8, and A5 (19) by default.) These are checked "first", and should be almost as fast as the external interrupt pins on the older AVRs.
  2. Refactor the ISRs to make better use of things that are known at compile time, instead of computing them at run time. For example, the PORTA ISR knows that it's on PORTS, so we don't need to compute that from "PA"
  3. Fiddle with the loop to use fewer registers. And some other stuff. port_interrupt_handler() is now 47 bytes. (used to be 98 bytes!) (overall, the code grows a bit, because the ~30 bytes added to the individual ISRs is repeated 6 times...)
  4. Shortcut the bit test loop if no more bits are left.

faster_interrupts.zip (9.0 KB)

Thanks. I will play with it and report back.

In the mean time I've tried to do some precise measurements, using 4 boards.

Measurement Setup
The first two boards will show the "overhead" created by attachInterrupt(); board 1 has a traditional ATMega328 processor (on an Arduino UNO board); board 2 has a modern ATMega 4809 (on an Arduino Every board).
The last two boards will show the "overhead" of direct ISR register usage. This is the fastest approach possible.
Board 3 is again a "traditional ATMega (2560 on a Mega board); however, the ISR is now "hardcoded" (by setting EIMSK, EICRA and ISR(INT2_vect)). Board number 4 has a new ATMega processor (4808 on a Nano Thinary board), and like the third board the ISR is hardcoded.

The results are shown in the attachment.
At the top you see the input signal; a falling edge triggers the ISR.

Results using attachInterrupt()
On a standard Arduino UNO execution of the user code within the ISR starts after 3,1 microsecond. Due to the additional overhead of attachInterrupt() on new processors (like the 480X and DxCore) this time gets more than doubled on newer boards like the Nano Every: the first user code within the ISR starts after 6,8 microseconds.

Hardcoded ISR
On a standard Arduino Mega execution of the user code within the ISR starts after 1,2 microsecond. On newer processors, such as the Nano Every, it takes 1,6 microseconds, this a little bit longer

Conclusion
AttachInterrupt() is already quite expensive on traditional ATMega processors, since it extends the time between the moment the ISR is triggered and the start of the user code within the ISR with roughly a factor 2,5. On new processors the overhead of attachInterrupt() becomes even a factor 4. Whereas this may be OK for applications in which humans push buttons, the overhead for applications that have interrupts every 100 microseconds may not be acceptable.

For completeness, here is the testcode:

void test_interrupt(void) {
#if defined(__AVR_ATmega328P__)
  PORTD |= (1<<2);
  PORTD &= ~(1<<2);
#endif
#if defined(__AVR_ATmega4809__)
  PORTF.OUT |= PIN2_bm;
  PORTF.OUT &= ~(PIN2_bm);
#endif
#if defined(__AVR_ATmega4808__)
  PORTF.OUT |= PIN2_bm;
  PORTF.OUT &= ~(PIN2_bm);
#endif
}

#if defined(__AVR_ATmega2560__)
ISR(INT2_vect) {
  PORTF |= (1<<2);
  PORTF &= ~(1<<2);
}
#endif

#if defined(__AVR_ATmega4808__)
ISR(PORTF_PORT_vect) {
  PORTF.INTFLAGS=8;                    // we know only PF3 has an interrupt, so that's the only flag that could be set.
  PORTF.OUTSET = PIN2_bm;              // turn PF2 output on  - Instruction takes 1 CPU cycle
  PORTF.OUTCLR = PIN2_bm;              // turn PF2 output off - Instruction takes 1 CPU cycle 
}
#endif


void setup() {
  #if defined(__AVR_ATmega328P__)
  pinMode(2, OUTPUT);                  // Set PD2 as output
  pinMode(3, INPUT_PULLUP);            // Set PD3 as input (INT1, Digital pin 3)
  attachInterrupt(digitalPinToInterrupt(3), test_interrupt, FALLING );
  #warning UNO
  #endif

  #if defined(__AVR_ATmega4809__)
  PORTF.DIR |= (1 << PIN2_bp);         // Set PF2 as output
  pinMode(19, INPUT_PULLUP);           // Set PF3 as input (Digital pin 19)
  attachInterrupt(digitalPinToInterrupt(19), test_interrupt, FALLING );
  #warning 4809
  #endif
  
  #if defined(__AVR_ATmega2560__)
  DDRF  |= (1<<2);                     // Set PF2 as output
  EIMSK |= (1<<INT2);                  // Enable INT2 (PD2 - Digital pin 19)
  EICRA |= (1<<ISC21);                 // Falling
  #warning MEGA
  #endif
  
  #if defined(__AVR_ATmega4808__)
  PORTF.DIR |= (1 << PIN2_bp);         // Set PF2 as output
  pinMode(PIN_PF3,INPUT_PULLUP);       // Set PF3 as input. 
  PORTF.PIN3CTRL=0b00001011;           // PULLUPEN=1, ISC=0x3 => trigger falling
  #warning 4808
  #endif
}


void loop() {
}

I downloaded your code, but it does not compile.
I guess you forgot to include "wiring_private.h" in the attachment.... :wink:

Funny that you use the same pins and "#define approach" as I did for my measurements. :grinning:

Nice that you start from that; I'm also using MegaCoreX as well as the DxCore as start.
Question: do you believe that your code should replace the MegaCoreX / DxCore attachInterrupt() functions?

Huh. It compiles here using either MegaCoreX 1.0.8 or 1.0.9. wiring_private.h should be part of the core: MegaCoreX/megaavr/cores/coreX-corefiles/wiring_private.h at master · MCUdude/MegaCoreX · GitHub

(I know that it does NOT work with the "Arduino MegaAvr" core)

do you believe that your code should replace the MegaCoreX / DxCore attachInterrupt() functions?
I think so. I started with WInterrupts.c (which is relatively nicely partitioned.)
Note that attachInterrupt() itself doesn't change at all; only the ISR functions.

I'm off on a 30th (!) anniversary trip, so it'll be about a week before I can look at this more.

Hi

I just updated from MegaCoreX 1.0.8 to MegaCoreX 1.0.9 but still get quite some errors. I tried using Pinout: "48 pin standard" as well as "Nano Every".

The error messages (see below) are a bit outside my comfort zone.
But first enjoy the anniversary trip :slight_smile:

The errors that I get are:

fastint2.c: In function '__vector_6':
fastint2.c:153:30: warning: passing argument 1 of 'port_interrupt_handler' discards 'volatile' qualifier from pointer target type [-Wdiscarded-qualifiers]
port_interrupt_handler(&intFunc[portnum8], intf);
^
fastint2.c:158:1: note: in expansion of macro 'IMPLEMENT_ISR'
IMPLEMENT_ISR(PORTA_PORT_vect, PA, PORTA)
^~~~~~~~~~~~~
fastint2.c:120:13: note: expected 'void (**)(void)' but argument is of type 'void (
volatile*)(void)'
static void port_interrupt_handler(voidFuncPtr funcTab, uint8_t int_flags)
^~~~~~~~~~~~~~~~~~~~~~
fastint2.c: In function '__vector_34':
fastint2.c:153:30: warning: passing argument 1 of 'port_interrupt_handler' discards 'volatile' qualifier from pointer target type [-Wdiscarded-qualifiers]
port_interrupt_handler(&intFunc[portnum
8], intf);
^
fastint2.c:159:1: note: in expansion of macro 'IMPLEMENT_ISR'
IMPLEMENT_ISR(PORTB_PORT_vect, PB, PORTB)
^~~~~~~~~~~~~
fastint2.c:120:13: note: expected 'void ()(void)' but argument is of type 'void (* volatile*)(void)'
static void port_interrupt_handler(voidFuncPtr funcTab, uint8_t int_flags)
^~~~~~~~~~~~~~~~~~~~~~
fastint2.c: In function '__vector_24':
fastint2.c:153:30: warning: passing argument 1 of 'port_interrupt_handler' discards 'volatile' qualifier from pointer target type [-Wdiscarded-qualifiers]
port_interrupt_handler(&intFunc[portnum
8], intf);
^
fastint2.c:160:1: note: in expansion of macro 'IMPLEMENT_ISR'
IMPLEMENT_ISR(PORTC_PORT_vect, PC, PORTC)
^~~~~~~~~~~~~
fastint2.c:120:13: note: expected 'void (
)(void)' but argument is of type 'void (* volatile*)(void)'
static void port_interrupt_handler(voidFuncPtr funcTab, uint8_t int_flags)
^~~~~~~~~~~~~~~~~~~~~~
fastint2.c: In function '__vector_20':
fastint2.c:153:30: warning: passing argument 1 of 'port_interrupt_handler' discards 'volatile' qualifier from pointer target type [-Wdiscarded-qualifiers]
port_interrupt_handler(&intFunc[portnum
8], intf);
^
fastint2.c:161:1: note: in expansion of macro 'IMPLEMENT_ISR'
IMPLEMENT_ISR(PORTD_PORT_vect, PD, PORTD)
^~~~~~~~~~~~~
fastint2.c:120:13: note: expected 'void ()(void)' but argument is of type 'void (* volatile*)(void)'
static void port_interrupt_handler(voidFuncPtr funcTab, uint8_t int_flags)
^~~~~~~~~~~~~~~~~~~~~~
fastint2.c: In function '__vector_35':
fastint2.c:153:30: warning: passing argument 1 of 'port_interrupt_handler' discards 'volatile' qualifier from pointer target type [-Wdiscarded-qualifiers]
port_interrupt_handler(&intFunc[portnum
8], intf);
^
fastint2.c:162:1: note: in expansion of macro 'IMPLEMENT_ISR'
IMPLEMENT_ISR(PORTE_PORT_vect, PE, PORTE)
^~~~~~~~~~~~~
fastint2.c:120:13: note: expected 'void (
)(void)' but argument is of type 'void (* volatile*)(void)'
static void port_interrupt_handler(voidFuncPtr funcTab, uint8_t int_flags)
^~~~~~~~~~~~~~~~~~~~~~
fastint2.c: In function '__vector_29':
fastint2.c:153:30: warning: passing argument 1 of 'port_interrupt_handler' discards 'volatile' qualifier from pointer target type [-Wdiscarded-qualifiers]
port_interrupt_handler(&intFunc[portnum
8], intf);
^
fastint2.c:163:1: note: in expansion of macro 'IMPLEMENT_ISR'
IMPLEMENT_ISR(PORTF_PORT_vect, PF, PORTF)
^~~~~~~~~~~~~
fastint2.c:120:13: note: expected 'void (**)(void)' but argument is of type 'void (* volatile*)(void)'
static void port_interrupt_handler(voidFuncPtr *funcTab, uint8_t int_flags)
^~~~~~~~~~~~~~~~~~~~~~

I'm back. And I think those warnings are OK.
I will try to do some testing here, and maybe eliminate some of the warnings as well.

(I did notice that the .zip file ended up with some autosave files and such that may have confused the IDE - it sounds like you got past that.)

While the fastint2.c file is theoretically a replacement for the core WInterupts file, the current source is designed to compile as a standalone sketch - using different function names should cause the fastint2.c code to be included instead of the core file.

There is a problem with this test code. The content of the ISR is unfair.

The STS instruction is used when the PORTF.OUT register is used when manipulating the port directly on the ATmega4809.
This adds PUSH and POP because it causes the need to use registers.
Since the your code at ATmega328P is accessed by SBI/CBI instructions, it does not use registers and is fast.
Furthermore, unnecessary PUSH/POP is not generated.
(The code with this case doesn't also require saving R0, R1 and SREG, but it will be added automatically unless ISR_NAKED is added.)
You should use the VPORTF.OUT register to do the same with the ATmega4809.

EDIT:
In addition, PUSH and SBI / CBI instructions have been reduced from 2 cycle when the ATmega328P to 1 cycle when the ATmega4809.
Since the cycle from when an interrupt is triggered to jumping to the interrupt vector for RJMP/JMP execute is the same for both, the ATmega4809 handles interrupts faster than ATmega328P(or 2560) when written ISR without attachInterrupt.


The assembly list is shown for reference.

Arduino MEGA

ISR(INT2_vect) {
  PORTF |= (1 << 2);
  PORTF &= ~(1 << 2);
}
000001a4 <__vector_3>:
 1a4:	1f 92       	push	r1
 1a6:	0f 92       	push	r0
 1a8:	0f b6       	in	r0, 0x3f	; 63
 1aa:	0f 92       	push	r0
 1ac:	11 24       	eor	r1, r1
 1ae:	8a 9a       	sbi	0x11, 2	; 17
 1b0:	8a 98       	cbi	0x11, 2	; 17
 1b2:	0f 90       	pop	r0
 1b4:	0f be       	out	0x3f, r0	; 63
 1b6:	0f 90       	pop	r0
 1b8:	1f 90       	pop	r1
 1ba:	18 95       	reti

19 cycle + reti


Arduino Nano Every

ISR(PORTF_PORT_vect) {
  PORTF.OUTSET = PIN2_bm;
  PORTF.OUTCLR = PIN2_bm;
  PORTF.INTFLAGS = 8;
}
000001a6 <__vector_29>:
 1a6:	1f 92       	push	r1
 1a8:	0f 92       	push	r0
 1aa:	0f b6       	in	r0, 0x3f	; 63
 1ac:	0f 92       	push	r0
 1ae:	11 24       	eor	r1, r1
 1b0:	8f 93       	push	r24
 1b2:	ef 93       	push	r30
 1b4:	ff 93       	push	r31
 1b6:	e0 ea       	ldi	r30, 0xA0	; 160
 1b8:	f4 e0       	ldi	r31, 0x04	; 4
 1ba:	84 e0       	ldi	r24, 0x04	; 4
 1bc:	85 83       	std	Z+5, r24	; 0x05
 1be:	86 83       	std	Z+6, r24	; 0x06
 1c0:	88 e0       	ldi	r24, 0x08	; 8
 1c2:	81 87       	std	Z+9, r24	; 0x09
 1c4:	ff 91       	pop	r31
 1c6:	ef 91       	pop	r30
 1c8:	8f 91       	pop	r24
 1ca:	0f 90       	pop	r0
 1cc:	0f be       	out	0x3f, r0	; 63
 1ce:	0f 90       	pop	r0
 1d0:	1f 90       	pop	r1
 1d2:	18 95       	reti

28 cycle + reti


Arduino Nano Every

ISR(PORTF_PORT_vect) {
  VPORTF.OUT |= PIN2_bm;
  VPORTF.OUT &= ~(PIN2_bm);
  VPORTF.INTFLAGS |= 8;
}
000001a6 <__vector_29>:
 1a6:	1f 92       	push	r1
 1a8:	0f 92       	push	r0
 1aa:	0f b6       	in	r0, 0x3f	; 63
 1ac:	0f 92       	push	r0
 1ae:	11 24       	eor	r1, r1
 1b0:	aa 9a       	sbi	0x15, 2	; 21
 1b2:	aa 98       	cbi	0x15, 2	; 21
 1b4:	bb 9a       	sbi	0x17, 3	; 23
 1b6:	0f 90       	pop	r0
 1b8:	0f be       	out	0x3f, r0	; 63
 1ba:	0f 90       	pop	r0
 1bc:	1f 90       	pop	r1
 1be:	18 95       	reti

15 cycle + reti