My 2 cents...
No two interruipts happen simultaneously. If the code to process one interrupt runs long enough that the next interrupt is blocked then the second interrupt will wait for the first to complete.
Similarly, if the main-line code spends a lot of time with interrupts disabled, then an interrupt can wait before it is executed.
I do not recall reading about the priority of interrupt delivery in the datasheet. Although, I was reading with different questions in mind, and there are 566 pages in the datasheet... you might find what you are looking for there.
Given it does not make much practical difference it might be 'unspecified' meaning the priority can be vary between versions of the CPUs without notice.
If the timing of an interrupt delivery is really super critical - then you can really only have one interrupt source in the system.
As once an interrupt starts running, the priority of the running interrupt relative to the next interrupt is not relevant.
The second interrupt ALWAYS waits until the first completes. The priority only makes a difference once there are two interrupts waiting - at which point the higher priority interrupt waits for a shorter time. At this point we have 3 interrupts running one after the other... and perhaps by the time they are done another interrupt is fired.
If you want to do super fast interrupts, using GNU-c/c++ the interrupt latency can be reduced to a few cycles - given you disable the prologue, manage your register use carefully, limit your code, and code in inline assembler and do not use the Arduino library. *** It is probably simpler to buy a faster processor. ***
The bare metal interrupt for Bill's example is...
// 3 cycle in pin sync
// 3 in the interrupt vector
ISR(INT0_vect, ISR_NAKED) // dont save r0, r1, sreg etc
{
LED_OFF(); // results in CBI which does not affect SREG // two or three cycles here.
// can not do much more here as no regs available...
reti();
}
So about 1/2 the number of cycles.
If you want to use interrupts as intended in the Arduino environment, to allow your code to be informed of something that happened in the real world, without having to check the state of a pin all over the place, then the extra 50+ cycles should not matter.
They represent 3 (16Mhz) or 6 (8Mhz) microseconds...
============
Look up the ATMEGA 168 on the atmel website. There is a link to the instructions there.
There are special instructions for accessing bytes with a variety of address ranges - it is hard to give a flat statement of the number of instructions per line of code.
The best way is to look at the output of avr-dump of your compiled code... if you search for avr-dump on the forum you will find discussions on how to do this on your platform.