Well in my test, the generated code for this sketch:
volatile byte savedPort;
byte bar;
void foo ()
{
bar = savedPort;
}
ISR (PCINT0_vect)
{
savedPort = PINB;
foo ();
} // end of PCINT0_vect
void setup ()
{
// pin change interrupts
PCMSK0 = _BV (PCINT1); // only want pin 9
PCIFR = _BV (PCIF0); // clear any outstanding interrupts
PCICR |= _BV (PCIE0); // enable pin change interrupts for PCINT0..7
}
void loop () {}
Only had this before saving the port:
ISR (PCINT0_vect)
100: 1f 92 push r1
102: 0f 92 push r0
104: 0f b6 in r0, 0x3f ; 63
106: 0f 92 push r0
108: 11 24 eor r1, r1
10a: 8f 93 push r24
{
savedPort = PINB;
10c: 83 b1 in r24, 0x03 ; 3
10e: 80 93 00 01 sts 0x0100, r24
Which isn't much worse than you can do with assembler. You have to save the status register, and before you do that you have to save R0.
Now I caution you about getting too carried away about shaving nanoseconds off ISRs. I did a few tests earlier, using this code:
ISR (PCINT0_vect)
{
PORTB = 4; // turn on pin 10
} // end of PCINT0_vect
void setup ()
{
digitalWrite (9, HIGH); // pullup
// pin change interrupts
PCMSK0 = _BV (PCINT1); // only want pin 9
PCIFR = _BV (PCIF0); // clear any outstanding interrupts
PCICR |= _BV (PCIE0); // enable pin change interrupts for PCINT0..7
pinMode (10, OUTPUT);
digitalWrite (10, LOW);
}
void loop () {}
Now, measuring the time taken between pin 9 going low (by my touching it to ground) and the time that D10 is brought high, as promptly as I could, I got these figures on consecutive tests:
1.2500 uS
1.4375 uS
1.5625 uS
1.3750 uS
That's a difference of 0.3125 uS (5 clock cycles) in what should be a repeatable experiment! I think at least 4 can be accounted for by the fact that main does a CALL to call loop, and CALL takes 4 clock cycles. Once the instruction starts, it has to finish before the interrupt can be serviced. Probably the 5th would be because of the exact time the interrupt occurred with reference to when the clock pulses.
So you already have something like 5 clock cycles of "jitter", and that is without doing anything else. For example, Timer 0 will cause an interrupt. Whether or not it is higher or lower priority than your pin change isn't the point. Once it starts, it has to finish. So that could be another 5 or 6 uS down the drain. And if your code calls millis() that turns interrupts off briefly. So that delays things too.
So with all these variables, whilst it is nice to design for a fast response, all this assembler code might be bit of an overkill.
The further the check of the pin state, the less likely (under certain circumstances, such as switch bounce) the port read is going to actually reflect the state of the port at the time of the interrupt.
Pin change interrupts can be deduced somewhat by comparing the now value to the previous one. Of course it could change back quickly, but switches don't tend to bounce that fast. And for other interrupts (eg. a falling level interrupt) if it fired you know what the new state is.