Looking at the generated assembler for my ISR:
// SPI interrupt routine
ISR (SPI_STC_vect)
118: 1f 92 push r1
11a: 0f 92 push r0
11c: 0f b6 in r0, 0x3f ; 63
11e: 0f 92 push r0
120: 11 24 eor r1, r1
122: 8f 93 push r24
124: 9f 93 push r25
126: ef 93 push r30
128: ff 93 push r31
{
byte c = SPDR; // grab byte from SPI Data Register
12a: 9e b5 in r25, 0x2e ; 46
// add to buffer if room
if (pos < sizeof buf)
12c: 80 91 76 01 lds r24, 0x0176
130: 84 36 cpi r24, 0x64 ; 100
132: 78 f4 brcc .+30 ; 0x152 <__vector_17+0x3a>
{
buf [pos++] = c;
134: 80 91 76 01 lds r24, 0x0176
138: e8 2f mov r30, r24
13a: f0 e0 ldi r31, 0x00 ; 0
13c: ee 5e subi r30, 0xEE ; 238
13e: fe 4f sbci r31, 0xFE ; 254
140: 90 83 st Z, r25
142: 8f 5f subi r24, 0xFF ; 255
144: 80 93 76 01 sts 0x0176, r24
// example: newline means time to process buffer
if (c == '\n')
148: 9a 30 cpi r25, 0x0A ; 10
14a: 19 f4 brne .+6 ; 0x152 <__vector_17+0x3a>
process_it = true;
14c: 81 e0 ldi r24, 0x01 ; 1
14e: 80 93 77 01 sts 0x0177, r24
} // end of room available
} // end of interrupt routine SPI_STC_vect
152: ff 91 pop r31
154: ef 91 pop r30
156: 9f 91 pop r25
158: 8f 91 pop r24
15a: 0f 90 pop r0
15c: 0f be out 0x3f, r0 ; 63
15e: 0f 90 pop r0
160: 1f 90 pop r1
162: 18 95 reti
I count about 32 clock cycles in there (plus the 4 to enter the interrupt and the 4 to leave) so although this doesn't do much more than store the data in an array, it is taking more clock cycles than we have to hand.