Fast enough?

Greetings all

I'd like to interface a Nano 3.0 to the calibrator interface on a Sevcon Millipak traction controller. The hardware is easy - the Sevcon outputs clock and data lines at TTL levels so that's a no brainer. My thinking was clock to an interrupt, interrupt on the falling edge, and read the data. The possible gotcha is that the clock pulses occur at 14uS intervals (approx 74,500Hz), and the packet is 36 bits long. The clock stops when the packet completes and packets repeat at a minimum of 30ms intervals, and I don't have to catch every packet. So for now, considering just one packet, am I going to be able to catch and stash the bits fast enough with this thing?

I've had three thoughts on the interrupt service routine. First:

 unsigned char x[36];
 int index;

void isr1(void)
{
  x[index++] = digitalRead(pin);
}

Down side - eats a lot of precious ram.

Idea #2:

unsigned char x[5];
int bitcount = 0;

void isr2(void)
{
  x[4] = (x[4] << 1) | (x[3] & 0x80) >> 7;
  x[3] = (x[3] << 1) | (x[2] & 0x80) >> 7;
  x[2] = (x[2] << 1) | (x[1] & 0x80) >> 7;
  x[1] = (x[1] << 1) | (x[0] & 0x80) >> 7;
  x[0] = (x[0] << 1) | digitalRead(pin);
  bitcount++;
}

Advantages - uses a lot less ram.
Disadvantages - slower (too slow?)

And lastly:

unsigned char x[5], bit;
int bitavailableflag = 0;

void isr3(void)
{
  bit = digitalRead(pin);
  bitavailableflag = 1;
}

void loop(void)
{
  ..
  if (bitavailableflag == 1)
  {
     x[4] = (x[4] << 1) | (x[3] & 0x80) >> 7;
     x[3] = (x[3] << 1) | (x[2] & 0x80) >> 7;
     x[2] = (x[2] << 1) | (x[1] & 0x80) >> 7;
     x[1] = (x[1] << 1) | (x[0] & 0x80) >> 7;
     x[0] = (x[0] << 1) | bit;
     bitavailableflag = 0;
  }
  ...
}

Advantages: fastest ISR
Concerns - uses more memory than 2 but less than 1, will it be fast enough?

Any thoughts and advice would be welcome.

Board?

Nano 3.0 (I mentioned that) ATmega328 @ 16mHz

ATmega328 @ 16MHz

1 / 16000000 = ~6.25E-08 seconds per machine instruction.
1 / 74500 = 1.34228E-05 seconds per bit.
1.34228E-05 / 6.25E-08 = 214 instructions per bit.

sizeof(x[36]) + sizeof(index) = 38 bytes.
Total SRAM = 2048 bytes.
38 / 2048 = 1.9% of available memory.

214 is a bit snug. 1.9% of available memory leaves plenty of free space. Go with the simplest / fastest solution (the first).