Delay Needed for External Memory on MEGA 2560

Background:
I'm using a MEGA 2560 to interface with hardware that was originally designed to be controlled by a 68HC11. I can't go into a lot of detail on what the hardware is, but the important part is that I need to interface with, among other things, a parallel 32K NV SRAM chip. For ease of manipulation, I sometimes need to combine two consecutive bytes into a single word. The easy way to do this would be to use word():

NewWord = word(HighByte, LowByte);

This seemed to work most of the time, but I had one value I needed to compare against a constant that kept getting flagged as bad.
If I went back and dumped it to the serial port byte by byte it was correct. I tried different ways to create a word from 2 bytes even breaking it down into each step like:

word temp;
temp = pointer_to_byte[Address_of_high_byte];
temp = temp << 8;
temp = temp| pointer_to_byte[Address_of_low_byte];

Still had the problem.
When I stuck in some debugging code it magically worked!

word temp;
temp = pointer_to_byte[Address_of_high_byte];
Serial.println(temp, HEX);
temp = temp << 8;
temp = temp| pointer_to_byte[Address_of_low_byte];

This confused me for a while until I tried replacing

Serial.println(temp,HEX);

with

delayMicroseconds(1);

This also fixed the problem, and brings me (finally) to my questions:

Why do I need a 1 uS delay between byte reads? If the memory/latch is too slow, why does the first byte read always work even without a delay? I haven't gone through the timing diagrams yet, but I suspect my latch (an older 74HC573) is too slow. The NVRAM is 70ns, so it should be fine. If I can dig up a 74AHC573 I'll try that. The old hardware was deigned for a <2MHz clock.

I originally set up my external memory like this:

//Init External Memory
  XMCRB=0x00; //all 64k, no bus keeper
  XMCRA=0x8F;//10001111
  
  //Bit 7 (SRE) = 1 to enable
  //Bit 6-4 (SRL2:0): 
  //000 = all one sector
  
  //Wait states:
  //00 = none, 11=Wait two cycles during read/write and wait one cycle before driving out new address
  //Bit 3-2 (SRW11, SRW10) - wait state for upper
  //Bit 1-0 (SRW01, SRW00) - wait state for lower

Is there any other way to slow the external memory interface? I don't want to mess with the system clock as that will affect all sorts of things.

Thanks

I can't go into a lot of detail on what the hardware is, but the important part is that I need to interface with, among other things, a parallel 32K NV SRAM chip.

How is it connected to the Mega? Long ribbon cable?

The A/D bus is all short (~4") wires of equal length. I used jumper wires soldered in place for that (I have a protoshield for the MEGA), then switched to thinner cut and stripped wires for other connections because the jumpers were just a tad too big and were ripping up traces when pushed through the holes where the 68HC11 socket used to be.

The ATMEGA datasheet does caution against using a 74HC series latch, so it's looking more likely the cause.

  XMCRA=0x8F;//10001111

...appears to be correct for what you are trying to do.