PORT write vs PIN read timing

Uno at 16MHz - possibly '328 hardware issue/feature

I had a problem with data corruption in this configuration:-
RAM is CY7C199 12ns access time on pins 2..9 for data
Address is from an external counter clocked by CS (pin10) rising
Without the 'dummy' setting of CS I get corruption of the PINB bits
Is there a delay in writing to the PORT relative to reading the PIN?
@16MHz cycle = 62.5ns so plenty of time apparently
Is the read at the start but the write at the end of a cycle?

unsigned char Read_RAM(void)
{
  unsigned char bread;
  PORTB &= ~CS_MSK;          // put CS low
  PORTB &= ~CS_MSK;          // 'dummy' for extra port pin response time?
  bread = (PINB & PB_MSK) | (PIND & PD_MSK);       // Read the data
  PORTB |=  CS_MSK;           // put CS high - autoincrement address
  return(bread);
}

It works OK with the code shown but I would appreciate an explanation.
Is there a better way?
(delayMicrseconds(2) also worked but I think took longer)

Thanks (Hope this is the right place)

Is it possible the problem is with the external address counter? If I understand correctly, you're taking the CS line low, then back high to increment the counter. How long does that line need to be low before it's brought back high again to have the counter increment correctly? And then how long before the increment propagates through the counter?

So far as I know, the Uno timing is straightforward.

ShermanP:
Is it possible the problem is with the external address counter?

Probably, so we need to see a datasheet of it in order to figure out the minimum pulse length (time), also known as maximum clock frequency.

I don't know how the compiler translates "PORTB &= ~CS_MSK;", but I presume it might take 4 machine instructions:

// Don't take this as actual assembly code, this is just a pseudo-code

in(PORTB, aRegister); // Unless a copy of the current value is still on a register
ldd(~CS_MSK, anotherRegister); // ~CS_MSK is compiled as a hard-coded constant
and(aRegister, anotherRegister); // Result is written to the first parameter
out(PORTB, aRegister);

Assuming that in and out takes two CPU cycles, this part should take 6 so 375 ns. Is 375 ns (more or less) too fast for your binary counter? That's the question.

On the other hand, a PINx read should be faster because it involves less steps (if the stored value is kept in a register).

theVariable = PINB;

roughly it does this:

// Don't take this as actual assembly code, this is just a pseudo-code

in(PINB, aRegister);

// This extra step will for sure occur if the variable is declared with the volatile keyword
out(theVariable, aRegister);

More or less ONLY 2 INSTRUCTIONS IN THE WORST CASE, which might account for 125 or 250 ns (again, if in and out takes two CPU cycles).

Thanks to all who have replied.
Some extra info I probably should have included:-

Address counter is 74HC590 x2
CS clock to both devices (Counter and latch) active on rising edge
(I know about the 1 clock delay between counter and latch output)

25degC 5V spec for clock setup/width 20ns
Ard pin output CS falling (only enables RAM output)
to RAM data available ~12ns max
[Ard Read data]
Ard pin output CS rising (Address counter clocked here)
to address valid 58ns max
[go round to next read]

Physical trace lengths <2cm

There does seem to be some pattern in that LSB(PB0) is always 1
and (PB1) shows 00110011......
(Test pattern is 'ABCDEFGHIJKLMNOPQRSTUVWXYZ....')
Other bits (PD2..7) seem OK
Maybe PB is read first in my code?

As the counter has the Return, Process and Call to go through there should be time for it to stabilise.
(Probably < 60ns needed)
There is only the RAM access time after CS low (~12ns) needed before reading.

I don't know the ins and outs of the Mega port structure but in some other devices there is a sync to a clock which effectively delays the detection of the pin state. (NOP needed)
Otherwise it acts as if the CS port drive cannot swing the pin fast enough to get the RAM to send its data in time especially if writing to the port is at the end of a cycle and reading is at the start.

I can't see that counter or RAM speed is a problem in this configuration so long as there is at least 1 clock cycle (@16MHz) between writing and reading PORTB.

Attached: schematic of ARD, counter and RAM.

Thanks again for your interest.

Checked out the Mega328 datasheet and found that it does indeed have a synchroniser on the PIN inputs. This inserts about a 1 cycle delay between the pin changing and the new value being read.
Thus my extra PORT write (or a NOP) is necessary as a short delay.

While searching I just found:

  asm ("nop");

as a way of putting in a 1 cycle delay.
Should make the code more understandable.