Setting bits of shift register not working as intended

I am trying the whole weekend now to find the problem with this code and board.
I know it is a bit to read, but i how someone can help me here, because i really am at a loss by now.

The whole thing is controlling a 16x16 RGB matrix separated into four 8x8 matrices. Setting the bits for the colors works fine, but the row multiplexing doesn’t seem to work. I am getting the first row displayed on all 8 rows of the segment. I went over the boards several times now, compared it with my drawing and checked if any lines are connected that should not. Everything is the way it should be.
The code should work from my point of view, at least i can’t find any error in it, so i don’t get where my error is.

The first attachment is a fritzing image for the Atmega328 board.
The white wires in there are connected with the row board.
The second attachment shows the row board using a shift register.
The third attachment is the complete sketch code.
The fourth attachment shows the pinout of the Atmega328

The relevant parts for the rows are this:

void loop() {
  if (displayOn) {
    cli();
    while(true) {
      for(int row = 0; row < 8; row++) {
        PORTB &= ~masterResetByte; // masterReset
        PORTB |= masterResetByte; // masterReset
        
        writeDataRegister(row, r);
        writeDataRegister(row, g);
        writeDataRegister(row, b);
        writeRowRgister(row);
        
        PORTD |= latchByte; // latch
        PORTD &= ~latchByte; // latch
        
        __asm__("nop\n\t");
      }
      if (UCSR0A & _BV(RXC0)) { // check uart  (register name changes per port)
        break;  // looks like there is data.  Break out of loop to handle it
      }
    }
    sei();
  }
}

void writeRowRgister(int row) {
  for(int col = 0; col < 8; col++){
    if(row == col) {
      PORTD |= rowDataByte; // rowData
    } else {
      PORTD &= ~rowDataByte; // rowData
    }
    PORTB |= rowClockByte; // rowClock
    PORTB &= ~rowClockByte; // rowClock
  }
}

atmega.PNG

rowController.PNG

matrixController74HC595.ino (10.1 KB)

Have tripple checked the bits i set on the internal registers too, and from what i know they should also match the pins i use on the board...

Ok, it seems i figured it out. The code was too good and the Atmega was running faster then the shift registers could cope. So after adding

#define NOP __asm__ __volatile__ ("nop\n\t")

And with that a few

NOP;

Between setting bits and triggering the clock pins gave me the expected result.