16 bit shift register parallel out...?

I would use SPI.transfer to send data to a cube, way faster than software based shiftout.
Use SCK and MOSI pins, with SS for the shift register latch pin.

digitalWrite (SS, LOW);
SPI.transfer (byte1);
SPI.transfer (byte2);
digitalWrite (SS, HIGH);