The PISO SRs will work just fine, the only thing is the speed using user C code and that seems to be open to conjecture at present. Certainly if the function was written in assembler it would be fast enough.Unfortunately the Arduino doesn't have a shiftin() function to match shiftout() as it's possible a library function like this would be written in assembly (OK unlikely but possible).
The ShiftOut is slow, it is NOT coded in assembly
There is no need to improve the speed
The PISO approach is NOT fine as A LOT of hardware is involved
This will need good PCB design..
Not to mention a BIG design Smiley, Eagle free version won't handle it.
Whatever you do this is going to be a large wiring job though.
There are two kinds of PISOs that apply: 74HC165 and 74HC597 and there is no difference between them for your application
What about the 74HC595
most 74*165 or 74*597 chips should work, HCT, LS, AS, F, etc.