SPI is a parallel bus, all devices get SCK-MISO-MOSI connected in parallel, with a unique chip select for each device ('device' can be a group of daisychained shift registers). Each device needs to have a tristate output controlled by it's chip select so as not to drive MISO when not selected, this can be 1 gate of a 74HC125 for example.
You really need 0.1uF cap at each device's VCC pin in addition to the 1uF cap on the power bus.
Your explanation does not provide insight as to when data is captured and when it is read out.