I have to say that I have a hard time understanding your schematic.
I would tie all the clock pins (pin 10) together to your sck pin; all the latch / par / ser pins (pin 9) together to your latch pin; 1st cd4021's serial output pin (pin 3) to the 2nd cd4021's serial input pin (pin 11). The last cd4021's serial output pin to your mcu's serial input pin.
the rest of the code would be something like this:
unsigned char cd4021_read(void) {
unsigned char mask = 0x80;
unsigned char tmp=0;
do {
digitalWrite(SCK, LOW);
if (digitalRead(SDIN)) tmp |= mask;
else tmp |= 0;
digitalWrite(SCK, HIGH);
mask = mask >> 1;
} while (mask);
return tmp;
}
void cd4021_sample(void) {
digitalWrite(LATCH, HIGH);
digitalWrite(LATCH, LOW);
}
...
in your user code,
...
cd4021_sample(); //load parallel bits into the shift registers;
tmp3 = cd4021_read(); //tmp3 has the bits from the last cd4021;
tmp2 = cd4021_read(); //tmp2 has the bits from the 2nd to last cd4021
tmp1 = cd4021_read(); //tmp1 has the bits from the 1st cd4021
...
You can certainly put it in a loop if you have lots of registers to read.
I would be surprised if you run out of fan-out capabilities when you wire up the chips correctly.