What are the limitations of expanding inputs and outputs using shift registers?

I'm currently using two 8 bit parallel in serial out shift registers to read and decode digits on 7 segment displays. It has me thinking, how would I use this method for serial data inputs? Like having a serial signal on each parallel input of the shift register. It would be like demultiplexing, but instead of reading one input stream at a time and switching between them and returning a byte for each of the inputs, the register would have one bit in each byte for each bit of the input streams.

The advantage to this over demultiplexing would be that it required only 3 pins, clock, load, and data, no matter how many inputs there are. The demultiplexer would require n pins for 2^n inputs.

The inputs would need a clock signal, assuming they all share a common clock, that signal could go into the load pin. That clock would then be multiplied by the number of parallel inputs and go into the clock input of the register and also to the arduino. Then only two pins would be needed on the arduino, clock and data. Possibly only the data pin if the clock rate is known, but I'm not sure if this is feasible.

The data retrieved could be put into a matrix and turned 90 degrees to produce the original inputs.

The sequence diagram might not be totally accurate but I think it helps get the idea across.

The only limit to the inputs I see is the clock speed, seeing as it scales linearly with the number of inputs. For an n-bit register, the output clock needs to be n times faster than the input clock. That might be a job for a PLL or something, but at what point is it better to just use another arduino instead of all this?

Is it feasible to design a general purpose input expander like this? I assume a similar method using posi registers could be used to expand the outputs.

Search Google. They already exist.

As you said, the limitation is how fast you want to read the inputs. You can read a single input more than 10,000 times per second but if you expand it to 100 inputs then you only get 100 readings per second on each one. For buttons and LEDs, that's probably okay.

It's usually better to use another Arduino if the inputs are in local groups. Then you put one Arduino per group (say, per building in a multi-building campus) and transmit the data to the master over a long-distance serial link. If everything is local, like you need a single box with 100 wires coming out of it for an art project, then port expanders are great. If the inputs are all distributed (one in each street light on campus) then it's a more tricky problem.

The clock speed is SPI clock, by default CLK/4, 4 cpu cycles per bit before the next read.