"con", yes, but interesting discussion 8)
It's NOT easy to address 1000 discrete I/O interface pins in Arduino, even with a whole boardfull of shift registers. However 20 Mega2560s can instantly address those 1000 I/O pins directly with no external hardware,
How is a boardful of shift registers different in this case than a rackful of 20 Megas?
I think I'd rather have the shift registers - 125 SPI.transfers to read all 1000 bits in while writing all 1000 bits out. No messing with a RS485 to read from 20 different slaves.
I have code running now that sends out 45 bytes at 20 KHz rate; 125 bytes would slow down the overall rate.
1/125uS = 8 KHz, and its just over 1uS/byte, so the rate would be a little less than that.
I doubt that interfacing to 20 slaves to read 7 bytes from each could achieve similar speeds. You'd have to go to faster interface speeds than the 8 MHz SPI that a 16 MHz Arduino supports.