Thanks for the replies and sorry the delay!
Is there any reason for not using the SPI hardware?
I didn't know about SPI, and thanks for pointing it out. If I got it right, there's single SPI on Uno, and that's able to push data at 8Mbits/s. I believe transfer() call to SPI is asynchronous so that data loads & looping can be done in parallel on the CPU while SPI feeds the data to the pins?
How much data are you talking about here? Where is it coming from? My limited experience with shift registers is as serial in / parallel out devices i.e. one shift register per pin. Are you writing to multiple shift registers simultaneously?
I'm looking to control ~300 RGB LED's, where each RGB channel takes 8-bit PWM data. This is rotating LED cylinder with radius of ~15cm and I'm looking for ~5mm resolution, so the cylindrical "display" size is 28200 pixels. I would like to have refresh rate of 20fps, so that would require 28200*20*3*8 bits/sec = ~13Mbits/sec data pushed from the main board. Of course the specs are adjustable to whatever is feasible to implement, but higher the better. While I would like to push the data directly to shift registers, I think I need additional microcontrollers in between, because the cylinder probably need to rotate faster than the actual update rate to maintain steady image (e.g. 100fps).
Theoretically. Note that the ATmega328 on the Uno only has one IO port (PORTD) with a full 8 pins, and those pins include the serial rx/tx pins.
I checked that it should take 8 cycles to update 8 bits of data from an array to the PORTD. If I unroll the loop few times you could shave off ~couple of cycles for higher transfer rate:
1) read data from memory to a register (2 cycles)
2) out the register to port D (1 cycle)
3) out 1 to cycle pin (1 cycle)
4) out 0 to cycle pin (1 cycle)
5) increment data address counter (1 cycle)
6) compare and loop (2 cycles)
So at 16MHz that should give 16Mbits/sec transfer speed.