Multiple Data, Clock, Latch pins

It seems that theoretically we can section multiple data, clock, latch pins. The reason for this is because we are trying to get around a long daisy chain of shift registers.

I have 60 shift registers, and there are 18 pins on Arduino. 18/3 = 6 - there are six sections I can set aside for one data, one clock, and one latch pin in each section.

That means I can assign 10 shift registers per section.

Is it possible to code Arduino this way?

If you use the SPI hardware you can load a string of shift registers at eight million bits per second. At that rate it may be faster to load all 60 shift registers via SPI than loading 10 registers with ShiftOut().

You can share two out of the three pins for each string of shift registers. If they share clock and data you only need a unique “Latch” pin for each. That is how SPI shares the bus: MOSI and SCK (Data and Clock) are shared but the SlaveSelect (Latch) pin is unique. That would allow you to have 16 strings of shift registers.

Can't the SS be chained one to the next? CLK to CLK, MISO of each shift register goes to MOSI of the next, last MOSI to the AVR MISO (or don't bother, but you could cycle bytes) and SS latches the whole chain at once. With power and ground, 6 wires should do the set.

I've read here that an UNO should be able to hit 2 MB/sec (16Mb/sec) on SPI. Is that wrong?

With SPI:
SCK goes to all devices in parallel.
Latch (SS) can go to all devices in parallel - MOSI then goes to first device's serial in, its serial out goes to the next device, etc.
For long strings, it can make sense to break the chain up and use for example 4 unique latch signals. MOSI then goes to the first device in each string.

SPI max clock speed is system clock/2 - so 8 MHz for SCK.
There are ways to reach near 8 MBit/second speed (1 Mbyte/sec). You have to avoid using loops, you have write to the SPDR (SPI data register) directly (I pull data from an array), you have to code in 15 NOPs after the register write to wait out the transfer completion and start the next one immediately.
If you watch the data on a scope, you then see the 8 SCK clocks, a short gap where the register write occurs, then the next burst of clocks.
Search for "20 KHz clock" for a post I made yesterday evening with a screen shot showintg that.

Hmm, didn’t post what I thought I had. The clock in the top traces is 20 KHz, the 2nd line is the SCK for 42 bytes of data, taking ~47.5uS to transfer. 1 Mbyte/sec would have taken 42uS.
I can post a shot of the SCKs when I get home if I remember.

It’s still very, very fast serial compared to 115200 baud including start and stop bits or I2C for all that.

I don’t see why not having them all on 1 latch. They’d be coordinated with no extra code necessary.
If it’s a matter of 1 TTL output should only drive 10 TTL inputs then isn’t that why there are transistors?

Thank you everyone for your help everyone.

CrossRoads:
With SPI:
SCK goes to all devices in parallel.

There’s little chance you’ll get away with a fanout of 60 for SCLK!

This is screaming out for a clock distribution network that’s been carefully thought out.

How (physically) large is the daisy chain meant to be? What kind of wiring between
sections was being imagined?

The '328P has plenty of drive current, other HC shift registers need very little, so its just capacitance that could be a problem. So maybe slow down the clock rate some; or add a buffer or two, depending on how things are spread out. I have clocked via SPI 20 TPIC6B595 shift registers across two of these boards at the default speed (4 MHz), 10 on each with the SPI signals carried from 1 board the next, the 2nd having just shift registers.