different speeds, serial in / serial out register

I am on a project which overlays some graphics on a vga high definition screen.

The prototype is working well with a bucketful of HTC chips, but now I am back onto wanting to do it with the Arduino.

The problem is the speed at which a high definition screen switches, so I have got the actual horizontal switching running from a pair of 74LS165 parallel to serial registers, which have preset links on the input while I was testing the speed / resolution, which is fine, and I have the sync sorted out.

I will now feed the 16 bits from a pair of 595 serial/parallel registers fed from the micro. ( now back out of warp factor speeds )

It seems a long way round, but I can read from the 165 chips 16 times - that is about 350 microsecs - while the 595s are refreshing and latching once ( I hope ) .

(My pixels are 16 lines high )

( I will try tonight running the Shiftout example for the two 595s driving LEDs and see how long it takes to cycle..... )

Before I start hooking up the 16 wires between the parallel pins of the 4 chips, does anyone know of a simple 16 bit serial in/latch/serial out which can do the same thing? That is the output to read from the latch many times while the serial input is loading?

I did a search and think one of the old UART chips might do it, but got a bit lost with the way they are addressed.