they flash as the bits clock thru to the final state
That's usually an issue I would think although it looks nice

Also with 374 you would have to wire each Q to the next D wouldn't you?
There are a 1000 SIPO registers out there, look at the TLC5916 and TLC5927 for direct LED driving.
The point is that they (almost) all work the same so that shouldn't concern you as the writer of a library I don't think, except if you are providing tutes with diagrams.
I gather you define the number of chips then for example set OP 56 high. That sounds nice and simple.
One of my pet gripes with the normal digitalWrite() is that you have to call it a dozen times to set a lot of pins, this would be the same but even worse because of course you may have to do a 100 odd shifts for each call.
I would like to see an extension that writes a whole byte to one of the chips, eg
digitalWriteByte(chip_num, val);
and maybe one that writes to an offset and is not tied to the chip boundaries
digitalWriteByteOffset(offset, val);
Both these would be trivial to write and add some value I think.
______
Rob