16x16 RGB LEDs with 96 register, good idea?

Depends how you want to address them.
If shared, you need separate CS line to each device, then you can address each chip individually. That would be fastest.
If daisy chained, you're more or less forced to send out all (16 x 6) byte every time.

Maybe some compromise - daisy chain, but in groups of 4? Works good for a '328.