What is the layout of your matrix? Example: 32x32? 10 x 100? 8 x 125? 16 x 62?
32x32 probably most efficienct - 4 shift registers along each side.
4 along the "side" of the matrix sourcing 20mA pf current to the 32 anodes of each row,
and 4 along the "bottom" of the matrix controlling 32 transistors to sink 640mA from each column, one column at a time.
Spread the LEDs out however you'd like.