Treating each 8x8 matrix as a individual matrix would require 32 shift registers, and seems like a over complication and a waste of hardware, thats not really the question here tho (I have most of that worked out)
Wops, thats true :-)
I am trying to find a shift register that can source enough current to get nice n bright led's while reducing hardware (ie tons of transistors and their resistors)
After a quick googling, I found the A6282 - 16-Channel Constant-Current LED Driver. http://www.alldatasheet.com/datasheet-pdf/pdf/239892/ALLEGRO/A6282.html Unfortunately (or not?), it doesn't seem to be in a DIP package. http://search.digikey.com/scripts/DkSearch/dksus.dll?lang=en&site=US&WT.z_homepage_link=hp_go_button&KeyWords=A6282&x=0&y=0
I haven't searched extensively, so there's probably others out there (as is evident from the first lnk, under the "related electronics part number" headline).
But if this is not an option, I'm not sure you can escape using some kind of transistors entirely. And resistors need not take too much space, especially if mounted vertically, or using SMD's (and they're cheap too).
I see you have most of it worked out, and I'm not telling you how to do things at all. I just have some thoughts on the subject. No guarantee regarding correctness either. I'm really not trying to give bad advice here, so if others kow better.. feel free to jump in!
As I see it, in the "basic" setup of a 32 by 32 matrix (with one LED, or with common anode RG / RGB LED's) I'm counting 32 transistors (surprise:p) minimum. That is if you use the LED matrix with the anode at the rows, cathode at the columns and the TPIC6B595 also at the cathodes (column) end. Transistors of course at the rows, sourcing current as needed. Transistors need to be able to supply at least 32 times the current for each led/color, of course. At least pulsed. I'm assuming this will be more than the 500 mA pulsed capability of the TPIC6B595, hence not using this as row multiplexer (which would require another column driver setup).
In short; the rows are transistorized. Columns could also have transistors of course, but I'm just guessing the TPIC6B595 is enough (and lowering component count). Not that some smaller transistors need take than much space. Better yet, some transistor array IC's (google it). Like M54513P or similar (that one have 8 transistors, 50mA each. Not enough for rows though).