driving 32 leds as a matrix with TLC5916

I need to drive 32 leds, each needs about 20ma of current but my goal is to minimize the total current used.

I am also avoiding the max7219 since it is an expensive chip (and I need to build a bunch of these).

My plan originally was to just use 4 x TLC5916INE4 (each is an 8 bit shift register with constant currents sinks) and shift out the 32 led values.
I could also try pwm the Enable pin to reduce total current.

Plan B:
Create an LED matrix configuration of 4 rows x 8 columns.
The 8 columns could be sunk by the TLC5916INE4 so then I need to drive to drive the 4 rows. What is the best way to do this??
I can't do it straight off the arduino pins because each pin can only source 40ma.
Could I just use another shift register? If so which one could source enough current?

For driving rows see my project:- Econo Monome

However I would go with the TLC5916INE4 solution as you don't need to multiplex them.

thanks for that link! There are some useful parts in there I could use in other projects. I think for now I will do what you suggest and stick to the 4 SRs.... should make less of a burden on code too.