I'd rewire it to use 6 PNP transistors (e.g. BC327) sourcing the columns, and either a TLC5940 or 2x74HC595 + 2xULN2803 + resistors sinking the rows. The 6 PNP transistors can be driven (via a series resistor to the base) either from 6 Arduino pins, or from 3 pins and a 74HC138 3-to-8 decoder.
If using the TLC5940, do the sums to make sure you won't exceed its power dissipation rating at the LED current you want to use.
I was trying to keep the discrete components to a minimum, so maybe a MIC2981? http://micrel.com/_PDF/mic2981.pdf It's a current source driver array (no experience with this exact one) and goes for just over $2 on Digikey. It would bring six transistors + six resistors down to one 18 DIP socket/chip. And yes, I would like to use six pins from the Arduino to scan instead of a counter, shift register etc.
Looking at the datasheet, it looks like I need some data I don't have, like current (Icc) into the TLC.The equation is on page 15: http://www.ti.com/lit/ds/slvs515c/slvs515c.pdf
The problem with that chip is the Vce(sat) i.e. the voltage you lose in it, which is around 2v.
It's basically the maximum current you want to drive each LED at, times the number of LEDs you are driving (16 if you are using all the pins), times the voltage drop across each output (i.e. 5v less the voltage lost in the column driver, less the forward voltage of the LED). Plus the power drawn by the TLC5940 internals.