After a day of testing I think the problem is the TLC5916s and not the 74AC138 3:8 decoders or the MOSFETs at all.
If I run the interrupt routine at 50Khz with no TLC5916 commands (so all it's doing is switching from one MOSFET to the next) the circuit uses only 1% less current than displaying a single row with no updates.
I added 150 ohm resistors on the gate pins and it didn't seem to make a difference one way or the other, but I'm going to leave them in the design because I trust MarkT more than I trust myself.
Once I started adding the TLC5916s code back in I saw the current drop off. Just adding the code that pulses the Output Enable pin reduces brightness about 8% and the SPI and latch code reduces it another 14%. When I increased the amps by reducing the RExt resistors I saw even more dropoff, this non-linear dropoff explains why I'm seeing the LEDs only pulling 57mA on average instead of 100mA.
My guess is the TLC5916s do their constant-current-regulation thing by starting out with low current and slowly ramping up, and my pulsing them so fast gives them less time to figure things out.
All these tests were at 5V; now that I know where the problem is I believe I can safely test higher voltages (as long as I stay within the thermal limits of the chips), knowing the problem isn't the decoders or MOSFET selection.
Thanks to everybody for your input.