8x8x8 multiplexed LED cube with an Arduino Mega 2560

1,2, 3: Yes, an Arduino. Connections would be via SPI as I previously described.
Specifically: D13/SCK to SRCK on all chips
D11/MOSI to SerDataIn on 1st chip, then SerOut goes to SerDataIn down the chain
D10/SS to RCK on all chips
SerClr/ to +5 on all chips
OE/ to Gnd on all chips. You could also experiment with connecting to a PWM output for dimming.
4. 64 * 20mA = 1.28A. I don't know of a PNP equivalent to ULN2803 that can do this.
I would suggest a P-channel MOSFET such as this

I also realize some resistors are missing. See corrected shematic below.
5. Yes, software does the multiplexing as I described earlier; one layer's transistor is turned on at a time while the cathodes are pulled low for on, and not for off.
Having each layer on for 4mS would yield about a 30Hz refresh rate. Better results may be seen if the transistor shift register is seperately written from the cathode shift register, have to experiment some & see.
6. I would keep the image of the cube in a 64 byte array, writing out 8 bytes at a time for a layer using SPI.

Cube would use 1.28A if a layer was turned full on at 20mA. 5V, 2A supply would be sufficient
http://www.dipmicro.com/store/DCA-0520

I suppose one could also use 4x TLC5940 or 4XWS2803 for the cathodes, and have PWM capability per channel. Have to send out more data tho, at least a full byte per column for 256 level brightness control. Vs just 8 bytes per layer & on/off only.

MAX7219, controls both the anodes & cathode, haven't thought of a way to multiplex that. Would think something like 8 transistors between MAX7219 and the anodes, or cathodes, of each layer to be able to isolate the control for that layer? Then have to control turning layers on/off in software and writing out data for each layer in software.