8x8x8 multiplexed LED cube with an Arduino Mega 2560

"If the software messes up the multiplexing and lights up all the 512 LEDs, I don't know what's going to happen..."
Nothing. Any COTS switching power supply will just shutdown as it sees an overcurrent situation occurring, same as if you shorted the output.

Your latest questions:
Resistors = inexpensive resistors, 1/8W, carbon composition. Value will depend on the LED color & the current you want to put thru them.
If have a 5V source, and the anode transistor has 0.45V across it, and the cathode shif register has 0.25V across it, that leaves the remaining voltage across the LED and the resistor.
For and LED with Vf of 3.2V way, and using 20mA as the current, then:
(5V - .45 - .2v - 3.2)/.02 = 55 ohm. 56 ohm is a standard value.

For the transistors, the base/gate resistor from the shift register is probably not needed (no need to limit current into the shift register, that will be set by the pullup resistor), and the pullup resistor can be smallish, say 220 ohm.

Caps are just 16/25/50V 0.1uF caps. I use these a lot
http://www.dipmicro.com/store/C5K10-50

P-channel MOSFET -
Gate connects to the shift register
Source connects to +5
Drain connects to the LEDs
(odd naming, I know).

30Hz - Movies were only 24 Hz for the longest time, and TV was 30 Hz. Good enough?

PWM connected to OE/ on the shift registers could be used for dimming. I would connect to the cathode shift registers where there is less current flow.

You are making too much of the "multiplexing complexity" I think.

// some stuff you need anyway
# include<SPI.h>
unsigned long currentmillis();  // millis() and micros() are type unsigned long, 0x00000000 to 0xFFFFFFFF
unsigned long previousmillis();
unsigned long duration = 4;  // time in mS to display each digit
// declare other variables, and 2 arrays, one to hold the anode bit selection, one to hold the 64 bytes of the array

void setup(){
// do all the setup stuff, pinModes & stuff.
SPI.begin();
};

void loop(){
// start the multiplexing:
currentmillis = millis();  // capture the "time"
if ( (currentmillis - previousmillis)>=duration){  // 4 mS gone by yet?
previousmillis = previousmillis  + duration;  // set for next time check

x=x+1;  // break down anode for:next loop so it can run within the time check
if (x==8){
x=0;  // reset  after passing 7
}
// turn off existing anode (see note below)
digitalWrite(anodeSS, LOW);
SPI.transfer(0xFF);  // all 1s so no MOSFET is on
digitalWrite(anodeSS, HIGH);

// set up cathodes
digitalWrite (cathodeSS, LOW);
for (y=0; y<8; y=y+1){
SPI.transfer(dataArray[(x*8)+y]); // so 0-7, then 8-15, 16-23, etc., up to 55-63
}
// (perhaps ditch earlier anode write & pull later one back to here, so all shift registers are updated together with one SS pin?)

digitalWrite (cathodeSS, HIGH);

// now turn on one anode (see note above)
digitalWrite(anodeSS, LOW);
SPI.transfer(anodeArray[x]);  // Array holds B00000001,  B00000010,  B00000100, B00001000, B00010000, B00100000, B0100000, B10000000
digitalWrite(anodeSS, HIGH);  // with high = output pulled low

} // end time check

// do other stuff if desired while waiting for next time check, like receive serial data, update dataArray contents, etc
} // end void loop

This may need a little tweaking, but that's basically it.
I used seperate SS for the Anode vs the Cathodes, it may be that you can string them all together with one SS, have to play some and see which looks better.