Typically, what is done is 8 shift registers for the colums, and a 9th for the layer selection.
If you use cd74AC164 to drive the columns high, and NPN transistors, or N-Channel MOSFETs to pull the cathodes low one at a time, then you can use SPI.transfer() to send out a byte to turn off all cathodes, send out 8 bytes out for the anodes, and do one more transfer to turn on a cathode. Pull the cathode data from an array:
cathodeArray = {0x01, 0x02, 0x04, 0x08, 0x10, 0x40, 0x80,};
and then a larger array for the layers,
anodeArray[64]; // 8 bytes/layer x 8 layers
Then set up your code to periodically send out the data - sending 10 bytes will take maybe 12-15uS if done properly and arrays are used.
If each layer is left on for 4mS, the whole array can be refreshed every 32mS, for a 30+Hz refresh rate.
During the 4mS on time for each layer, your code can be doing things to update the anode array - reading pots, buttons, etc.
void loop(){
// all time related variables are unsigned long
currentMicros = micros(); // capture current "time"
elapsedMicros = currentMicros - previousMicros; // how long has it been since last capture?
if (elapsedMicros >= onTime){ // if long enough, 4000UL = 4mS, move to next layer
previousMicros = previousMicros + onTime; // setup time for the next layer
// turn off prior layer's cathode, using direct port manipulation for chip select
assumes PORTD-2 = cathode chip select, PORTD-3 = anode chip select
PORTD = PORTD & 0b11111011; // clear bit 2
SPI.transfer(0); // 0's from shift register output turn off cathode transistors
PORTD = PORTD | 0b00000100; // set bit 2
layer = layer +1;
if (layer == 8){layer = 0;} // keep track of which layer, 0 to 7, reset as needed
// now send out anode data, groups of 8 from 64 byte array
PORTD = PORTD & 0b11110111; // clear bit 3
SPI.transfer(anodeArray[(8*layer)+0]); // do the math: 0,8,16,24,32,40,48,56
SPI.transfer(anodeArray[(8*layer)+1]); // do the math: 1,9,17,25,33,41,49,57
SPI.transfer(anodeArray[(8*layer)+2]); // etc
SPI.transfer(anodeArray[(8*layer)+3]); // don't do this in a loop - each pass adds 12uS
SPI.transfer(anodeArray[(8*layer)+4]); // set SPI divisors to 2 in setup for 8 Mbit/sec transfers
SPI.transfer(anodeArray[(8*layer)+5]); // don't forget 0.1uF cap from each shift register's
SPI.transfer(anodeArray[(8*layer)+6]); // VCC pin to nearest Gnd.
SPI.transfer(anodeArray[(8*layer)+7]);
PORTD = PORTD | 0b00001000; // set bit 3
// and turn on a cathode
PORTD = PORTD & 0b11111011; // clear bit 2
SPI.transfer(cathodeArray[layer]); // 1bit go to high
PORTD = PORTD | 0b00000100; // set bit 2
} // end time check & array update
// now do other stuff - read pots, make array updates, etc.
} // end loop