Pointers for multiple Arduino schematic to control large led cube

I have successfully built a couple of 8x8x8 RGB led cubes now driven by a single Mega.
We're now contemplating building a 16x16x16 RGB led cube leveraging the same process as for the smaller cube.

One thing that does concern me however is the significant increases in the size of the arrays needed to drive this larger cube and the increased time required for The Arduino to communicate to the cube over SPI.

One thought I had was to break down the cube into multiple elements and develop a distributed approach, but this is out of my experience hence request for any pointers.

I was thinking of upgrading the boards from Megas to Duos (I'll need to change all the 595 chips to use 3.3 volts as opposed to 5v). To split the program up I was contemplating using one Duo to run the animation logic and prepare the large arrays and then communicate these arrays ( here's where I need the help) to four additional Duo boards ( or Mega's if possible and not restricted by the difference in baseline voltage) and use these four boards to read the array and communicate to an 8x8 section of the board over SPI.

Can I ask for some help in which direction to look at for the initial communication form the animation device to the multiple cube driver boards. Speed is of the essence here for fast animations.

Appreciate the help.