Buffer SCk, SS, MOSI out to each string. Use 6 chip selects if there is enough IO pins. Otherwise,  let MAX7219s be the data in to data out buffer, and just have the SS & SCK buffered.
3 to 8 decoders are generally 1 of 8 outputs low, that doesn't help on data; would help on chip select.
Good drive current to get across lots of PCB traces and wiring, vs wimpier HC parts:
