Is there a way to push out single 32 bit data to 4 MAX7219?

I have MAX7219 x4 side by side (and would like to expand to total of 16 using 2 separate SPI pins for 2 sets of 8 displays) but it seems like the only way to write data is 8 bits at a time using lc.setRow to each one of the module? Is there a library mod that would treat longer data row so it'd get pushed to all units at once? Say I send a 32 bits long row, it's automatically chopped into 4x 8 bits and sent to each one of the 4 displays? Manually splitting rows into 4 sets of bytes takes a bit of clock cycles, and if I could push out entire 64 bits row at a time it might go faster?

Or is that the limitation of MAX7219 and any library alteration to handle splitting would use about the same clock cycles as currently splitting the line into individual bytes?

The limitation is in the 8 bit architecture of the Arduino. Its hardware will only send SPI in 8 bit chunks. However, sending two lots of 8 bits, effectively is the same as sending 16 bits, and four lots of 8 bits is 32 bits.

It is up to your software to arrange this.

and if I could push out entire 64 bits row at a time it might go faster?

No, as I said it is down to the SPI hardware. However, if you want faster look and see what speed the SIP is running at. Default speed is I think 1MHz but it will go faster if the MAX7219 will take it.

Your performance problems are not because of limitations of the Arduino hardware or the max chip. Both are capable of dramatically faster performance. The problem here is the LED Control library you are using. It's a basic "demonstration" library and when driving multiple max chips, is very inefficient.

So my recommendation would be to use hardware SPI to drive all 16 max chips in a single chain, and choose a better library to control the chips (such as Parola), or use no library (if that's more appropriate for your project).