How are you controlling it?One way would be to use a MAX7219 for each 8x8 piece, so 32x16 = 8 chips.Easy to connect up, easy to control, each 8x8 is multiplexed at 800 Hz, all your code has to do is write data to registers via SPI commands to update the display.Can daisy chain them, can control each individually. I chose to select each individually in this example.Your wiring would be a little more intense, having to duplicate 8 matrices.
So you need 48 IO, 16 or 32 of them setup as PWM outputs, the other 32 or 16 set up to sink or source current to the selected column being driven.Or split it in half two 16x16 sections, with 32 IO needed per section, each with 16 PWMs and 16 sinks/sources.Brightness will be improved with the 2nd one, and depending on the CPLD, perhaps a x16 dual-port RAM for each half could be used.
The other thing you have to consider is how the data is coming from memory.You'll have bytes in an array representing fonts or a picture or something.How will you send that out to the shift registers or the CPLD or whatever to be displayed?And manipulate it to make changes?
I don't know, seems to be a complicated way to replicate the 16 outputs of at TLC5490 for example.
How many pairs of nippy-cutters did you go through for all that? > > > Once you get that humming, you're going to see the effect of in/consistency in brightness and in the die and lens alignment (manufacturing). One by one they all seem about the same, but as a group, differences are more appreciable.
The uC is going to compose frames in that dual-port memory. Each frame is going to be 512 bytes, in other words 8 bits of grayscale for each pixel. The CPLD is simply a dumb driver, it's going to display what is in the current frame buffer.