You're going to need a little more planning than this. First, forget the TLC5940, you can't use it with this display. Your display is a matrix of 88 RGB LED, with 8 anode rows and 38 cathode columns (one for each colour). The TLC5940 and similar are designed to drive 16 individual single colour leds or 5+1/3 RGB led. It handles the PWM for you fro one LED, but it can't deal with a matrix.
If you want a ready made IC able to drive a LED matrix, you would need to look for something like the MAX6952, although this one can only drive 5 columns so you would need 4+1/5 per display to get all colours. It allows you 15 levels of brightness which it regulates via the current through the LED. I guess there will be some models for 8*8 matrices available, but I never bothered to look for them.
If you want to regulate the colour intensity via PWM, here are a few little back of the envelope calculations to evaluate the available time.
Lets say the desired LED refresh rate for your display is 25 times per second.
You have 8 rows and 24 columns for your 88 display. The maximum brightness will be 1/8 of the LED constantly running.
You need to load the shift register at minimum 825 =200 times per seconds and you need to output 20024 = 4800 bits towards
the shift register.
If you want to add 8 levels of brightness, you will need be able to split the timeslot of each LED into 8 parts. That means you have to refresh each LED 200 times per second thus you're going to need to output 48008 = 38400 bits per second. For 16 levels it'll be twice as much, 76800 bits per second.
In the first case, you have 26 µs or 416 clock ticks on a 16MHz Arduino between each bit. With 16 colours, it'll be half the time. This will be tough, but not impossible.
And finally, to make it all worse, the human perception reacts logarithmically to intensity, not linearly. So if you go from 1 timeslot to 2, the LED will be appear 100% brighter, from 2 timeslots to 3 only 50% brighter and 7 timeslots to 8 just 14% brighter.
Now if you move to 4 displays, you have two options: More rows or more columns. No matter what way you go, your available time will diminish and your bit rates explode. The only way to keep this under control is to have per matrix one controller to work on the matrices in parallel.
A completely different option is to use some kind of dual ported video ram to refresh the LED. A few kB are amply enough for this application. That's what it was basically invented for and the Arduino would only need to write into the right places the data with no real time constraints. An simple approach is to increment the address counter for each tick which pushes row shift register and each bit of the byte will drive one column. If you want PWM on the colours, write for each byte 8 or 16 versions with the right PWM phase. The circuit for the refresh can be completely independent of the Arduino then. That's what the first home computers in the 80'ies were doing.
I hope, I gave you some ideas how to proceed.
Korman