16x16 RGB LEDs with 96 register, good idea?

First time poster, if I sound impolite, please smack me as needed. It is not intended.

My plan: 16x16 pixel of RGB LEDs, common cathode.
I have Arduino Mega 2560, excessive IO port usage is not a problem (for now).
I assume that most of the LED arrays use shift registers, turning on one row/column at a time, and depend on high refresh rate for persistent-like display.
So, I attempt to use 96 8-bit register / d latch to allow consistent display, and the arduino will only update the selected register as needed. I will update each register with 8 IO pin from Arduino, if it is actually worth the reduction of time delay, if any.
I know this will be expensive, as 96 8 bit D Latch will cost almost as much as 256 RGB LEDs.
However, my question will be:
Will having register remembering each pixel color will have significant increase in performance (display to eyes)?
Will this worth the price for the max performance it can obtained compared to row-scanning method?
If there is anything worth mentioning, I will like to hear.

Will having register remembering each pixel color will have significant increase in performance

It will be brighter because you don't need to multiplex it, but the unit will take more current.

So, I attempt to use 96 8-bit register / d latch to allow consistent display,

It is not so easy to scale up electronics like that, for a start each logic output from the arduio can only drive 20 or so inputs, so you will need some buffer chips. It needs proper layout on strip board with lots of decoupling and a good power supply. Assuming each LED takes 20mA the a 16 X 16 matrix will take -> 16 * 16 * 3 * 0.02 = 15.36 Amps, that is quite a lot.
Also 96 X 8 is 768 arduino outputs, a touch more than a mega has free, and I am not sure you can drive 8 LEDs direct off the D-type latch.
Apart from that the plan is flawless.

I have a 36A 5V power supply to power the LEDs. And I am planing to use up to 15 8-bit demux to access each register independently (and part of the plan is to update only the changing pixels/register selectively).
Also, all of 8 input of registers will share the same 8 output from Arduino.

Other than the fact that it will get expensive, I still want to know if I can obtain "higher" framerate for smoother animation if I do it this way.

Are you planning to do PWM as well, or is this just a 6 color + black variant?

I still want to know if I can obtain "higher" framerate for smoother animation if I do it this way.

Not particularly, you will have a big problem just working out what part of the image to update, it is much easier to refresh it all.

all of 8 input of registers will share the same 8 output from Arduino.

So just remember what I said about buffering.

You can't really control the colors with those registers, you will only be able to make them red, yellow, green, blue, violet, cyan, white, and off.

You could however theoretically use 48 TLC5940's, but that will be expensive (100,- if you order them in china), and you will need a LOT of decoupling.

I'm currently building a big board with 256 warm white leds to light my room, and with only 4 TLC's I already had to decouple a lot, but it could be possible.

You could however theoretically use 48 TLC5940's

Well the data sheet says

The maximum number of cascading TLC5940 devices depends on the application system and is in the range of 40 devices.

So you are pushing it with 48.

If I were tasked to do this I would do either of two things, depending ob budget / performance demands:

A)

Make it modular with one controller per board and one more chip to distribute data. 8x8 RGB (PWM-ed) is known to work with shift registers (done that) and it certainly works very well with LED drivers. If you want to force everything into one micro it will be pretty busy and probably won't have much time left for other stuff (user interaction).

B)

Use a much beefier micro with lots of reserves (like xmos).

I think architecturally, I would run 12 groups of 8 shift registers in series, with each group having its own SlaveSelect line, and then update a group using
eight SPI.transfer commands. Cuts way down on the number of parts, your data could be stored as 96 bytes in an array, and just update the section that changed.

Or arrange other ways: 4 groups of 4x4 with 4 SS lines, update each group as something changed.
16 groups of 2x2. Whatever.

madworm:
Are you planning to do PWM as well, or is this just a 6 color + black variant?

I will do PWM if I can, but the best I can do is a quick switch of ON and OFF per refresh cycle. So, by design, I am not planing for it to have true PWM coming into the LEDs, unless there is an easy way to implement.

Grumpy_Mike:

I still want to know if I can obtain "higher" framerate for smoother animation if I do it this way.

Not particularly, you will have a big problem just working out what part of the image to update, it is much easier to refresh it all.

I might have the computer to tell the arduino exactly which part that will be changed, if needed. But this is just secondary objective.

Inevitableavoidance:
You can't really control the colors with those registers, you will only be able to make them red, yellow, green, blue, violet, cyan, white, and off.
You could however theoretically use 48 TLC5940's, but that will be expensive (100,- if you order them in china), and you will need a LOT of decoupling.
I'm currently building a big board with 256 warm white leds to light my room, and with only 4 TLC's I already had to decouple a lot, but it could be possible.

Thank for providing me a new method, I am not sure how to control it for now, but my guess is that it will truly get messy when it comes to 256 RGB LEDs. At this rate, I might as well already use 4 colorduino with the 8x8 RGB LEDs matrix.

madworm:
Make it modular with one controller per board and one more chip to distribute data. 8x8 RGB (PWM-ed) is known to work with shift registers (done that) and it certainly works very well with LED drivers. If you want to force everything into one micro it will be pretty busy and probably won't have much time left for other stuff (user interaction).

You are probably right on the spot for that. I will need to concern about computer-arduino interaction part.

CrossRoads:
I think architecturally, I would run 12 groups of 8 shift registers in series, with each group having its own SlaveSelect line, and then update a group using
eight SPI.transfer commands. Cuts way down on the number of parts, your data could be stored as 96 bytes in an array, and just update the section that changed.

Or arrange other ways: 4 groups of 4x4 with 4 SS lines, update each group as something changed.
16 groups of 2x2. Whatever.

Do you think that using the shift registers will be "fast" enough to update for a 256 RGB LEDs? I am not experienced, but I just have a feeling that I will not get a good refresh rate (target at minimal 30 FPS). I believe that this method will share the same brightness as using d-latch register. Also, what is the relative price comparison between register and a shift register?

Ok, I miscounted. 16 x 16 x 3 for RGB = 768 bytes.
SPI runs at 4 MHz.
4,000,000 / 8 = 500,000 bytes/second / 768 byte = 650 frames/second
Throw in loss for other stuff going on, you can still hit 30 frames/second easy.

Get a chip with more memory, '328 with 2K of memory and storing your image in 768 bytes of it could be an issue.
Like a '1284 with 16K of RAM.
16 x 16 x RGB is really more like 48 x 16, I'd go multiplexing route with 48 anode drivers and 16 cathode current sinks using >1A transistors (48 x 0.02A)

CrossRoads:
Ok, I miscounted. 16 x 16 x 3 for RGB = 768 bytes.
SPI runs at 4 MHz.
4,000,000 / 8 = 500,000 bytes/second / 768 byte = 650 frames/second
Throw in loss for other stuff going on, you can still hit 30 frames/second easy.

Get a chip with more memory, '328 with 2K of memory and storing your image in 768 bytes of it could be an issue.
Like a '1284 with 16K of RAM.
16 x 16 x RGB is really more like 48 x 16, I'd go multiplexing route with 48 anode drivers and 16 cathode current sinks using >1A transistors (48 x 0.02A)

I have Arduino 2560 , 8kb RAM should be sufficient?

Just a side note to the non-persistent light method, I am trying to get a combo of capacitor and resistor attached next to the LED for each anode, with the hope that it will be able to retain some brightness when it is disconnected. Have anyone ever experienced with this before?

I don't see how that works unless you add a diode as well to prevent the drive line from just discharging the cap as soon as the the charge line turns off.
Might as well go with a bank of 12 7219/7221's to control 64 LEDs each.
8 SPI.transfer()s to each to update the display. Techone just bought 70 of them for 50 cents each.

CrossRoads:
I don't see how that works unless you add a diode as well to prevent the drive line from just discharging the cap as soon as the the charge line turns off.
Might as well go with a bank of 12 7219/7221's to control 64 LEDs each.
8 SPI.transfer()s to each to update the display. Techone just bought 70 of them for 50 cents each.

I am looking at this 5 X Max7219 MAX7219CNG LED Display Driver IC for sale online | eBay
The data sheet said that this is designed for common cathode design. I am right now will try to think of a way to link 3 of MAX7219CNG to synch because the RGB leds are common cathode by itself.

You are mixing up the use of the term common cathode to describe the way an RGB LED is wired with the term common cathode which describes the way a matrix is wired.
This page will explain the fundamentals of multiplexing:-
http://www.thebox.myzen.co.uk/Workshop/LED_Matrix.html

Yeah, I think I gave him a bum steer on the 7219, wasn't thinking in terms of the RGB LED being one device, but as 3 separate devices, which it is not.

Still, if you had 16 7219s , each driving 16 RGB LEDs, 2 across with a common cathode and 8 down with common anodes, that would help.
Need 16 7219's, as you can't share an RGB LED across 2 7219's.

A1 A2 A3 A4 A5 A6 A7 A8
R G B R G B NC NC cathode 1 Repeat 8 times across
R G B R G B NC NC cathode 2
R G B R G B NC NC cathode 3
R G B R G B NC NC cathode 4
R G B R G B NC NC cathode 5
R G B R G B NC NC cathode 6
R G B R G B NC NC cathode 7
R G B R G B NC NC cathode 8

Repeat 2 times down

for a 16 x 16 RGB array

Eventually you will also see noise on the LEDs when you have long wires supporting the shifting registers control lines.
You need damping resistors but I don't have concrete advice for your circuit, just an idea what is causing the "noise" if you encounter any.
These shifting registers typically can handle at least 20 Mhz however the higher the frequency, the more difficult is it to drive them especially having a long chain of registers.

30 Amps is really a lot to handle. However eventually it won't be as much as 30 Amps for 256 RGB LEDs, even non-multiplexed. You'd really have to measure for a smaller matrix, and scale up.

I suggest at first to build only a part of the matrix, and see how you get along with it.
If you fully developed it, build the rest of the matrix.

I also suggest to add a larger storage capacitor (100uF or the like) to each of the shifting registers, and use a "ring" circuit for the power supply (not only one long bus).

"Eventually you will also see noise on the LEDs when you have long wires supporting the shifting registers control lines."

If the 7219/7221s are wired up next to each other with the SPI lines wired nice & neat, I don't think that would be a problem.

Each 7219 only turns on up to 6 LEDs at a time in my configuration (2 RGB LEDs per cathode 'row'), it multiplexes them at 800 Hz, flicker free.
So 120mA x 16 '7221s = 1.92A. Not much at all really.

The code could then just update the data that changed, and not have to refresh the whole display every frame cycle.

I don't have this information maybe it was already agreed to use the 7219/7221s.

I see this phenomena when I have the controller on a different PCB than the LED matrix.
10cm to 15cm wire are enough at MHz frequencies to ecounter things like crosstalk, and noise pickup.
Eventually I use frequencies upto 8 Mhz not continuously but for some transitions.

Nothing has been agreed, its all design speculation so far.

I have not tried seperating the uC from a '7221 by more than an inch or two, and then updating registers once a second. The displays themselves are all broken out using 12" ribbon cable from the 7221 to the display. That info is switched at 800 Hz. And the 7221 uses edge control to reduce EMI.

I'll have to take your word on the longer separation of the SPI lines. If you didn't run the lines inside a shielded cable, I could see EMI being generated.