8 * 16 RGB LED MATRIX

What level of control do you want for these leds?
RGB on/off (RGBCMYKWBl) 7 colour +white and black is most reasonable with shift registers.
Keep in mind the 595 can't source or sink much current. If you want to drive multiple leds at the same time (you do) you want to have individual buffer transistors on EACH output (row/column) to drive enough current. 8 * .02 = 160mA = more current than one column driver could support.

You have CA and CC leds -- ideally you want not-common...isolated rgb cathode and anode so you can drive all 3 colours at the same time, but if they are common, you'll have to drive the reds, then the greens, then the blues separately.

Are you driving the columns AND rows with shift registers? If so you need to address them all, every frame.
an output frame would look like
//outputred
latch low
redColumn16b, greenColumn16b, blueColumn16b (6 bytes with only 1 bit set at a time), then 8 bits for that colour
latch high
might look like {01000000,00000000,00000000,00000000,0000000,000000000,10101010} //so only one colour column is driven at a time, with the last byte mattering.
You can do it more efficiently I suppose with 8bit columns and 16bit (unsigned integer) colour rows, but it takes more ram, your choice. Since a max of 16 leds can be on at a time instead of 8 it might change things a bit.
How much other code do you plan to do? Do you need all the other arduino pins? You could easily skip a register for just using 8 pins and output 8 shifted bits faster. You can set an entire port of pins with one line of code (such as the 'row' byte)...

the code will be very similar to the code for the 8x8;

I HIGHLY recommend understanding the 595 shift register tutorial

Also good to learn would be bitwise math, because shifting out is done in bits not bytes. Your rgb matrix can be stored in 3 arrays of 16 bytes (48 bytes total).

byte red[15]; //each byte represents a row of 8 bits of that colour
byte blue[15];
byte green[15];

unsigned int frame[7]; //7 byte frame to be shifted out

//to simplify procedural assertion of column bits within a row, you could declare various named patterns
byte bits[8] = {1,2,4,8,16,32,64,128}; //binary equivalent of single bits

a simple for loop would count 0-15
1 byte would represent a row of r, or g, or b

in your main loop you need to address
-setting an animation pattern (static pattern to begin) - choose 1 frame at a time of course
-outputting the pattern
I'd consider generating the frame[] with 2 nested for loops
column and colour

the first 6 bytes are set using the bits[] array

frame[] = {0,0,0,0,0};
int temp[2] = {0,0};
for (int column = 0; column < 16; column++)
{
temp[] = {0,0}; //convert the column into 2 bytes
if (colour < 8) {temp[0] = bits[column];} 
else {temp[1] = bits[(column-8)];}

for (int colour = 0; colour < 3; colour++)
{

if (colour = 0){//red
frame[0] = temp[0];
frame[1] = temp[1;]
frame[6] = red[column];
}

if (colour = 1){//green
frame[2] = temp[0];
frame[3] = temp[1];
frame[6] = green[column];
}

if (colour = 2){//blue
frame[4] = temp[0];
frame[5] = temp[1];
frame[6] = blue[column];
}

}end for
}end for