Can this rgb 74595 matrix code be made faster?

Hello, I've been working on this code for quite a while and am wondering if it can be made any faster. I have the hardware working as well but wonder if this is the best I can do as I can only get 16 steps of color for each pixel of each rgb led.
Basically I have an 10x8 array of LED class elements each with 3 byte variables to hold the r,g and b numbers that represent the brightness of the led. In my loop code I iterate through the array setting the brightness of the colors (generated by 2 MSGEQ07s from music). I have an ISR which goes through each row and depending on if a "PWM" pseudo variable is within the bounds of the array number sets the pin of my chip array on (they are all set off at the start of the ISR). When I get to the top of the matrix I increment the PMW counter by 16 and go to the bottom. The ISR is on timer 1 and used in CTC mode so I can change the calling frequency easily. I'm using direct port calls to speed up the output in the pins as well.
Here is the code and I'm open to suggestions.
Thanks
Mike

ISR(TIMER3_COMPA_vect)
{  
  volatile static byte pwmCntr=0,rowCntr=0,led=0;
  volatile byte pinCtr=1,d; // set index to chips and pins
  // clear chip array
  for(volatile byte cl=0;cl<NUM_CHIPS;cl++) //write changed data to chips
  {
  chip[cl]=0;
  }
  // move data from array to shift register chips

  // START OF PWM ROUTINE
  for(volatile byte y=0;y<=NUM_COLS;y++)
  {
    for(volatile byte x=0;x<3;x++)
    {
      d=(Matrix[rowCntr][y].color[x])/2;
      //if(pwmCntr<(Matrix[rowCntr][y].color[x])) // turn on LED? at start of PWM cycle
        if((pwmCntr>(125-d)) && (pwmCntr<(125+d))) // turn on LED?  in the center of the PWM cycle
      {
        bitSet(chip[pinCtr/8],pinCtr%8); // set bit of chip array on
      } //end if <pwmCntr 
      pinCtr++;
    } // end x
  } // end y
// SPI transfer to shift resisters
  setLow(SRCLK_PORT, SRCLK_PIN);
  for(volatile byte sp=0;sp<NUM_CHIPS;sp++) //write changed data to chips
  {
    SPI.transfer(chip[sp]);
  } // end sp
  // disable Slave Select
  setHigh(SRCLK_PORT,SRCLK_PIN);
// I'm using a 74hc164 to drive a UDN2982 to drive the rows 
  // Select next row of LEDs
  setLow(CLK64_PORT, CLK64_PIN);  
  if (rowCntr==0) // start new cycle on rows
  {
    setHigh(DATA64_PORT,DATA64_PIN);
  }
  else
  {
    setLow(DATA64_PORT,DATA64_PIN);
  }
  // end if(rowCtr
  setHigh(CLK64_PORT, CLK64_PIN); 
  rowCntr++;
if(rowCntr>(NUM_ROWS-1)) //reset to start 
  {  
    if(led==0) // TOGGLE PIN 13 as a test light
    {
      setLow(LEDP_PORT,LEDP_PIN);
      led=1;
    }
    else
    {
      setHigh(LEDP_PORT,LEDP_PIN);
      led=0;
    } // IF LED 
    rowCntr=0;
    pwmCntr+=16;
    if(pwmCntr>=255) //reset pwm counter
    {
      pwmCntr=0;
    }  
  }
} // end ISR

Hard to say as you haven't posted ALL your code so we have no idea what the setHigh and setLow functions look like. Are you using direct port addressing here?
Do not declair variables in the ISR that takes time but if you do there is no need to declair them as volatile when inside an ISR.
You seem to do all the refreshing inside that one ISR all at once, just do one row at a time then the time between ISR calls is spent with some LEDs on, it will look better.

Divisions in your inner loop?

SPI transfer after all data is prepared? (the point of SPI in hardware is it does it's job in parallel with the main CPU)

Oh, yes, there's a lot you can do...

Thanks for the replies, Grumpy Mike I'm especially honoured you should cast your eye upon my feeble scratchings.
The sethigh and setlow are macros I pinched from some code for the TLC5940 by Matt Pandia, I gave up on the TLCs after letting out so much magic smoke my room looked like Bejing. I also moved the variable initialisation out to setup ( doesn't this break the ethos of no global variables?). I have also moved the division out to where the matrix is filled as I was already doing a /4 to get from 1024 to 255 and a /8 wouldn't be noticed. I thought that processing 1 "page" at a time at each step of PWM was a good trade off between speed and the overhead involved in entering and leaving the ISR . I'm at work now so will try the results when I get home.
Cheers
Mike