I'm confused by your rather complex code to address the 595 chip. Why not just use SPI? A single SPI transfer will do it. Check out my page here:
Example code to ripple out bits:
#include <SPI.h>
const byte LATCH = 10;
void setup ()
{
SPI.begin ();
} // end of setup
byte c;
void loop ()
{
c++;
digitalWrite (LATCH, LOW);
SPI.transfer (c);
digitalWrite (LATCH, HIGH);
delay (20);
} // end of loop
As others have pointed out, the 595 has 8 bits not 9:
for ( int bitToSet = 0; bitToSet <=8; bitToSet++ ) {
^^^^^^^^