How to efficiently push a 1 (bit) across a 74HC595 shift register

I've got a question about shift registers and a trick I had in mind which isn't panning out as I thought it would.

I have a led matrix I'm driving the anode columns high one at a time and sinking all the cathodes together every time. I can do it the traditional way and completely refresh the anode shift registers every time but that seems redundant since all I'm doing is pushing a 1 (high) across the pins with every new loop. Since the shift register is rather good at doing just that (pulse the SH_CP clock pin 11 high and low), I figured it should be super efficient to enter a high bit at the start of the line and then just clock it down every loop while keeping a counter so that I can know when the 1 high bit will have reached the end.
Two things went differently than I had expected, and of those two there's just one that had me puzzled.

The first easy one: I put a single 1 in de data line, clocked that in and put the latch high, entering it to the output pins. I then clocked that through. What didn't happen was a single led going from pin 1, to 2, to 3, to 4 etc. (like a Knight Rider thing) but every consecutive led turned on, so first led 1, then led 1 and 2, then 1, 2, and 3, etc. Made sense, because I was moving up a 1, without the 1 in the previous position being changed. So, I shifted a 1 into the data line, clocked that in, and then a 0. Now it worked! When clocking, the 1 moved along, and the trailing 0 as well, changing the previous 1 to a 0 while doing so. Great success. Me happy.

I though it would be trivial to apply this to daisy chaining, i.e. that I could just shift that 1 and 0 across multiple shift registers by only clocking but... that doesn't work! I can't explain why, but as soon as I start only clocking (ie not shifting out any data), the second shift register will show identical data as the first one. Let me explain this through numbers:

Begin state. Two shift registers, shifting in data from the right:
00000000 00000000

I shift in a 1...
00000000 00000001

... and then a 0
00000000 00000010

.. and latch it. the bit sequence is now set to the output pins. 
I don't shift out any new data but only putch the clock high and low again, 
also latching in between (to update the new position of the bits to the 
output pins). I would now expect this:
00000000 00000100

but instead this happens:
00000100 00000100

Another latch low, clock high, clock low, latch high, and then:
00001000 00001000

In a way that IS logical, but why did the second shift register get that 1 in there in the first place!?

Apparently I'm missing something in the way shift registers work. Why isn't my idea working? Is it possible at all? Would really make things more efficient.

How are your shift registers wired?

Use SPI.transfer, writes a new byte into spdr and lets the internal high speed hardware take care of it.
More efficient than code bit-banging a 1 and then 0 out on a data bit and then doing the same for a clock.
Use direct port manipulation for the latch.

Have the anode data in an array, and the cathode data in an array:

byte anodeArray[] = {0b00000001, 0b00000010, 0b00000100, 0b00001000, 0b00010000, 0b00100000, 0b01000000, 0b10000000,};
byte cathodeArray[] = {whatever the data is, assumed to be changing,};

// time check using millis or micros - time for an update?
for (x=0; x<8; x=x+1){
PORTB = PORTB & 0b11111011; // say D10 used for latch, clear it low, leave rest alone
SPI.transfer(cathodeArray[x];
SPI.transfer(anodeArray[x]);
PORTB = PORTB | 0b00000100; // set D10 high, leave rest alone
}

Jiggy-Ninja:
How are your shift registers wired?

Like in this shiftOut tutorial:

@CrossRoads thanks for the code snippet. I'll be using SPI in the next code iteration to gain speed but that wouldn't change simply the question 'is it possible'. I have an update though, which answers the question with "yes, it's possible".

First off, I was incorrect with my reporting. There wasn't a 'copy' led being lit exactly 8 bits down but it was 7 bits!

I shift in a 0, a 1 and then a 0 before only proceeding with clocking and latching:

First 0:

00000000 00000000 (all's well)

The 1:

00000000 00000001 (all's well)

The 0:

00000000 10000010 (wait.. WHAT!?)

now don't shift in anything anymore, just clock it (and also latch to put the new values to the outputs)

00000001 00000100

Here's the next 14 steps, as expected:

00000010 00001000
00000100 00010000
00001000 00100000
00010000 01000000
00100000 10000000
01000001 00000000
10000010 00000000
00000100 00000000
00001000 00000000
00010000 00000000
00100000 00000000
01000000 00000000
10000000 00000000
00000000 00000000

Unfortunately I can't explain how this happens, but I did change my code just a little bit and now my trick works as wanted. I'm not sending in a 010 sequence anymore and only clocking after that, but Im shifting in a 1 the first time and shifting in 0's after that. Saves on shifting out bigtime!

The shift register is connected to pins 2,3,4 (PORTD)

void setColumn_faster(byte c) {
  PORTD &= ~_BV(cols_latch);   // latch low
  
  // If at first position shift out a 0, otherwise a 1 (this is inverted logic since I'm using PNP transistors
  // and driving a PNP high means no current and vice versa
  if (c == 0) {
    PORTD |= ~_BV(cols_data);    // data 0 (which becomes a high output because I'm using PNP transistors
  } else {
    PORTD |= _BV(cols_data);    // data 1 (which becomes a low output because I'm using PNP transistors
  }
  
  PORTD |= _BV(cols_clock);    // clock high
  PORTD &= ~_BV(cols_clock);   // clock low

  PORTD |= _BV(cols_latch);    // latch high
}

They should have included .1uF ceramic decoupling capacitors from the I.C. VCC pins to ground!

You didn't happen to copy that catastrophic blunder of putting a 1µF capacitor across the latch line in that tutorial, did you?

No-one knows how it got there - it clearly should have been across Vcc and ground.

The comments on the page clearly show the cap was added by design. Agree, it should not be there, we have asked numerous times to have it removed.

Evil design?

No, I think poor understanding:
"Notice the 0.1"f capacitor on the latchPin, if you have some flicker when the latch pin pulses you can use a capacitor to even it out."

CrossRoads:
"Notice the 0.1"f capacitor on the latchPin, if you have some flicker when the latch pin pulses you can use a capacitor to even it out."

That really is rubbish!

I can imagine that a capacitor would provide crude debouncing for operating the latch from pushbuttons, and perhaps that is what they had in mind, though it would actually be necessary on the clock line. This comment however makes no sense at all.

And on the diagram , it is a 1µF capacitor, not a '0.1"f', whatever that may be.

Dumb, dumb, dumb. Not at all good for newbies for sure.

I have a plan to get it changed. 8)
But it will take about two months, wait and see if I succeed.