To explain further, you want a byte of data (8 bits) that is held or calculated in your sketch to appear on the 8 parallel output pins of the shift register. To get them there, the 8 bits have to be sent from the Arduino to the shift register one bit at a time, in serial form. The shift register can then re-assemble the 8 bits into a byte and that then appears on the 8 output pins. To do that, the Arduino has to break the byte down into its individual bits. There are two ways to do that.
One is to use a small piece of code contained in a standard shiftOut() function. When using this method, you can specify any pins you want for the data pin and the clock pin.
The second way is to use a part of the Arduino's built-in hardware called the SPI interface. The downside of the second method is that you can only use the pins that the SPI hardware is connected to inside the chip. The up side is that you are using hardware acceleration and it is, I don't know, probably at least 20x as fast.
For a single shift register, the extra speed of SPI probably won't make any noticeable difference, depending what your circuit is doing. But shift registers can be chained together to give more outputs. If you have 10 chained together, you might still not notice the extra speed of SPI. But if you have 100 shift registers chained together, then it would get really slow to use shiftOut(), and much faster and more efficient to use SPI.