The pin of Arduino that is driving clock signal needs to drive 45 inputs of 595s. It is too much for single Arduino's pin - it cannot do nice clean edges on such load with speed needed. This is reason for buffer - the pin is driving only a few inputs of 125s and the buffer drives only few inputs of 595s.