Why use Buffers rather then Transistors to help with fan out

I am using 24 Shift Registers (TPIC6B595N) and I need to use something to prevent fan out on the RCK & SRCK buses. From searching and posting people tend to recommend using a buffer to do the job (I was going to use the SN74AC244NE4). However I am curious as to fast Transisters, like the FDC855N, aren't recommended instead. Can anyone explain?

Many thanks!

Not an EE but I would venture to say that board real estate would become a major factor. How many discrete transistors can you place in the area occupied by a single SMD '244? if you use 1 transistor for each signal you do not have driven rail to rail output. The 244 you referenced has eight channels of +/- 24 mA at 0 & +5 Volts and a propagation delay of no more than 7.5 ns.
I'm sure someone could cobble together a discrete buffer but with the circuit complexities of a discrete rail to rail buffer is it really worth the effort?

Buffers can both source and sink current ("push-pull outputs".) A transistor will only do one or the other.
Buffers also take into account the appropriate logic levels, and are thus more immune to noise. By the time you implement the equivalent features of a buffer circuit, you'll have exceeded the cost of a buffer IC.
There ARE buffers with fewer than 8 signals, that are cheaper and smaller than an AC244. Try a 74xx125

The "fan-out" is a term related to the bipolar TTL technology, not applicable to unipolar MOS logic.

Long lines and many inputs increase the capacitance, that has to be driven by an output. The higher the capacity, the more current is required to switch from one logic level to the other one, or the longer it takes to switch at a given source/sink current.

You simply can reduce the clock frequency, to make many shift registers work without any additional circuitry. This also compensates for the delay of the data signal, that ripples through the daisy-chained shift registers. Depending on the refresh rate of the driven circuits (multiplexed LED array?), a low clock rate will not cause any unwanted effects.

Or you insert a driver or simple non-inverting gate into the clock lines, after one or more registers, to delay the clock signal according to the ripple delay. For more details you should have a scope at hand, to measure the slopes and the ripple and consequentially required clock delays.

As already mentioned, line drivers built from discrete components become much bigger and more expensive than integrated circuits. That's due to the complementary push/pull outputs, requiring one transistor each, and possibly more transistors to invert the signal twice and to drive the output stage.

As another solution you can use multiple register chains, up to star topology with no daisy-chaining at all. If you can spend some more pins to drive the data inputs of multiple register chains at the same time, the clock rate can be decreased as well. Instead of 1 chain of 24 registers at 1MHz, you can feed 2 chains of 12 registers at 500kHz, or 4 chains of 6 registers at 250kHz, to transfer the same number of bits in the same time.