Well, you say "this seems to be the way to go", but my experiments showed me that using these functions doesn't give as smooth an upgrade path to faster processors as I had hoped. And as westfw pointed out, they won't necessarily give your code a big boost when you move to a faster processor. And indeed I found that a single port manipulation took 0.28us on a 16MHz Nano and 0.28us on an 80MHz esp.
So to answer your question, I'll assume you will sticking with avr processors for now, because I don't have a good answer otherwise.
Even on different processors from the avr family, Arduino Pin X is not necessarily the same bit on the same port. Worse still, Arduino Pins X & Y might be on the same port on one avr processor but on different ports on another avr processor.
To deal with those situations, your code needs to use the functions portOutputRegister(), digitalPinToPort() and digitalPinToBitMask() for each Arduino pin your code needs to use, and store the results in different variables. So for example.
#define CLK 2
#define DATA 3
#define LATCH 4
uint8_t clkPort, clkBit, dataPort, dataBit, latchPort, latchBit;
void setup() {
clkPort = portOutputRegister(digitalPinToPort(CLK));
clkBit = digitalPinToBitMask(CLK);
dataPort = portOutputRegister(digitalPinToPort(DATA));
dataBit = digitalPinToBitMask(DATA);
latchPort = portOutputRegister(digitalPinToPort(LATCH));
latchBit = digitalPinToBitMask(LATCH);
}
and so on.