I then went back to the Nano to see if the pointer version worked OK:
#define PIN 2
uint8_t *myPort;
uint8_t myPinBit;
...
*myPort |= myPinBit;
*myPort &= ~myPinBit;
It did work, and the result was:
digitalWrite() 3.59us, Port Manipulation 0.41us which is 8.76 times faster
So using the pointer slowed down the direct port manipulation, but it works with and without using a pointer.
Using the "volatile" keyword slowed it down a little further:
digitalWrite() 3.59us, Port Manipulation 0.54us which is 6.70 times faster