My goal is to send data bit by bit out one pin (serially, and I'll use another pin to clock it by writing that pin high and low alternately.)
Did you know shiftOut() does this already?
And shiftIn() for bringing bits back in?
If you want to do it really fast, 8 MHz rates, then SPI.transfer() is the way to go.